[Fwd: TermEnum usage]

2010-07-22 Thread Vincent DARON
Without any answers, I'm reposting once. Do I have to post bug report ?

Let me know

Thanks a lot

Vincent DARON
ASK
---BeginMessage---
Hi all

I'm using Lucene.NET 2.9.2.2 from SVN.

I try to iterate terms of a field in my index, todo so, i'm using
IndexReader.Terms(f) that return a TermEnum.

The classic usage of iterator is the folowing pattern

TermNum enu = reader.Terms(new Term(myfield));
while(enu.Next())
{
ProcessTerm(enu.Term());
}

But it seems that the TermEnum is already on the first item BEFORE the
first call to Next. The previous code will therefore always skip the
first Term.

Bug ?

Thanks

Vincent DARON
ASK
---End Message---


Re: [Fwd: TermEnum usage]

2010-07-22 Thread Ben West
Hey Vincent,

I am not a dev, but for example look at FuzzyQuery.cs (starting at line 148):

do 
{
  float score = 0.0f;
  Term t = enumerator.Term();
  if (t != null) 
  {
// some stuff with t
  }
}
while (enumerator.Next());

you can see that it is expecting the enumerator to have a term in it before it 
calls next [i.e. it is using do...while rather than just while]. So I think 
this is expected behavior, although it may not be intuitive.

Hope this helps,
-Ben

--- On Thu, 7/22/10, Vincent DARON vda...@ask.be wrote:

 From: Vincent DARON vda...@ask.be
 Subject: [Fwd: TermEnum usage]
 To: lucene-net-dev lucene-net-dev@lucene.apache.org
 Date: Thursday, July 22, 2010, 10:10 AM
 Without any answers, I'm reposting
 once. Do I have to post bug report ?
 
 Let me know
 
 Thanks a lot
 
 Vincent DARON
 ASK
 


  


Re: API changes between 2.9.2 and 2.9.3

2010-07-22 Thread Andi Vajda


On Jul 22, 2010, at 2:09, Bill Janssen jans...@parc.com wrote:


Andi Vajda va...@apache.org wrote:


Porting your stuff to 3.0 is thus highly recommended instead
of complaining about broken (my bad) long- deprecated APIs.


Hey, take 2.9.3 down, and announce no further pylucene support for  
2.x,

and I'll stop talking about it.


The value in 2.9.3 is really just in the Lucene fixes since 2.9.2. If  
you want them without the new JCC which is tripping you up, take a  
2.9.2 build tree and change the Lucene svn url near the top of the  
Makefile to point at the 2.9.3 sources. This should just work (tm).


Andi..



Bill


Re: API changes between 2.9.2 and 2.9.3

2010-07-22 Thread Bill Janssen
Andi Vajda va...@apache.org wrote:

 
 On Jul 22, 2010, at 2:09, Bill Janssen jans...@parc.com wrote:
 
  Andi Vajda va...@apache.org wrote:
 
  Porting your stuff to 3.0 is thus highly recommended instead
  of complaining about broken (my bad) long- deprecated APIs.
 
  Hey, take 2.9.3 down, and announce no further pylucene support for
  2.x,
  and I'll stop talking about it.
 
 The value in 2.9.3 is really just in the Lucene fixes since 2.9.2. If
 you want them without the new JCC which is tripping you up, take a
 2.9.2 build tree and change the Lucene svn url near the top of the
 Makefile to point at the 2.9.3 sources. This should just work (tm).

Another fix is to edit the common-build.xml file in the Lucene subtree
to remove the 1.4 restriction.  That lets it build with Java 5 and that
adds the Iterable interface, and things work as they did, even with jcc 2.6.

Bill


Re: API changes between 2.9.2 and 2.9.3

2010-07-22 Thread Andi Vajda


On Jul 22, 2010, at 17:52, Bill Janssen jans...@parc.com wrote:


Andi Vajda va...@apache.org wrote:



On Jul 22, 2010, at 2:09, Bill Janssen jans...@parc.com wrote:


Andi Vajda va...@apache.org wrote:


Porting your stuff to 3.0 is thus highly recommended instead
of complaining about broken (my bad) long- deprecated APIs.


Hey, take 2.9.3 down, and announce no further pylucene support for
2.x,
and I'll stop talking about it.


The value in 2.9.3 is really just in the Lucene fixes since 2.9.2. If
you want them without the new JCC which is tripping you up, take a
2.9.2 build tree and change the Lucene svn url near the top of the
Makefile to point at the 2.9.3 sources. This should just work (tm).


Another fix is to edit the common-build.xml file in the Lucene subtree
to remove the 1.4 restriction.  That lets it build with Java 5 and  
that
adds the Iterable interface, and things work as they did, even with  
jcc 2.6.


Even better. Still, none of the Lucene 2.9 code uses any of the Java  
1.5 features directly, hence why Lucene 3.0 is yet a better choice.


Andi..




Bill


[jira] Commented: (SOLR-1731) ArrayIndexOutOfBoundsException when highlighting

2010-07-22 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891018#action_12891018
 ] 

Leonhard Maylein commented on SOLR-1731:


We have the same problem whenever we search for a word which has synonyms 
defined.

 ArrayIndexOutOfBoundsException when highlighting
 

 Key: SOLR-1731
 URL: https://issues.apache.org/jira/browse/SOLR-1731
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.4
Reporter: Tim Underwood
Priority: Minor

 I'm seeing an java.lang.ArrayIndexOutOfBoundsException when trying to 
 highlight for certain queries.  The error seems to be an issue with the 
 combination of the ShingleFilterFactory, PositionFilterFactory and the 
 LengthFilterFactory. 
 Here's my fieldType definition:
 fieldType name=textSku class=solr.TextField positionIncrementGap=100 
 omitNorms=true
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.LengthFilterFactory min=2 max=100/
   /analyzer
   analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory /
   filter class=solr.ShingleFilterFactory maxShingleSize=8 
 outputUnigrams=true/
   filter class=solr.PositionFilterFactory /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   filter class=solr.LengthFilterFactory min=2 max=100/ !-- works 
 if this is commented out --
 /analyzer
 /fieldType
 Here's the field definition:
 field name=sku_new type=textSku indexed=true stored=true 
 omitNorms=true/
 Here's a sample doc:
 add
 doc
   field name=id1/field
   field name=sku_newA 1280 C/field
 /doc
 /add
 Doing a query for sku_new:A 1280 C and requesting highlighting throws the 
 exception (full stack trace below):  
 http://localhost:8983/solr/select/?q=sku_new%3A%22A+1280+C%22version=2.2start=0rows=10indent=onhl=onhl.fl=sku_newfl=*
 If I comment out the LengthFilterFactory from my query analyzer section 
 everything seems to work.  Commenting out just the PositionFilterFactory also 
 makes the exception go away and seems to work for this specific query.
 Full stack trace:
 java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:202)
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
 at 
 org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
 at 
 org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
 at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
 at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
 at 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
 at org.mortbay.jetty.Server.handle(Server.java:285)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
 at 
 

[jira] Updated: (SOLR-1804) Upgrade Carrot2 to 3.2.0

2010-07-22 Thread Stanislaw Osinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislaw Osinski updated SOLR-1804:


Attachment: SOLR-1804-carrot2-3.4.0-dev.patch

Ok, here's another shot. This time, the language model factory includes support 
for Chinese. To avoid compilation issues, the classes are loaded through 
reflection. Not pretty, but works. If there's a way to have access to smart 
chinese at compilation time, let me know, I can remove the reflection stuff, so 
that the refactoring is more reliable.

 Upgrade Carrot2 to 3.2.0
 

 Key: SOLR-1804
 URL: https://issues.apache.org/jira/browse/SOLR-1804
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Clustering
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Attachments: SOLR-1804-carrot2-3.4.0-dev.patch


 http://project.carrot2.org/release-3.2.0-notes.html
 Carrot2 is now LGPL free, which means we should be able to bundle the binary!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



unsubscribe

2010-07-22 Thread Peter Bruhn Andersen




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-07-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891085#action_12891085
 ] 

Michael McCandless commented on LUCENE-2324:


This is looking awesome Michael!  I love the removal of *PerThread --
they are all logically absorbed into DWPT, so everything is now per
thread.

I still see usage of docStoreOffset, but aren't we doing away with
shared doc stores with the cutover to DWPT?

I think you can further simplify DocumentsWriterPerThread.DocWriter;
in fact I think you can remove it  all subclasses in consumers!  The
consumers can simply directly write their files.  The only reason this
class was created was because we have to interleave docs when writing
the doc stores; this is no longer needed since doc stores are again
private to the segment.  I think we don't need PerDocBuffer, either.
And this also simplifies RAM usage tracking!

Also, we don't need separate closeDocStore; it should just be closed
during flush.

I like the ThreadAffinityDocumentsWriterThreadPool; it's the default
right (I see some tests explicitly setting in on IWC; not sure why)?

We should make the in-RAM deletes impl somehow pluggable?


 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1799) Unicode compression

2010-07-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1799:


Attachment: LUCENE-1799_big.patch

attached is a really really rough patch that sets bocu-1 as the default 
encoding.

Beware: its a work in progress and a lot of the patch is auto-generated 
(eclipse) so some things need to be reverted.

Most tests pass, the idea is to find bugs in tests etc that abuse 
bytesref/assume utf-8 encoding, things like that.


 Unicode compression
 ---

 Key: LUCENE-1799
 URL: https://issues.apache.org/jira/browse/LUCENE-1799
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Affects Versions: 2.4.1
Reporter: DM Smith
Priority: Minor
 Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, 
 LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


 In lucene-1793, there is the off-topic suggestion to provide compression of 
 Unicode data. The motivation was a custom encoding in a Russian analyzer. The 
 original supposition was that it provided a more compact index.
 This led to the comment that a different or compressed encoding would be a 
 generally useful feature. 
 BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM 
 with an implementation in ICU. If Lucene provide it's own implementation a 
 freely avIlable, royalty-free license would need to be obtained.
 SCSU is another Unicode compression algorithm that could be used. 
 An advantage of these methods is that they work on the whole of Unicode. If 
 that is not needed an encoding such as iso8859-1 (or whatever covers the 
 input) could be used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1799) Unicode compression

2010-07-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891101#action_12891101
 ] 

Robert Muir commented on LUCENE-1799:
-

btw that patch is huge because i just sucked in the icu charset stuff to have 
an implementation that works for testing... 

its not intended to ever be that way as we would just implement the stuff we 
need without this code, but it makes it easier to test since you dont need any 
external jars or muck with the build system at all.


 Unicode compression
 ---

 Key: LUCENE-1799
 URL: https://issues.apache.org/jira/browse/LUCENE-1799
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Affects Versions: 2.4.1
Reporter: DM Smith
Priority: Minor
 Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, 
 LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch


 In lucene-1793, there is the off-topic suggestion to provide compression of 
 Unicode data. The motivation was a custom encoding in a Russian analyzer. The 
 original supposition was that it provided a more compact index.
 This led to the comment that a different or compressed encoding would be a 
 generally useful feature. 
 BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM 
 with an implementation in ICU. If Lucene provide it's own implementation a 
 freely avIlable, royalty-free license would need to be obtained.
 SCSU is another Unicode compression algorithm that could be used. 
 An advantage of these methods is that they work on the whole of Unicode. If 
 that is not needed an encoding such as iso8859-1 (or whatever covers the 
 input) could be used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2537) FSDirectory.copy() impl is unsafe

2010-07-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887935#action_12887935
 ] 

Shai Erera edited comment on LUCENE-2537 at 7/22/10 8:09 AM:
-

Oh .. found the thread we discussed that on the list, to which I've actually 
last posted w/ the following text:

{quote}
I've Googled around a bit and came across this: 
http://markmail.org/message/l67bierbmmedrfw5. Apparently, there's a long 
standing bug against SUN since May 2006 
(http://bugs.sun.com/view_bug.do?bug_id=6431344) that's still open and reports 
the exact same behavior that I'm seeing.

If I understand correctly, this might be a Windows limitation and is expected 
to work well on Linux. I'll give it a try. But this makes me think if we should 
keep the current behavior for Linux-based directories, and fallback to the 
chunks approach for Windows ones? Since eventually I'll be running on Linux, I 
don't want to lose performance ...

This isn't the first that we've witnessed the write once, run everywhere 
misconception of Java :). I'm thinking if in general we should have a 
Windows/Linux FSDirectory impl, or handlers, to prepare for future cases as 
well. Mike already started this with LUCENE-2500 (DirectIOLinuxDirectory). 
Instead of writing a Directory, perhaps we could have a handler object or 
something, or a generic LinuxDirectory that impls some stuff the 'linux' way. 
In FSDirectory we already have code which detects the OS and JRE used to decide 
between Simple, NIO and MMAP Directories ...
{quote}

  was (Author: shaie):
Oh .. found the thread we discussed that on the list, to which I've 
actually last posted w/ the following text:

{quote}
I've Googled around a bit and came across this: 
http://markmail.org/message/l67bierbmmedrfw5. Apparently, there's a long 
standing bug against SUN since May 2006 
(http://bugs.sun.com/view_bug.do?bug_id=6431344) that's still open and reports 
the exact same behavior that I'm seeing.

If I understand correctly, this might be a Windows limitation and is expected 
to work well on Linux. I'll give it a try. But this makes me think if we should 
keep the current behavior for Linux-based directories, and fallback to the 
chunks approach for Windows ones? Since eventually I'll be running on Linux, I 
don't want to lose performance ...

This isn't the first that we've witnessed the write once, run everywhere 
misconception of Java :). I'm thinking if in general we should have a 
Windows/Linux FSDirectory impl, or handlers, to prepare for future cases as 
well. Mike already started this with LUCENE-2500 (DirectIOLinuxDirectory). 
Instead of writing a Directory, perhaps we could have a handler object or 
something, or a generic LinuxDirectory that impls some stuff the 'linux' way. 
In FSDirectory we already have code which detects the OS and JRE used to decide 
between Simple, NIO and MMAP Directories ...
{code}
  
 FSDirectory.copy() impl is unsafe
 -

 Key: LUCENE-2537
 URL: https://issues.apache.org/jira/browse/LUCENE-2537
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1, 4.0


 There are a couple of issues with it:
 # FileChannel.transferFrom documents that it may not copy the number of bytes 
 requested, however we don't check the return value. So need to fix the code 
 to read in a loop until all bytes were copied..
 # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
 I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
 cryptic):
 {code}
 Exception in thread main java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
 at 
 sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
 at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
 at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
 ... 7 more
 {code}
 I changed the impl to something like this:
 {code}
 long numWritten = 0;
 long numToWrite = input.size();
 long bufSize = 1  26;
 while (numWritten  numToWrite) {
   numWritten += output.transferFrom(input, numWritten, bufSize);
 }
 {code}
 And the code successfully adds the indexes. This code uses chunks of 64MB, 
 however that might be too large for some applications, so we definitely need 
 a smaller one. The question is how small so that performance won't be 
 affected, and it'd be great if we can let it be configurable, however since 
 that API is 

[jira] Reopened: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb reopened SOLR-1999:



Sorry, but that is still advertising nightly builds to the general public, 
albeit indirectly.

If a developer really wants to find nightly builds, they should be able to do 
so via the developer pages, not the pages intended for all users.

 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2537) FSDirectory.copy() impl is unsafe

2010-07-22 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2537:
---

Attachment: FileCopyTest.java

I wrote a test which compares FileChannel API to intermediate buffer copies. 
The test runs each method 3 times and reports the best time of each. It can be 
run w/ different file and chunk sizes.

Here are the results of copying a 1GB file using different chunk sizes (the 
chunk is used as the intermediate buffer size as well).

Machine spec:
* Linux, 64-bit (IBM) JVM
* 2xQuad (+hyper-threading) - 16 cores overall
* 16GB RAM
* SAS HD

||Chunk Size||FileChannel||Intermediate Buffer||Diff||
|64K|1865|1528|{color:red}-18%{color}|
|128K|1660|1526|{color:red}-9%{color}|
|512K|1514|1493|{color:red}-2%{color}|
|1M|1552|2072|{color:green}+33%{color}|
|2M|1488|1559|{color:green}5%{color}|
|4M|1596|1831|{color:green}13%{color}|
|16M|1563|1964|{color:green}21%{color}|
|64M|1494|2442|{color:green}39%{color}|
|128M|1469|2445|{color:green}40%{color}|

For small buffer sizes, intermediate byte[] copies is preferable. However, 
FileChannel method performs pretty much consistently, irregardless of the 
buffer size (except for the first run), while the byte[] approach degrades a 
lot, as the buffer size increases.

I think, given these results, we can use the FileChannel method w/ a chunk size 
of 4 (or even 2) MB, to be on the safe side and don't eat up too much RAM?

 FSDirectory.copy() impl is unsafe
 -

 Key: LUCENE-2537
 URL: https://issues.apache.org/jira/browse/LUCENE-2537
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1, 4.0

 Attachments: FileCopyTest.java


 There are a couple of issues with it:
 # FileChannel.transferFrom documents that it may not copy the number of bytes 
 requested, however we don't check the return value. So need to fix the code 
 to read in a loop until all bytes were copied..
 # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
 I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
 cryptic):
 {code}
 Exception in thread main java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
 at 
 sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
 at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
 at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
 ... 7 more
 {code}
 I changed the impl to something like this:
 {code}
 long numWritten = 0;
 long numToWrite = input.size();
 long bufSize = 1  26;
 while (numWritten  numToWrite) {
   numWritten += output.transferFrom(input, numWritten, bufSize);
 }
 {code}
 And the code successfully adds the indexes. This code uses chunks of 64MB, 
 however that might be too large for some applications, so we definitely need 
 a smaller one. The question is how small so that performance won't be 
 affected, and it'd be great if we can let it be configurable, however since 
 that API is called by other API, such as addIndexes, not sure it's easily 
 controllable.
 Also, I read somewhere (can't remember now where) that on Linux the native 
 impl is better and does copy in chunks. So perhaps we should make a Linux 
 specific impl?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2537) FSDirectory.copy() impl is unsafe

2010-07-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891123#action_12891123
 ] 

Michael McCandless commented on LUCENE-2537:


Nice results Shai!

bq. I think, given these results, we can use the FileChannel method w/ a chunk 
size of 4 (or even 2) MB, to be on the safe side and don't eat up too much RAM?

+1

 FSDirectory.copy() impl is unsafe
 -

 Key: LUCENE-2537
 URL: https://issues.apache.org/jira/browse/LUCENE-2537
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1, 4.0

 Attachments: FileCopyTest.java


 There are a couple of issues with it:
 # FileChannel.transferFrom documents that it may not copy the number of bytes 
 requested, however we don't check the return value. So need to fix the code 
 to read in a loop until all bytes were copied..
 # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
 I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
 cryptic):
 {code}
 Exception in thread main java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
 at 
 sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
 at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
 at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
 ... 7 more
 {code}
 I changed the impl to something like this:
 {code}
 long numWritten = 0;
 long numToWrite = input.size();
 long bufSize = 1  26;
 while (numWritten  numToWrite) {
   numWritten += output.transferFrom(input, numWritten, bufSize);
 }
 {code}
 And the code successfully adds the indexes. This code uses chunks of 64MB, 
 however that might be too large for some applications, so we definitely need 
 a smaller one. The question is how small so that performance won't be 
 affected, and it'd be great if we can let it be configurable, however since 
 that API is called by other API, such as addIndexes, not sure it's easily 
 controllable.
 Also, I read somewhere (can't remember now where) that on Linux the native 
 impl is better and does copy in chunks. So perhaps we should make a Linux 
 specific impl?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2553) IOException: read past EOF

2010-07-22 Thread Kyle L. (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle L. updated LUCENE-2553:


Description: 
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
at 
com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
at 
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
at org.apache.lucene.search.Searcher.search(Searcher.java:67)
...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an 
unordered manner:

{code}
private class AllHitsUnsortedCollector extends Collector {

private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
private IndexReader reader;
private int baselineDocumentId;
private ListDocument matchingDocuments = new ArrayListDocument();

@Override
public boolean acceptsDocsOutOfOrder() {
return true;
}

@Override
public void collect(int docId) throws IOException {

int documentId = baselineDocumentId + docId;
Document document = reader.document(documentId, getFieldSelector());

if (document == null) {
logger.info(Null document from search results!);
} else {
matchingDocuments.add(document);
}
}

@Override
public void setNextReader(IndexReader segmentReader, int baseDocId) 
throws IOException {
this.reader = segmentReader;
this.baselineDocumentId = baseDocId;
}

@Override
public void setScorer(Scorer scorer) throws IOException {
// do nothing
}

public ListDocument getMatchingDocuments() {
return matchingDocuments;
}
}

{code}

The exception arises when users perform searches while indexing/optimization is 
occurring. Our {{IndexReader}} is read-only. From the documentation I have 
read, a read-only {{IndexReader}} instance should be immune from any 
uncommitted index changes and should return consistent results during indexing 
and optimization. As this exception occurs during indexing/optimization, it 
seems to me that the read-only {{IndexReader}} is somehow stumbling upon the 
uncommitted content? 

The problem is difficult to replicate as it is sporadic in nature and so far 
has only occurred in Production.

We have rebuilt the indexes a number of times, but that does not seem to 
alleviate the issue.

Any other information I can provide that will help isolate the issue? 

Most likely the other possibility is that the {{Collector}} we have written is 
doing something it shouldn't. Any pointers?

  was:
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
at 
com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
at 
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
at org.apache.lucene.search.Searcher.search(Searcher.java:67)
...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an 
unordered manner:

{code}
private class AllHitsUnsortedCollector extends Collector {

private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
private IndexReader reader;
private int baselineDocumentId;
private ListDocument matchingDocuments = new ArrayListDocument();

@Override
public boolean acceptsDocsOutOfOrder() {
return true;
}

@Override
public void 

[jira] Updated: (LUCENE-2553) IOException: read past EOF

2010-07-22 Thread Kyle L. (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle L. updated LUCENE-2553:


Description: 
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
at 
com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
at 
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
at org.apache.lucene.search.Searcher.search(Searcher.java:67)
...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an 
unordered manner:

{code}
private class AllHitsUnsortedCollector extends Collector {

private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
private IndexReader reader;
private int baselineDocumentId;
private ListDocument matchingDocuments = new ArrayListDocument();

@Override
public boolean acceptsDocsOutOfOrder() {
return true;
}

@Override
public void collect(int docId) throws IOException {

int documentId = baselineDocumentId + docId;
Document document = reader.document(documentId, getFieldSelector());

if (document == null) {
logger.info(Null document from search results!);
} else {
matchingDocuments.add(document);
}
}

@Override
public void setNextReader(IndexReader segmentReader, int baseDocId) 
throws IOException {
this.reader = segmentReader;
this.baselineDocumentId = baseDocId;
}

@Override
public void setScorer(Scorer scorer) throws IOException {
// do nothing
}

public ListDocument getMatchingDocuments() {
return matchingDocuments;
}
}

{code}

The exception arises when users perform searches while indexing/optimization is 
occurring. Our {{IndexReader}} is read-only. From the documentation I have 
read, a read-only {{IndexReader}} instance should be immune from any 
uncommitted index changes and should return consistent results during indexing 
and optimization. As this exception occurs during indexing/optimization, it 
seems to me that the read-only {{IndexReader}} is somehow stumbling upon the 
uncommitted content? 

The problem is difficult to replicate as it is sporadic in nature and so far 
has only occurred in Production.

We have rebuilt the indexes a number of times, but that does not seem to 
alleviate the issue.

Any other information I can provide that will help isolate the issue? 

The most likely other possibility is that the {{Collector}} we have written is 
doing something it shouldn't. Any pointers?

  was:
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
at 
com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
at 
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
at org.apache.lucene.search.Searcher.search(Searcher.java:67)
...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an 
unordered manner:

{code}
private class AllHitsUnsortedCollector extends Collector {

private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
private IndexReader reader;
private int baselineDocumentId;
private ListDocument matchingDocuments = new ArrayListDocument();

@Override
public boolean acceptsDocsOutOfOrder() {
return true;
}

@Override
public void 

[jira] Updated: (LUCENE-2537) FSDirectory.copy() impl is unsafe

2010-07-22 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2537:
---

Attachment: LUCENE-2537.patch

Patch copies the files in chunks of 2MB. All core tests pass. I'll wait a day 
or two in case someone wants to suggests a different approach, or chunk size 
limit before I commit.

 FSDirectory.copy() impl is unsafe
 -

 Key: LUCENE-2537
 URL: https://issues.apache.org/jira/browse/LUCENE-2537
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1, 4.0

 Attachments: FileCopyTest.java, LUCENE-2537.patch


 There are a couple of issues with it:
 # FileChannel.transferFrom documents that it may not copy the number of bytes 
 requested, however we don't check the return value. So need to fix the code 
 to read in a loop until all bytes were copied..
 # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
 I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
 cryptic):
 {code}
 Exception in thread main java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
 at 
 sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
 at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
 at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
 ... 7 more
 {code}
 I changed the impl to something like this:
 {code}
 long numWritten = 0;
 long numToWrite = input.size();
 long bufSize = 1  26;
 while (numWritten  numToWrite) {
   numWritten += output.transferFrom(input, numWritten, bufSize);
 }
 {code}
 And the code successfully adds the indexes. This code uses chunks of 64MB, 
 however that might be too large for some applications, so we definitely need 
 a smaller one. The question is how small so that performance won't be 
 affected, and it'd be great if we can let it be configurable, however since 
 that API is called by other API, such as addIndexes, not sure it's easily 
 controllable.
 Also, I read somewhere (can't remember now where) that on Linux the native 
 impl is better and does copy in chunks. So perhaps we should make a Linux 
 specific impl?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-64) strict hierarchical facets

2010-07-22 Thread SolrFan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891207#action_12891207
 ] 

SolrFan commented on SOLR-64:
-

Can the patch please be updated to the latest trunk? Thanks

 strict hierarchical facets
 --

 Key: SOLR-64
 URL: https://issues.apache.org/jira/browse/SOLR-64
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: Next

 Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, 
 SOLR-64.patch


 Strict Facet Hierarchies... each tag has at most one parent (a tree).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-792) Tree Faceting Component

2010-07-22 Thread SolrFan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891208#action_12891208
 ] 

SolrFan commented on SOLR-792:
--

Hi, can this patch please be updated against the current 1.4 trunk? thanks.

 Tree Faceting Component
 ---

 Key: SOLR-792
 URL: https://issues.apache.org/jira/browse/SOLR-792
 Project: Solr
  Issue Type: New Feature
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
 Attachments: SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, 
 SOLR-792.patch, SOLR-792.patch


 A component to do multi-level faceting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-64) strict hierarchical facets

2010-07-22 Thread Aleksander Stensby (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891213#action_12891213
 ] 

Aleksander Stensby commented on SOLR-64:


I'm currently on holidays until July 27.

If urgent, please contact Gisele O'Connor:
email: gisele.o.con...@integrasco.com
phone: +47 90283809

Best regards,
 Aleksander

-- 
Aleksander M. Stensby
Integrasco A/S
E-mail: aleksander.sten...@integrasco.com
Tel.: +47 41 22 82 72
www.integrasco.com
http://twitter.com/Integrasco
http://facebook.com/Integrasco

Please consider the environment before printing all or any of this e-mail


 strict hierarchical facets
 --

 Key: SOLR-64
 URL: https://issues.apache.org/jira/browse/SOLR-64
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: Next

 Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, 
 SOLR-64.patch


 Strict Facet Hierarchies... each tag has at most one parent (a tree).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-07-22 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891228#action_12891228
 ] 

Michael Busch commented on LUCENE-2324:
---

Thanks, Mike - great feedback! (as always)

{quote}
I still see usage of docStoreOffset, but aren't we doing away with
shared doc stores with the cutover to DWPT?
{quote}

Do we want all segments that one DWPT writes to share the same
doc store, i.e. one doc store per DWPT, or remove doc stores 
entirely?


{quote}
I think you can further simplify DocumentsWriterPerThread.DocWriter;
in fact I think you can remove it  all subclasses in consumers!
{quote}

I agree!  Now that a high number of testcases pass it's less scary
to modify even more code :)  - will do this next.


{quote}
Also, we don't need separate closeDocStore; it should just be closed
during flush.
{quote}

OK sounds good.


{quote}
I like the ThreadAffinityDocumentsWriterThreadPool; it's the default
right (I see some tests explicitly setting in on IWC; not sure why)?
{quote}

It's actually only TestStressIndexing2 and it sets it to use a different 
number of max thread states than the default.


{quote}
We should make the in-RAM deletes impl somehow pluggable?
{quote}

Do you mean so that it's customizable how deletes are handled? 
E.g. doing live deletes vs. lazy deletes on flush?
I think that's a good idea.  E.g. at Twitter we'll do live deletes always
to get the lowest latency (and we don't have too many deletes),
but that's probably not the best default for everyone.
So I agree that making this customizable is a good idea.

It'd also be nice to have a more efficient data structure to buffer the
deletes.  With many buffered deletes the java hashmap approach
will not be very efficient.  Terms could be written into a byte pool,
but what should we do with queries?

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [Fwd: TermEnum usage]

2010-07-22 Thread Digy
It is expected behavior. Please see 

http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/index/IndexReader.html#terms%28org.apache.lucene.index.Term%29

DIGY

-Original Message-
From: Vincent DARON [mailto:vda...@ask.be] 
Sent: Thursday, July 22, 2010 6:10 PM
To: lucene-net-dev
Subject: [Fwd: TermEnum usage]

Without any answers, I'm reposting once. Do I have to post bug report ?

Let me know

Thanks a lot

Vincent DARON
ASK



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-07-22 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891241#action_12891241
 ] 

Yonik Seeley commented on LUCENE-2324:
--

bq. It'd also be nice to have a more efficient data structure to buffer the 
deletes. With many buffered deletes the java hashmap approach will not be very 
efficient. Terms could be written into a byte pool, but what should we do with 
queries?

IMO, terms are an order of magnitude more important than queries.  Most deletes 
will be by some sort of unique id, and will be in the same field.

Perhaps a single byte[] with length prefixes (like the field cache has).  A 
single int could then represent a term (it would just be an offset into the 
byte[], which is field-specific, so no need to store the field each time).

We could then build a treemap or hashmap that natively used an int[]... but 
that may not be necessary (depending on how deletes are applied).  Perhaps a 
sort could be done right before applying, and duplicate terms could be handled 
at that time.

Anyway, I'm only casually following this issue, but I'ts looking like really 
cool stuff!

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-07-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891256#action_12891256
 ] 

Michael McCandless commented on LUCENE-2324:


{quote}
bq. I still see usage of docStoreOffset, but aren't we doing away with shared 
doc stores with the cutover to DWPT?

Do we want all segments that one DWPT writes to share the same
doc store, i.e. one doc store per DWPT, or remove doc stores 
entirely?
{quote}

Oh good question... a single DWPT can in fact continue to share doc
store across the segments it flushes.

Hmm, but... this opto only helps in that we don't have to merge the
doc stores if we merge segments that already share their doc stores.
But if (say) I have 2 threads indexing, and I'm indexing lots of docs
and each DWPT has written 5 segments, we will then merge these 10
segments, and must merge the doc stores at that point.  So the sharing
isn't really buying us much (just not closing old files  opening new
ones, which is presumably negligible)?

{quote}
bq. I think you can further simplify DocumentsWriterPerThread.DocWriter; in 
fact I think you can remove it  all subclasses in consumers!

I agree! Now that a high number of testcases pass it's less scary
to modify even more code  - will do this next.

bq. Also, we don't need separate closeDocStore; it should just be closed during 
flush.

OK sounds good.
{quote}

Super :)

{quote}
bq. I like the ThreadAffinityDocumentsWriterThreadPool; it's the default right 
(I see some tests explicitly setting in on IWC; not sure why)?

It's actually only TestStressIndexing2 and it sets it to use a different 
number of max thread states than the default.
{quote}

Ahh OK great.

{quote}
bq. We should make the in-RAM deletes impl somehow pluggable?

Do you mean so that it's customizable how deletes are handled? 
{quote}

Actually I was worried about the long[] sequenceIDs (adding 8 bytes
RAM per buffered doc) -- this could be a biggish hit to RAM efficiency
for small docs.

{quote} E.g. doing live deletes vs. lazy deletes on flush?
I think that's a good idea. E.g. at Twitter we'll do live deletes always
to get the lowest latency (and we don't have too many deletes),
but that's probably not the best default for everyone.
So I agree that making this customizable is a good idea.
{quote}

Yeah, this too :)

Actually deletions today are not applied on flush -- they continue to
be buffered beyond flush, and then get applied just before a merge
kicks off.  I think we should keep this (as an option and probably as
the default) -- it's important for apps w/ large indices that don't use
NRT (and don't pool readers) because it's costly to open readers.

So it sounds like we should support lazy (apply-before-merge like
today) and live (live means resolve deleted Term/Query - docID(s)
synchronously inside deleteDocuments, right?).

Live should also be less performant because of less temporal locality
(vs lazy).

{quote}
It'd also be nice to have a more efficient data structure to buffer the
deletes. With many buffered deletes the java hashmap approach
will not be very efficient. Terms could be written into a byte pool,
but what should we do with queries?
{quote}

I agree w/ Yonik: let's worry only about delete by Term (not Query)
for now.

Maybe we could reuse (factor out) TermsHashPerField's custom hash
here, for the buffered Terms?  It efficiently maps a BytesRef -- int.

Another thing: it looks like finishFlushedSegment is sync'd on the IW
instance, but, it need not be sync'd for all of that?  EG
readerPool.get(), applyDeletes, building the CFS, may not need to be
inside the sync block?


 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to 

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-07-22 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891262#action_12891262
 ] 

Michael Busch commented on LUCENE-2324:
---

{quote}
Perhaps a single byte[] with length prefixes (like the field cache has).  A 
single int could then represent a term (it would just be an offset into the 
byte[], which is field-specific, so no need to store the field each time).
{quote}

Yeah that's pretty much how TermsHashPerField works.  I agree with Mike, 
let's reuse that code.


{quote}
Hmm, but... this opto only helps in that we don't have to merge the
doc stores if we merge segments that already share their doc stores.
But if (say) I have 2 threads indexing, and I'm indexing lots of docs
and each DWPT has written 5 segments, we will then merge these 10
segments, and must merge the doc stores at that point. So the sharing
isn't really buying us much (just not closing old files  opening new
ones, which is presumably negligible)?
{quote}

Yeah that's true.  I agree it won't help much. I think we should just 
remove the doc stores, great simplification (which should also make 
parallel indexing a bit easier :) ).  


{quote}
Another thing: it looks like finishFlushedSegment is sync'd on the IW
instance, but, it need not be sync'd for all of that? EG
readerPool.get(), applyDeletes, building the CFS, may not need to be
inside the sync block?
{quote}

Thanks for the hint.  I need to carefully go over all the synchronization, 
there are likely more problems.  

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-07-22 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891264#action_12891264
 ] 

Yonik Seeley commented on LUCENE-2324:
--

bq. Yeah that's pretty much how TermsHashPerField works. I agree with Mike, 
let's reuse that code.

Do we even need to maintain a hash over it though, or can we simply keep a list 
(and allow dup terms until it's time to apply them)?

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-752) Allow better Field Compression options

2010-07-22 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891284#action_12891284
 ] 

David Smiley commented on SOLR-752:
---

I spent some time today attempting to implement this with my own Solr FieldType 
that extends TextField.  As I tried to implement it, I realized that I couldn't 
really do it.  FieldType has a method createField(...) that is necessary to 
implement in order to set binary data (i.e. byte[]) on a Field.  This method 
demands I return a org.apache.lucene.document.Field which is final.  If I 
create the field with binary data, by default it's not indexed or tokenized.  I 
can get those booleans to flip by simply invoking f.setTokenStream(null).  
However, I can't set omitNorms() to false, nor can I set booleans for the term 
vector fields.  There may be other issues but at this point I gave up to work 
on other more important priorities of mine.

 Allow better Field Compression options
 --

 Key: SOLR-752
 URL: https://issues.apache.org/jira/browse/SOLR-752
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor

 See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression
 It would be good if Solr handled field compression outside of Lucene's 
 Field.COMPRESS capabilities, since those capabilities are less than ideal 
 when it comes to control over compression.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-752) Allow better Field Compression options

2010-07-22 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891305#action_12891305
 ] 

David Smiley commented on SOLR-752:
---

I already looked at BinaryField and TrieField for inspiration.  BinaryField 
assumes you're not going to index the data.  And TrieField doesn't set binary 
data value on the Field.

Yes, I think the next step is to make createField() return Fieldable.  But I'm 
not a committer...

Instead or in addition... I have to wonder, why not modify Lucene's Field class 
to allow me to set the Index, Store, and TermVecotr enums AND specify binary 
data on a suitable constructor?  Arguably an existing constructor taking String 
would be hijaced to take Object and then do the right thing.  That would be a 
small change, whereas implementing another subclass of AbstractField is more 
complex and would likely reproduce much of what's in Field already.

 Allow better Field Compression options
 --

 Key: SOLR-752
 URL: https://issues.apache.org/jira/browse/SOLR-752
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor

 See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression
 It would be good if Solr handled field compression outside of Lucene's 
 Field.COMPRESS capabilities, since those capabilities are less than ideal 
 when it comes to control over compression.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2009) Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot

2010-07-22 Thread Mark Miller (JIRA)
Contrib ant test targets do not respect sys props testcase,testpackage,and 
testpackageroot
--

 Key: SOLR-2009
 URL: https://issues.apache.org/jira/browse/SOLR-2009
 Project: Solr
  Issue Type: Bug
  Components: Build
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: Next


Very annoying using these props with core tests unless you use the junit target 
rather than test. Also would be nice if they worked regardless for future dev.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

2010-07-22 Thread James Dyer (JIRA)
Improvements to SpellCheckComponent Collate functionality
-

 Key: SOLR-2010
 URL: https://issues.apache.org/jira/browse/SOLR-2010
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, spellchecker
Affects Versions: 1.4.1
 Environment: Tested against trunk revision 966633
Reporter: James Dyer
Priority: Minor


Improvements to SpellCheckComponent Collate functionality

Our project requires a better Spell Check Collator.  I'm contributing this as a 
patch to get suggestions for improvements and in case there is a broader need 
for these features.

1. Only return collations that are guaranteed to result in hits if re-queried 
(applying original fq params also).  This is especially helpful when there is 
more than one correction per query.  The 1.4 behavior does not verify that a 
particular combination will actually return hits.
2. Provide the option to get multiple collation suggestions
3. Provide extended collation results including the # of hits re-querying will 
return and a breakdown of each misspelled word and its correction.

This patch is similar to what is described in SOLR-507 item #1.  Also, this 
patch provides a viable workaround for the problem discussed in SOLR-1074.  A 
dictionary could be created that combines the terms from the multiple fields.  
The collator then would prune out any spurious suggestions this would cause.

This patch adds the following spellcheck parameters:

1. spellcheck.maxCollationTries - maximum # of collation possibilities to try 
before giving up.  Lower values ensure better performance.  Higher values may 
be necessary to find a collation that can return results.  Default is 0, which 
maintains backwards-compatible behavior (do not check collations).

2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, 
which maintains backwards-compatible behavior.

3. spellcheck.collateExtendedResult - if true, returns an expanded response 
format detailing collations found.  default is false, which maintains 
backwards-compatible behavior.  When true, output is like this (in context):

lst name=spellcheck
lst name=suggestions
lst name=hopq
int name=numFound94/int
int name=startOffset7/int
int name=endOffset11/int
arr name=suggestion
strhope/str
strhow/str
strhope/str
strchops/str
strhoped/str
etc
/arr
lst name=faill
int name=numFound100/int
int name=startOffset16/int
int name=endOffset21/int
arr name=suggestion
strfall/str
strfails/str
strfail/str
strfill/str
strfaith/str
strall/str
etc
/arr
/lst
lst name=collation
str name=collationQueryTitle:(how AND fails)/str
int name=hits2/int
lst name=misspellingsAndCorrections
str name=hopqhow/str
str name=faillfails/str
/lst
/lst
lst name=collation
str name=collationQueryTitle:(hope AND faith)/str
int name=hits2/int
lst name=misspellingsAndCorrections
str name=hopqhope/str
str name=faillfaith/str
/lst
/lst
lst name=collation
str name=collationQueryTitle:(chops AND all)/str
int name=hits1/int
lst name=misspellingsAndCorrections
str name=hopqchops/str
str name=faillall/str
/lst
/lst
/lst
/lst

In addition, SOLRJ is updated to include 
SpellCheckResponse.getCollatedResults(), which will return the expanded 
Collation format.  getCollatedResult(), which returns a single String, is 
retained for backwards-compatibility.  Other APIs were not changed but will 
still work provided that spellcheck.collateExtendedResult is false.

This likely will not return valid results if using Shards.  Rather, a more 
robust interaction with the index would be necessary than what exists in 

[jira] Updated: (SOLR-2009) Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot

2010-07-22 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2009:
--

Attachment: SOLR-2009.patch

 Contrib ant test targets do not respect sys props testcase,testpackage,and 
 testpackageroot
 --

 Key: SOLR-2009
 URL: https://issues.apache.org/jira/browse/SOLR-2009
 Project: Solr
  Issue Type: Bug
  Components: Build
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: Next

 Attachments: SOLR-2009.patch


 Very annoying using these props with core tests unless you use the junit 
 target rather than test. Also would be nice if they worked regardless for 
 future dev.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

2010-07-22 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2010:
-

Attachment: SOLR-2010.patch

Tested against branch version #96633

 Improvements to SpellCheckComponent Collate functionality
 -

 Key: SOLR-2010
 URL: https://issues.apache.org/jira/browse/SOLR-2010
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, spellchecker
Affects Versions: 1.4.1
 Environment: Tested against trunk revision 966633
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2010.patch


 Improvements to SpellCheckComponent Collate functionality
 Our project requires a better Spell Check Collator.  I'm contributing this as 
 a patch to get suggestions for improvements and in case there is a broader 
 need for these features.
 1. Only return collations that are guaranteed to result in hits if re-queried 
 (applying original fq params also).  This is especially helpful when there is 
 more than one correction per query.  The 1.4 behavior does not verify that a 
 particular combination will actually return hits.
 2. Provide the option to get multiple collation suggestions
 3. Provide extended collation results including the # of hits re-querying 
 will return and a breakdown of each misspelled word and its correction.
 This patch is similar to what is described in SOLR-507 item #1.  Also, this 
 patch provides a viable workaround for the problem discussed in SOLR-1074.  A 
 dictionary could be created that combines the terms from the multiple fields. 
  The collator then would prune out any spurious suggestions this would cause.
 This patch adds the following spellcheck parameters:
 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try 
 before giving up.  Lower values ensure better performance.  Higher values may 
 be necessary to find a collation that can return results.  Default is 0, 
 which maintains backwards-compatible behavior (do not check collations).
 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 
 1, which maintains backwards-compatible behavior.
 3. spellcheck.collateExtendedResult - if true, returns an expanded response 
 format detailing collations found.  default is false, which maintains 
 backwards-compatible behavior.  When true, output is like this (in context):
 lst name=spellcheck
   lst name=suggestions
   lst name=hopq
   int name=numFound94/int
   int name=startOffset7/int
   int name=endOffset11/int
   arr name=suggestion
   strhope/str
   strhow/str
   strhope/str
   strchops/str
   strhoped/str
   etc
   /arr
   lst name=faill
   int name=numFound100/int
   int name=startOffset16/int
   int name=endOffset21/int
   arr name=suggestion
   strfall/str
   strfails/str
   strfail/str
   strfill/str
   strfaith/str
   strall/str
   etc
   /arr
   /lst
   lst name=collation
   str name=collationQueryTitle:(how AND fails)/str
   int name=hits2/int
   lst name=misspellingsAndCorrections
   str name=hopqhow/str
   str name=faillfails/str
   /lst
   /lst
   lst name=collation
   str name=collationQueryTitle:(hope AND faith)/str
   int name=hits2/int
   lst name=misspellingsAndCorrections
   str name=hopqhope/str
   str name=faillfaith/str
   /lst
   /lst
   lst name=collation
   str name=collationQueryTitle:(chops AND all)/str
   int name=hits1/int
   lst name=misspellingsAndCorrections
   str name=hopqchops/str
   str name=faillall/str
   /lst
   /lst
   /lst
 /lst
 In addition, SOLRJ is updated to include 
 SpellCheckResponse.getCollatedResults(), which will return the expanded 
 Collation format.  getCollatedResult(), which returns a single String, is 
 retained for 

[jira] Commented: (SOLR-1240) Numerical Range faceting

2010-07-22 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891321#action_12891321
 ] 

Hoss Man commented on SOLR-1240:


bq. Rather than embedding meta to the list containing the counts, perhaps we 
should bite the bullet and add an additional level for the counts.

yeah ... i'm on board with that idea.  it's a trivial change.

any comments on the implementation?

i think it's fairly solid -- the one wish i have though is to try and gut the 
existing date faceting code to just use the new code -- but i can't see a very 
easy way to do that while dealing with the differnet param names .. suggestions?

 Numerical Range faceting
 

 Key: SOLR-1240
 URL: https://issues.apache.org/jira/browse/SOLR-1240
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Gijs Kunze
Priority: Minor
 Attachments: SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch, 
 SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch


 For faceting numerical ranges using many facet.query query arguments leads to 
 unmanageably large queries as the fields you facet over increase. Adding the 
 same faceting parameter for numbers which already exists for dates should fix 
 this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2011) Solr should get it's temp dir like lucene - first checking the tempDir sys prop

2010-07-22 Thread Mark Miller (JIRA)
Solr should get it's temp dir like lucene - first checking the tempDir sys prop
---

 Key: SOLR-2011
 URL: https://issues.apache.org/jira/browse/SOLR-2011
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: Next




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-07-22 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891334#action_12891334
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

{quote}I think we should just remove the doc stores{quote}

Right, I think we should remove sharing doc stores between
segments. And in general, RT apps will likely not want to use
doc stores if they are performing numerous updates and/or
deletes. We can explicitly state this in the javadocs.

I'm thinking we could explore efficient deleted docs as sequence
ids in a different issue, specifically storing them in a short[]
and wrapping around.  

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2012) stats component, min/max on a field with no values

2010-07-22 Thread Jonathan Rochkind (JIRA)
stats component, min/max on a field with no values
--

 Key: SOLR-2012
 URL: https://issues.apache.org/jira/browse/SOLR-2012
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Jonathan Rochkind


: 
: When I use the stats component on a field that has no values in the result set
: (ie, stats.missing == rowCount), I'd expect that 'min'and 'max' would be
: blank.
: 
: Instead, they seem to be the smallest and largest float values or something,
: min = 1.7976931348623157E308, max = 4.9E-324 .
: 
: Is this a bug?

off the top of my head it sounds like it ... would you mind opening a n 
issue in Jira please?

-Hoss

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2555) Remove shared doc stores

2010-07-22 Thread Michael Busch (JIRA)
Remove shared doc stores


 Key: LUCENE-2555
 URL: https://issues.apache.org/jira/browse/LUCENE-2555
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch


With per-thread DocumentsWriters sharing doc stores across segments doesn't 
make much sense anymore.

See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2009) Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot

2010-07-22 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-2009.
---

Resolution: Fixed

more to do here later, but this initial fix is in.

 Contrib ant test targets do not respect sys props testcase,testpackage,and 
 testpackageroot
 --

 Key: SOLR-2009
 URL: https://issues.apache.org/jira/browse/SOLR-2009
 Project: Solr
  Issue Type: Bug
  Components: Build
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: Next

 Attachments: SOLR-2009.patch


 Very annoying using these props with core tests unless you use the junit 
 target rather than test. Also would be nice if they worked regardless for 
 future dev.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2554) preflex codec doesn't order terms correctly

2010-07-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891364#action_12891364
 ] 

Robert Muir commented on LUCENE-2554:
-

the perf issues here are really from our contrived tests... its good to use 
_TestUtil.randomUnicodeString, but it gives you the impression there is 
something wrong with this dance and there really isnt.

I added _TestUtil.randomRealisticUnicodeString in r966878, you can swap this 
into some of these slow tests and see its definitely the problem.


 preflex codec doesn't order terms correctly
 ---

 Key: LUCENE-2554
 URL: https://issues.apache.org/jira/browse/LUCENE-2554
 Project: Lucene - Java
  Issue Type: Test
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2554.patch


 The surrogate dance in the preflex codec (which must dynamically remap terms 
 from UTF16 order to unicode code point order) is buggy.
 To better test it, I want to add a test-only codec, preflexrw, that is able 
 to write indices in the pre-flex format.  Then we should also fix tests to 
 randomly pick codecs (including preflexrw) so we better test all of our 
 codecs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891369#action_12891369
 ] 

Sebb commented on SOLR-1999:


See:

http://www.apache.org/dev/release.html#what

Do not include any links on the project website that might encourage 
non-developers to download and use nightly builds, snapshots, release 
candidates, or any other similar package.

 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891376#action_12891376
 ] 

Hoss Man commented on SOLR-1999:



Developers are members of the general public -- any page a developer can see 
can be seen by anybody else as well.

While i agree the previous link was bad, i quite frankly don't understand your 
concern with the current situation

HEADER.html doesn't even mention nightly builds -- it directs people interested 
in (unofficial, unreleased) source code for Solr to [a wiki 
page|http://wiki.apache.org/solr/HackingSolr] which makes it very clear it's 
audience is developers, and which has info on how to check out the development 
branches.

Admittedly that HackingSolr page does mention that we have a nightly build 
system, so a non-developer might click the link about hacking on the source and 
then get intersted in the nightly builds -- but it doesn't even link directly 
to any builds -- instead it links to a [hudson 
page|http://hudson.zones.apache.org/hudson/view/Lucene/] where there is a list 
of branches that have builds, and if you click on one of those you can get a 
[branch build status 
page|http://hudson.zones.apache.org/hudson/view/Lucene/job/Solr-trunk/] and 
from there you can scroll all the way to the bottom to click on [an artifacts 
link|http://hudson.zones.apache.org/hudson/view/Lucene/job/Solr-trunk/lastSuccessfulBuild/artifact/]
 and from *there* you can actually click on a link to download something that 
could be called a nightly build.

That seems like it fits the definition of developer pages, not the pages 
intended for all users.

I'm hard pressed to imagine a way to make it harder for non-developers to find 
those builds while still linking to those hudson pages for developers

 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891378#action_12891378
 ] 

Robert Muir commented on SOLR-1999:
---

bq. Do not include any links on the project website that might encourage 
non-developers to download and use nightly builds, snapshots, release 
candidates, or any other similar package.

Personally I think this is a load of crap. How should we get quality releases 
without encouraging users to test things before its officially released?

Getting feedback from users that are willing to deal with trunk and patches, 
and letting things bake in trunk is really valuable, and I think its also a 
step towards encouraging them to participate in development.


 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891379#action_12891379
 ] 

Sebb commented on SOLR-1999:


The download pages are intended for all users of the software, and must only 
include released (voted on) software.

It is not appropriate to mention non-released code on the official page for 
releases.

 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891384#action_12891384
 ] 

Hoss Man commented on SOLR-1999:


bq. It is not appropriate to mention non-released code on the official page for 
releases.

why?

i can (moderately) understand that we should not encourage non-devleopers to 
use unofficial versions, and i recognize that linking directly to nightlys from 
the official release page is a very bad idea .. but how far down the rabbit 
hole do we have to go to avoid links to links to links to links for nightly 
builds?

Even following the letter of the policy you linked to, i don't see how we 
anyone could possibly construe that we are encourage(ing) non-developers to 
download and use nightly builds, snapshots, release candidates, or any other 
similar package 


 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2011) Solr should get it's temp dir like lucene - first checking the tempDir sys prop

2010-07-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2011:
--

Attachment: SOLR-2011.patch

attached is an initial patch... (it only fixes solr core, but I think we can 
fix contrib build.xml's the same way).

One benefit is that since temp stuff goes in build/ like lucene: on windows, 
'ant clean' 
will remove spellchecker indexes or other leftover stuff that couldnt be 
deleted in tearDown(), 
rather than littering your system temp directory.


 Solr should get it's temp dir like lucene - first checking the tempDir sys 
 prop
 ---

 Key: SOLR-2011
 URL: https://issues.apache.org/jira/browse/SOLR-2011
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: Next

 Attachments: SOLR-2011.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-22 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891414#action_12891414
 ] 

Michael Busch commented on LUCENE-2555:
---

What shall we do about index backward-compatibility?

I guess 4.0 has to be able to read shared doc stores?  So a lot of that code we 
can't remove? :(

 Remove shared doc stores
 

 Key: LUCENE-2555
 URL: https://issues.apache.org/jira/browse/LUCENE-2555
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch


 With per-thread DocumentsWriters sharing doc stores across segments doesn't 
 make much sense anymore.
 See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-22 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891422#action_12891422
 ] 

Jason Rutherglen commented on LUCENE-2555:
--

Maybe we should break backwards-compatibility for the RT branch?  Or just ship 
an RT specific JAR to keep things simple?

 Remove shared doc stores
 

 Key: LUCENE-2555
 URL: https://issues.apache.org/jira/browse/LUCENE-2555
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch


 With per-thread DocumentsWriters sharing doc stores across segments doesn't 
 make much sense anymore.
 See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2554) preflex codec doesn't order terms correctly

2010-07-22 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2554:
---

Attachment: LUCENE-2554.patch

Fixed the test failures -- all tests should pass.

 preflex codec doesn't order terms correctly
 ---

 Key: LUCENE-2554
 URL: https://issues.apache.org/jira/browse/LUCENE-2554
 Project: Lucene - Java
  Issue Type: Test
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2554.patch, LUCENE-2554.patch


 The surrogate dance in the preflex codec (which must dynamically remap terms 
 from UTF16 order to unicode code point order) is buggy.
 To better test it, I want to add a test-only codec, preflexrw, that is able 
 to write indices in the pre-flex format.  Then we should also fix tests to 
 randomly pick codecs (including preflexrw) so we better test all of our 
 codecs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891440#action_12891440
 ] 

Yonik Seeley commented on SOLR-1999:


I've been around the ASF long enough now to know that what seems like iron clad 
policy, often isn't.
It's often just someone editing a page to reflect what they think should be the 
policy,  and no one else complaining too much - even in cases when there 
clearly was no consensus.

Related to this issue, I remember the last big thread back in '06 on the infra 
list.  And in that case too, it was a single individual that took it upon 
themselves to add the text you now see (and there certainly was no previous 
consensus or even discussion on the text added).

Trying to draw sharp lines between developers and users is a lost cause... 
lucene and solr are for developers themselves and it's one big continuum 
between user and developer.  Having people use nightly builds is very important 
for lucene/solr development.  Having a pointer to developer resources from 
*anywhere* should be fine.

The *only* important point I see is to clearly communicate that a nightly build 
is not an official ASF release.

 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Hudson: Lucene-trunk #1246

2010-07-22 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/1246/changes

Changes:

[rmuir] add randomRealisticUnicodeString, all chars in the same unicode block

[uschindler] As BytesRef has now native order use them in numeric tests. The 
contents are raw byte[] and no strings, it should compare native

[rmuir] add random prefixquerytest (hopefully easy to debug preflex issues with)

[rmuir] fix some bytesref abuse in these tests

--
[...truncated 2710 lines...]
[junit] Testsuite: org.apache.lucene.search.TestPrefixRandom
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 23.153 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestQueryTermVector
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.019 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestQueryWrapperFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.009 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestRegexpQuery
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.028 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestRegexpRandom
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 107.362 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestRegexpRandom2
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 35.416 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestScoreCachingWrappingScorer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.02 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestScorerPerf
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.572 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSetNorm
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.006 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSimilarity
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.008 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSimpleExplanations
[junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 2.778 sec
[junit] 
[junit] Testsuite: 
org.apache.lucene.search.TestSimpleExplanationsOfNonMatches
[junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 0.133 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSloppyPhraseQuery
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.253 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSort
[junit] Tests run: 24, Failures: 0, Errors: 0, Time elapsed: 6.451 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSpanQueryFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.012 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermRangeFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 18.939 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermRangeQuery
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.047 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermScorer
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.011 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermVectors
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.321 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestThreadSafe
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.786 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 1.123 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.013 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.004 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestWildcard
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.038 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestWildcardRandom
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 26.908 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 7.058 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestDocValues
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.007 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery
[junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.21 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestOrdValues
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 

[jira] Created: (LUCENE-2556) CharTermAttribute cloning memory consumption

2010-07-22 Thread Adriano Crestani (JIRA)
CharTermAttribute cloning memory consumption


 Key: LUCENE-2556
 URL: https://issues.apache.org/jira/browse/LUCENE-2556
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 3.0.2
Reporter: Adriano Crestani
Priority: Minor
 Fix For: 3.1


The memory consumption problem with cloning a CharTermAttributeImpl object was 
raised on thread http://markmail.org/thread/bybuerugbk5w2u6z

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2556) CharTermAttribute cloning memory consumption

2010-07-22 Thread Adriano Crestani (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano Crestani updated LUCENE-2556:
-

Attachment: CharTermAttributeMemoryConsumptionDemo.java

This java application demonstrates how much memory 
CharTermAttributeImpl.clone() might consume in some scenarios.

 CharTermAttribute cloning memory consumption
 

 Key: LUCENE-2556
 URL: https://issues.apache.org/jira/browse/LUCENE-2556
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 3.0.2
Reporter: Adriano Crestani
Priority: Minor
 Fix For: 3.1

 Attachments: CharTermAttributeMemoryConsumptionDemo.java


 The memory consumption problem with cloning a CharTermAttributeImpl object 
 was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2556) CharTermAttribute cloning memory consumption

2010-07-22 Thread Adriano Crestani (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano Crestani updated LUCENE-2556:
-

Attachment: lucene_2556_adriano_crestani_07_23_2010.patch

This patch optimizes the cloning of the CharTermAttributeImpl internal buffer. 
It keeps using clone() to clone the internal buffer when 
CharTermAttribute.length() is at least 150 and at least 75% and  of the 
internal buffer length, otherwise, it uses System.arrayCopy(...) to clone it 
using CharTermAttribute.length() as the new internal buffer size.

It's performing the optimization, because in some scenarios, like cloning long 
arrays, clone() is usually faster than System.arrayCopy(...). 

 CharTermAttribute cloning memory consumption
 

 Key: LUCENE-2556
 URL: https://issues.apache.org/jira/browse/LUCENE-2556
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 3.0.2
Reporter: Adriano Crestani
Priority: Minor
 Fix For: 3.1

 Attachments: CharTermAttributeMemoryConsumptionDemo.java, 
 lucene_2556_adriano_crestani_07_23_2010.patch


 The memory consumption problem with cloning a CharTermAttributeImpl object 
 was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2556) CharTermAttribute cloning memory consumption

2010-07-22 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2556:
--

Attachment: LUCENE-2556.patch

Here the patch, I see no problem with applying it to 3.x and trunk.

 CharTermAttribute cloning memory consumption
 

 Key: LUCENE-2556
 URL: https://issues.apache.org/jira/browse/LUCENE-2556
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 3.0.2
Reporter: Adriano Crestani
Priority: Minor
 Fix For: 3.1

 Attachments: CharTermAttributeMemoryConsumptionDemo.java, 
 LUCENE-2556.patch, lucene_2556_adriano_crestani_07_23_2010.patch


 The memory consumption problem with cloning a CharTermAttributeImpl object 
 was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2556) CharTermAttribute cloning memory consumption

2010-07-22 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-2556:
-

Assignee: Uwe Schindler

 CharTermAttribute cloning memory consumption
 

 Key: LUCENE-2556
 URL: https://issues.apache.org/jira/browse/LUCENE-2556
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 3.0.2
Reporter: Adriano Crestani
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.1

 Attachments: CharTermAttributeMemoryConsumptionDemo.java, 
 LUCENE-2556.patch, lucene_2556_adriano_crestani_07_23_2010.patch


 The memory consumption problem with cloning a CharTermAttributeImpl object 
 was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2556) CharTermAttribute cloning memory consumption

2010-07-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891481#action_12891481
 ] 

Uwe Schindler commented on LUCENE-2556:
---

{quote}
This patch optimizes the cloning of the CharTermAttributeImpl internal buffer. 
It keeps using clone() to clone the internal buffer when 
CharTermAttribute.length() is at least 150 and at least 75% and of the internal 
buffer length, otherwise, it uses System.arrayCopy(...) to clone it using 
CharTermAttribute.length() as the new internal buffer size. 
It's performing the optimization, because in some scenarios, like cloning long 
arrays, clone() is usually faster than System.arrayCopy(...). 
{quote}

Haven't seen your patch yet. I dont know if the two extra calculations rectify 
the barnching, because terms are mostly short...

If we take your patch, the allocations should in all cases be done with 
ArrayUtils.oversize() to be consistent with the allocation strategy of the rest 
of CTA.

 CharTermAttribute cloning memory consumption
 

 Key: LUCENE-2556
 URL: https://issues.apache.org/jira/browse/LUCENE-2556
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 3.0.2
Reporter: Adriano Crestani
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.1

 Attachments: CharTermAttributeMemoryConsumptionDemo.java, 
 LUCENE-2556.patch, lucene_2556_adriano_crestani_07_23_2010.patch


 The memory consumption problem with cloning a CharTermAttributeImpl object 
 was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr debugging suing eclipse

2010-07-22 Thread pavan kumar donepudi
HI,

Can anyone help me with the instructions on how to use eclipse for solr
development.I want to configure Solr in eclipse and should be able to debug.

Thanks  Regard's,
Pavan


Re: Solr debugging suing eclipse

2010-07-22 Thread Li Li
create a web project
copy all source codes to src
copy all jsp to WebContent
configure tomcat with -Dsolr.solr.home=

2010/7/23 pavan kumar donepudi pavan.donep...@gmail.com:
 HI,
 Can anyone help me with the instructions on how to use eclipse for solr
 development.I want to configure Solr in eclipse and should be able to debug.
 Thanks  Regard's,
 Pavan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org