date:20100722


[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891085#action_12891085
 ] 

Michael McCandless commented on LUCENE-2324:


This is looking awesome Michael!  I love the removal of *PerThread --
they are all logically absorbed into DWPT, so everything is now per
thread.

I still see usage of docStoreOffset, but aren't we doing away with
shared doc stores with the cutover to DWPT?

I think you can further simplify DocumentsWriterPerThread.DocWriter;
in fact I think you can remove it  all subclasses in consumers!  The
consumers can simply directly write their files.  The only reason this
class was created was because we have to interleave docs when writing
the doc stores; this is no longer needed since doc stores are again
private to the segment.  I think we don't need PerDocBuffer, either.
And this also simplifies RAM usage tracking!

Also, we don't need separate closeDocStore; it should just be closed
during flush.

I like the ThreadAffinityDocumentsWriterThreadPool; it's the default
right (I see some tests explicitly setting in on IWC; not sure why)?

We should make the in-RAM deletes impl somehow pluggable?


 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1799) Unicode compression

[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-1799:

Attachment: LUCENE-1799_big.patch

attached is a really really rough patch that sets bocu-1 as the default
encoding.

Beware: its a work in progress and a lot of the patch is auto-generated
(eclipse) so some things need to be reverted.

Most tests pass, the idea is to find bugs in tests etc that abuse
bytesref/assume utf-8 encoding, things like that.

Unicode compression
---

Key: LUCENE-1799
URL: https://issues.apache.org/jira/browse/LUCENE-1799
Project: Lucene - Java
Issue Type: New Feature
Components: Store
Affects Versions: 2.4.1
Reporter: DM Smith
Priority: Minor
Attachments: LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch,
LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799.patch, LUCENE-1799_big.patch

In lucene-1793, there is the off-topic suggestion to provide compression of
Unicode data. The motivation was a custom encoding in a Russian analyzer. The
original supposition was that it provided a more compact index.
This led to the comment that a different or compressed encoding would be a
generally useful feature.
BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM
with an implementation in ICU. If Lucene provide it's own implementation a
freely avIlable, royalty-free license would need to be obtained.
SCSU is another Unicode compression algorithm that could be used.
An advantage of these methods is that they work on the whole of Unicode. If
that is not needed an encoding such as iso8859-1 (or whatever covers the
input) could be used.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1799) Unicode compression

[
https://issues.apache.org/jira/browse/LUCENE-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891101#action_12891101
]

Robert Muir commented on LUCENE-1799:
-

btw that patch is huge because i just sucked in the icu charset stuff to have
an implementation that works for testing...

its not intended to ever be that way as we would just implement the stuff we
need without this code, but it makes it easier to test since you dont need any
external jars or muck with the build system at all.

Unicode compression
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2537) FSDirectory.copy() impl is unsafe

2010-07-22 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887935#action_12887935
 ] 

Shai Erera edited comment on LUCENE-2537 at 7/22/10 8:09 AM:
-

Oh .. found the thread we discussed that on the list, to which I've actually 
last posted w/ the following text:

{quote}
I've Googled around a bit and came across this: 
http://markmail.org/message/l67bierbmmedrfw5. Apparently, there's a long 
standing bug against SUN since May 2006 
(http://bugs.sun.com/view_bug.do?bug_id=6431344) that's still open and reports 
the exact same behavior that I'm seeing.

If I understand correctly, this might be a Windows limitation and is expected 
to work well on Linux. I'll give it a try. But this makes me think if we should 
keep the current behavior for Linux-based directories, and fallback to the 
chunks approach for Windows ones? Since eventually I'll be running on Linux, I 
don't want to lose performance ...

This isn't the first that we've witnessed the write once, run everywhere 
misconception of Java :). I'm thinking if in general we should have a 
Windows/Linux FSDirectory impl, or handlers, to prepare for future cases as 
well. Mike already started this with LUCENE-2500 (DirectIOLinuxDirectory). 
Instead of writing a Directory, perhaps we could have a handler object or 
something, or a generic LinuxDirectory that impls some stuff the 'linux' way. 
In FSDirectory we already have code which detects the OS and JRE used to decide 
between Simple, NIO and MMAP Directories ...
{quote}

  was (Author: shaie):
Oh .. found the thread we discussed that on the list, to which I've 
actually last posted w/ the following text:

{quote}
I've Googled around a bit and came across this: 
http://markmail.org/message/l67bierbmmedrfw5. Apparently, there's a long 
standing bug against SUN since May 2006 
(http://bugs.sun.com/view_bug.do?bug_id=6431344) that's still open and reports 
the exact same behavior that I'm seeing.

If I understand correctly, this might be a Windows limitation and is expected 
to work well on Linux. I'll give it a try. But this makes me think if we should 
keep the current behavior for Linux-based directories, and fallback to the 
chunks approach for Windows ones? Since eventually I'll be running on Linux, I 
don't want to lose performance ...

This isn't the first that we've witnessed the write once, run everywhere 
misconception of Java :). I'm thinking if in general we should have a 
Windows/Linux FSDirectory impl, or handlers, to prepare for future cases as 
well. Mike already started this with LUCENE-2500 (DirectIOLinuxDirectory). 
Instead of writing a Directory, perhaps we could have a handler object or 
something, or a generic LinuxDirectory that impls some stuff the 'linux' way. 
In FSDirectory we already have code which detects the OS and JRE used to decide 
between Simple, NIO and MMAP Directories ...
{code}
  
 FSDirectory.copy() impl is unsafe
 -

 Key: LUCENE-2537
 URL: https://issues.apache.org/jira/browse/LUCENE-2537
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1, 4.0


 There are a couple of issues with it:
 # FileChannel.transferFrom documents that it may not copy the number of bytes 
 requested, however we don't check the return value. So need to fix the code 
 to read in a loop until all bytes were copied..
 # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
 I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
 cryptic):
 {code}
 Exception in thread main java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
 at 
 sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
 at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
 at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
 ... 7 more
 {code}
 I changed the impl to something like this:
 {code}
 long numWritten = 0;
 long numToWrite = input.size();
 long bufSize = 1  26;
 while (numWritten  numToWrite) {
   numWritten += output.transferFrom(input, numWritten, bufSize);
 }
 {code}
 And the code successfully adds the indexes. This code uses chunks of 64MB, 
 however that might be too large for some applications, so we definitely need 
 a smaller one. The question is how small so that performance won't be 
 affected, and it'd be great if we can let it be configurable, however since 
 that API is

[jira] Reopened: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Sebb (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb reopened SOLR-1999:



Sorry, but that is still advertising nightly builds to the general public, 
albeit indirectly.

If a developer really wants to find nightly builds, they should be able to do 
so via the developer pages, not the pages intended for all users.

 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2537) FSDirectory.copy() impl is unsafe

2010-07-22 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2537:
---

Attachment: FileCopyTest.java

I wrote a test which compares FileChannel API to intermediate buffer copies. 
The test runs each method 3 times and reports the best time of each. It can be 
run w/ different file and chunk sizes.

Here are the results of copying a 1GB file using different chunk sizes (the 
chunk is used as the intermediate buffer size as well).

Machine spec:
* Linux, 64-bit (IBM) JVM
* 2xQuad (+hyper-threading) - 16 cores overall
* 16GB RAM
* SAS HD

||Chunk Size||FileChannel||Intermediate Buffer||Diff||
|64K|1865|1528|{color:red}-18%{color}|
|128K|1660|1526|{color:red}-9%{color}|
|512K|1514|1493|{color:red}-2%{color}|
|1M|1552|2072|{color:green}+33%{color}|
|2M|1488|1559|{color:green}5%{color}|
|4M|1596|1831|{color:green}13%{color}|
|16M|1563|1964|{color:green}21%{color}|
|64M|1494|2442|{color:green}39%{color}|
|128M|1469|2445|{color:green}40%{color}|

For small buffer sizes, intermediate byte[] copies is preferable. However, 
FileChannel method performs pretty much consistently, irregardless of the 
buffer size (except for the first run), while the byte[] approach degrades a 
lot, as the buffer size increases.

I think, given these results, we can use the FileChannel method w/ a chunk size 
of 4 (or even 2) MB, to be on the safe side and don't eat up too much RAM?

 FSDirectory.copy() impl is unsafe
 -

 Key: LUCENE-2537
 URL: https://issues.apache.org/jira/browse/LUCENE-2537
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1, 4.0

 Attachments: FileCopyTest.java


 There are a couple of issues with it:
 # FileChannel.transferFrom documents that it may not copy the number of bytes 
 requested, however we don't check the return value. So need to fix the code 
 to read in a loop until all bytes were copied..
 # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
 I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
 cryptic):
 {code}
 Exception in thread main java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
 at 
 sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
 at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
 at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
 ... 7 more
 {code}
 I changed the impl to something like this:
 {code}
 long numWritten = 0;
 long numToWrite = input.size();
 long bufSize = 1  26;
 while (numWritten  numToWrite) {
   numWritten += output.transferFrom(input, numWritten, bufSize);
 }
 {code}
 And the code successfully adds the indexes. This code uses chunks of 64MB, 
 however that might be too large for some applications, so we definitely need 
 a smaller one. The question is how small so that performance won't be 
 affected, and it'd be great if we can let it be configurable, however since 
 that API is called by other API, such as addIndexes, not sure it's easily 
 controllable.
 Also, I read somewhere (can't remember now where) that on Linux the native 
 impl is better and does copy in chunks. So perhaps we should make a Linux 
 specific impl?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2537) FSDirectory.copy() impl is unsafe


[ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891123#action_12891123
 ] 

Michael McCandless commented on LUCENE-2537:


Nice results Shai!

bq. I think, given these results, we can use the FileChannel method w/ a chunk 
size of 4 (or even 2) MB, to be on the safe side and don't eat up too much RAM?

+1

 FSDirectory.copy() impl is unsafe
 -

 Key: LUCENE-2537
 URL: https://issues.apache.org/jira/browse/LUCENE-2537
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1, 4.0

 Attachments: FileCopyTest.java


 There are a couple of issues with it:
 # FileChannel.transferFrom documents that it may not copy the number of bytes 
 requested, however we don't check the return value. So need to fix the code 
 to read in a loop until all bytes were copied..
 # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
 I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
 cryptic):
 {code}
 Exception in thread main java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
 at 
 sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
 at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
 at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
 ... 7 more
 {code}
 I changed the impl to something like this:
 {code}
 long numWritten = 0;
 long numToWrite = input.size();
 long bufSize = 1  26;
 while (numWritten  numToWrite) {
   numWritten += output.transferFrom(input, numWritten, bufSize);
 }
 {code}
 And the code successfully adds the indexes. This code uses chunks of 64MB, 
 however that might be too large for some applications, so we definitely need 
 a smaller one. The question is how small so that performance won't be 
 affected, and it'd be great if we can let it be configurable, however since 
 that API is called by other API, such as addIndexes, not sure it's easily 
 controllable.
 Also, I read somewhere (can't remember now where) that on Linux the native 
 impl is better and does copy in chunks. So perhaps we should make a Linux 
 specific impl?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2553) IOException: read past EOF

2010-07-22 Thread Kyle L. (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle L. updated LUCENE-2553:


Description: 
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
at 
com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
at 
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
at org.apache.lucene.search.Searcher.search(Searcher.java:67)
...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an 
unordered manner:

{code}
private class AllHitsUnsortedCollector extends Collector {

private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
private IndexReader reader;
private int baselineDocumentId;
private ListDocument matchingDocuments = new ArrayListDocument();

@Override
public boolean acceptsDocsOutOfOrder() {
return true;
}

@Override
public void collect(int docId) throws IOException {

int documentId = baselineDocumentId + docId;
Document document = reader.document(documentId, getFieldSelector());

if (document == null) {
logger.info(Null document from search results!);
} else {
matchingDocuments.add(document);
}
}

@Override
public void setNextReader(IndexReader segmentReader, int baseDocId) 
throws IOException {
this.reader = segmentReader;
this.baselineDocumentId = baseDocId;
}

@Override
public void setScorer(Scorer scorer) throws IOException {
// do nothing
}

public ListDocument getMatchingDocuments() {
return matchingDocuments;
}
}

{code}

The exception arises when users perform searches while indexing/optimization is 
occurring. Our {{IndexReader}} is read-only. From the documentation I have 
read, a read-only {{IndexReader}} instance should be immune from any 
uncommitted index changes and should return consistent results during indexing 
and optimization. As this exception occurs during indexing/optimization, it 
seems to me that the read-only {{IndexReader}} is somehow stumbling upon the 
uncommitted content? 

The problem is difficult to replicate as it is sporadic in nature and so far 
has only occurred in Production.

We have rebuilt the indexes a number of times, but that does not seem to 
alleviate the issue.

Any other information I can provide that will help isolate the issue? 

Most likely the other possibility is that the {{Collector}} we have written is 
doing something it shouldn't. Any pointers?

  was:
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
at 
com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
at 
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
at org.apache.lucene.search.Searcher.search(Searcher.java:67)
...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an 
unordered manner:

{code}
private class AllHitsUnsortedCollector extends Collector {

private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
private IndexReader reader;
private int baselineDocumentId;
private ListDocument matchingDocuments = new ArrayListDocument();

@Override
public boolean acceptsDocsOutOfOrder() {
return true;
}

@Override
public void

[jira] Updated: (LUCENE-2553) IOException: read past EOF

2010-07-22 Thread Kyle L. (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle L. updated LUCENE-2553:


Description: 
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
at 
com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
at 
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
at org.apache.lucene.search.Searcher.search(Searcher.java:67)
...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an 
unordered manner:

{code}
private class AllHitsUnsortedCollector extends Collector {

private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
private IndexReader reader;
private int baselineDocumentId;
private ListDocument matchingDocuments = new ArrayListDocument();

@Override
public boolean acceptsDocsOutOfOrder() {
return true;
}

@Override
public void collect(int docId) throws IOException {

int documentId = baselineDocumentId + docId;
Document document = reader.document(documentId, getFieldSelector());

if (document == null) {
logger.info(Null document from search results!);
} else {
matchingDocuments.add(document);
}
}

@Override
public void setNextReader(IndexReader segmentReader, int baseDocId) 
throws IOException {
this.reader = segmentReader;
this.baselineDocumentId = baseDocId;
}

@Override
public void setScorer(Scorer scorer) throws IOException {
// do nothing
}

public ListDocument getMatchingDocuments() {
return matchingDocuments;
}
}

{code}

The exception arises when users perform searches while indexing/optimization is 
occurring. Our {{IndexReader}} is read-only. From the documentation I have 
read, a read-only {{IndexReader}} instance should be immune from any 
uncommitted index changes and should return consistent results during indexing 
and optimization. As this exception occurs during indexing/optimization, it 
seems to me that the read-only {{IndexReader}} is somehow stumbling upon the 
uncommitted content? 

The problem is difficult to replicate as it is sporadic in nature and so far 
has only occurred in Production.

We have rebuilt the indexes a number of times, but that does not seem to 
alleviate the issue.

Any other information I can provide that will help isolate the issue? 

The most likely other possibility is that the {{Collector}} we have written is 
doing something it shouldn't. Any pointers?

  was:
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
at 
com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
at 
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
at org.apache.lucene.search.Searcher.search(Searcher.java:67)
...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an 
unordered manner:

{code}
private class AllHitsUnsortedCollector extends Collector {

private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
private IndexReader reader;
private int baselineDocumentId;
private ListDocument matchingDocuments = new ArrayListDocument();

@Override
public boolean acceptsDocsOutOfOrder() {
return true;
}

@Override
public void

[jira] Updated: (LUCENE-2537) FSDirectory.copy() impl is unsafe

2010-07-22 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2537:
---

Attachment: LUCENE-2537.patch

Patch copies the files in chunks of 2MB. All core tests pass. I'll wait a day 
or two in case someone wants to suggests a different approach, or chunk size 
limit before I commit.

 FSDirectory.copy() impl is unsafe
 -

 Key: LUCENE-2537
 URL: https://issues.apache.org/jira/browse/LUCENE-2537
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1, 4.0

 Attachments: FileCopyTest.java, LUCENE-2537.patch


 There are a couple of issues with it:
 # FileChannel.transferFrom documents that it may not copy the number of bytes 
 requested, however we don't check the return value. So need to fix the code 
 to read in a loop until all bytes were copied..
 # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
 I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
 cryptic):
 {code}
 Exception in thread main java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
 at 
 sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
 at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
 at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
 ... 7 more
 {code}
 I changed the impl to something like this:
 {code}
 long numWritten = 0;
 long numToWrite = input.size();
 long bufSize = 1  26;
 while (numWritten  numToWrite) {
   numWritten += output.transferFrom(input, numWritten, bufSize);
 }
 {code}
 And the code successfully adds the indexes. This code uses chunks of 64MB, 
 however that might be too large for some applications, so we definitely need 
 a smaller one. The question is how small so that performance won't be 
 affected, and it'd be great if we can let it be configurable, however since 
 that API is called by other API, such as addIndexes, not sure it's easily 
 controllable.
 Also, I read somewhere (can't remember now where) that on Linux the native 
 impl is better and does copy in chunks. So perhaps we should make a Linux 
 specific impl?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-64) strict hierarchical facets

2010-07-22 Thread SolrFan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891207#action_12891207
 ] 

SolrFan commented on SOLR-64:
-

Can the patch please be updated to the latest trunk? Thanks

 strict hierarchical facets
 --

 Key: SOLR-64
 URL: https://issues.apache.org/jira/browse/SOLR-64
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: Next

 Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, 
 SOLR-64.patch


 Strict Facet Hierarchies... each tag has at most one parent (a tree).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-792) Tree Faceting Component

2010-07-22 Thread SolrFan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891208#action_12891208
 ] 

SolrFan commented on SOLR-792:
--

Hi, can this patch please be updated against the current 1.4 trunk? thanks.

 Tree Faceting Component
 ---

 Key: SOLR-792
 URL: https://issues.apache.org/jira/browse/SOLR-792
 Project: Solr
  Issue Type: New Feature
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
 Attachments: SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, 
 SOLR-792.patch, SOLR-792.patch


 A component to do multi-level faceting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-64) strict hierarchical facets

2010-07-22 Thread Aleksander Stensby (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891213#action_12891213
 ] 

Aleksander Stensby commented on SOLR-64:


I'm currently on holidays until July 27.

If urgent, please contact Gisele O'Connor:
email: gisele.o.con...@integrasco.com
phone: +47 90283809

Best regards,
 Aleksander

-- 
Aleksander M. Stensby
Integrasco A/S
E-mail: aleksander.sten...@integrasco.com
Tel.: +47 41 22 82 72
www.integrasco.com
http://twitter.com/Integrasco
http://facebook.com/Integrasco

Please consider the environment before printing all or any of this e-mail


 strict hierarchical facets
 --

 Key: SOLR-64
 URL: https://issues.apache.org/jira/browse/SOLR-64
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: Next

 Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, 
 SOLR-64.patch


 Strict Facet Hierarchies... each tag has at most one parent (a tree).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891228#action_12891228
]

Michael Busch commented on LUCENE-2324:
---

Thanks, Mike - great feedback! (as always)

{quote}
I still see usage of docStoreOffset, but aren't we doing away with
shared doc stores with the cutover to DWPT?
{quote}

Do we want all segments that one DWPT writes to share the same
doc store, i.e. one doc store per DWPT, or remove doc stores
entirely?

{quote}
I think you can further simplify DocumentsWriterPerThread.DocWriter;
in fact I think you can remove it all subclasses in consumers!
{quote}

I agree! Now that a high number of testcases pass it's less scary
to modify even more code :) - will do this next.

{quote}
Also, we don't need separate closeDocStore; it should just be closed
during flush.
{quote}

OK sounds good.

{quote}
I like the ThreadAffinityDocumentsWriterThreadPool; it's the default
right (I see some tests explicitly setting in on IWC; not sure why)?
{quote}

It's actually only TestStressIndexing2 and it sets it to use a different
number of max thread states than the default.

{quote}
We should make the in-RAM deletes impl somehow pluggable?
{quote}

Do you mean so that it's customizable how deletes are handled?
E.g. doing live deletes vs. lazy deletes on flush?
I think that's a good idea. E.g. at Twitter we'll do live deletes always
to get the lowest latency (and we don't have too many deletes),
but that's probably not the best default for everyone.
So I agree that making this customizable is a good idea.

It'd also be nice to have a more efficient data structure to buffer the
deletes. With many buffered deletes the java hashmap approach
will not be very efficient. Terms could be written into a byte pool,
but what should we do with queries?

Per thread DocumentsWriters that write their own private segments
-

Key: LUCENE-2324
URL: https://issues.apache.org/jira/browse/LUCENE-2324
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
Fix For: Realtime Branch

Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch

See LUCENE-2293 for motivation and more details.
I'm copying here Mike's summary he posted on 2293:
Change the approach for how we buffer in RAM to a more isolated
approach, whereby IW has N fully independent RAM segments
in-process and when a doc needs to be indexed it's added to one of
them. Each segment would also write its own doc stores and
normal segment merging (not the inefficient merge we now do on
flush) would merge them. This should be a good simplification in
the chain (eg maybe we can remove the *PerThread classes). The
segments can flush independently, letting us make much better
concurrent use of IO CPU.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Fwd: TermEnum usage]

2010-07-22 Thread Digy

It is expected behavior. Please see 

http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/index/IndexReader.html#terms%28org.apache.lucene.index.Term%29

DIGY

-Original Message-
From: Vincent DARON [mailto:vda...@ask.be] 
Sent: Thursday, July 22, 2010 6:10 PM
To: lucene-net-dev
Subject: [Fwd: TermEnum usage]

Without any answers, I'm reposting once. Do I have to post bug report ?

Let me know

Thanks a lot

Vincent DARON
ASK

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-07-22 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891241#action_12891241
]

Yonik Seeley commented on LUCENE-2324:
--

bq. It'd also be nice to have a more efficient data structure to buffer the
deletes. With many buffered deletes the java hashmap approach will not be very
efficient. Terms could be written into a byte pool, but what should we do with
queries?

IMO, terms are an order of magnitude more important than queries. Most deletes
will be by some sort of unique id, and will be in the same field.

Perhaps a single byte[] with length prefixes (like the field cache has). A
single int could then represent a term (it would just be an offset into the
byte[], which is field-specific, so no need to store the field each time).

We could then build a treemap or hashmap that natively used an int[]... but
that may not be necessary (depending on how deletes are applied). Perhaps a
sort could be done right before applying, and duplicate terms could be handled
at that time.

Anyway, I'm only casually following this issue, but I'ts looking like really
cool stuff!

Per thread DocumentsWriters that write their own private segments
-

Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891256#action_12891256
]

Michael McCandless commented on LUCENE-2324:

{quote}
bq. I still see usage of docStoreOffset, but aren't we doing away with shared
doc stores with the cutover to DWPT?

Do we want all segments that one DWPT writes to share the same
doc store, i.e. one doc store per DWPT, or remove doc stores
entirely?
{quote}

Oh good question... a single DWPT can in fact continue to share doc
store across the segments it flushes.

Hmm, but... this opto only helps in that we don't have to merge the
doc stores if we merge segments that already share their doc stores.
But if (say) I have 2 threads indexing, and I'm indexing lots of docs
and each DWPT has written 5 segments, we will then merge these 10
segments, and must merge the doc stores at that point. So the sharing
isn't really buying us much (just not closing old files opening new
ones, which is presumably negligible)?

{quote}
bq. I think you can further simplify DocumentsWriterPerThread.DocWriter; in
fact I think you can remove it all subclasses in consumers!

I agree! Now that a high number of testcases pass it's less scary
to modify even more code - will do this next.

bq. Also, we don't need separate closeDocStore; it should just be closed during
flush.

OK sounds good.
{quote}

Super :)

{quote}
bq. I like the ThreadAffinityDocumentsWriterThreadPool; it's the default right
(I see some tests explicitly setting in on IWC; not sure why)?

It's actually only TestStressIndexing2 and it sets it to use a different
number of max thread states than the default.
{quote}

Ahh OK great.

{quote}
bq. We should make the in-RAM deletes impl somehow pluggable?

Do you mean so that it's customizable how deletes are handled?
{quote}

Actually I was worried about the long[] sequenceIDs (adding 8 bytes
RAM per buffered doc) -- this could be a biggish hit to RAM efficiency
for small docs.

{quote} E.g. doing live deletes vs. lazy deletes on flush?
I think that's a good idea. E.g. at Twitter we'll do live deletes always
to get the lowest latency (and we don't have too many deletes),
but that's probably not the best default for everyone.
So I agree that making this customizable is a good idea.
{quote}

Yeah, this too :)

Actually deletions today are not applied on flush -- they continue to
be buffered beyond flush, and then get applied just before a merge
kicks off. I think we should keep this (as an option and probably as
the default) -- it's important for apps w/ large indices that don't use
NRT (and don't pool readers) because it's costly to open readers.

So it sounds like we should support lazy (apply-before-merge like
today) and live (live means resolve deleted Term/Query - docID(s)
synchronously inside deleteDocuments, right?).

Live should also be less performant because of less temporal locality
(vs lazy).

{quote}
It'd also be nice to have a more efficient data structure to buffer the
deletes. With many buffered deletes the java hashmap approach
will not be very efficient. Terms could be written into a byte pool,
but what should we do with queries?
{quote}

I agree w/ Yonik: let's worry only about delete by Term (not Query)
for now.

Maybe we could reuse (factor out) TermsHashPerField's custom hash
here, for the buffered Terms? It efficiently maps a BytesRef -- int.

Another thing: it looks like finishFlushedSegment is sync'd on the IW
instance, but, it need not be sync'd for all of that? EG
readerPool.get(), applyDeletes, building the CFS, may not need to be
inside the sync block?

Per thread DocumentsWriters that write their own private segments
-

Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891262#action_12891262
]

Michael Busch commented on LUCENE-2324:
---

{quote}
Perhaps a single byte[] with length prefixes (like the field cache has). A
single int could then represent a term (it would just be an offset into the
byte[], which is field-specific, so no need to store the field each time).
{quote}

Yeah that's pretty much how TermsHashPerField works. I agree with Mike,
let's reuse that code.

{quote}
Hmm, but... this opto only helps in that we don't have to merge the
doc stores if we merge segments that already share their doc stores.
But if (say) I have 2 threads indexing, and I'm indexing lots of docs
and each DWPT has written 5 segments, we will then merge these 10
segments, and must merge the doc stores at that point. So the sharing
isn't really buying us much (just not closing old files opening new
ones, which is presumably negligible)?
{quote}

Yeah that's true. I agree it won't help much. I think we should just
remove the doc stores, great simplification (which should also make
parallel indexing a bit easier :) ).

{quote}
Another thing: it looks like finishFlushedSegment is sync'd on the IW
instance, but, it need not be sync'd for all of that? EG
readerPool.get(), applyDeletes, building the CFS, may not need to be
inside the sync block?
{quote}

Thanks for the hint. I need to carefully go over all the synchronization,
there are likely more problems.

Per thread DocumentsWriters that write their own private segments
-

Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-07-22 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891264#action_12891264
]

Yonik Seeley commented on LUCENE-2324:
--

bq. Yeah that's pretty much how TermsHashPerField works. I agree with Mike,
let's reuse that code.

Do we even need to maintain a hash over it though, or can we simply keep a list
(and allow dup terms until it's time to apply them)?

Per thread DocumentsWriters that write their own private segments
-

Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-752) Allow better Field Compression options

2010-07-22 Thread David Smiley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891284#action_12891284
]

David Smiley commented on SOLR-752:
---

I spent some time today attempting to implement this with my own Solr FieldType
that extends TextField. As I tried to implement it, I realized that I couldn't
really do it. FieldType has a method createField(...) that is necessary to
implement in order to set binary data (i.e. byte[]) on a Field. This method
demands I return a org.apache.lucene.document.Field which is final. If I
create the field with binary data, by default it's not indexed or tokenized. I
can get those booleans to flip by simply invoking f.setTokenStream(null).
However, I can't set omitNorms() to false, nor can I set booleans for the term
vector fields. There may be other issues but at this point I gave up to work
on other more important priorities of mine.

Allow better Field Compression options
--

Key: SOLR-752
URL: https://issues.apache.org/jira/browse/SOLR-752
Project: Solr
Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor

See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression
It would be good if Solr handled field compression outside of Lucene's
Field.COMPRESS capabilities, since those capabilities are less than ideal
when it comes to control over compression.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-752) Allow better Field Compression options

2010-07-22 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891305#action_12891305
 ] 

David Smiley commented on SOLR-752:
---

I already looked at BinaryField and TrieField for inspiration.  BinaryField 
assumes you're not going to index the data.  And TrieField doesn't set binary 
data value on the Field.

Yes, I think the next step is to make createField() return Fieldable.  But I'm 
not a committer...

Instead or in addition... I have to wonder, why not modify Lucene's Field class 
to allow me to set the Index, Store, and TermVecotr enums AND specify binary 
data on a suitable constructor?  Arguably an existing constructor taking String 
would be hijaced to take Object and then do the right thing.  That would be a 
small change, whereas implementing another subclass of AbstractField is more 
complex and would likely reproduce much of what's in Field already.

 Allow better Field Compression options
 --

 Key: SOLR-752
 URL: https://issues.apache.org/jira/browse/SOLR-752
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor

 See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression
 It would be good if Solr handled field compression outside of Lucene's 
 Field.COMPRESS capabilities, since those capabilities are less than ideal 
 when it comes to control over compression.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2009) Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot

Contrib ant test targets do not respect sys props testcase,testpackage,and 
testpackageroot
--

 Key: SOLR-2009
 URL: https://issues.apache.org/jira/browse/SOLR-2009
 Project: Solr
  Issue Type: Bug
  Components: Build
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: Next


Very annoying using these props with core tests unless you use the junit target 
rather than test. Also would be nice if they worked regardless for future dev.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

2010-07-22 Thread James Dyer (JIRA)

Improvements to SpellCheckComponent Collate functionality
-

 Key: SOLR-2010
 URL: https://issues.apache.org/jira/browse/SOLR-2010
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, spellchecker
Affects Versions: 1.4.1
 Environment: Tested against trunk revision 966633
Reporter: James Dyer
Priority: Minor


Improvements to SpellCheckComponent Collate functionality

Our project requires a better Spell Check Collator.  I'm contributing this as a 
patch to get suggestions for improvements and in case there is a broader need 
for these features.

1. Only return collations that are guaranteed to result in hits if re-queried 
(applying original fq params also).  This is especially helpful when there is 
more than one correction per query.  The 1.4 behavior does not verify that a 
particular combination will actually return hits.
2. Provide the option to get multiple collation suggestions
3. Provide extended collation results including the # of hits re-querying will 
return and a breakdown of each misspelled word and its correction.

This patch is similar to what is described in SOLR-507 item #1.  Also, this 
patch provides a viable workaround for the problem discussed in SOLR-1074.  A 
dictionary could be created that combines the terms from the multiple fields.  
The collator then would prune out any spurious suggestions this would cause.

This patch adds the following spellcheck parameters:

1. spellcheck.maxCollationTries - maximum # of collation possibilities to try 
before giving up.  Lower values ensure better performance.  Higher values may 
be necessary to find a collation that can return results.  Default is 0, which 
maintains backwards-compatible behavior (do not check collations).

2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, 
which maintains backwards-compatible behavior.

3. spellcheck.collateExtendedResult - if true, returns an expanded response 
format detailing collations found.  default is false, which maintains 
backwards-compatible behavior.  When true, output is like this (in context):

lst name=spellcheck
lst name=suggestions
lst name=hopq
int name=numFound94/int
int name=startOffset7/int
int name=endOffset11/int
arr name=suggestion
strhope/str
strhow/str
strhope/str
strchops/str
strhoped/str
etc
/arr
lst name=faill
int name=numFound100/int
int name=startOffset16/int
int name=endOffset21/int
arr name=suggestion
strfall/str
strfails/str
strfail/str
strfill/str
strfaith/str
strall/str
etc
/arr
/lst
lst name=collation
str name=collationQueryTitle:(how AND fails)/str
int name=hits2/int
lst name=misspellingsAndCorrections
str name=hopqhow/str
str name=faillfails/str
/lst
/lst
lst name=collation
str name=collationQueryTitle:(hope AND faith)/str
int name=hits2/int
lst name=misspellingsAndCorrections
str name=hopqhope/str
str name=faillfaith/str
/lst
/lst
lst name=collation
str name=collationQueryTitle:(chops AND all)/str
int name=hits1/int
lst name=misspellingsAndCorrections
str name=hopqchops/str
str name=faillall/str
/lst
/lst
/lst
/lst

In addition, SOLRJ is updated to include 
SpellCheckResponse.getCollatedResults(), which will return the expanded 
Collation format.  getCollatedResult(), which returns a single String, is 
retained for backwards-compatibility.  Other APIs were not changed but will 
still work provided that spellcheck.collateExtendedResult is false.

This likely will not return valid results if using Shards.  Rather, a more 
robust interaction with the index would be necessary than what exists in

[jira] Updated: (SOLR-2009) Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot


 [ 
https://issues.apache.org/jira/browse/SOLR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2009:
--

Attachment: SOLR-2009.patch

 Contrib ant test targets do not respect sys props testcase,testpackage,and 
 testpackageroot
 --

 Key: SOLR-2009
 URL: https://issues.apache.org/jira/browse/SOLR-2009
 Project: Solr
  Issue Type: Bug
  Components: Build
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: Next

 Attachments: SOLR-2009.patch


 Very annoying using these props with core tests unless you use the junit 
 target rather than test. Also would be nice if they worked regardless for 
 future dev.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

2010-07-22 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2010:
-

Attachment: SOLR-2010.patch

Tested against branch version #96633

 Improvements to SpellCheckComponent Collate functionality
 -

 Key: SOLR-2010
 URL: https://issues.apache.org/jira/browse/SOLR-2010
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, spellchecker
Affects Versions: 1.4.1
 Environment: Tested against trunk revision 966633
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2010.patch


 Improvements to SpellCheckComponent Collate functionality
 Our project requires a better Spell Check Collator.  I'm contributing this as 
 a patch to get suggestions for improvements and in case there is a broader 
 need for these features.
 1. Only return collations that are guaranteed to result in hits if re-queried 
 (applying original fq params also).  This is especially helpful when there is 
 more than one correction per query.  The 1.4 behavior does not verify that a 
 particular combination will actually return hits.
 2. Provide the option to get multiple collation suggestions
 3. Provide extended collation results including the # of hits re-querying 
 will return and a breakdown of each misspelled word and its correction.
 This patch is similar to what is described in SOLR-507 item #1.  Also, this 
 patch provides a viable workaround for the problem discussed in SOLR-1074.  A 
 dictionary could be created that combines the terms from the multiple fields. 
  The collator then would prune out any spurious suggestions this would cause.
 This patch adds the following spellcheck parameters:
 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try 
 before giving up.  Lower values ensure better performance.  Higher values may 
 be necessary to find a collation that can return results.  Default is 0, 
 which maintains backwards-compatible behavior (do not check collations).
 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 
 1, which maintains backwards-compatible behavior.
 3. spellcheck.collateExtendedResult - if true, returns an expanded response 
 format detailing collations found.  default is false, which maintains 
 backwards-compatible behavior.  When true, output is like this (in context):
 lst name=spellcheck
   lst name=suggestions
   lst name=hopq
   int name=numFound94/int
   int name=startOffset7/int
   int name=endOffset11/int
   arr name=suggestion
   strhope/str
   strhow/str
   strhope/str
   strchops/str
   strhoped/str
   etc
   /arr
   lst name=faill
   int name=numFound100/int
   int name=startOffset16/int
   int name=endOffset21/int
   arr name=suggestion
   strfall/str
   strfails/str
   strfail/str
   strfill/str
   strfaith/str
   strall/str
   etc
   /arr
   /lst
   lst name=collation
   str name=collationQueryTitle:(how AND fails)/str
   int name=hits2/int
   lst name=misspellingsAndCorrections
   str name=hopqhow/str
   str name=faillfails/str
   /lst
   /lst
   lst name=collation
   str name=collationQueryTitle:(hope AND faith)/str
   int name=hits2/int
   lst name=misspellingsAndCorrections
   str name=hopqhope/str
   str name=faillfaith/str
   /lst
   /lst
   lst name=collation
   str name=collationQueryTitle:(chops AND all)/str
   int name=hits1/int
   lst name=misspellingsAndCorrections
   str name=hopqchops/str
   str name=faillall/str
   /lst
   /lst
   /lst
 /lst
 In addition, SOLRJ is updated to include 
 SpellCheckResponse.getCollatedResults(), which will return the expanded 
 Collation format.  getCollatedResult(), which returns a single String, is 
 retained for

[jira] Commented: (SOLR-1240) Numerical Range faceting

2010-07-22 Thread Hoss Man (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891321#action_12891321
]

Hoss Man commented on SOLR-1240:

bq. Rather than embedding meta to the list containing the counts, perhaps we
should bite the bullet and add an additional level for the counts.

yeah ... i'm on board with that idea. it's a trivial change.

any comments on the implementation?

i think it's fairly solid -- the one wish i have though is to try and gut the
existing date faceting code to just use the new code -- but i can't see a very
easy way to do that while dealing with the differnet param names .. suggestions?

Numerical Range faceting

Key: SOLR-1240
URL: https://issues.apache.org/jira/browse/SOLR-1240
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Gijs Kunze
Priority: Minor
Attachments: SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch,
SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch, SOLR-1240.patch

For faceting numerical ranges using many facet.query query arguments leads to
unmanageably large queries as the fields you facet over increase. Adding the
same faceting parameter for numbers which already exists for dates should fix
this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2011) Solr should get it's temp dir like lucene - first checking the tempDir sys prop

Solr should get it's temp dir like lucene - first checking the tempDir sys prop
---

 Key: SOLR-2011
 URL: https://issues.apache.org/jira/browse/SOLR-2011
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: Next




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-07-22 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891334#action_12891334
]

Jason Rutherglen commented on LUCENE-2324:
--

{quote}I think we should just remove the doc stores{quote}

Right, I think we should remove sharing doc stores between
segments. And in general, RT apps will likely not want to use
doc stores if they are performing numerous updates and/or
deletes. We can explicitly state this in the javadocs.

I'm thinking we could explore efficient deleted docs as sequence
ids in a different issue, specifically storing them in a short[]
and wrapping around.

Per thread DocumentsWriters that write their own private segments
-

Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2012) stats component, min/max on a field with no values

2010-07-22 Thread Jonathan Rochkind (JIRA)

stats component, min/max on a field with no values
--

 Key: SOLR-2012
 URL: https://issues.apache.org/jira/browse/SOLR-2012
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Jonathan Rochkind


: 
: When I use the stats component on a field that has no values in the result set
: (ie, stats.missing == rowCount), I'd expect that 'min'and 'max' would be
: blank.
: 
: Instead, they seem to be the smallest and largest float values or something,
: min = 1.7976931348623157E308, max = 4.9E-324 .
: 
: Is this a bug?

off the top of my head it sounds like it ... would you mind opening a n 
issue in Jira please?

-Hoss

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2555) Remove shared doc stores

Remove shared doc stores


 Key: LUCENE-2555
 URL: https://issues.apache.org/jira/browse/LUCENE-2555
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch


With per-thread DocumentsWriters sharing doc stores across segments doesn't 
make much sense anymore.

See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-2009) Contrib ant test targets do not respect sys props testcase,testpackage,and testpackageroot


 [ 
https://issues.apache.org/jira/browse/SOLR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-2009.
---

Resolution: Fixed

more to do here later, but this initial fix is in.

 Contrib ant test targets do not respect sys props testcase,testpackage,and 
 testpackageroot
 --

 Key: SOLR-2009
 URL: https://issues.apache.org/jira/browse/SOLR-2009
 Project: Solr
  Issue Type: Bug
  Components: Build
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: Next

 Attachments: SOLR-2009.patch


 Very annoying using these props with core tests unless you use the junit 
 target rather than test. Also would be nice if they worked regardless for 
 future dev.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2554) preflex codec doesn't order terms correctly


[ 
https://issues.apache.org/jira/browse/LUCENE-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891364#action_12891364
 ] 

Robert Muir commented on LUCENE-2554:
-

the perf issues here are really from our contrived tests... its good to use 
_TestUtil.randomUnicodeString, but it gives you the impression there is 
something wrong with this dance and there really isnt.

I added _TestUtil.randomRealisticUnicodeString in r966878, you can swap this 
into some of these slow tests and see its definitely the problem.


 preflex codec doesn't order terms correctly
 ---

 Key: LUCENE-2554
 URL: https://issues.apache.org/jira/browse/LUCENE-2554
 Project: Lucene - Java
  Issue Type: Test
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2554.patch


 The surrogate dance in the preflex codec (which must dynamically remap terms 
 from UTF16 order to unicode code point order) is buggy.
 To better test it, I want to add a test-only codec, preflexrw, that is able 
 to write indices in the pre-flex format.  Then we should also fix tests to 
 randomly pick codecs (including preflexrw) so we better test all of our 
 codecs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Sebb (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891369#action_12891369
 ] 

Sebb commented on SOLR-1999:


See:

http://www.apache.org/dev/release.html#what

Do not include any links on the project website that might encourage 
non-developers to download and use nightly builds, snapshots, release 
candidates, or any other similar package.

 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891376#action_12891376
 ] 

Hoss Man commented on SOLR-1999:



Developers are members of the general public -- any page a developer can see 
can be seen by anybody else as well.

While i agree the previous link was bad, i quite frankly don't understand your 
concern with the current situation

HEADER.html doesn't even mention nightly builds -- it directs people interested 
in (unofficial, unreleased) source code for Solr to [a wiki 
page|http://wiki.apache.org/solr/HackingSolr] which makes it very clear it's 
audience is developers, and which has info on how to check out the development 
branches.

Admittedly that HackingSolr page does mention that we have a nightly build 
system, so a non-developer might click the link about hacking on the source and 
then get intersted in the nightly builds -- but it doesn't even link directly 
to any builds -- instead it links to a [hudson 
page|http://hudson.zones.apache.org/hudson/view/Lucene/] where there is a list 
of branches that have builds, and if you click on one of those you can get a 
[branch build status 
page|http://hudson.zones.apache.org/hudson/view/Lucene/job/Solr-trunk/] and 
from there you can scroll all the way to the bottom to click on [an artifacts 
link|http://hudson.zones.apache.org/hudson/view/Lucene/job/Solr-trunk/lastSuccessfulBuild/artifact/]
 and from *there* you can actually click on a link to download something that 
could be called a nightly build.

That seems like it fits the definition of developer pages, not the pages 
intended for all users.

I'm hard pressed to imagine a way to make it harder for non-developers to find 
those builds while still linking to those hudson pages for developers

 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds


[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891378#action_12891378
 ] 

Robert Muir commented on SOLR-1999:
---

bq. Do not include any links on the project website that might encourage 
non-developers to download and use nightly builds, snapshots, release 
candidates, or any other similar package.

Personally I think this is a load of crap. How should we get quality releases 
without encouraging users to test things before its officially released?

Getting feedback from users that are willing to deal with trunk and patches, 
and letting things bake in trunk is really valuable, and I think its also a 
step towards encouraging them to participate in development.


 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Sebb (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891379#action_12891379
 ] 

Sebb commented on SOLR-1999:


The download pages are intended for all users of the software, and must only 
include released (voted on) software.

It is not appropriate to mention non-released code on the official page for 
releases.

 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1999) Download HEADER should not have pointer to nightly builds

2010-07-22 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891384#action_12891384
 ] 

Hoss Man commented on SOLR-1999:


bq. It is not appropriate to mention non-released code on the official page for 
releases.

why?

i can (moderately) understand that we should not encourage non-devleopers to 
use unofficial versions, and i recognize that linking directly to nightlys from 
the official release page is a very bad idea .. but how far down the rabbit 
hole do we have to go to avoid links to links to links to links for nightly 
builds?

Even following the letter of the policy you linked to, i don't see how we 
anyone could possibly construe that we are encourage(ing) non-developers to 
download and use nightly builds, snapshots, release candidates, or any other 
similar package 


 Download HEADER should not have pointer to nightly builds
 -

 Key: SOLR-1999
 URL: https://issues.apache.org/jira/browse/SOLR-1999
 Project: Solr
  Issue Type: Bug
 Environment: http://www.apache.org/dist/lucene/solr/HEADER.html
Reporter: Sebb
Assignee: Hoss Man

 The file HEADER.html should not have a pointer to nightly builds.
 Nightly builds should be reserved for developers, and not advertised to the 
 general public.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2011) Solr should get it's temp dir like lucene - first checking the tempDir sys prop


 [ 
https://issues.apache.org/jira/browse/SOLR-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2011:
--

Attachment: SOLR-2011.patch

attached is an initial patch... (it only fixes solr core, but I think we can 
fix contrib build.xml's the same way).

One benefit is that since temp stuff goes in build/ like lucene: on windows, 
'ant clean' 
will remove spellchecker indexes or other leftover stuff that couldnt be 
deleted in tearDown(), 
rather than littering your system temp directory.


 Solr should get it's temp dir like lucene - first checking the tempDir sys 
 prop
 ---

 Key: SOLR-2011
 URL: https://issues.apache.org/jira/browse/SOLR-2011
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: Next

 Attachments: SOLR-2011.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2555) Remove shared doc stores


[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891414#action_12891414
 ] 

Michael Busch commented on LUCENE-2555:
---

What shall we do about index backward-compatibility?

I guess 4.0 has to be able to read shared doc stores?  So a lot of that code we 
can't remove? :(

 Remove shared doc stores
 

 Key: LUCENE-2555
 URL: https://issues.apache.org/jira/browse/LUCENE-2555
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch


 With per-thread DocumentsWriters sharing doc stores across segments doesn't 
 make much sense anymore.
 See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-22 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891422#action_12891422
 ] 

Jason Rutherglen commented on LUCENE-2555:
--

Maybe we should break backwards-compatibility for the RT branch?  Or just ship 
an RT specific JAR to keep things simple?

 Remove shared doc stores
 

 Key: LUCENE-2555
 URL: https://issues.apache.org/jira/browse/LUCENE-2555
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch


 With per-thread DocumentsWriters sharing doc stores across segments doesn't 
 make much sense anymore.
 See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2554) preflex codec doesn't order terms correctly