from:"Shai Erera \(JIRA\)"

[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857388#action_12857388
]

Shai Erera commented on LUCENE-2396:

Robert I think this is great! Can we move more analyzers from core here? I
think however that a backwards section in changes is important because it
alerts users about those analyzers whose runtime behavior changed. Otherwise
how would the poor uses know that? It doesn't mean you need to maintain back
compat support but at least alert them when things change.

Even if we eventually decide to remove API bw completely, a section in CHANGES
will still be required to help users upgrade easily.

remove version from contrib/analyzers.
--

Key: LUCENE-2396
URL: https://issues.apache.org/jira/browse/LUCENE-2396
Project: Lucene - Java
Issue Type: Task
Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
Attachments: LUCENE-2396.patch

Contrib/analyzers has no backwards-compatibility policy, so let's remove
Version so the API is consumable.
if you think we shouldn't do this, then instead explicitly state and vote on
what the backwards compatibility policy for contrib/analyzers should be
instead, or move it all to core.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857396#action_12857396
 ] 

Shai Erera commented on LUCENE-2396:


Static? Weren't you against that!? 

But if we remove back compat from analyzers why do we need Version? Or is this 
API bw that we remove?

 remove version from contrib/analyzers.
 --

 Key: LUCENE-2396
 URL: https://issues.apache.org/jira/browse/LUCENE-2396
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2396.patch


 Contrib/analyzers has no backwards-compatibility policy, so let's remove 
 Version so the API is consumable.
 if you think we shouldn't do this, then instead explicitly state and vote on 
 what the backwards compatibility policy for contrib/analyzers should be 
 instead, or move it all to core.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2397) SnapshotDeletionPolicy.snapshot() throws NPE if no commits happened

2010-04-15 Thread Shai Erera (JIRA)

SnapshotDeletionPolicy.snapshot() throws NPE if no commits happened
---

 Key: LUCENE-2397
 URL: https://issues.apache.org/jira/browse/LUCENE-2397
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1


SDP throws NPE if no commits occurred and snapshot() was called. I will replace 
it w/ throwing IllegalStateException. I'll also move TestSDP from o.a.l to 
o.a.l,index. I'll post a patch soon

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2316) Define clear semantics for Directory.fileLength

2010-04-14 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2316.


Lucene Fields: [New, Patch Available]  (was: [New])
 Assignee: Shai Erera
   Resolution: Fixed

Committed revision 933879.

 Define clear semantics for Directory.fileLength
 ---

 Key: LUCENE-2316
 URL: https://issues.apache.org/jira/browse/LUCENE-2316
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2316.patch


 On this thread: 
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201003.mbox/%3c126142c1003121525v24499625u1589bbef4c079...@mail.gmail.com%3e
  it was mentioned that Directory's fileLength behavior is not consistent 
 between Directory implementations if the given file name does not exist. 
 FSDirectory returns a 0 length while RAMDirectory throws FNFE.
 The problem is that the semantics of fileLength() are not defined. As 
 proposed in the thread, we'll define the following semantics:
 * Returns the length of the file denoted by codename/code if the file 
 exists. The return value may be anything between 0 and Long.MAX_VALUE.
 * Throws FileNotFoundException if the file does not exist. Note that you can 
 call dir.fileExists(name) if you are not sure whether the file exists or not.
 For backwards we'll create a new method w/ clear semantics. Something like:
 {code}
 /**
  * @deprecated the method will become abstract when #fileLength(name) has 
 been removed.
  */
 public long getFileLength(String name) throws IOException {
   long len = fileLength(name);
   if (len == 0  !fileExists(name)) {
 throw new FileNotFoundException(name);
   }
   return len;
 }
 {code}
 The first line just calls the current impl. If it throws exception for a 
 non-existing file, we're ok. The second line verifies whether a 0 length is 
 for an existing file or not and throws an exception appropriately.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856845#action_12856845
]

Shai Erera commented on LUCENE-2159:

This looks like a nice tool. But all it does is create multiple copies of the
same segment(s) right? So what exactly do you want to test with it? What
worries me is that we'll be multiplying the lexicon, posting lists, statistics
etc., therefore I'm not sure how reliable the tests will be (whatever they
are), except for measuring things related to large number of segments (like
merge performance). Am I right?

I also think this class better fits in benchmark rather than misc, as it's
really for perf. testing/measurements and not as a generic utility ... You can
create a Task out if it, like ExpandIndexTask which one can include in his
algorithm.

Tool to expand the index for perf/stress testing.
-

Key: LUCENE-2159
URL: https://issues.apache.org/jira/browse/LUCENE-2159
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/*
Affects Versions: 3.0
Reporter: John Wang
Attachments: ExpandIndex.java

Sometimes it is useful to take a small-ish index and expand it into a large
index with K segments for perf/stress testing.
This tool does that. See attached class.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856877#action_12856877
 ] 

Shai Erera commented on LUCENE-2159:


bq. I understand having a general performance suite to test regression is a 
good thing. But we found having a more focused test for segmentation and merge 
is important.

Are you saying that because of the benchmark proposal? I still think that an 
ExpandIndexTask will be useful for benchmark and fits better there, than in 
contrib/misc. We can have that task together w/ a predefined .alg for using it 
...

 Tool to expand the index for perf/stress testing.
 -

 Key: LUCENE-2159
 URL: https://issues.apache.org/jira/browse/LUCENE-2159
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Affects Versions: 3.0
Reporter: John Wang
 Attachments: ExpandIndex.java


 Sometimes it is useful to take a small-ish index and expand it into a large 
 index with K segments for perf/stress testing. 
 This tool does that. See attached class.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856911#action_12856911
]

Shai Erera commented on LUCENE-2159:

Which is fine - I think this would be a neat task to add to benchmark, w/
specific documentation on how to use it and for what purposes. If you can also
write a sample .alg file which e.g. creates a small index and then Expand it,
that'd be great.

I've looked at the different PerfTask implementations in benchmark, and I'm
thinking if we perhaps should do the following:
* Create an AddIndexesTask which receives one or more Directories as input and
calls writer.addIndexesNoOptimize
* If one wants, he can add an OptimizeTask call afterwards.
* Write an expandIndex.alg which initially creates an index of size N from one
content source and then calls the AddIndexesTask several times. The .alg file
is meant to be an example as well as people can change it to create bigger or
smaller indexes, use other content sources and switch between RAM/FS
directories.

How's that sound?

Tool to expand the index for perf/stress testing.
-

Sometimes it is useful to take a small-ish index and expand it into a large
index with K segments for perf/stress testing.
This tool does that. See attached class.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856917#action_12856917
]

Shai Erera commented on LUCENE-2159:

bq. There is an excellent section on it in LIA2

Indeed !

Ok so to create a task, you just extend PerfTask. You can look under
contrib/benchmark/src/java/o.a.l/benchmark/byTask/tasks for many examples.
OptimizeTask seems relevant here (i.e. it calls an IW API and receives a
parameter).

For writing .alg files, that's SUPER simple, just look under
contrib/benchmark/conf for many existing examples. You can post a patch once
you feel comfortable enough with it and I can help you with the struggles (if
you'll run into any). Another great source (besides LIA2) on writing .alg files
is the package.html under
contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask.

Tool to expand the index for perf/stress testing.
-

Sometimes it is useful to take a small-ish index and expand it into a large
index with K segments for perf/stress testing.
This tool does that. See attached class.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-13 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2386.


Resolution: Fixed

Committed revision 933613. (take #2)

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2316) Define clear semantics for Directory.fileLength

2010-04-13 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2316:
---

Attachment: LUCENE-2316.patch

Patch clarifies the contract, fixes the directories to adhere to it and adds a 
CHANGES under backwards section. All tests pass.

 Define clear semantics for Directory.fileLength
 ---

 Key: LUCENE-2316
 URL: https://issues.apache.org/jira/browse/LUCENE-2316
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2316.patch


 On this thread: 
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201003.mbox/%3c126142c1003121525v24499625u1589bbef4c079...@mail.gmail.com%3e
  it was mentioned that Directory's fileLength behavior is not consistent 
 between Directory implementations if the given file name does not exist. 
 FSDirectory returns a 0 length while RAMDirectory throws FNFE.
 The problem is that the semantics of fileLength() are not defined. As 
 proposed in the thread, we'll define the following semantics:
 * Returns the length of the file denoted by codename/code if the file 
 exists. The return value may be anything between 0 and Long.MAX_VALUE.
 * Throws FileNotFoundException if the file does not exist. Note that you can 
 call dir.fileExists(name) if you are not sure whether the file exists or not.
 For backwards we'll create a new method w/ clear semantics. Something like:
 {code}
 /**
  * @deprecated the method will become abstract when #fileLength(name) has 
 been removed.
  */
 public long getFileLength(String name) throws IOException {
   long len = fileLength(name);
   if (len == 0  !fileExists(name)) {
 throw new FileNotFoundException(name);
   }
   return len;
 }
 {code}
 The first line just calls the current impl. If it throws exception for a 
 non-existing file, we're ok. The second line verifies whether a 0 length is 
 for an existing file or not and throws an exception appropriately.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855870#action_12855870
]

Shai Erera commented on LUCENE-2386:

I'm not sure if we're arguing about the same thing here ... why when I open an
IW on empty Directory I need an empty segment that's created, and from now on
never changed, populated or even read? That just seems wrong to me ... when I
fixed the tests to not rely on the buggy behavior, I noticed several which
count the list of commits (especially the IDP ones) w/ a documentation like 1
for opening + N for committing ...

It just looks weird that when you open IW a commit happens, a set of empty
files are created, but from now on they are never modified, until IDP kicks in,
after the second commit ... it's nothing like initing the Directory to be able
to receive input ..

And I don't know what's the benefit of doing new IW() following by
IR.open() ... that IR will always see 0 documents, until you call reopen (if
commit happened in between). So what's the convenience here? that your code can
call IR.open once, and from that point forward just 'reopen()'? That seems low
advantage to me, really. Maybe what we should do is fix IR.open to return a
null IR in case the directory hasn't been populated w/ anything yet. Then you
can check easily if you should call open() (==null) or reopen (otherwise). Or
create a blank stub of IR which emulates an empty Dir, and when reopen is
called works well (if the Directory is not empty now) ...

BTW, FWIW, Solr's code did not break from this change at all ... it was the
combination of FSDir and NoLF/SingleInstanceLF that broke some tests that used
it ... I don't know how many apps out there are using that combination, but I'd
bet it's small? I use that combination, however in my case an IR is opened only
after a commit signal/event is raised (so I don't check isCurrent often or
attempt to reopen()). What I'm trying to say is that this combination is
dangerous, and the application needs to ensure that only one IW is open at any
given time, and I'm sure such apps are more sophisticated then opening IW and
then IR just for the convenience of it.

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch,
LUCENE-2386.patch, LUCENE-2386.patch

I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh
Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems
unnecessarily, and kind of brings back an autoCommit mode, in a strange way
... why do we need that commit? Do we really expect people to open an
IndexReader on an empty Directory which they just passed to an IW w/
create=true? If they want, they can simply call commit() right away on the IW
they created.
I ran into this when writing a test which committed N times, then compared
the number of commits (via IndexReader.listCommits) and was surprised to see
N+1 commits.
Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter
jumping on me .. so the change might not be that simple. But I think it's
manageable, so I'll try to attack it (and IFD specifically !) back :).

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2316) Define clear semantics for Directory.fileLength

2010-04-12 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855873#action_12855873
 ] 

Shai Erera commented on LUCENE-2316:


Well ... dir.fileLength is also used by SegmentInfos.sizeInBytes to compute the 
size of all the files in the Directory. If we remove fileLength, then SI will 
need to call dir.openInput.length() and the close it? Seems like a lot of work 
to me, for just obtaining the length of the file. So I agree that if you have 
an IndexInput at hand, you should call its length() method rather than 
Dir.fileLength. But otherwise, if you just have a name at hand, a 
dir.fileLength is convenient?

I'm also ok w/ the bw break rather than going through the new/deprecate cycle.

 Define clear semantics for Directory.fileLength
 ---

 Key: LUCENE-2316
 URL: https://issues.apache.org/jira/browse/LUCENE-2316
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Priority: Minor
 Fix For: 3.1


 On this thread: 
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201003.mbox/%3c126142c1003121525v24499625u1589bbef4c079...@mail.gmail.com%3e
  it was mentioned that Directory's fileLength behavior is not consistent 
 between Directory implementations if the given file name does not exist. 
 FSDirectory returns a 0 length while RAMDirectory throws FNFE.
 The problem is that the semantics of fileLength() are not defined. As 
 proposed in the thread, we'll define the following semantics:
 * Returns the length of the file denoted by codename/code if the file 
 exists. The return value may be anything between 0 and Long.MAX_VALUE.
 * Throws FileNotFoundException if the file does not exist. Note that you can 
 call dir.fileExists(name) if you are not sure whether the file exists or not.
 For backwards we'll create a new method w/ clear semantics. Something like:
 {code}
 /**
  * @deprecated the method will become abstract when #fileLength(name) has 
 been removed.
  */
 public long getFileLength(String name) throws IOException {
   long len = fileLength(name);
   if (len == 0  !fileExists(name)) {
 throw new FileNotFoundException(name);
   }
   return len;
 }
 {code}
 The first line just calls the current impl. If it throws exception for a 
 non-existing file, we're ok. The second line verifies whether a 0 length is 
 for an existing file or not and throws an exception appropriately.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855875#action_12855875
]

Shai Erera commented on LUCENE-2392:

Mike - it'll also be great if we can store the length of the document in a
custom way. I think what I'm saying is that if we can open up the norms
computation to custom code - that will do what I want, right? Maybe we can have
a class like DocLengthProvider which apps can plug in if they want to customize
how that length is computed. Wherever we write the norms, we'll call that impl,
which by default will do what Lucene does today?
I think though that it's not a field-level setting, but an IW one?

Enable flexible scoring
---

Key: LUCENE-2392
URL: https://issues.apache.org/jira/browse/LUCENE-2392
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 3.1

Attachments: LUCENE-2392.patch

This is a first step (nowhere near committable!), implementing the
design iterated to in the recent Baby steps towards making Lucene's
scoring more flexible java-dev thread.
The idea is (if you turn it on for your Field; it's off by default) to
store full stats in the index, into a new _X.sts file, per doc (X
field) in the index.
And then have FieldSimilarityProvider impls that compute doc's boost
bytes (norms) from these stats.
The patch is able to index the stats, merge them when segments are
merged, and provides an iterator-only API. It also has starting point
for per-field Sims that use the stats iterator API to compute boost
bytes. But it's not at all tied into actual searching! There's still
tons left to do, eg, how does one configure via Field/FieldType which
stats one wants indexed.
All tests pass, and I added one new TestStats unit test.
The stats I record now are:
- field's boost
- field's unique term count (a b c a a b -- 3)
- field's total term count (a b c a a b -- 6)
- total term count per-term (sum of total term count for all docs
that have this term)
Still need at least the total term count for each field.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2373) Change StandardTermsDictWriter to work with streaming and append-only filesystems

2010-04-12 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855877#action_12855877
 ] 

Shai Erera commented on LUCENE-2373:


I'd rather not count on file length as well ... so a put/getTermDictSize method 
on Codec will allow one to implement it however one wants, if running on HDFS 
for example?

 Change StandardTermsDictWriter to work with streaming and append-only 
 filesystems
 -

 Key: LUCENE-2373
 URL: https://issues.apache.org/jira/browse/LUCENE-2373
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Andrzej Bialecki 
 Fix For: 3.1


 Since early 2.x times Lucene used a skip/seek/write trick to patch the length 
 of the terms dict into a place near the start of the output data file. This 
 however made it impossible to use Lucene with append-only filesystems such as 
 HDFS.
 In the post-flex trunk the following code in StandardTermsDictWriter 
 initiates this:
 {code}
 // Count indexed fields up front
 CodecUtil.writeHeader(out, CODEC_NAME, VERSION_CURRENT); 
 out.writeLong(0); // leave space for end 
 index pointer
 {code}
 and completes this in close():
 {code}
   out.seek(CodecUtil.headerLength(CODEC_NAME));
   out.writeLong(dirStart);
 {code}
 I propose to change this layout so that this pointer is stored simply at the 
 end of the file. It's always 8 bytes long, and we known the final length of 
 the file from Directory, so it's a single additional seek(length - 8) to read 
 it, which is not much considering the benefits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855892#action_12855892
]

Shai Erera commented on LUCENE-2386:

bq. what is the proper way (after this fix) to open an IR over possibly-empty
directory?

You can simply call commit() immediately after you open IW. If that's what you
need then it will work for you.

You're right that if I add docs, deletes and them commits, I'll get an empty
segment. So is if you do new IW() and then iw.close() w/ no addDocument in
between. The point here was that we should not create a commit unless the user
has specifically asked for it. Calling close() means asking for a commit, per
close semantics and contract. But if the app called new IW, add docs and
crashed in the middle, the Directory will still remain empty ... which is sort
of what, IMO, should happen.

I agree it's a matter of perspective. I think that when autoCommit was removed,
so should have been this code. I don't know if it was left behind for a good
reason, or simply because when someone tried to do it, he found out it's not
that simple (like I have :)).

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch,
LUCENE-2386.patch, LUCENE-2386.patch

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855913#action_12855913
]

Shai Erera commented on LUCENE-2392:

I'd like to withdraw my request from above. I misunderstood that the stats I
need are stored per-field per-doc. So that will allow me to compute the
docLength as I want.

Enable flexible scoring
---

Attachments: LUCENE-2392.patch

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855924#action_12855924
]

Shai Erera commented on LUCENE-2386:

I don't think that people need to write that emptiness-detection-then-commit
code ... if they care, they can simply immediately call commit() after they
open IW.

bq. Isn't opening IW with CREATE* mode called specifically asking for?

It depends on how you interpret the mode ... for example, you cannot pass
OpenMode.APPEND for an empty Directory, because IW throws an exception. The
modes are just meant to tell IW how to behave:
* APPEND - I know there is an index in the Directory, and I'd like to append to
it.
* CREATE - I don't care if there is an index in the Directory -- create a new
one, zeroing out all segments.
* CREATE_OR_APPEND - If there is an index, open it, otherwise create a new one.

So if you pass CREATE on an already populated index, IW doesn't do the implicit
commit, until you call commit() yourself. But if you pass CREATE on an empty
index, IW suddenly calls commit()? That's just an inconsistency that's meant to
allow you to open an IR immediately after new IW() call, irregardless of what
was there? And if you open that IR, then if the index was populated you see the
previous set of documents, but if it wasn't you see nothing, even though you
meant to say override what's there?

I've checked what FileOutputStream does, using the following code:
{code}
File file = new File(d:/temp/tmpfile);
FileOutputStream fos = new FileOutputStream(file);
fos.write(3);
fos.close();

fos = new FileOutputStream(file);
FileInputStream fis = new FileInputStream(file);
System.out.println(fis.read());
{code}

* Second line creates an empty file immediately, not waiting for close() or
flush() -- which resembles the behavior that you're suggesting we should take
w/ IW (which is the 'today's behavior')
* Forth line closes the file, flushing and writing the content.
* Fifth line *recreates* the file, empty, again, w/o calling close. So it zeros
out the file content immediately, even before you wrote a single piece of byte
to it.
* Sixth+Seventh line proves it by attempting to read from the file, and the
output printed is -1.

I've wrapped the FOS w/ a BufferedOS and the behavior is still the same. So I'm
trying to show is that we don't fully adhere to the CREATE mode, and rightfully
if you ask me - we shouldn't zero out the segments until the application called
commit(). But we choose to adhere differently to the CREATE* mode if the index
is already populated. That's an inconsistent behavior, at least in my
perspective. It's also harder to explain and document, e.g. you should call
commit() if you used CREATE, in case you want to zero out everything
immediately, and the Directory is not empty, but you don't need to call
commit() if the directory was empty, Lucene will do it for you. -- so now how
will the app know if it should call commit()? It will need to write a sort of
emptiness-detection-then-commit?

I am willing to consider the following semantics:
* APPEND - assumes an index exists and open it.
* CREATE - zeros out everything that's in the directory *immediately*, and also
prepares an empty directory.
* CREATE_OR_APPEND - either loads an existing index, or is able to work on the
empty directory. No implicit commit is happening by IW if the index does not
exist.

But I think CREATE is too dangerous, and so I prefer to stick w/ the proposed
change to the patch so far -- if you open an index in CREATE*, you should call
commit before you can read it. That will adhere to the semantics of what the
application wanted, whether it meant to zero out an existing Directory, or
create a new one from scratch.

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch,
LUCENE-2386.patch, LUCENE-2386.patch

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856063#action_12856063
]

Shai Erera commented on LUCENE-2386:

So just call new IW(), then rollback and ensure dir.listAll() returns an
empty list? Or also index stuff, making sure a flush occurs and then rollback?
I'm not sure that the latter is related to that issue ...

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch,
LUCENE-2386.patch, LUCENE-2386.patch

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

Patch includes the proposed test in TestIndexWriter. I think this is ready for 
commit, if there are no more objections.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2386.


Lucene Fields: [New, Patch Available]  (was: [New])
   Resolution: Fixed

Committed revision 932868.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1709) Parallelize Tests

2010-04-11 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855713#action_12855713
 ] 

Shai Erera commented on LUCENE-1709:


Committed revision 932878 with the following:
# benchmark tests force sequential run
# threadsPerProcessor defaults to 1 and can be overridden by 
-DthreadsPerProcessor=value
# A CHANGES entry

 Parallelize Tests
 -

 Key: LUCENE-1709
 URL: https://issues.apache.org/jira/browse/LUCENE-1709
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Robert Muir
 Fix For: 3.1

 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, runLuceneTests.py

   Original Estimate: 48h
  Remaining Estimate: 48h

 The Lucene tests can be parallelized to make for a faster testing system.  
 This task from ANT can be used: 
 http://ant.apache.org/manual/CoreTasks/parallel.html
 Previous discussion: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/69669
 Notes from Mike M.:
 {quote}
 I'd love to see a clean solution here (the tests are embarrassingly
 parallelizable, and we all have machines with good concurrency these
 days)... I have a rather hacked up solution now, that uses
 -Dtestpackage=XXX to split the tests up.
 Ideally I would be able to say use N threads and it'd do the right
 thing... like the -j flag to make.
 {quote}

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855727#action_12855727
 ] 

Shai Erera commented on LUCENE-2386:


Committed revision 932917 for the revert.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

Fixes IndexFileDeleter, adds a proper test to TestIndexWriter. Haven't run all
the tests yet though, but the added test passes now with the fix.

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch,
LUCENE-2386.patch

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855767#action_12855767
]

Shai Erera commented on LUCENE-2386:

About IndexReader.listCommits ... the javadocs state this There must be at
least one commit in the Directory, else this method throws
java.io.IOException.. So I'll change it to reflect the right exception type is
thrown (IndexNotFoundException) and revert the change to DirReader.listCommits
which returns an empty list.

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch,
LUCENE-2386.patch

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

Patch w/ proposed fixes. All tests pass, including Solr's :).

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-10 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

Patch updated to latest rev. + the proposed name change -- 
IndexNotFoundException. All tests pass. I plan to commit this later today.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855344#action_12855344
 ] 

Shai Erera commented on LUCENE-2386:


Ok I've added the following to DirReader:

{code}
try {
  latest.read(dir, codecs);
} catch (FileNotFoundException e) {
  if (e.getMessage().startsWith(no segments* file found in)) {
// Might be that the Directory is empty, in which case just return an
// empty collection.
return Collections.emptyList();
  } else {
throw e;
  }
}
{code}

And now that test passes.

I'll continue discovering tests that fail ... probably backwards will have its 
share too :).

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855369#action_12855369
 ] 

Shai Erera commented on LUCENE-2386:


I already did that ... just didn't post back. Created 
SegmentsFileNotFoundException.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-04-09 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855379#action_12855379
 ] 

Shai Erera commented on LUCENE-1879:


I have found such version ... and it fails too :). At least the one I received.

But never mind that ... as long as we both agree the implementation should 
change. I didn't mean to say anything bad about what you did .. I know the 
limitations you had to work with.

 Parallel incremental indexing
 -

 Key: LUCENE-1879
 URL: https://issues.apache.org/jira/browse/LUCENE-1879
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
 Fix For: 3.1

 Attachments: parallel_incremental_indexing.tar


 A new feature that allows building parallel indexes and keeping them in sync 
 on a docID level, independent of the choice of the MergePolicy/MergeScheduler.
 Find details on the wiki page for this feature:
 http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing 
 Discussion on java-dev:
 http://markmail.org/thread/ql3oxzkob7aqf3jd

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

Patch fixes all tests as well as changes to IndexWriter, IndexFileDeleter,
DirectoryReader and SegmentInfos.

I'd like to commit this shortly, before all the files get changed by a
malicious other commit :). (kidding of course)

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch, LUCENE-2386.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855457#action_12855457
 ] 

Shai Erera commented on LUCENE-2386:


Ok sounds good. Is there a preferred package for exceptions? Or is o.a.l.index 
ok?

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854885#action_12854885
]

Shai Erera commented on LUCENE-2074:

Uwe, must this be coupled with that issue? This one waits for a long time (why?
for JFlex 1.5 release?) and protecting against a huge buffer allocation can be
a real quick and tiny fix. And this one also focuses on getting Unicode 5 to
work, which is unrelated to the buffer size. But the buffer size is not a
critical issue either that we need to move fast with it ... so it's your call.
Just thought they are two unrelated problems.

Use a separate JFlex generated Unicode 4 by Java 5 compatible
StandardTokenizer
---

Key: LUCENE-2074
URL: https://issues.apache.org/jira/browse/LUCENE-2074
Project: Lucene - Java
Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 3.1

Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch,
LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch,
LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch,
LUCENE-2074.patch

The current trunk version of StandardTokenizerImpl was generated by Java 1.4
(according to the warning). In Java 3.0 we switch to Java 1.5, so we should
regenerate the file.
After regeneration the Tokenizer behaves different for some characters.
Because of that we should only use the new TokenizerImpl when
Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854887#action_12854887
 ] 

Shai Erera commented on LUCENE-2074:


bq. I plan to commit this soon! 

That's great news !

BTW - what are you going to do w/ the JFlex 1.5 binary? Are you going to check 
it in somewhere? because it hasn't been released last I checked. I'm asking for 
general knowledge, because I know the scripts are downloading it, or rely on it 
to exist somewhere.

In that case, then yes, let's fix it here.

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2010-04-08 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854920#action_12854920
 ] 

Shai Erera commented on LUCENE-1482:


I still think that calling isDebugEnabled is better, because the message 
formatting stuff may do unnecessary things like casting, autoboxing etc. IMO, 
if logging is enabled, evaluating it twice is not a big deal ... it's a simple 
check.

I'm glad someone here thinks logging will be useful though :). I wish there 
will be quorum here to proceed w/ that.

Note that I also offered to not create any dependency on SLF4J, but rather 
extract infoStream to a static InfoStream class, which will avoid passing it 
around everywhere, and give the flexibility to output stuff from other classes 
which don't have an infoStream at hand.

 Replace infoSteram by a logging framework (SLF4J)
 -

 Key: LUCENE-1482
 URL: https://issues.apache.org/jira/browse/LUCENE-1482
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-1482-2.patch, LUCENE-1482.patch, 
 slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar


 Lucene makes use of infoStream to output messages in its indexing code only. 
 For debugging purposes, when the search application is run on the customer 
 side, getting messages from other code flows, like search, query parsing, 
 analysis etc can be extremely useful.
 There are two main problems with infoStream today:
 1. It is owned by IndexWriter, so if I want to add logging capabilities to 
 other classes I need to either expose an API or propagate infoStream to all 
 classes (see for example DocumentsWriter, which receives its infoStream 
 instance from IndexWriter).
 2. I can either turn debugging on or off, for the entire code.
 Introducing a logging framework can allow each class to control its logging 
 independently, and more importantly, allows the application to turn on 
 logging for only specific areas in the code (i.e., org.apache.lucene.index.*).
 I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, 
 as it names states, a facade over different logging frameworks. As such, you 
 can include the slf4j.jar in your application, and it recognizes at deploy 
 time what is the actual logging framework you'd like to use. SLF4J comes with 
 several adapters for Java logging, Log4j and others. If you know your 
 application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in 
 your classpath, and your logging statements will use Java logging underneath 
 the covers.
 This makes the logging code very simple. For a class A the logger will be 
 instantiated like this:
 public class A {
   private static final logger = LoggerFactory.getLogger(A.class);
 }
 And will later be used like this:
 public class A {
   private static final logger = LoggerFactory.getLogger(A.class);
   public void foo() {
 if (logger.isDebugEnabled()) {
   logger.debug(message);
 }
   }
 }
 That's all !
 Checking for isDebugEnabled is very quick, at least using the JDK14 adapter 
 (but I assume it's fast also over other logging frameworks).
 The important thing is, every class controls its own logger. Not all classes 
 have to output logging messages, and we can improve Lucene's logging 
 gradually, w/o changing the API, by adding more logging messages to 
 interesting classes.
 I will submit a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1709) Parallelize Tests

2010-04-08 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855020#action_12855020
]

Shai Erera commented on LUCENE-1709:

Robert, I will commit the patch, seems good to do anyway. We can handle the ant
jars separately later.

And ths hang behavior is exactly what I experience, including the
FileInputStream thing. Only on my machine, when I took a thread dump, it showed
that Ant waits on FIS.read() ...

Robert - to remind you that even with the patch which forces junit to use a
separate temp folder per thread, it still hung ...

Parallelize Tests
-

Key: LUCENE-1709
URL: https://issues.apache.org/jira/browse/LUCENE-1709
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Robert Muir
Fix For: 3.1

Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch,
LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch,
LUCENE-1709.patch, runLuceneTests.py

Original Estimate: 48h
Remaining Estimate: 48h

The Lucene tests can be parallelized to make for a faster testing system.
This task from ANT can be used:
http://ant.apache.org/manual/CoreTasks/parallel.html
Previous discussion:
http://www.gossamer-threads.com/lists/lucene/java-dev/69669
Notes from Mike M.:
{quote}
I'd love to see a clean solution here (the tests are embarrassingly
parallelizable, and we all have machines with good concurrency these
days)... I have a rather hacked up solution now, that uses
-Dtestpackage=XXX to split the tests up.
Ideally I would be able to say use N threads and it'd do the right
thing... like the -j flag to make.
{quote}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)

Move NoDeletionPolicy from benchmark to core


 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1


As the subject says, but I'll also make it a singleton + add some unit tests, 
as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)

IndexWriter commits unnecessarily on fresh Directory


 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1


I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... 
why do we need that commit? Do we really expect people to open an IndexReader 
on an empty Directory which they just passed to an IW w/ create=true? If they 
want, they can simply call commit() right away on the IW they created.

I ran into this when writing a test which committed N times, then compared the 
number of commits (via IndexReader.listCommits) and was surprised to see N+1 
commits.

Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
jumping on me .. so the change might not be that simple. But I think it's 
manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2385:
---

Attachment: LUCENE-2385.patch

Move NoDeletionPolicy to core, adds javadocs + TestNoDeletionPolicy. Also 
includes the relevant changes to benchmark (algorithms + CreateIndexTask).
I've fixed a typo I had in NoMergeScheduler - not related to this issue, but 
since it was just a typo, thought it's no harm to do it here.

Tests pass. Planning to commit shortly.

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855131#action_12855131
 ] 

Shai Erera commented on LUCENE-2386:


Took a look at IndexFileDeleter, and located to offending code segment which is 
responsible for the IndexCorruptException:
{code}
if (currentCommitPoint == null) {
  // We did not in fact see the segments_N file
  // corresponding to the segmentInfos that was passed
  // in.  Yet, it must exist, because our caller holds
  // the write lock.  This can happen when the directory
  // listing was stale (eg when index accessed via NFS
  // client with stale directory listing cache).  So we
  // try now to explicitly open this commit point:
  SegmentInfos sis = new SegmentInfos();
  try {
sis.read(directory, segmentInfos.getCurrentSegmentFileName(), codecs);
  } catch (IOException e) {
throw new CorruptIndexException(failed to locate current segments_N 
file);
  }
{code}

Looks like this code protects against a real problem, which was raised on the 
list a couple of times already - stale NFS cache. So I'm reluctant to remove 
that check ... thought I still think we should differentiate between a newly 
created index on a fresh Directory, to a stale NFS problem. Maybe we can pass a 
boolean isNew or something like that to the ctor, and if it's a new index and 
the last commit point is missing, IFD will not throw the exception, but 
silently ignore that? So the code would become something like this:
{code}
if (currentCommitPoint == null  !isNew) {
   
}
{code}

Does this make sense, or am I missing something?

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855140#action_12855140
 ] 

Shai Erera commented on LUCENE-2385:


I did that first, but then remembered that when I did that in the past, people 
were unable to apply my patches, w/o doing the svn move themselves. Anyway, for 
this file it's not really important I think - a very simple and tiny file, w/ 
no history to preserve? Is that ok for this file (b/c I have no idea how to do 
the svn move now ... after I've made all the changes already) :)

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855148#action_12855148
]

Shai Erera commented on LUCENE-2386:

Looking at IFD again, I think a boolean ctor arg is not required. What I can do
is check if any Lucene file has been seen (in the for-loop iteration on the
Directory files), and if not, then deduce it's a new Directory, and skip that
'if' check. I'll give it a shot.

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2385:
---

Attachment: LUCENE-2385.patch

Is it better now?

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch, LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855155#action_12855155
 ] 

Shai Erera commented on LUCENE-2385:


Forgot to mention that the only move I made was of NoDeletionPolicy:

svn move 
contrib/benchmark/src/java/org/apache/lucene/benchmark/utils/NoDeletionPolicy.java
 src/java/org/apache/lucene/index/NoDeletionPolicy.java

I'll remember that in the future Uwe - thanks for the heads up !

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch, LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2385.


Resolution: Fixed

Committed revision 932129.

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch, LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

First stab at this. Patch still missing CHANGES entry, and I haven't run all
the tests, just TestIndexWriter. With those changes it passes. One thing that I
think should be fixed is testImmediateDiskFull - if I don't add
writer.commit(), the test fails, because dir.getRecomputeActualSizeInBytes
returns 0 (no RAMFiles yet), and then the test succeeds at adding one document.
So maybe just change the test to set maxSizeInBytes to '1', always?

TestNoDeletionPolicy is not covered by this patch (should be fixed as well,
because now the number of commits is exactly N and not N+1). Will fix it
tomorrow.

Anyway, it's really late now, so hopefully some fresh eyes will look at it
while I'm away, and comment on the proposed changes. I hope I got all the
changes to the tests right.

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855265#action_12855265
]

Shai Erera commented on LUCENE-2386:

bq. Maybe change testImmediateDiskFull to set max allowed size to max(1,
current-usage)?

Good idea ! Did it and it works.

Now ... one thing I haven't mentioned is the bw break. This is a behavioral bw
break, which specifically I'm not so sure we should care about, because I
wonder how many apps out there rely on being able to open a reader before they
ever commited on a fresh new index. So what do you think - do this change
anyway, OR ... utilize Version to our aid? I.e., if the Version that was passed
to IWC is before LUCENE_31, we keep the initial commit, otherwise we don't do
it? Pros is that I won't need to change many of the tests because they still
use the LUCENE_30 version (but that is not a strong argument), so it's a weak
Pro. Cons is that IW will keep having that doCommit handling in its ctor, only
now w/ added comments on why this is being kept around etc.

What do you think?

IndexWriter commits unnecessarily on fresh Directory

Key: LUCENE-2386
URL: https://issues.apache.org/jira/browse/LUCENE-2386
Project: Lucene - Java
Issue Type: Bug
Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2386.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855277#action_12855277
 ] 

Shai Erera commented on LUCENE-2386:


Apparently, there are more tests that fail ... lost count but easy fixing. I 
tried writing the following test:

{code}
  public void testNoCommits() throws Exception {
// Tests that if we don't call commit(), the directory has 0 commits. This 
has
// changed since LUCENE-2386, where before IW would always commit on a fresh
// new index.
Directory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new 
IndexWriterConfig(TEST_VERSION_CURRENT, new 
WhitespaceAnalyzer(TEST_VERSION_CURRENT)));
assertEquals(expected 0 commits!, 0, IndexReader.listCommits(dir).size());
// No changes still should generate a commit, because it's a new index.
writer.close();
assertEquals(expected 1 commits!, 0, IndexReader.listCommits(dir).size());
  }
{code}

Simple test - validates that no commits are present following a freshly new 
index creation, w/o closing or committing. However, IndexReader.listCommits 
fails w/ the following exception:

{code}
java.io.FileNotFoundException: no segments* file found in 
org.apache.lucene.store.ramdirect...@2d262d26: files: []
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:652)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:535)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:323)
at 
org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1033)
at 
org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1023)
at 
org.apache.lucene.index.IndexReader.listCommits(IndexReader.java:1341)
at 
org.apache.lucene.index.TestIndexWriter.testNoCommits(TestIndexWriter.java:4966)
   
{code}

The failure occurs when SegmentInfos attempts to find segments.gen and fails. 
So I wonder if I should fix DirectoryReader to catch that exception and simply 
return an empty Collection .. or I should fix SegmentInfos at this point -- 
notice the files: [] at the end - I think that by adding a check to the 
following code (SegmentInfos, line 652) which validates that there were any 
files before throwing the exception, it'll still work properly and safely (i.e. 
to detect a problematic Directory). Will need probably to break away from the 
while loop and I guess fix some other things in upper layers ... therefore I'm 
not sure if I should not simply catch that exception in 
DirectoryReader.listCommits w/ proper documentation and be done w/ it. After 
all, it's not supposed to be called ... ever? or hardly ever?

{code}
  if (gen == -1) {
// Neither approach found a generation
throw new FileNotFoundException(no segments* file found in  + 
directory + : files:  + Arrays.toString(files));
  }
{code}

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1709) Parallelize Tests

2010-04-07 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1709:
---

Attachment: LUCENE-1709-2.patch

Since I had the changes on my local env. I thought it's best to generate a 
patch out of them, so they don't get lost. The patch doesn't cover the ant 
.jars, only the changes to common-build.xml as well as benchmark/build.xml

 Parallelize Tests
 -

 Key: LUCENE-1709
 URL: https://issues.apache.org/jira/browse/LUCENE-1709
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Robert Muir
 Fix For: 3.1

 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, runLuceneTests.py

   Original Estimate: 48h
  Remaining Estimate: 48h

 The Lucene tests can be parallelized to make for a faster testing system.  
 This task from ANT can be used: 
 http://ant.apache.org/manual/CoreTasks/parallel.html
 Previous discussion: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/69669
 Notes from Mike M.:
 {quote}
 I'd love to see a clean solution here (the tests are embarrassingly
 parallelizable, and we all have machines with good concurrency these
 days)... I have a rather hacked up solution now, that uses
 -Dtestpackage=XXX to split the tests up.
 Ideally I would be able to say use N threads and it'd do the right
 thing... like the -j flag to make.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2377) Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark

2010-04-07 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2377.


Resolution: Fixed

Committed revision 931502.

 Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark
 -

 Key: LUCENE-2377
 URL: https://issues.apache.org/jira/browse/LUCENE-2377
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2377.patch


 Benchmark allows one to set the MP and MS to use, by defining the class name 
 and then use reflection to instantiate them. However NoMP and NoMS are 
 singletons and therefore reflection does not work for them. Easy fix in 
 CreateIndexTask. I'll post a patch soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-04-07 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854588#action_12854588
 ] 

Shai Erera commented on LUCENE-2353:


Actually, we've reopened LUCENE-1709 to track that. This is not related to this 
issue's changes, but seems to be related to benchmark test in specifically. 
Please have a look there at a patch I've posted which forces benchmark tests to 
run in sequential mode. Additionally, you can 'ant test -Drunsequential=1' from 
the command line, benchmark's root folder, to achieve the same.
And it'd be great if you post the above on LUCENE-1709 as well -- because now I 
know I'm not the only one running into this :).

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch, LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1709) Parallelize Tests

2010-04-06 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854348#action_12854348
 ] 

Shai Erera commented on LUCENE-1709:


One more thing - change benchmark tests to run sequentially (by adding the 
property).
Robert, are you going to tackle that soon?

 Parallelize Tests
 -

 Key: LUCENE-1709
 URL: https://issues.apache.org/jira/browse/LUCENE-1709
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Robert Muir
 Fix For: 3.1

 Attachments: LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py

   Original Estimate: 48h
  Remaining Estimate: 48h

 The Lucene tests can be parallelized to make for a faster testing system.  
 This task from ANT can be used: 
 http://ant.apache.org/manual/CoreTasks/parallel.html
 Previous discussion: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/69669
 Notes from Mike M.:
 {quote}
 I'd love to see a clean solution here (the tests are embarrassingly
 parallelizable, and we all have machines with good concurrency these
 days)... I have a rather hacked up solution now, that uses
 -Dtestpackage=XXX to split the tests up.
 Ideally I would be able to say use N threads and it'd do the right
 thing... like the -j flag to make.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2377) Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark

2010-04-06 Thread Shai Erera (JIRA)

Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark
-

 Key: LUCENE-2377
 URL: https://issues.apache.org/jira/browse/LUCENE-2377
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1


Benchmark allows one to set the MP and MS to use, by defining the class name 
and then use reflection to instantiate them. However NoMP and NoMS are 
singletons and therefore reflection does not work for them. Easy fix in 
CreateIndexTask. I'll post a patch soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2377) Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark

2010-04-06 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2377:
---

Attachment: LUCENE-2377.patch

Patch includes both fix to CreateIndexTask as well as relevant tests to 
CreateIndexTaskTest. I plan to commit later today if there are no objections.

 Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark
 -

 Key: LUCENE-2377
 URL: https://issues.apache.org/jira/browse/LUCENE-2377
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2377.patch


 Benchmark allows one to set the MP and MS to use, by defining the class name 
 and then use reflection to instantiate them. However NoMP and NoMS are 
 singletons and therefore reflection does not work for them. Easy fix in 
 CreateIndexTask. I'll post a patch soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-31 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851829#action_12851829
 ] 

Shai Erera commented on LUCENE-2310:


+1 for this simplification. Can we just name it Indexable, and omit Document 
from it? That way, it's both shorter and less chances for users to directly 
link it w/ Document.

One thing I didn't understand though, is what will happen to ir/is.doc() 
method? Will those be deprecated in favor of some other class which receives an 
IR as parameter and knows how to re-construct Indexable(Document)?

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-31 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera reassigned LUCENE-2353:
--

Assignee: Shai Erera

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch, LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-31 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851836#action_12851836
 ] 

Shai Erera commented on LUCENE-2353:


Unless there are objections, I plan to commit this shortly

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch, LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-31 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851842#action_12851842
]

Shai Erera commented on LUCENE-2310:

Right Earwin - agreed.

I'd like to summarize a brief discussion we had on IRC around that:
The idea is not to provide another interface/class for search purposes, but
rather expose the right API from IndexReader, even if it might be a bit
low-level. API like getIndexedFields(docId) and getStorefFields(docId), both
optionally take a FieldSelector, should allow the application to re-construct
its Indexable however it wants. And IR/IS don't need to know anything about
that.
To complete the picture for current users, we can have a static reconstruct()
on Document which takes IR, docId and FieldSelector ...

BTW, I'm not even sure getIndedxedFields can be efficiently supported today.
Just listing it here for completeness.

Reduce Fieldable, AbstractField and Field complexity

Key: LUCENE-2310
URL: https://issues.apache.org/jira/browse/LUCENE-2310
Project: Lucene - Java
Issue Type: Sub-task
Components: Index
Reporter: Chris Male
Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch,
LUCENE-2310-Deprecate-AbstractField.patch,
LUCENE-2310-Deprecate-AbstractField.patch,
LUCENE-2310-Deprecate-AbstractField.patch,
LUCENE-2310-Deprecate-DocumentGetFields-core.patch,
LUCENE-2310-Deprecate-DocumentGetFields.patch,
LUCENE-2310-Deprecate-DocumentGetFields.patch

In order to move field type like functionality into its own class, we really
need to try to tackle the hierarchy of Fieldable, AbstractField and Field.
Currently AbstractField depends on Field, and does not provide much more
functionality that storing fields, most of which are being moved over to
FieldType. Therefore it seems ideal to try to deprecate AbstractField (and
possible Fieldable), moving much of the functionality into Field and
FieldType.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-31 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2353.


Resolution: Fixed

Committed revision 929520.

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch, LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-29 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2353:
---

Attachment: LUCENE-2353.patch

Updated to also match 'c:/temp' like paths, which are also accepted on Windows

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch, LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-28 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850644#action_12850644
 ] 

Shai Erera commented on LUCENE-2353:


I don't have an account yet, so I cannot commit this on my own. Any volunteers?

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-27 Thread Shai Erera (JIRA)

Config incorrectly handles Windows absolute pathnames
-

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
 Fix For: 3.1


I have no idea how no one ran into this so far, but I tried to execute an .alg 
file which used ReutersContentSource and referenced both docs.dir and work.dir 
as Windows absolute pathnames (e.g. d:\something). Surprisingly, the run 
reported an error of missing content under benchmark\work\something.

I've traced the problem back to Config, where get(String, String) includes the 
following code:
{code}
if (sval.indexOf(:)  0) {
  return sval;
}
// first time this prop is extracted by round
int k = sval.indexOf(:);
String colName = sval.substring(0, k);
sval = sval.substring(k + 1);
...
{code}

It detects : in the value and so it thinks it's a per-round property, thus 
stripping d: from the value ... fix is very simple:
{code}
if (sval.indexOf(:)  0) {
  return sval;
} else if (sval.indexOf(:\\) = 0) {
  // this previously messed up absolute path names on Windows. Assuming
  // there is no real value that starts with \\
  return sval;
}
// first time this prop is extracted by round
int k = sval.indexOf(:);
String colName = sval.substring(0, k);
sval = sval.substring(k + 1);
{code}

I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-27 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2353:
---

Attachment: LUCENE-2353.patch

The fix is only relevant to get(String, String) and not to all other 
get(String, type) variants.

Benchmark test passed but after I svn up (to include the latest parallel test 
thing) the test just sits idle (after finishing), waiting for something. If I 
run the tests in eclipse they pass. So I'm guessing it's a problem w/ my env. 
or build.xml?

I also tried 'ant clean test' from within benchmark, but it didn't help. I then 
tried 'ant clean' from root, and 'ant test' from benchmark, but the test just 
keeps waiting on WriteLineDocTaskTest, on this line:
[junit]  config properties:
[junit] directory = RAMDirectory
[junit] doc.maker = 
org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTaskTest$JustDateDocMaker
[junit] line.file.out = 
D:\dev\lucene\lucene-trunk\build\contrib\benchmark\test\W\one-line
[junit] ---

I think this can go in (if it passes on someone else's machine, while I figure 
out what's wrong in my env. separately.

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader

2010-03-26 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850075#action_12850075
 ] 

Shai Erera commented on LUCENE-2345:


Earwin, w/o knowing too much about the details of your work, I wanted to 
comment on get rid of of init/reinit/moreinit methods, moving the code to 
constructors. I work now on Parallel Index and one of the things I do is 
extend IW. Currently, IW's ctor code performs the initialization, however I'm 
thinking to move that code to an init method. The reason is to allow easy 
extensions of IW, such as LUCENE-2330. There I'm going to add a default ctor to 
IW, accompanied by an init method the extending class can call if needed. So 
what I'm trying to say is that init methods are not always bad, and sometimes 
ctors limit you. Perhaps it would make sense though in what you're trying to do 
...

 Make it possible to subclass SegmentReader
 --

 Key: LUCENE-2345
 URL: https://issues.apache.org/jira/browse/LUCENE-2345
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Reporter: Tim Smith
 Fix For: 3.1

 Attachments: LUCENE-2345_3.0.patch


 I would like the ability to subclass SegmentReader for numerous reasons:
 * to capture initialization/close events
 * attach custom objects to an instance of a segment reader (caches, 
 statistics, so on and so forth)
 * override methods on segment reader as needed
 currently this isn't really possible
 I propose adding a SegmentReaderFactory that would allow creating custom 
 subclasses of SegmentReader
 default implementation would be something like:
 {code}
 public class SegmentReaderFactory {
   public SegmentReader get(boolean readOnly) {
 return readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
   }
   public SegmentReader reopen(SegmentReader reader, boolean readOnly) {
 return newSegmentReader(readOnly);
   }
 }
 {code}
 It would then be made possible to pass a SegmentReaderFactory to IndexWriter 
 (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, 
 etc)
 I could prepare a patch if others think this has merit
 Obviously, this API would be experimental/advanced/will change in future

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader

2010-03-26 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850083#action_12850083
 ] 

Shai Erera commented on LUCENE-2345:


Thanks Uwe, I know that ctor is the preferred way, and in the process of 
introducing IWC I delete IW.init which all ctors called and pulled all the code 
to IW ctor. I will make that init() on IW final. But sometimes putting code in 
init() is not bad (and it's used in Lucene elsewhere too (e.g. PQ and up until 
recently IW).

 Make it possible to subclass SegmentReader
 --

 Key: LUCENE-2345
 URL: https://issues.apache.org/jira/browse/LUCENE-2345
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Reporter: Tim Smith
 Fix For: 3.1

 Attachments: LUCENE-2345_3.0.patch


 I would like the ability to subclass SegmentReader for numerous reasons:
 * to capture initialization/close events
 * attach custom objects to an instance of a segment reader (caches, 
 statistics, so on and so forth)
 * override methods on segment reader as needed
 currently this isn't really possible
 I propose adding a SegmentReaderFactory that would allow creating custom 
 subclasses of SegmentReader
 default implementation would be something like:
 {code}
 public class SegmentReaderFactory {
   public SegmentReader get(boolean readOnly) {
 return readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
   }
   public SegmentReader reopen(SegmentReader reader, boolean readOnly) {
 return newSegmentReader(readOnly);
   }
 }
 {code}
 It would then be made possible to pass a SegmentReaderFactory to IndexWriter 
 (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, 
 etc)
 I could prepare a patch if others think this has merit
 Obviously, this API would be experimental/advanced/will change in future

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2215) paging collector

2010-03-26 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850086#action_12850086
 ] 

Shai Erera commented on LUCENE-2215:


Sure let's wait for the patch and some perf. results.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader

2010-03-26 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850094#action_12850094
 ] 

Shai Erera commented on LUCENE-2345:


Earwin, I wholeheartedly agree with what you wrote. If we could refactor IW and 
extract it to a set of interfaces, then I agree (and Michael B. has an issue 
open for that). I think though that IW's API is already that interface (give or 
take few methods). So perhaps this can be an easy refactoring - introduce an 
Indexer (a la Searcher) class (or interface) w/ all of IW public methods, and 
then let PW extend/impl that class/interface as well as IW. We can also 
consider making IW itself final this way (though bw police will prevent it :)).

Then when PW sets up the slices, it can create them as IW or any other IW-like 
implementation it needs them to impl. If it sounds good enough to become its 
own issue, I can open one and we can continue discussing it there (and leave 
that issue focused on extending SR). Then I'll hold off w/ LUCENE-2330, or 
simply rename it to reflect that Indexer API.

 Make it possible to subclass SegmentReader
 --

 Key: LUCENE-2345
 URL: https://issues.apache.org/jira/browse/LUCENE-2345
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Reporter: Tim Smith
 Fix For: 3.1

 Attachments: LUCENE-2345_3.0.patch


 I would like the ability to subclass SegmentReader for numerous reasons:
 * to capture initialization/close events
 * attach custom objects to an instance of a segment reader (caches, 
 statistics, so on and so forth)
 * override methods on segment reader as needed
 currently this isn't really possible
 I propose adding a SegmentReaderFactory that would allow creating custom 
 subclasses of SegmentReader
 default implementation would be something like:
 {code}
 public class SegmentReaderFactory {
   public SegmentReader get(boolean readOnly) {
 return readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
   }
   public SegmentReader reopen(SegmentReader reader, boolean readOnly) {
 return newSegmentReader(readOnly);
   }
 }
 {code}
 It would then be made possible to pass a SegmentReaderFactory to IndexWriter 
 (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, 
 etc)
 I could prepare a patch if others think this has merit
 Obviously, this API would be experimental/advanced/will change in future

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-26 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850313#action_12850313
]

Shai Erera commented on LUCENE-1879:

The way I planned to support multi-threaded indexing is to do a two-phase
addDocument. First, allocate a doc ID from DocumentsWriter (synchronized) and
then add the Document to each Slice with that doc ID. DocumentsWriter was not
suppose to know it is a parallel index ... something like the following.
{code}
int docId = obtainDocId();
for (IndexWriter slice : slices) {
slice.addDocument(docId, Document);
}
{code}

That allows ParallelWriter to be really an orchestrator/manager of all slices,
while each slice can be an IW on its own.

Now, when you say ParallelDocumentsWriter, I assume you mean that that
DocWriter will be aware of the slices? That I think is an interesting idea,
which is unrelated to LUCENE-2324. I.e., ParallelWriter will invoke its
addDocument code which will get down to ParallelDocumentWriter, which will
allocate the doc ID itself and call each slice's DocWriter.addDocument? And
then LUCENE-2324 will just improve the performance of that process?

This might require a bigger change to IW then I had anticipated, but perhaps
it's worth it.

What do you think?

Parallel incremental indexing
-

Key: LUCENE-1879
URL: https://issues.apache.org/jira/browse/LUCENE-1879
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Fix For: 3.1

Attachments: parallel_incremental_indexing.tar

A new feature that allows building parallel indexes and keeping them in sync
on a docID level, independent of the choice of the MergePolicy/MergeScheduler.
Find details on the wiki page for this feature:
http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing
Discussion on java-dev:
http://markmail.org/thread/ql3oxzkob7aqf3jd

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-26 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850336#action_12850336
 ] 

Shai Erera commented on LUCENE-1879:


Hi Grant - I believe what you describe is related to solving the incremental 
field updates problem, where someone might want to change the value of a 
specific document's field. But PI is not about that. Rather, PI is about 
updating a whole slice at once, ie, changing a field's value across all docs, 
or adding a field to all docs (I believe such question was asked on the user 
list few days ago). I've listed above several scenarios where PI is useful for, 
but unfortunately it is unrelated to incremental field updates.

If I misunderstood you, then please clarify.

Re incremental field updates, I think your direction is interesting, and 
deserves discussion, but in a separate issue/thread?

 Parallel incremental indexing
 -

 Key: LUCENE-1879
 URL: https://issues.apache.org/jira/browse/LUCENE-1879
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
 Fix For: 3.1

 Attachments: parallel_incremental_indexing.tar


 A new feature that allows building parallel indexes and keeping them in sync 
 on a docID level, independent of the choice of the MergePolicy/MergeScheduler.
 Find details on the wiki page for this feature:
 http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing 
 Discussion on java-dev:
 http://markmail.org/thread/ql3oxzkob7aqf3jd

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader

2010-03-25 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849728#action_12849728
 ] 

Shai Erera commented on LUCENE-2345:


bq. The IndexWriter now has a getter and setter for setting this

If this is not expected to change during the lifetime of IW, I think it should 
be added to IWC when you upgrade the patch to 3.1.

 Make it possible to subclass SegmentReader
 --

 Key: LUCENE-2345
 URL: https://issues.apache.org/jira/browse/LUCENE-2345
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Reporter: Tim Smith
 Fix For: 3.1

 Attachments: LUCENE-2345_3.0.patch


 I would like the ability to subclass SegmentReader for numerous reasons:
 * to capture initialization/close events
 * attach custom objects to an instance of a segment reader (caches, 
 statistics, so on and so forth)
 * override methods on segment reader as needed
 currently this isn't really possible
 I propose adding a SegmentReaderFactory that would allow creating custom 
 subclasses of SegmentReader
 default implementation would be something like:
 {code}
 public class SegmentReaderFactory {
   public SegmentReader get(boolean readOnly) {
 return readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
   }
   public SegmentReader reopen(SegmentReader reader, boolean readOnly) {
 return newSegmentReader(readOnly);
   }
 }
 {code}
 It would then be made possible to pass a SegmentReaderFactory to IndexWriter 
 (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, 
 etc)
 I could prepare a patch if others think this has merit
 Obviously, this API would be experimental/advanced/will change in future

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2215) paging collector

2010-03-25 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850002#action_12850002
]

Shai Erera commented on LUCENE-2215:

bq. since I think it's safe to say most applications implement paging

Let's be careful about the semantics here Grant. Most if not all applications
implement paging indeed, but I believe only FEW actually store user contexts
between searches. PagingCollector relies on the application to store the lowest
ranking doc that was returned previously, which means storing context between
user's searches.

I agree w/ Mike's statement about 99.9% of the searches would never run that
code, which is why I've proposed a delegation/wrapper approach from the
beginning. I also think that we should make some allowances here and there, for
the non-common case, and introduce better software design than specialized
code. A Collector filter approach for some rare (or even less common) cases
seems very reasonable to me.

Also, I think that if we add to TSDC a create method which takes into account
the previously scored lowest doc, it will confuse people. Now they will need to
think where do I get this low score from? - but perhaps after I see the code,
it wouldn't be such a bad thing just have a feeling TSDC and TFC should be
left on their own, and extreme paging stuff should either be its own
specialized collector, or a wrapper.

paging collector

Key: LUCENE-2215
URL: https://issues.apache.org/jira/browse/LUCENE-2215
Project: Lucene - Java
Issue Type: New Feature
Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
Attachments: IterablePaging.java, LUCENE-2215.patch,
PagingCollector.java, TestingPagingCollector.java

http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
Somebody assign this to Aaron McCurry and we'll see if we can get enough
votes on this issue to convince him to upload his patch. :)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2215) paging collector

2010-03-24 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849200#action_12849200
 ] 

Shai Erera commented on LUCENE-2215:


So what's the motivation of declaring PagingCollector a TopDocsCollector? Would 
you envision one to request for a TopDocsCollector but don't care if it's TSDC, 
TFC or PagingCollector? I would rather have it extend TDC directly, and then 
you won't need to throw UOE for the rest of the methods ...

What about renaming it to TopScorePagingCollector?

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors

2010-03-24 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849384#action_12849384
 ] 

Shai Erera commented on LUCENE-2343:


In the patch you write: topDocOrdered - Creates a TopDocCollector that 
requires in order docs - did you mean TopScoreDocCollector? Because 
TopDocCollector is abstract ...

I think the following:
{code}
+  Class? extends Collector clazz = (Class? extends Collector) 
Class.forName(clnName);
+  collector = clazz.newInstance();
{code}
can be written as 
Class.forName(clnName).asSubclass(Collector.class).newInstance();

Also, and it's a style issue, can you remove the '== true/false' from ifs?

I'd change *if (clnName.equals() == false)* to *if (clnName.length()  0)*.

Why does benchmark/build.xml now relies on the compiled classes/test (of core)?

 Add support for benchmarking Collectors
 ---

 Key: LUCENE-2343
 URL: https://issues.apache.org/jira/browse/LUCENE-2343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2343.patch


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors

2010-03-24 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849393#action_12849393
 ] 

Shai Erera commented on LUCENE-2343:


ok I won't argue about == true/false. It's a style thing and I'm not too 
fanatic about it :).

 Add support for benchmarking Collectors
 ---

 Key: LUCENE-2343
 URL: https://issues.apache.org/jira/browse/LUCENE-2343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2343.patch


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors

2010-03-24 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849403#action_12849403
 ] 

Shai Erera commented on LUCENE-2343:


I wasn't talking about the name of the parameter but about the comment in the 
javadoc. TopDocsCollector is a typo - should have been TopScoreDocCollector. If 
you also want to change the name of the parameter in the .alg file that's ok as 
well, though I'm fine w/ topDocOrdered/Unordered.

 Add support for benchmarking Collectors
 ---

 Key: LUCENE-2343
 URL: https://issues.apache.org/jira/browse/LUCENE-2343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2343.patch


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-24 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849404#action_12849404
 ] 

Shai Erera commented on LUCENE-2339:


Do we want to suppress only IOExceptions? What about any RuntimeExceptions - 
upon hitting any of them the code will fly away? Not saying it's a bad thing, 
but pointing it out.

Other than that, the patch looks good. closeSafely is not exactly what I had in 
mind about closeNoException because it forces you to catch the IOE if you don't 
declare you throw it, or you need to move on, discarding it. But I guess this 
is a matter for another issue. 

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch, 
 LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors

2010-03-24 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849416#action_12849416
 ] 

Shai Erera commented on LUCENE-2343:


Looks good !

 Add support for benchmarking Collectors
 ---

 Key: LUCENE-2343
 URL: https://issues.apache.org/jira/browse/LUCENE-2343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2343.patch, LUCENE-2343.patch


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors

2010-03-24 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849435#action_12849435
 ] 

Shai Erera commented on LUCENE-2343:


I've just realized you haven't added a CHANGES entry (and I missed that in my 
previous review, sorry).

 Add support for benchmarking Collectors
 ---

 Key: LUCENE-2343
 URL: https://issues.apache.org/jira/browse/LUCENE-2343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2343.patch, LUCENE-2343.patch


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2342) DisjunctionSumScorer explain

2010-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848560#action_12848560
 ] 

Shai Erera commented on LUCENE-2342:


Took me a while to spot the typo :). Can you reproduce a problem w/ a nice test 
case? So that we won't run into this issue in the future again.

 DisjunctionSumScorer explain
 

 Key: LUCENE-2342
 URL: https://issues.apache.org/jira/browse/LUCENE-2342
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Gary Yngve
Priority: Minor
   Original Estimate: 0.17h
  Remaining Estimate: 0.17h

 The bottom of the explain method in DisjunctionSumScorer says
 if (nrMatchers = minimumNrMatchers) {
 This is incorrect.. it should say
 if (nrMatches = minimumNrMatchers) {
 nrMatchers is the instance variable used for advancing, whereas nrMatches is 
 explain's local variable.
 Minor, because I don't think DSS's explain is ever called by anything 
 (BooleanWeight has its own explain)?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848565#action_12848565
]

Shai Erera commented on LUCENE-2339:

I personally haven't seen problem using NIO on Windows, but that's perhaps just
because I haven't run into them yet :). I think your proposal makes sense -
let's start w/ NIO bulk-copy and then we can disable if people complain or
report errors.

Consistency is important, I agree. So let's keep Collection there. I just
wanted to avoid converting arrays to a Collection, just so that they can be
iterated on. Seems a waste to me, but not so much to argue about :).

Re (7), I hate such libraries too. But I hate more the ones that just hide
problems away from me :). The ideal thing was if Lucene would use a logging
mechanism (I once started it on LUCENE-1482) so that you could include the
stacktrace print if logging is enabled. But currently the code just hides the
problem away ... and I'd hate to debug such thing, not realizing an IO
exception is thrown from close().

So unless LUCENE-1482 springs back to life again, what do you suggest we do?
Suppressing the exceptions seems wrong to me.

Allow Directory.copy() to accept a collection of file names to be copied

Key: LUCENE-2339
URL: https://issues.apache.org/jira/browse/LUCENE-2339
Project: Lucene - Java
Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch

Par example, I want to copy files pertaining to a certain commit, and not
everything there is in a Directory.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2010-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848571#action_12848571
 ] 

Shai Erera commented on LUCENE-1482:


Well ... since Mark hasn't closed it yet (thanks Mark :)), I thought to try 
once more. Perhaps w/ the merge of Lucene/Solr this will look more reasonable 
now? I personally feel that just setting InfoStream on IW is not enough. I 
don't think we need to control logging per level either. I think it's important 
to introduce this in at least one of the following modes:
# We add SLF4J and allow the application to control logging per package(s), but 
the logging level won't matter - as long as it's not OFF, we log.
# We add a static factory LuceneLogger or something, which turns logging 
on/off, in which case all components/packages either log or not.

I think (1) gives us greater flexibility (us as in the apps developers), but 
(2) is also acceptable. As long as we can introduce logging messages from more 
components w/o passing infoStream around ... On LUCENE-2339 for example, a 
closeSafely method was added which suppresses IOExceptions that may be caused 
by io.close(). You cannot print the stacktrace because that would be 
unacceptable w/ products that are not allowed to print anything unless logging 
has been enabled, but on the other hand suppressing the exception is not good 
either ... in this case, a LuceneLogger could have helped because you could 
print the stacktrace if logging was enabled.

 Replace infoSteram by a logging framework (SLF4J)
 -

 Key: LUCENE-1482
 URL: https://issues.apache.org/jira/browse/LUCENE-1482
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-1482-2.patch, LUCENE-1482.patch, 
 slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar


 Lucene makes use of infoStream to output messages in its indexing code only. 
 For debugging purposes, when the search application is run on the customer 
 side, getting messages from other code flows, like search, query parsing, 
 analysis etc can be extremely useful.
 There are two main problems with infoStream today:
 1. It is owned by IndexWriter, so if I want to add logging capabilities to 
 other classes I need to either expose an API or propagate infoStream to all 
 classes (see for example DocumentsWriter, which receives its infoStream 
 instance from IndexWriter).
 2. I can either turn debugging on or off, for the entire code.
 Introducing a logging framework can allow each class to control its logging 
 independently, and more importantly, allows the application to turn on 
 logging for only specific areas in the code (i.e., org.apache.lucene.index.*).
 I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, 
 as it names states, a facade over different logging frameworks. As such, you 
 can include the slf4j.jar in your application, and it recognizes at deploy 
 time what is the actual logging framework you'd like to use. SLF4J comes with 
 several adapters for Java logging, Log4j and others. If you know your 
 application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in 
 your classpath, and your logging statements will use Java logging underneath 
 the covers.
 This makes the logging code very simple. For a class A the logger will be 
 instantiated like this:
 public class A {
   private static final logger = LoggerFactory.getLogger(A.class);
 }
 And will later be used like this:
 public class A {
   private static final logger = LoggerFactory.getLogger(A.class);
   public void foo() {
 if (logger.isDebugEnabled()) {
   logger.debug(message);
 }
   }
 }
 That's all !
 Checking for isDebugEnabled is very quick, at least using the JDK14 adapter 
 (but I assume it's fast also over other logging frameworks).
 The important thing is, every class controls its own logger. Not all classes 
 have to output logging messages, and we can improve Lucene's logging 
 gradually, w/o changing the API, by adding more logging messages to 
 interesting classes.
 I will submit a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848606#action_12848606
 ] 

Shai Erera commented on LUCENE-2339:


Sorry ... I was confused w/ the for loop of Java 5 :). Let's keep it Collection 
then. Sorry for the hassle.

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848636#action_12848636
 ] 

Shai Erera commented on LUCENE-2339:


I don't want to block the issue. If LUCENE-1482 will advance somewhere, we'll 
log a message in closeSafely. Otherwise between suppressing to always printing 
I agree we should suppress. If someone does not want to suppress he should call 
close(). Which makes me think we should call this method closeNoException 
because closeSafely is not exactly what it does :).

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848729#action_12848729
 ] 

Shai Erera commented on LUCENE-2339:


Mike, that's what I wrote above if someone does not want to suppress, he 
should call close. I think that closeSafely (or as I prefer it - 
closeNoException) should be closed only when you know you've hit an exception 
and you want to close the stream suppressing any exceptions. Otherwise call 
close().

bq. can we add a boolean arg (suppressExceptions) to control that?

That would beat the purpose of the method no? I mean, currently it does not 
throw any exception, not even declaring one, and if we add that boolean it will 
need to declare throws IOException, which will force the caller to try-catch 
that exception and ... suppress it or document // cannot happen because I've 
passed false?

So how about we call it closeNoException, document that it does not throw any 
exception and intentionally suppresses them, and if you don't want them to be 
suppressed, you can call io.close() yourself?

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848753#action_12848753
 ] 

Shai Erera commented on LUCENE-2339:


bq. But there is still a need to close everything, but do throw the 1st 
exception you hit.

Ohh I see what you mean. My assumption is that when you call closeNoException 
you already know that you've hit an exception and just want to close the stream 
w/o getting more exceptions. If you don't know that, don't call 
closeNoException?

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848777#action_12848777
 ] 

Shai Erera commented on LUCENE-2339:


Ok that's indeed different :). I guess we can introduce it now, in this issue 
(it's tiny and simple). A closeAll which documents it throws the first 
exception it hits.

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848896#action_12848896
]

Shai Erera commented on LUCENE-2215:

I've reviewed PagingCollector.java and the first thing I have to say about it
is that I really like it ! :) Saves lots of unnecessary heapify code, if the
application can allow itself to store the lowest last SD.

I have few comments/questions.

I don't understand what getLastScoreDoc is for? Is it just a utility method? Is
it something the app can compute by itself? Anyway, it lacks javadocs, so
perhaps if they existed I wouldn't need to ask ;).

In collect(), there's the following code:
{code}
} else if (score == previousPassLowest.score doc =
previousPassLowest.doc) {
// if the scores are the same and the doc is less than
or equal to
// the
// previous pass lowest hit doc then skip because this
collector
// favors
// lower number documents.
return;
{code}

I think there's a typo in the comment favors lower number documents .. while
it seems to prefer higher doc IDs? The way I understand it, irregardless of
whether docs are collected in/out of order, HitQueue ensures that when scores
are equals, the lowest IDs are favored. Thus the first round always keeps the
lowest IDs among the docs whose scores match. The next round will favor the
docs whose IDs come next, and so forth ... am I right? (just clarifying my
understanding).
If that's the case, I think it'll be good if it's spelled out in the comment,
and also mention that it means that document has already been returned
previously (like it's documented in the previous 'if').

The last 'else' really looks like TSDC's out-of-order version, which makes me
think whether PagingCollector can be viewed as a filter on top of TSDC (and
possibly even TopFieldCollector)? So if a hit should be collected, it just
calls super.collect? I realize though that a Collector is a hotspot and we want
to minimize 'if' let alone method call statements as much as possible. But it
just feels so strong that it should be a filter ... :). And you wouldn't need
to specifically handle in/out orderness ... and w/ the right design, it can
also wrap a TFC or any other TDC implementation ...

BTW, I've noticed that you don't track maxScore - is it assumed that the
application stores it from the first round? If so I'd document it, because the
application needs to know it should use TSDC the first round, and
PagingCollector the second round.

Also, PagingCollector offers a ctor which does not force the application to
pass in a ScoreDoc. See my comment from above - it might be misleading, because
if you use this collector right from the very first search, you lose the
maxScore tracking. I also don't see why it should be allowed - if a dummy
previousPassLowest ScoreDoc is used, collect() does a lot of unnecessary 'if's.
I think this collector should be used only from the second round, and a single
ctor which forces a ScoreDoc to be passed would make more sense. If the
application wishes to shoot itself in the leg (performance-wise), it can pass a
dummy SD itself.

paging collector

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848908#action_12848908
]

Shai Erera commented on LUCENE-2215:

I must admit I don't like throwing UOE. I imagine the naive user calling one of
these and hit w/ UOE out of nowhere really :). Perhaps it's a sign
PagingCollector should not be a sub-class of TopDocsCollector? It does not
benefit from it in any way because it overrides all the main methods, impls
them or throws UOE for those it doesn't like. So perhaps it should just be a
TopScorePagingCollector which copies some of the functionality of TSDC, but is
not a TDC itself. It will have a topDocs() method, and only it (b/c I agree the
rest don't make any sense).

Notice the different name I propose - to make it clear it's a collector that
can be used for paging through a scored list of results.

I BTW liked that the if/else clauses were separated, b/c you could include
meaningful documentation for each. Right now those are just very long lines.

About in-order, I think the only thing you will save is the last 'else'. Read
my comment above about wrapping TSDC ... not sure about it, but it will make it
more elegant.

I'll review the rest of the patch. Didn't yet understand what's PagingIterable
for ...

paging collector

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2331) Add NoOpMergePolicy

2010-03-22 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-2331:
---

Attachment: LUCENE-2331.patch

Sorry - new eclipse and project settings :). Should be ok now.

Add NoOpMergePolicy
---

Key: LUCENE-2331
URL: https://issues.apache.org/jira/browse/LUCENE-2331
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
Fix For: 3.1

Attachments: LUCENE-2331.patch, LUCENE-2331.patch

I'd like to add a simple and useful MP implementation which does nothing
! :). I've came across many places where either the following is documented
or implemented: if you want to prevent merges, set mergeFactor to a high
enough value. I think a NoOpMergePolicy is just as good, and can REALLY
allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL).
As such, NoOpMergePolicy will be introduced as a singleton, and can be used
for convenience purposes only. Also, for Parallel Index it's important,
because I'd like the slices to never do any merges, unless ParallelWriter
decides so. So they should be set w/ that MP.
I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need
to change it afterwards.
About the name - I like the name, but suggestions are welcome. I thought of a
NullMergePolicy, but I don't like 'Null' used for a NoOp.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2331) Add NoOpMergePolicy

2010-03-22 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848113#action_12848113
]

Shai Erera commented on LUCENE-2331:

bq. do you think we should allow instantiation of NoMergePolicy, allowing you
to control if it uses CFS or not?

You ask because of the useCompound* methods? I wanted NMP to be a singleton
really, and I don't think those two really matter? Meaning, if you are using
it, I guess you don't really care if it uses a cmpnd file or not?

But if you think it's important, I can create 3 singletons:
NO_COMPOUND_FILES_AND_STORE, COMPOUND_FILES, COMPOUND_FILES_AND_STORE (I really
hate the long names though). We can settle w/ just two - (NO)COMPOUND_FILES ...

Add NoOpMergePolicy
---

Key: LUCENE-2331
URL: https://issues.apache.org/jira/browse/LUCENE-2331
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
Fix For: 3.1

Attachments: LUCENE-2331.patch, LUCENE-2331.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2331) Add NoOpMergePolicy

2010-03-22 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-2331:
---

Attachment: LUCENE-2331.patch

Patch includes NoMergePolicy.NO_COMPOUND_FILES and COMPOUND_FILES singletons.

Add NoOpMergePolicy
---

Key: LUCENE-2331
URL: https://issues.apache.org/jira/browse/LUCENE-2331
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
Fix For: 3.1

Attachments: LUCENE-2331.patch, LUCENE-2331.patch, LUCENE-2331.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2331) Add NoOpMergePolicy

2010-03-22 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848192#action_12848192
]

Shai Erera commented on LUCENE-2331:

I think it's correct. The idea is to say that even w/ NMP, if you use NMS you
ensure that no MS code is ever run (e.g. if you use NMP only, then CMS code
[default] will always run but won't do anything).

Add NoOpMergePolicy
---

Key: LUCENE-2331
URL: https://issues.apache.org/jira/browse/LUCENE-2331
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
Fix For: 3.1

Attachments: LUCENE-2331.patch, LUCENE-2331.patch, LUCENE-2331.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848341#action_12848341
]

Shai Erera commented on LUCENE-2328:

Earwin, can you add a deprecation message to sync(String)? When I upgraded from
2.9 to 3.0 some methods were deprecated w/o any explanation as to what I should
use instead. I think a message like @deprecated use #sync(Collection) instead.
For easy migration you can change your code to call
sync(Colllections.singleton(name)) ... or something along those lines.

Other than that, patch looks great! I really like the code cleanup from IW.

IndexWriter.synced field accumulates data leading to a Memory Leak
---

Key: LUCENE-2328
URL: https://issues.apache.org/jira/browse/LUCENE-2328
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
Fix For: 3.1

Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch

Original Estimate: 1h
Remaining Estimate: 1h

I am running into a strange OutOfMemoryError. My small test application does
index and delete some few files. This is repeated for 60k times. Optimization
is run from every 2k times a file is indexed. Index size is 50KB. I did
analyze
the HeapDumpFile and realized that IndexWriter.synced field occupied more than
half of the heap. That field is a private HashSet without a getter. Its task
is
to hold files which have been synced already.
There are two calls to addAll and one call to add on synced but no remove or
clear throughout the lifecycle of the IndexWriter instance.
According to the Eclipse Memory Analyzer synced contains 32618 entries which
look like file names _e065_1.del or _e067.cfs
The index directory contains 10 files only.
I guess synced is holding obsolete data

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-22 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848376#action_12848376
]

Shai Erera commented on LUCENE-2339:

Patch looks good! Few comments:

# is it safe to use NIO for all FSDirs? I thought that on Windows NIO has some
bugs/limitations. In that case, would it be safer if just NIOFSDir used NIO?
# Can copyTo(Directory, CollectionString) be changed to copyTo(Directory,
IterableString)? Unless we think that someone would want to use size() or
something.
# I know it's a matter of style, but you import static Arrays.asList, and
then use asList directly in copyTo(Dir). It confuses me because I expect asList
to be a method declared on Dir, and so I prefer to see Arrays.asList. But it's
just style, don't know how others feel about that.
# On copyTo(Dir), perhaps instead of converting the listAll() to List and then
remove elements from it, you can just iterate on whatever listAll() returns and
add the files that pass the filter to a list? You can even optimize and if all
the files Dir returned pass the filter, you can just pass the array to
copyTo(Dir, Iterable), assuming we change the method to accept Iterable. But
that's a minor optimization.
# copy(src, dest, boolean) - can you add a message to @deprecated so users will
know what to replace it with more easily?
# I see that copy(src, dest) also accepts a boolean of whether to close the src
directory. But copyTo(dIr) doesn't. I personally think it's ok, as someone can
call close on src himself, but am wondering if it wouldn't be more convenient.
I.e. instead of change calls from Directory.copy(src, dest, true), I now need
to do src.copyTo(dest) followed by a src.close().
# closeSafely - perhaps print the stacktrace, even if you don't throw it?

Allow Directory.copy() to accept a collection of file names to be copied

Par example, I want to copy files pertaining to a certain commit, and not
everything there is in a Directory.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2337) DisjunctionSumScorer and ScorerDocQueue javadocs and one method name out of date after move from skipTo() to advance()

2010-03-21 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847916#action_12847916
 ] 

Shai Erera commented on LUCENE-2337:


Note that -1 is a valid return value in case doc() is called before nextDoc(). 
However it is not valid for nextDoc() and advance().

 DisjunctionSumScorer and ScorerDocQueue javadocs and one method name out of 
 date after move from skipTo() to advance()
 --

 Key: LUCENE-2337
 URL: https://issues.apache.org/jira/browse/LUCENE-2337
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Javadocs, Search
Reporter: Paul Elschot
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2337.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2333) Failures during contrib builds, when classes in core were changed without ant clean

2010-03-19 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847415#action_12847415
]

Shai Erera commented on LUCENE-2333:

This up-to-date thingy looks really cool and useful. So I guess you'd compare
the .jar date and the build/classes/java date? This is sort of what javac does
when it decides which classes to compile ... I guess.

Failures during contrib builds, when classes in core were changed without ant
clean
---

Key: LUCENE-2333
URL: https://issues.apache.org/jira/browse/LUCENE-2333
Project: Lucene - Java
Issue Type: Bug
Components: Build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 3.1

Attachments: LUCENE-2333.patch, shai-compile-fix.patch,
shai-compile-fix2.patch

From java-dev by Shai Erera:
{quote}
I've noticed that sometimes, after I run test-core and test-contrib, and then
change core code, test-contrib fail on NoSuchMethodError and stuff like that.
I've noticed that core.jar exists under build, and I assumed it's used by
test-contrib, and probably is not recreated after core code has changed.
I verified it when looking in contrib-build.xml, which defines a property
lucene.jar.present which is set to true if the jar is ... well, present.
Which I believe is the reason for these failures. I've been thinking how to
resolve that, and I can think of two ways:
(1) have test-core always delete that file, but that has two issues:
(1.1) It's redundant if the code hasn't changed.
(1.2) It forces you to either jar-core or test-core before you test-contrib,
if you want to make sure you run w/ the latest jar.
or
(2) have test-contrib always call jar-core, which will first delete the file
and then re-create it by compiling first. Compiling should not do anything if
the code hasn't changed. So the only waste would be to create the .jar, but I
think that's quite fast?
Does anyone, with more Ant skills than me, know of a better way to detect
from test-contrib that core code has changed and only then rebuild the jar?
{quote}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847448#action_12847448
]

Shai Erera commented on LUCENE-2328:

Earwin, I agree that sub-classing FSDir is not that easy. So I guess you'll add
another piece of jdoc to createOutput, to notify Dir when it's closed? This
seems reasonable.

IndexWriter.synced field accumulates data leading to a Memory Leak
---

Original Estimate: 1h
Remaining Estimate: 1h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847585#action_12847585
]

Shai Erera commented on LUCENE-2328:

bq. Trying to sync a file that hasn't yet been closed will be undefined

Can we avoid 'undefined'? We have an issue open about SegmentInfos.fileLength()
not clearly defined and it causes confusion. If it's undefined, then someone
might attempt to call sync before he closes the file, and only then close ...
can we throw an exception in that case?

We can have close(), sync() and closeAndSync(). Would the latter make sense?

I prefer if the API will be explicit,, and I think that throwing an exception
(StillOpenException?) if sync() is called before close() is very explicit, and
reasonable if accompanied by a proper jdoc.

IndexWriter.synced field accumulates data leading to a Memory Leak
---

Original Estimate: 1h
Remaining Estimate: 1h

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2331) Add NoOpMergePolicy

2010-03-19 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-2331:
---

Attachment: LUCENE-2331.patch

Patch includes:
* NoMergePolicy + TestNoMergePolicy
* NoMergeScheduler + TestNoMergeScheduler
* MergeScheduler - methods changed to public
* CHANGES entry (New Features)

Add NoOpMergePolicy
---

Key: LUCENE-2331
URL: https://issues.apache.org/jira/browse/LUCENE-2331
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Reporter: Shai Erera
Fix For: 3.1

Attachments: LUCENE-2331.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2336) off by one: DisjunctionSumScorer::advance

2010-03-19 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847716#action_12847716
 ] 

Shai Erera commented on LUCENE-2336:


Hi Gary

This has been discussed before (I'm not sure if about DisjunctionSumScorer 
specifically), and therefore there is also a NOTE in advance() of DISI:
{code}
   * bNOTE:/b certain implementations may return a different value (each
   * time) if called several times in a row with the same target.
{code}
Note the *may return a different value...* part. I remember while working on 
LUCENE-1614 that this has been discussed and thus we ended up w/ documenting 
that *may return* part. See here: 
https://issues.apache.org/jira/browse/LUCENE-1614?focusedCommentId=12710860page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12710860
 and read some above and below to see relevant discussion.

I'll need to refresh my memory though why DisjunctionSumScorer works like that 
... perhaps an overlook on my side from 1614, but perhaps there was a reason.

Anyway, about the code example you gave above, why would you want to call 
advance w/ the same value many times? What's the use case? If you're only 
dealing w/ one DISI, then unless you really want to skip to a certain document, 
I don't see any reason for calling advance. The usage is typically if you have 
2 or more DISIs, and one's nextDoc or advance returned a value that is greater 
than the other's doc() ...

Also, it's risky to write the code you wrote, because some scorers, upon init 
are already on a certain doc (I think the Disj. ones, but maybe also the Conj. 
one), and so by calling advance(1), you will actually *skip* over the first 
document and miss a hit.

Can you clarify the usage then?

 off by one: DisjunctionSumScorer::advance
 -

 Key: LUCENE-2336
 URL: https://issues.apache.org/jira/browse/LUCENE-2336
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Gary Yngve
Priority: Minor
   Original Estimate: 4h
  Remaining Estimate: 4h

 The bug is:
 if (target = currentDoc) {
 should be
 if (target  currentDoc) {
 based on the comments for the method as well as the contract for 
 DocIdSetIterator: Advances to the first beyond the current
 It can be demonstrated by:
   assertEquals(advance(1) first match failed, 1, 
 scorer.advance(1));
   assertEquals(advance(1) second match failed, n, 
 scorer.advance(1));
 if docId: 1 is a hit and n is the next hit.  (Tests all pass if this code 
 change is made.)
 I'm not labeling it as major because the class is package-protected and 
 currently passes spec.
 Relevant excerpt:
  /**
* Advances to the first match beyond the current whose document number is
* greater than or equal to a given target. br
* When this method is used the {...@link #explain(int)} method should not 
 be
* used. br
* The implementation uses the skipTo() method on the subscorers.
* 
* @param target
*  The target document number.
* @return the document whose number is greater than or equal to the given
* target, or -1 if none exist.
*/
   public int advance(int target) throws IOException {
 if (scorerDocQueue.size()  minimumNrMatchers) {
   return currentDoc = NO_MORE_DOCS;
 }
 if (target = currentDoc) {
   return currentDoc;
 }
 do {
   if (scorerDocQueue.topDoc() = target) {
 boolean b = advanceAfterCurrent();
 return b ? currentDoc : (currentDoc = NO_MORE_DOCS);
   } else if (!scorerDocQueue.topSkipToAndAdjustElsePop(target)) {
 if (scorerDocQueue.size()  minimumNrMatchers) {
   return currentDoc = NO_MORE_DOCS;
 }
   }
 } while (true);
   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2320) Add MergePolicy to IndexWriterConfig

2010-03-18 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2320:
---

Attachment: LUCENE-2320.patch

Fixed a copy-paste comment error in IndexWriter (introduced in LUCENE-2294).

 Add MergePolicy to IndexWriterConfig
 

 Key: LUCENE-2320
 URL: https://issues.apache.org/jira/browse/LUCENE-2320
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2320.patch, LUCENE-2320.patch, LUCENE-2320.patch, 
 LUCENE-2320.patch, LUCENE-2320.patch


 Now that IndexWriterConfig is in place, I'd like to move MergePolicy to it as 
 well. The change is not straightforward and so I've kept it for a separate 
 issue. MergePolicy requires in its ctor an IndexWriter, however none can be 
 passed to it before an IndexWriter actually exists. And today IW may create 
 an MP just for it to be overridden by the application one line afterwards. I 
 don't want to make iw member of MP non-final, or settable by extending 
 classes, however it needs to remain protected so they can access it directly. 
 So the proposed changes are:
 * Add a SetOnce object (to o.a.l.util), or Immutable, which can only be set 
 once (hence its name). It'll have the signature SetOnceT w/ *synchronized 
 setT* and *T get()*. T will be declared volatile, so that get() won't be 
 synchronized.
 * MP will define a *protected final SetOnceIndexWriter writer* instead of 
 the current writer. *NOTE: this is a bw break*. any suggestions are welcomed.
 * MP will offer a public default ctor, together with a set(IndexWriter).
 * IndexWriter will set itself on MP using set(this). Note that if set will be 
 called more than once, it will throw an exception (AlreadySetException - or 
 does someone have a better suggestion, preferably an already existing Java 
 exception?).
 That's the core idea. I'd like to post a patch soon, so I'd appreciate your 
 review and proposals.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

1 2 3 4 5 6 7 8 >

1 - 100 of 755 matches

Mail list logo