[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044608#comment-13044608
 ] 

Paul Elschot edited comment on LUCENE-2454 at 6/7/11 7:37 AM:
--

I finally had some time to start taking a look at the grouping module and again 
at the patch here.
There is too much code there for me to come up with a test case soon.
So please don't wait for me to commit this.

An easy way to test this would be to have a boolean query with required term 
and an optional term,
with the optional term occurring in a document group in a document before (i.e. 
with a lower docId than)
a document in the same group with a required term.

In case I run into this I'll open a separate issue.


  was (Author: paul.elsc...@xs4all.nl):
I finally had some time to start taking a look at the grouping module and 
again at the patch here.
There is too much code there for me to come up with a test case soon.
So please don't wait for me to commit this.

An easy way to test this would be to have a boolean query with required term 
and an optional term,
with the optional term occurring the in a document group in a document before 
(i.e. with a lower docId than)
a document in the same group with a required term.

In case I run into this I'll open a separate issue.

  
 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-07 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker reassigned LUCENE-2793:
-

Assignee: Varun Thacker  (was: Simon Willnauer)

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045314#comment-13045314
 ] 

Paul Elschot commented on LUCENE-2454:
--

That is very nicely readable XML.

The problem might occur when a document with an optional term occurs before a 
document in the same group with a required term.
So the second question is the one for which the problem might occur.
The score value Grant's resume should then be higher than the score value for 
Sean's.
Testing only for the set of expected results is not enough for this particular 
query.

The problem might occur in another disguise when requiring both terms and then 
the set of expected results is enough to test,
but this is not as easily tested because one does not know beforehand the order 
in which the terms are going to be advance()d.
The case with an optional term is simpler to test because the optional term is 
certain to be advance()d to compute the score value after the required term 
determines that there is a match (see ReqOptSumScorer.score()), and then to be 
certain of the correct advance() on the optional term one needs to test the 
score value.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-07 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I have made the required changes. There are places where I have made the 
Context=Other , some of which might be wrong. Please suggest me where to make 
the necessary changes. 

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045319#comment-13045319
 ] 

Paul Elschot commented on LUCENE-2454:
--

Looking at the structure of the BooleanQuery, I would expect this to work 
correctly.  The ParentsFilter on the unfiltered scorer of required term 
(mahout) should return the docId of the parent (resume) when the unfiltered 
scorer is at the document containing the required term.


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-07 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045322#comment-13045322
 ] 

Simon Willnauer commented on LUCENE-2793:
-

Very very quick review:

* I think OneMerge should not be required for createing a IOContext we maybe 
should add a default ctor.
* IOContext.Other is confusing I think. If a IOContext doesn't make lots of 
sense somewhere we should not need to pass it in. Can't we simply have 
overloaded methods? maybe I just don't like the name, maybe use DEFAULT?
* IOContext seem to pretty straight forward you either read or write but it 
seems to be confused with high level operations like Merge and Flush. Either 
with go on a high level or we only have read and write here. Since read / write 
is implicit (you either pull input or output) we should make this high-level 
only. So maybe we have Query or Search instead of Read here? Maybe it makes 
sense to specify stuff like Consume or Sequential here too some high level APIs 
define sequential access so I think it does not conflict?

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045334#comment-13045334
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. Looking at the structure of the BooleanQuery, I would expect this to work 
correctly.

I've found it to be robust so far - you just need to be clear about directing 
criteria at only one child or potentially different children. 
The main challenge in using this functionality is allowing users to articulate 
the nuances of such queries and Lucene-3133 is a holding place for this.

Under the covers using the same cached filter for parent filters certainly 
helps with performance and I typically wrap the ParentFilter tag in the XML 
queries with a CachedFilter tag to achieve this

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3177) Decouple indexer from Document/Field impls

2011-06-07 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3177:
---

Attachment: LUCENE-3177.patch

Patch.

Tests pass, but there are many nocommits.

I was *almost* able to create only IndexableField, so that
IW.addDocument took IterableIndexableField, except for doc level
boost, so I had to create IndexableDocument.

I also cutover to a .binaryValue(BytesRef reuse) API here, replacing
getBinaryValue/Length/Offset.

I would actually like to take IndexableDocument/Field further, so that
eg responsibiliity for analysis lies under the tokenStreamValue()
method, but I think we should leave that for LUCENE-2309.  This is a
big enough change already...


 Decouple indexer from Document/Field impls
 --

 Key: LUCENE-3177
 URL: https://issues.apache.org/jira/browse/LUCENE-3177
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3177.patch


 I think we should define minimal iterator interfaces,
 IndexableDocument/Field, that indexer requires to index documents.
 Indexer would consume only these bare minimum interfaces, not the
 concrete Document/Field/FieldType classes from oal.document package.
 Then, the Document/Field/FieldType hierarchy is one concrete impl of
 these interfaces. Apps are free to make their own impls as well.
 Maybe eventually we make another impl that enforces a global schema,
 eg factored out of Solr's impl.
 I think this frees design pressure on our Document/Field/FieldType
 hierarchy, ie, these classes are free to become concrete
 fully-featured user-space classes with all sorts of friendly sugar
 APIs for adding/removing fields, getting/setting values, types, etc.,
 but they don't need substantial extensibility/hierarchy. Ie, the
 extensibility point shifts to IndexableDocument/Field interface.
 I think this means we can collapse the three classes we now have for a
 Field (Fieldable/AbstracField/Field) down to a single concrete class
 (well, except for LUCENE-2308 where we want to break out dedicated
 classes for different field types...).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3177) Decouple indexer from Document/Field impls

2011-06-07 Thread Michael McCandless (JIRA)
Decouple indexer from Document/Field impls
--

 Key: LUCENE-3177
 URL: https://issues.apache.org/jira/browse/LUCENE-3177
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0
 Attachments: LUCENE-3177.patch

I think we should define minimal iterator interfaces,
IndexableDocument/Field, that indexer requires to index documents.

Indexer would consume only these bare minimum interfaces, not the
concrete Document/Field/FieldType classes from oal.document package.

Then, the Document/Field/FieldType hierarchy is one concrete impl of
these interfaces. Apps are free to make their own impls as well.
Maybe eventually we make another impl that enforces a global schema,
eg factored out of Solr's impl.

I think this frees design pressure on our Document/Field/FieldType
hierarchy, ie, these classes are free to become concrete
fully-featured user-space classes with all sorts of friendly sugar
APIs for adding/removing fields, getting/setting values, types, etc.,
but they don't need substantial extensibility/hierarchy. Ie, the
extensibility point shifts to IndexableDocument/Field interface.

I think this means we can collapse the three classes we now have for a
Field (Fieldable/AbstracField/Field) down to a single concrete class
(well, except for LUCENE-2308 where we want to break out dedicated
classes for different field types...).


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-06-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045351#comment-13045351
 ] 

Michael McCandless commented on LUCENE-2308:


Thanks for the patch Nikola!

Note: when you submit patches that you intend to donate to Apache, you
should remember to check the box that says Grant license to ASF...,
as long as you are the sole creator of that patch (and thus have the
right to grant this patch the ASF).  Patches that incorporate someone
elses source code are more interesting because we have to ensure the
license is compatible with Apache's, update our LICENSE/NOTICE, etc.

Stepping back here... I think we should think a bit about the target
end goal here and then work out the baby steps to get there?

I think ideally once we are done here, it should be incredibly simple
to create a document, something like this:

{code}
Document d = new Document();
d.add(new TextField(title));
d.add(new StringField(id));
d.add(new BinaryField(bytes));
d.add(new NumericField(price));
{code}

These classes each use a default FieldType under the hood:

  * TextField indexes, tokenizes, with norms and TFAP

  * StringField indexes untokenized and no norms, no TFAP (maybe)

  * BinaryField only stores the byte[]

  * NumericField does what it does today

If an app wants to tweak the type, it can do so, something like this:

{code}
FieldType titleFieldType = new FieldType(Textfield.DEFAULT_TYPE);
titleFieldType.setOmitNorms(true);
titleFieldType.setOmitTFAP(true);
d.add(new Field(titleFieldType, title));
{code}

Ie, the default *Field classes are sugar for binding to the common
default type, but you can easily go and customize the type if you want
to.

Does that sound roughly like the goal here?


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-06-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045350#comment-13045350
 ] 

Michael McCandless commented on LUCENE-2308:


I've opened LUCENE-3177 (and linked to this issue), to strongly
decouple what indexer needs when indexing documents/fields from what
we do in this issue.  Ie, LUCENE-3177 gives us more freedom here, I
think, to create specific concrete FieldType hierarhy for creating
documents.


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045354#comment-13045354
 ] 

Michael McCandless commented on LUCENE-2793:


Patch looks great!  Comments:

  * I think IOCtx should have ctor taking only OneMerge, which would
set the OneMerge and set the context as Merge?

  * Likewise, a ctor taking a SegmentInfo to mean context = Flush?

  * And finally a default ctor that maps to Other (or Default or
Unknown or Unspecified or something)

  * You don't need to create the IOCtx in IW.maybeMerge?

  * Maybe we want a readOnce boolean in IOCtx?  When we read del
docs, norms, terms index, doc values, segments file /
segments.gen, we would set this?  (And UnixDir would send eg
NO_REUSE down to the OS).

  * I think we'll need a NativeMMapDir as well as NativeDir (or
NativeUnix/WindowsDir), because mmap can also take flags giving
hints about access patterns.  I'll open a new issue...

  * Why does SegmentCoreReaders hang onto the IOCtx?  Seems like
classes shouldn't hang onto it... (also: PreFlexFields).

  * Hmm does createOutput even need an IOCtx...?  What would a dir
do with this?  I suppose if it's a merge and we had io
prioritization (someday) we could set lower prio... OK let's keep
it.

  * I think Codec.fieldsProducer/Consumer should take an IOCtx?

  * Still need to fix IW's ReaderPool to key off of IOCtx.Context plus
the info.  Maybe put a // nocommit in there so we remember...?
Eg, where you commented out the readBufferSize = ... inside
ReaderPool.get is a good place.


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-06-07 Thread Nikola Tankovic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045374#comment-13045374
 ] 

Nikola Tankovic commented on LUCENE-2308:
-

Mike, that's exactly what I needed, quick API goal summary. This looks very 
clean and nice to me. Next patch will continue in that direction.

Basically, like I said, we should remove AbstractField and keep only Field 
(with Fieldable interface). Then extend Field with Text,String,Binary and 
Numeric.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3175) speed up core tests

2011-06-07 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3175:


Attachment: LUCENE-3175_2.patch

i shaved another minute off on my mac with this patch.

 speed up core tests
 ---

 Key: LUCENE-3175
 URL: https://issues.apache.org/jira/browse/LUCENE-3175
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3175.patch, LUCENE-3175.patch, LUCENE-3175_2.patch


 Our core tests have gotten slower and slower, if you don't have a really fast 
 computer its probably frustrating.
 I think we should:
 1. still have random parameters, but make the 'obscene' settings like 
 SimpleText rarer... we can always make them happen more on NIGHTLY
 2. tests that make a lot of documents can conditionalize on NIGHTLY so that 
 they are still doing a reasonable test on ordinary runs e.g. numdocs = 
 (NIGHTLY ? 1 : 1000) * multiplier
 3. refactor some of the slow huge classes with lots of tests like 
 TestIW/TestIR, at least pull out really slow methods like TestIR.testDiskFull 
 into its own class. this gives better parallelization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3174) Similarity.Stats class for term collection statistics

2011-06-07 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3174:


Lucene Fields: [New, Patch Available]  (was: [New])

 Similarity.Stats class for term  collection statistics
 ---

 Key: LUCENE-3174
 URL: https://issues.apache.org/jira/browse/LUCENE-3174
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
Priority: Minor
 Fix For: flexscoring branch

 Attachments: LUCENE-3174.patch


 In order to support ranking methods besides TF-IDF, we need to make the 
 statistics they need available. These statistics could be computed in 
 computeWeight (soon to become computeStats) and stored in a separate object 
 for easy access. Since this object will be used solely by subclasses of 
 Similarity, it should be implented as a static inner class, i.e. 
 Similarity.Stats.
 There are two ways this could be implemented:
 - as a single Similarity.Stats class, reused by all ranking algorithms. In 
 this case, this class would have a member field for all statistics;
 - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
 subclass would define only the statistics needed for the ranking algorithm.
 In the second case, the Stats class in DefaultSimilarity would have a single 
 field, idf, while the one in e.g. BM25Similarity would have idf and average 
 field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3174) Similarity.Stats class for term collection statistics

2011-06-07 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3174:


Attachment: LUCENE-3174.patch

Patch v0.1

 Similarity.Stats class for term  collection statistics
 ---

 Key: LUCENE-3174
 URL: https://issues.apache.org/jira/browse/LUCENE-3174
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
Priority: Minor
 Fix For: flexscoring branch

 Attachments: LUCENE-3174.patch


 In order to support ranking methods besides TF-IDF, we need to make the 
 statistics they need available. These statistics could be computed in 
 computeWeight (soon to become computeStats) and stored in a separate object 
 for easy access. Since this object will be used solely by subclasses of 
 Similarity, it should be implented as a static inner class, i.e. 
 Similarity.Stats.
 There are two ways this could be implemented:
 - as a single Similarity.Stats class, reused by all ranking algorithms. In 
 this case, this class would have a member field for all statistics;
 - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
 subclass would define only the statistics needed for the ranking algorithm.
 In the second case, the Stats class in DefaultSimilarity would have a single 
 field, idf, while the one in e.g. BM25Similarity would have idf and average 
 field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-06-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045428#comment-13045428
 ] 

Michael McCandless commented on LUCENE-2308:


I think we can create only Field?  Ie, no Fieldable interface nor AbstractField?

I think IndexableField (LUCENE-3177) is the only interface we need?

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1967) New Native PHP Response Writer Class

2011-06-07 Thread Eric Pugh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045435#comment-13045435
 ] 

Eric Pugh commented on SOLR-1967:
-

Seems like this could be closed as a won't fix.   One of the most popular 
clients for PHP, solr-php-client (http://code.google.com/p/solr-php-client/), 
doesn't use the PHP writer at all! And isn't going to:  
http://code.google.com/p/solr-php-client/issues/detail?id=6#c1  

I'd echo Peter Wolanin's comment that having lots of writers that don't get 
tested/updated isn't a good thing.

 New Native PHP Response Writer Class
 

 Key: SOLR-1967
 URL: https://issues.apache.org/jira/browse/SOLR-1967
 Project: Solr
  Issue Type: New Feature
  Components: clients - php, Response Writers
Affects Versions: 1.4
Reporter: Israel Ekpo
  Labels: php, response, solrclient, writer
 Fix For: 3.3

 Attachments: phpnative.tar.gz, phpnativeresponsewriter.jar

   Original Estimate: 0h
  Remaining Estimate: 0h

 Hi Solr users,
 If you are using Apache Solr via PHP, I have some good news for you.
 There is a new response writer for the PHP native extension, currently 
 available as a plugin.
 This new feature adds a new response writer class to the 
 org.apache.solr.request package.
 This class is used by the PHP Native Solr Client driver to prepare the query 
 response from Solr.
 This response writer allows you to configure the way the data is serialized 
 for the PHP client.
 You can use your own class name and you can also control how the properties 
 are serialized as well.
 The formatting of the response data is very similar to the way it is 
 currently done by the PECL extension on the client side.
 The only difference now is that this serialization is happening on the server 
 side instead.
 You will find this new response writer particularly useful when dealing with 
 responses for 
 - highlighting
 - admin threads responses
 - more like this responses
 to mention just a few
 You can pass the objectClassName request parameter to specify the class 
 name to be used for serializing objects. 
 Please note that the class must be available on the client side to avoid a 
 PHP_Incomplete_Object error during the unserialization process.
 You can also pass in the objectPropertiesStorageMode request parameter with 
 either a 0 (independent properties) or a 1 (combined properties).
 These parameters can also be passed as a named list when loading the response 
 writer in the solrconfig.xml file
 Having this control allows you to create custom objects which gives the 
 flexibility of implementing custom __get methods, ArrayAccess, Traversable 
 and Iterator interfaces on the PHP client side.
 Until this class in incorporated into Solr, you simply have to copy the jar 
 file containing this plugin into your lib directory under $SOLR_HOME
 The jar file is available here and so is the source code.
 Then set up the configuration as shown below and then restart your servelet 
 container
 Below is an example configuration in solrconfig.xml
 code
 queryResponseWriter name=phpnative 
 class=org.apache.solr.request.PHPNativeResponseWriter
 !-- You can choose a different class for your objects. Just make sure the 
 class is available in the client --
 str name=objectClassNameSolrObject/str
 !--
 0 means OBJECT_PROPERTIES_STORAGE_MODE_INDEPENDENT
 1 means OBJECT_PROPERTIES_STORAGE_MODE_COMBINED
 In independed mode, each property is a separate property
 In combined mode, all the properites are merged into a _properties array.
 The combined mode allows you to create custom __getters and you could also 
 implement ArrayAccess, Iterator and Traversable
 --
 int name=objectPropertiesStorageMode0/int
 /queryResponseWriter
 code
 Below is an example implementation on the PHP client side.
 Support for specifying custom response writers will be available starting 
 from the 0.9.11 version of the PECL extension for Solr currently available 
 here
 http://pecl.php.net/package/solr
 Here is an example of how to use the new response writer with the PHP client.
 code
 ?php
 class SolrClass
 {
 public $_properties = array();
 public function __get($property_name) {
 if (property_exists($this, $property_name)) { return $this-$property_name; } 
 else if (isset($_properties[$property_name])) { return 
 $_properties[$property_name]; }
 return null;
 }
 }
 $options = array
 (
 'hostname' = 'localhost',
 'port' = 8983,
 'path' = '/solr/'
 );
 $client = new SolrClient($options);
 $client-setResponseWriter(phpnative);
 $response = $client-ping();
 $query = new SolrQuery();
 $query-setQuery(:);
 $query-set(objectClassName, SolrClass);
 $query-set(objectPropertiesStorageMode, 1);
 $response = $client-query($query);
 $resp = $response-getResponse();
 ?
 code
 Documentation of the 

RE: svn commit: r1133021 - /lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/TokenSources.java

2011-06-07 Thread Uwe Schindler
Wuh, what a leftover from earlier days! The generics policeman thanks for 
correcting that!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
 Sent: Tuesday, June 07, 2011 4:14 PM
 To: comm...@lucene.apache.org
 Subject: svn commit: r1133021 -
 /lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/
 search/highlight/TokenSources.java
 
 Author: mikemccand
 Date: Tue Jun  7 14:13:48 2011
 New Revision: 1133021
 
 URL: http://svn.apache.org/viewvc?rev=1133021view=rev
 Log:
 fix redundant cast compilation warning
 
 Modified:
 
 lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/s
 earch/highlight/TokenSources.java
 
 Modified:
 lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/s
 earch/highlight/TokenSources.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/highlighter
 /src/java/org/apache/lucene/search/highlight/TokenSources.java?rev=1133
 021r1=1133020r2=1133021view=diff
 ==
 
 ---
 lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/s
 earch/highlight/TokenSources.java (original)
 +++
 lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/s
 earch/highlight/TokenSources.java Tue Jun  7 14:13:48 2011
 @@ -165,7 +165,7 @@ public class TokenSources {
  this.tokens = tokens;
  termAtt = addAttribute(CharTermAttribute.class);
  offsetAtt = addAttribute(OffsetAttribute.class);
 -posincAtt = (PositionIncrementAttribute)
 addAttribute(PositionIncrementAttribute.class);
 +posincAtt = addAttribute(PositionIncrementAttribute.class);
}
 
@Override
 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 3.2.0

2011-06-07 Thread Michael McCandless
+1

I built on OS X 10.6.6, passed all tests (I think?  No overall summary
in the end, but I didn't see any obvious problem), and ran my usual
smoke test indexing first 100K docs from a line file from Wikipedia,
and running a few searches.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 6, 2011 at 4:58 PM, Andi Vajda va...@apache.org wrote:

 The PyLucene 3.2.0-1 release closely tracking the recent release of Lucene
 Java 3.2 is ready.

 A release candidate is available from:
  http://people.apache.org/~vajda/staging_area/

 A list of changes in this release can be seen at:
  http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_3_2/CHANGES

 PyLucene 3.2.0 is built with JCC 2.9 included in these release artifacts.

 A list of Lucene Java changes can be seen at:
  http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_2/lucene/CHANGES.txt

 Please vote to release these artifacts as PyLucene 3.2.0-1.

 Thanks !

 Andi..

 ps: the KEYS file for PyLucene release signing is at:
  http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
  http://people.apache.org/~vajda/staging_area/KEYS

 pps: here is my +1




[jira] [Commented] (SOLR-2491) spellcheck.maxCollationTries breaks when using FieldCollapsing

2011-06-07 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045453#comment-13045453
 ] 

James Dyer commented on SOLR-2491:
--

I think this issue can go separately from SOLR-2564 and have it use ungrouped 
queries.  This little patch allows people to use both features in tandem now 
rather than waiting for later (for instance, I have an app in production using 
this patch now...) .

As a follow-up to SOLR-2564, it would be nice to give the user the option to 
return # of grouped hits.  If the end-user is receiving groups as results and 
the app gives a message like 300 results (groups) returned, then in the case 
of a misspelled query, any did-you-mean message that includes # of hits would 
probably need to be consistent and give # groups rather than # documents.  So 
this would be useful additional functionality, whenever we indeed get grouping 
that can return # groups...

Maybe a separate issue should be opened just for this, and it can be worked 
after SOLR-2564 goes in?

 spellcheck.maxCollationTries breaks when using FieldCollapsing
 --

 Key: SOLR-2491
 URL: https://issues.apache.org/jira/browse/SOLR-2491
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2491.patch


 If specifying spellcheck.maxCollationTries and group=true on the same 
 query, you never get any Spell Check Collations back.  The problem is that 
 SpellCheckCollator relies on ResponseBuilder.getToLog().get(hits) to see 
 how many results each test query returns.  When group=true, the toLog 
 isn't populated so SpellCheckCollator is unable to find a collation that can 
 return results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3178) Native MMapDir

2011-06-07 Thread Michael McCandless (JIRA)
Native MMapDir
--

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless


Spinoff from LUCENE-2793.

Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
level IO flags depending on the IOContext, we could in theory do something 
similar with MMapDir.

The problem is MMap is apparently quite hairy... and to pass the flags the 
native code would need to invoke mmap (I think?), unlike UnixDir where the code 
only has to open the file handle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045455#comment-13045455
 ] 

Michael McCandless commented on LUCENE-2793:


{quote}
I think we'll need a NativeMMapDir as well as NativeDir (or
NativeUnix/WindowsDir), because mmap can also take flags giving
hints about access patterns. I'll open a new issue...
{quote}

I opened LUCENE-3178.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2491) spellcheck.maxCollationTries breaks when using FieldCollapsing

2011-06-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045458#comment-13045458
 ] 

Robert Muir commented on SOLR-2491:
---

James: sounds like a plan. 

Lets try to get this one resolved and we can followup with the option (and 
maybe change default or whatever) when that makes sense.

I'll review the patch shortly.

 spellcheck.maxCollationTries breaks when using FieldCollapsing
 --

 Key: SOLR-2491
 URL: https://issues.apache.org/jira/browse/SOLR-2491
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2491.patch


 If specifying spellcheck.maxCollationTries and group=true on the same 
 query, you never get any Spell Check Collations back.  The problem is that 
 SpellCheckCollator relies on ResponseBuilder.getToLog().get(hits) to see 
 how many results each test query returns.  When group=true, the toLog 
 isn't populated so SpellCheckCollator is unable to find a collation that can 
 return results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-06-07 Thread Nikola Tankovic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045460#comment-13045460
 ] 

Nikola Tankovic commented on LUCENE-2308:
-

Yes, IndexableField looks sufficient.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2542) dataimport global session putVal blank

2011-06-07 Thread Frank Wesemann (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Wesemann updated SOLR-2542:
-

Attachment: TestContext.java

JUnitTest for this

 dataimport global session putVal blank
 --

 Key: SOLR-2542
 URL: https://issues.apache.org/jira/browse/SOLR-2542
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
Reporter: Linbin Chen
  Labels: dataimport
 Fix For: 3.3

 Attachments: TestContext.java, 
 dataimport-globalSession-bug-solr3.1.patch


 {code:title=ContextImpl.java}
   private void putVal(String name, Object val, Map map) {
 if(val == null) map.remove(name);
 else entitySession.put(name, val);
   }
 {code}
 change to 
 {code:title=ContextImpl.java}
   private void putVal(String name, Object val, Map map) {
 if(val == null) map.remove(name);
 else map.put(name, val);
   }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3178) Native MMapDir

2011-06-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045461#comment-13045461
 ] 

Robert Muir commented on LUCENE-3178:
-

can the flags you need all be set with madvise() or are some only available as 
flags to mmap() ?

If so, it might not be that bad.

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless

 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Adding shard info to returned document

2011-06-07 Thread Ryan McKinley
With the DocTransformer stuff in place, we should be able to return
the shard info with the documents. (like SOLR-705)

I see two options:
1. Each server adds its own ID to the documents -- I like this
approach, but (as far as i can tell) the shards don't really know
their ID (or that they are in a distributed request).  To support
this, we could pass a parameter like shard.id=localhost:9877  along
with the request

2. The controlling server adds the ID to documents as they are
returned from the shards.  This is kinda messy, but avoids passing an
extra parameter.

thoughts?

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Adding shard info to returned document

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 10:57 AM, Ryan McKinley ryan...@gmail.com wrote:
 With the DocTransformer stuff in place, we should be able to return
 the shard info with the documents. (like SOLR-705)

 I see two options:
 1. Each server adds its own ID to the documents -- I like this
 approach, but (as far as i can tell) the shards don't really know
 their ID (or that they are in a distributed request).  To support
 this, we could pass a parameter like shard.id=localhost:9877  along
 with the request

Shards currently know that they are in a distrib request via isShard=true
I originally favored #2 (the controlling server adds the ID), but
thinking about it again, I'm starting to lean toward your #1.
If/when we move to micro-sharding (keeping multiple indexes around so
we can rebalance easily), a distrib request should state what parts of
the index it is requesting from the server.

-Yonik
http://www.lucidimagination.com


 2. The controlling server adds the ID to documents as they are
 returned from the shards.  This is kinda messy, but avoids passing an
 extra parameter.

 thoughts?

 ryan

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2491) spellcheck.maxCollationTries breaks when using FieldCollapsing

2011-06-07 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2491.
---

Resolution: Fixed
  Assignee: Robert Muir

Committed revision 1133043.

Thanks James!

 spellcheck.maxCollationTries breaks when using FieldCollapsing
 --

 Key: SOLR-2491
 URL: https://issues.apache.org/jira/browse/SOLR-2491
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Assignee: Robert Muir
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2491.patch


 If specifying spellcheck.maxCollationTries and group=true on the same 
 query, you never get any Spell Check Collations back.  The problem is that 
 SpellCheckCollator relies on ResponseBuilder.getToLog().get(hits) to see 
 how many results each test query returns.  When group=true, the toLog 
 isn't populated so SpellCheckCollator is unable to find a collation that can 
 return results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Is docs for PatternReplaceFilterFactory missing on wiki...?

2011-06-07 Thread Eric Pugh
Seems like the documentation for PatternReplaceFilterFactory should be added to 
this wiki page?  
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?#TokenFilterFactories

Is there a desire for this page meant to be an exhaustive index of all the 
Analyzers etc available?  I know it's explicitly called out that it isn't.

Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.










-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup

2011-06-07 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045523#comment-13045523
 ] 

James Dyer commented on SOLR-2571:
--

I added thresholdTokenFrequency to the SpellCheckComponent wiki page.

 IndexBasedSpellChecker thresholdTokenFrequency fails with a 
 ClassCastException on startup
 ---

 Key: SOLR-2571
 URL: https://issues.apache.org/jira/browse/SOLR-2571
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: James Dyer
Assignee: Robert Muir
Priority: Minor
  Labels: whereIsHossManWhenYouNeedHim
 Fix For: 3.3, 4.0

 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, 
 SOLR-2571.patch, SOLR-2571.solr3.2.patch


 When parsing the configuration for thresholdTokenFrequency, the 
 IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived 
 NamedList.  However, this comes through as a String.  Therefore, a 
 ClassCastException is always thrown whenever this parameter is specified.  
 The code ought to be doing Float.parseFloat(...) on the value.
 This looks like a nice feature to use in cases the data contains misspelled 
 or rare words leading to spurious correct queries.  I would have liked to 
 have used this with a project we just completed however this bug prevented 
 that.  This issue came up recently in the User's mailing list so I am raising 
 an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3178) Native MMapDir

2011-06-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045526#comment-13045526
 ] 

Michael McCandless commented on LUCENE-3178:


I think we want to call madvise, and not change the flags passed to the 
original mmap invocation?  But I'm not sure...

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless

 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Is docs for PatternReplaceFilterFactory missing on wiki...?

2011-06-07 Thread Steven A Rowe
Hi Eric,

If you want to add it, you should.

Steve

 -Original Message-
 From: Eric Pugh [mailto:ep...@opensourceconnections.com]
 Sent: Tuesday, June 07, 2011 12:21 PM
 To: dev@lucene.apache.org
 Subject: Is docs for PatternReplaceFilterFactory missing on wiki...?
 
 Seems like the documentation for PatternReplaceFilterFactory should be
 added to this wiki page?
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?#TokenFilterF
 actories
 
 Is there a desire for this page meant to be an exhaustive index of all
 the Analyzers etc available?  I know it's explicitly called out that it
 isn't.
 
 Eric
 
 -
 Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
 http://www.opensourceconnections.com
 Co-Author: Solr 1.4 Enterprise Search Server available from
 http://www.packtpub.com/solr-1-4-enterprise-search-server
 This e-mail and all contents, including attachments, is considered to be
 Company Confidential unless explicitly stated otherwise, regardless of
 whether attachments are marked as such.
 
 
 
 
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Is docs for PatternReplaceFilterFactory missing on wiki...?

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 12:20 PM, Eric Pugh
ep...@opensourceconnections.com wrote:
 Seems like the documentation for PatternReplaceFilterFactory should be added 
 to this wiki page?  
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?#TokenFilterFactories

 Is there a desire for this page meant to be an exhaustive index of all the 
 Analyzers etc available?  I know it's explicitly called out that it isn't.

I'm not sure it's meant to be exhaustive, but it should include
anything generally useful enough (or at least a pointer to somewhere
else that lists some of the generally useful stuff).
PatternReplaceFilterFactory certainly seems to fit the bill of useful!

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045561#comment-13045561
 ] 

Paul Elschot commented on LUCENE-2454:
--

So one concern that is left is performance for parent testing.
I'll open an issue for OpenBitSet.prevSetBit().

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Paul Elschot (JIRA)
OpenBitSet.prevSetBit()
---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3


Find a previous set bit in an OpenBitSet.
Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-3179:
-

Attachment: LUCENE-3197.patch

Add prevSetBit() and tests. Also moves some test code from TestOpenBitSet to 
TestBitUtil.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3197.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045561#comment-13045561
 ] 

Paul Elschot edited comment on LUCENE-2454 at 6/7/11 6:21 PM:
--

So one concern that is left is performance for parent testing.
I'll open an issue for OpenBitSet.prevSetBit(), LUCENE-3197

  was (Author: paul.elsc...@xs4all.nl):
So one concern that is left is performance for parent testing.
I'll open an issue for OpenBitSet.prevSetBit().
  
 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045561#comment-13045561
 ] 

Paul Elschot edited comment on LUCENE-2454 at 6/7/11 6:22 PM:
--

So one concern that is left is performance for parent testing.
I'll open an issue for OpenBitSet.prevSetBit(), LUCENE-3179

  was (Author: paul.elsc...@xs4all.nl):
So one concern that is left is performance for parent testing.
I'll open an issue for OpenBitSet.prevSetBit(), LUCENE-3197
  
 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045576#comment-13045576
 ] 

Yonik Seeley commented on LUCENE-3179:
--

Hey Paul, did you try this implementation against Long.numberOfLeadingZeros?
The later Oracle Java6 implementations have instrinsified this method, so it 
might be faster: 
http://bugs.sun.com/view_bug.do?bug_id=6823354

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3197.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-3179:
-

Attachment: LUCENE-3179.patch

Correct the issue number in the patch, and remove a superfluous javadoc comment.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch, LUCENE-3197.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-3179:
-

Attachment: LUCENE-3179.patch

Correct the issue number in the patch, remove a superfluous javadoc comment, 
and grant licence ...

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch, LUCENE-3179.patch, LUCENE-3197.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-3179:
-

Attachment: (was: LUCENE-3197.patch)

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-3179:
-

Attachment: (was: LUCENE-3179.patch)

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045587#comment-13045587
 ] 

Paul Elschot commented on LUCENE-3179:
--

I did not try this against Long.numberOfLeadingZeros, but in case that is 
faster we should use that of course.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information

2011-06-07 Thread James Dyer (JIRA)
DirectSolrSpellChecker is not returning frequency information
-

 Key: SOLR-2576
 URL: https://issues.apache.org/jira/browse/SOLR-2576
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 4.0


DirectSolrSpellChecker is not returning frequency information.  This also 
causes the correctlySpelled flag in extended results to sometimes be wrong.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information

2011-06-07 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2576:
-

Attachment: SOLR-2576.patch

This patch fixes DirectSolrSpellChecker to correctly forward the frequency 
data.  Results are now consistent with IndexBasedSpellChecker.  An additional 
DSSC unit test is also added.

I also changed the method name SpellingResult.add(Token token, int docFreq) 
to SpellingResult.addFrequency(Token token, int docFreq) .  This 
less-ambiguous method name should help prevent this kind of error in the 
future.  Note, however, if back-porting to 3.x, it might be wise to add back a 
deprecated SpellingResult.add(Token token, int docFreq) method.   This will 
prevent us from breaking anyone's custom solr spellcheckers...

 DirectSolrSpellChecker is not returning frequency information
 -

 Key: SOLR-2576
 URL: https://issues.apache.org/jira/browse/SOLR-2576
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2576.patch


 DirectSolrSpellChecker is not returning frequency information.  This also 
 causes the correctlySpelled flag in extended results to sometimes be wrong. 
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information

2011-06-07 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned SOLR-2576:
-

Assignee: Robert Muir

 DirectSolrSpellChecker is not returning frequency information
 -

 Key: SOLR-2576
 URL: https://issues.apache.org/jira/browse/SOLR-2576
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Assignee: Robert Muir
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2576.patch


 DirectSolrSpellChecker is not returning frequency information.  This also 
 causes the correctlySpelled flag in extended results to sometimes be wrong. 
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2577) Give option for spellcheck.collateExtendedResults w/Grouping to return #-of-Grouped-Hits

2011-06-07 Thread James Dyer (JIRA)
Give option for spellcheck.collateExtendedResults w/Grouping to return 
#-of-Grouped-Hits
--

 Key: SOLR-2577
 URL: https://issues.apache.org/jira/browse/SOLR-2577
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 4.0


Currently, if using spellcheck.collateExtendedResults in conjunction with 
group=true, the spelling collation report always gives # of hits as # of 
documents.  It would be useful to give users the option to get # of groups back 
instead (or possibly in addition).  This cannot happen, however, until Solr's 
group function can return the total # of groups.  This functionality is 
indicated in SOLR-2564.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3174) Similarity.Stats class for term collection statistics

2011-06-07 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045598#comment-13045598
 ] 

David Mark Nemeskey commented on LUCENE-3174:
-

Here's what the patch does:
- it introduces the Similarity.Stats class and its subclasses
- renames computeWeight() to computeStats()
- fixes methods that call computeStats()

What remains to be done:
- rewrite the javadoc
- Stats will be used inside other Similarity methods: its availability should 
be unsured somehow. The current solution in MockBM25Similarity is not 
satisfactory because there is only one Similarity object at a time.
- MultiPhraseWeight, PhraseWeight, SpanWeight, TermWeight call computeStats and 
extract the IDFExplain object. This level of coupling is not desirable, and 
should be eliminated. All the more so, as not all Similarity subclasses will 
have an idf
- It might not even make sense to expose computeStats()?

To consider:
- it might be better if Stats were static, because they could inherit fields 
from each other

 Similarity.Stats class for term  collection statistics
 ---

 Key: LUCENE-3174
 URL: https://issues.apache.org/jira/browse/LUCENE-3174
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
Priority: Minor
 Fix For: flexscoring branch

 Attachments: LUCENE-3174.patch


 In order to support ranking methods besides TF-IDF, we need to make the 
 statistics they need available. These statistics could be computed in 
 computeWeight (soon to become computeStats) and stored in a separate object 
 for easy access. Since this object will be used solely by subclasses of 
 Similarity, it should be implented as a static inner class, i.e. 
 Similarity.Stats.
 There are two ways this could be implemented:
 - as a single Similarity.Stats class, reused by all ranking algorithms. In 
 this case, this class would have a member field for all statistics;
 - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
 subclass would define only the statistics needed for the ranking algorithm.
 In the second case, the Stats class in DefaultSimilarity would have a single 
 field, idf, while the one in e.g. BM25Similarity would have idf and average 
 field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-2229) SimpleSpanFragmenter fails to start a new fragment

2011-06-07 Thread Elmer Garduno (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elmer Garduno closed LUCENE-2229.
-

Resolution: Won't Fix

 SimpleSpanFragmenter fails to start a new fragment
 --

 Key: LUCENE-2229
 URL: https://issues.apache.org/jira/browse/LUCENE-2229
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/highlighter
Reporter: Elmer Garduno
Priority: Minor
 Attachments: LUCENE-2229.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 SimpleSpanFragmenter fails to identify a new fragment when there is more than 
 one stop word after a span is detected. This problem can be observed when the 
 Query contains a PhraseQuery.
 The problem is that the span extends toward the end of the TokenGroup. This 
 is because {{waitForProps = positionSpans.get(i).end + 1;}} and {{position += 
 posIncAtt.getPositionIncrement();}} this generates a value of {{position}} 
 greater than the value of {{waitForProps}} and {{(waitForPos == position)}} 
 never matches.
 {code:title=SimpleSpanFragmenter.java}
   public boolean isNewFragment() {
 position += posIncAtt.getPositionIncrement();
 if (waitForPos == position) {
   waitForPos = -1;
 } else if (waitForPos != -1) {
   return false;
 }
 WeightedSpanTerm wSpanTerm = 
 queryScorer.getWeightedSpanTerm(termAtt.term());
 if (wSpanTerm != null) {
   ListPositionSpan positionSpans = wSpanTerm.getPositionSpans();
   for (int i = 0; i  positionSpans.size(); i++) {
 if (positionSpans.get(i).start == position) {
   waitForPos = positionSpans.get(i).end + 1;
   break;
 }
   }
 }
...
 {code}
 An example is provided in the test case for the following Document and the 
 query *all tokens* followed by the words _of a_.
 {panel:title=Document}
 Attribute instances are reused for *all tokens* _of a_ document. Thus, a 
 TokenStream/-Filter needs to update the appropriate Attribute(s) in 
 incrementToken(). The consumer, commonly the Lucene indexer, consumes the 
 data in the Attributes and then calls incrementToken() again until it retuns 
 false, which indicates that the end of the stream was reached. This means 
 that in each call of incrementToken() a TokenStream/-Filter can safely 
 overwrite the data in the Attribute instances.
 {panel}
 {code:title=HighlighterTest.java}
  public void testSimpleSpanFragmenter() throws Exception {
 ...
 doSearching(\all tokens\);
 maxNumFragmentsRequired = 2;
 
 scorer = new QueryScorer(query, FIELD_NAME);
 highlighter = new Highlighter(this, scorer);
 for (int i = 0; i  hits.totalHits; i++) {
   String text = searcher.doc(hits.scoreDocs[i].doc).get(FIELD_NAME);
   TokenStream tokenStream = analyzer.tokenStream(FIELD_NAME, new 
 StringReader(text));
   highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer, 20));
   String result = highlighter.getBestFragments(tokenStream, text,
   maxNumFragmentsRequired, ...);
   System.out.println(\t + result);
 }
   }
 {code}
 {panel:title=Result}
 are reused for Ball/B Btokens/B of a document. Thus, a 
 TokenStream/-Filter needs to update the appropriate Attribute(s) in 
 incrementToken(). The consumer, commonly the Lucene indexer, consumes the 
 data in the Attributes and then calls incrementToken() again until it retuns 
 false, which indicates that the end of the stream was reached. This means 
 that in each call of incrementToken() a TokenStream/-Filter can safely 
 overwrite the data in the Attribute instances.
 {panel}
 {panel:title=Expected Result}
 for Ball/B Btokens/B of a document
 {panel}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2578) ReplicationHandler Backups -- clean up old backups

2011-06-07 Thread James Dyer (JIRA)
ReplicationHandler Backups -- clean up old backups
--

 Key: SOLR-2578
 URL: https://issues.apache.org/jira/browse/SOLR-2578
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Affects Versions: 3.2, 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 3.3, 4.0


It would be nice when performing backups if there was an easy way to tell 
ReplicationHandler to only keep so many and then delete the older ones.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2578) ReplicationHandler Backups -- clean up old backups

2011-06-07 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2578:
-

Attachment: SOLR-2578.patch

This patch adds the functionality with a new parameter:  numberToKeep .  The 
unit test has been enhanced to do 2 backups and then check to see if the first 
one was automatically deleted (numberToKeep=1).

 ReplicationHandler Backups -- clean up old backups
 --

 Key: SOLR-2578
 URL: https://issues.apache.org/jira/browse/SOLR-2578
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Affects Versions: 3.2, 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2578.patch


 It would be nice when performing backups if there was an easy way to tell 
 ReplicationHandler to only keep so many and then delete the older ones.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045655#comment-13045655
 ] 

Uwe Schindler commented on LUCENE-3179:
---

If it's faster, should we not replace it completely in Lucene? The impl in Java 
5 (Sun JDK) is identical to ours from BitUtils, so why replicate? If it gets 
intrinsic, it can only get faster. I assume its a relict from pre-Java-1.5 
times like Lucene 2.9.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045656#comment-13045656
 ] 

Uwe Schindler commented on LUCENE-3179:
---

With the previous comment I also refer to nextSetBit().

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045659#comment-13045659
 ] 

Dawid Weiss commented on LUCENE-3179:
-

I posted the benchmarks of intrinsic vs. manual (OpenBitSet) performance of nlz 
and pop (bitcount) methods a while ago -- they should still be around JIRA 
somewhere. If I recall right, the difference was significant, although not like 
an order of magnitude or  something... and on CPUs without intrinsic 
instructions the implementation handcrafted by Yonik was actually faster than 
the one in the standard library. Of course these days most CPUs will have 
popcnt/ nlz instructions, so it makes sense to switch.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045662#comment-13045662
 ] 

Dawid Weiss commented on LUCENE-3179:
-

I think it's the 1.6 that adds these intrinsics -- I don't know if they've been 
backported to updates to 1.5, but this should be relatively easy to verify 
empirically.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2579) UIMAUpdateRequestProcessor ignore error fails if text.length() 0

2011-06-07 Thread Elmer Garduno (JIRA)
UIMAUpdateRequestProcessor ignore error fails if text.length()  0
--

 Key: SOLR-2579
 URL: https://issues.apache.org/jira/browse/SOLR-2579
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.2
Reporter: Elmer Garduno
Priority: Minor
 Fix For: 3.3


If UIMAUpdateRequestProcessor is configured to ignore errors, an exception is 
raised when logging the error and text.length()  100.

  if (solrUIMAConfiguration.isIgnoreErrors())
log.warn(new StringBuilder(skip the text processing due to )
  .append(e.getLocalizedMessage()).append(optionalFieldInfo)
  .append( text=\).append(text.substring(0, 
100)).append(...\).toString());
  else{

throw new SolrException(ErrorCode.SERVER_ERROR,
new StringBuilder(processing error: )
  .append(e.getLocalizedMessage()).append(optionalFieldInfo)
  .append( text=\).append(text.substring(0, 
100)).append(...\).toString(), e);
  }

I'm submitting a patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045667#comment-13045667
 ] 

Dawid Weiss commented on LUCENE-3179:
-

Intrinsics are implemented/added at the hotspot (jit) level, you won't see them 
in src.jar -- all calls to specific methods in Long.* or Integer.* are replaced 
by handcrafted assembly (usually process-specific instructions that do what a 
given method should do).

If you're interested, check out openjdk code of hotspot and scan for intrinsics 
(or popcnt).

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045669#comment-13045669
 ] 

Uwe Schindler commented on LUCENE-3179:
---

You misunderstood me, i know what intrinsics are. My confusion was related to 
that:

bq. and on CPUs without intrinsic instructions the implementation handcrafted 
by Yonik was actually faster than the one in the standard library

And the so called hand crafted method is identical in src.jar and Yonik's code. 
So without intrinsics, the standard library and Yoniks code should be identical 
in performance, as it was same code, the last time I looked into it.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045674#comment-13045674
 ] 

Yonik Seeley commented on LUCENE-3179:
--

bq. And the so called hand crafted method is identical in src.jar and Yonik's 
code.

For pop, yes. But not for ntz or pop_array and friends.

BitUtil.pop exists because this was originally written to work with java1.4 
which didn't have Long.bitCount()
http://markmail.org/message/5ay4m2thsvsahk3c

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045676#comment-13045676
 ] 

Paul Elschot commented on LUCENE-3179:
--

The micro benchmarks for ntz() and pop() are at LUCENE-2221

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045679#comment-13045679
 ] 

Dawid Weiss commented on LUCENE-3179:
-

Oh, ok -- clear. So, my comment was related to the various methods of doing 
bitcounts and other bit-fiddling on arrays of long values (for example 
pop_array) -- these are HD derived implementations; I compared them to naive 
loops using intrinsics and naive loops on cpus (and jvms) without intrinsics -- 
in that case simple loops with intrinsics was faster than Lucene's code, but 
Lucene's code was faster than simple loops without intrinsics (effectively 
using whatever was in the std. library).

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045682#comment-13045682
 ] 

Uwe Schindler commented on LUCENE-3179:
---

OK, so we can sefely remove BitUtil.pop and replace by the Java 5 method (maybe 
review again the code in src.jar also for ntz). And if this one is an intrinsic 
in Java 6 its even faster.

Now we talk the same language :-)

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045685#comment-13045685
 ] 

Paul Elschot commented on LUCENE-3179:
--

As to the performance, the current patch at LUCENE-2454 has a bitwise linear 
search to do this.


 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-07 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045683#comment-13045683
 ] 

Shawn Heisey commented on SOLR-2399:


The xinclude stuff looks pretty cool!  A suggestion on it: shade the background 
of the entire expanded section, so it stands out better.  Thanks for including 
this!

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, 
 SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
 SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information

2011-06-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045690#comment-13045690
 ] 

Robert Muir commented on SOLR-2576:
---

Thanks James, patch looks good!

This is definitely the source of confusion, because there are several 
overloaded methods 
named add(), one of which does a completely different thing :)


 DirectSolrSpellChecker is not returning frequency information
 -

 Key: SOLR-2576
 URL: https://issues.apache.org/jira/browse/SOLR-2576
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2576.patch


 DirectSolrSpellChecker is not returning frequency information.  This also 
 causes the correctlySpelled flag in extended results to sometimes be wrong. 
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information

2011-06-07 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2576:
--

Fix Version/s: 3.3

adding fix version 3.3 to backport the API improvement.

 DirectSolrSpellChecker is not returning frequency information
 -

 Key: SOLR-2576
 URL: https://issues.apache.org/jira/browse/SOLR-2576
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2576.patch


 DirectSolrSpellChecker is not returning frequency information.  This also 
 causes the correctlySpelled flag in extended results to sometimes be wrong. 
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information

2011-06-07 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2576.
---

Resolution: Fixed

Committed revision 1133187 (trunk), 1133190 (branch_3x)

 DirectSolrSpellChecker is not returning frequency information
 -

 Key: SOLR-2576
 URL: https://issues.apache.org/jira/browse/SOLR-2576
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2576.patch


 DirectSolrSpellChecker is not returning frequency information.  This also 
 causes the correctlySpelled flag in extended results to sometimes be wrong. 
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3174) Similarity.Stats class for term collection statistics

2011-06-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045706#comment-13045706
 ] 

Robert Muir commented on LUCENE-3174:
-

Hi David, after reviewing the patch, I think we should do this:

* make Similarity.Stats static
* pass this, instead of Weight, to exactDocScorer() and sloppyDocScorer(). this 
should fix the MockBM25Sim issue as it wont need to hold a stats since its 
passed here.



 Similarity.Stats class for term  collection statistics
 ---

 Key: LUCENE-3174
 URL: https://issues.apache.org/jira/browse/LUCENE-3174
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
Priority: Minor
 Fix For: flexscoring branch

 Attachments: LUCENE-3174.patch


 In order to support ranking methods besides TF-IDF, we need to make the 
 statistics they need available. These statistics could be computed in 
 computeWeight (soon to become computeStats) and stored in a separate object 
 for easy access. Since this object will be used solely by subclasses of 
 Similarity, it should be implented as a static inner class, i.e. 
 Similarity.Stats.
 There are two ways this could be implemented:
 - as a single Similarity.Stats class, reused by all ranking algorithms. In 
 this case, this class would have a member field for all statistics;
 - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
 subclass would define only the statistics needed for the ranking algorithm.
 In the second case, the Stats class in DefaultSimilarity would have a single 
 field, idf, while the one in e.g. BM25Similarity would have idf and average 
 field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8685 - Failure

2011-06-07 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8685/

All tests passed

Build Log (for compile errors):
[...truncated 14853 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-91) IndexWriter ctor does not release lock on exception

2011-06-07 Thread Adam Ahmed (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045794#comment-13045794
 ] 

Adam Ahmed commented on LUCENE-91:
--

Please note that this also happens when Lock.obtain times out.  As far as I can 
tell, the only way to avoid that possibility is to LOCK_OBTAIN_WAIT_FOREVER, 
and forever generally sounds like a bad idea.  I would say that is a bug, and 
more clearly so than exceptions due to the index not existing.

 IndexWriter ctor does not release lock on exception
 ---

 Key: LUCENE-91
 URL: https://issues.apache.org/jira/browse/LUCENE-91
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 1.2
 Environment: Operating System: All
 Platform: All
Reporter: Alex Staubo
Assignee: Lucene Developers

 If IndexWriter construction fails with an exception, the write.lock lock is 
 not
 released.
 For example, this happens if one tries to open an IndexWriter on an 
 FSDirectory
 which does not contain an Lucene index. FileNotFoundException will be thrown 
 by
 org.apache.lucene.store.FSInputStream, after which the write lock will remain 
 in
 the directory, and nobody can open the index.
 I have been using this pattern -- doing IndexWriter(..., false), catching
 FileNotFoundException and doing IndexWriter(..., true) -- in my code to
 initialize the index on demand, because the app never know if the index 
 already
 exists.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-91) IndexWriter ctor does not release lock on exception

2011-06-07 Thread Adam Ahmed (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045800#comment-13045800
 ] 

Adam Ahmed commented on LUCENE-91:
--

The better place for the timeout fix would probably be in Lock.obtain(), where 
it should attempt something similar to Lock.release() if a timeout occurs.

 IndexWriter ctor does not release lock on exception
 ---

 Key: LUCENE-91
 URL: https://issues.apache.org/jira/browse/LUCENE-91
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 1.2
 Environment: Operating System: All
 Platform: All
Reporter: Alex Staubo
Assignee: Lucene Developers

 If IndexWriter construction fails with an exception, the write.lock lock is 
 not
 released.
 For example, this happens if one tries to open an IndexWriter on an 
 FSDirectory
 which does not contain an Lucene index. FileNotFoundException will be thrown 
 by
 org.apache.lucene.store.FSInputStream, after which the write lock will remain 
 in
 the directory, and nobody can open the index.
 I have been using this pattern -- doing IndexWriter(..., false), catching
 FileNotFoundException and doing IndexWriter(..., true) -- in my code to
 initialize the index on demand, because the app never know if the index 
 already
 exists.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org