[jira] [Assigned] (SOLR-2603) Encoding of alternate fields in highlighting

2011-06-18 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-2603:


Assignee: Koji Sekiguchi

 Encoding of alternate fields in highlighting
 

 Key: SOLR-2603
 URL: https://issues.apache.org/jira/browse/SOLR-2603
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 3.2
Reporter: Massimo Schiavon
Assignee: Koji Sekiguchi
Priority: Minor

 Could be useful if the method 
 DefaultSolrHighlighter.alternateField(NamedList, SolrParams, Document, 
 String) applies configured encoding to alternate fields.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2603) Encoding of alternate fields in highlighting

2011-06-18 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2603:
-

Affects Version/s: 4.0
   3.1
Fix Version/s: 4.0
   3.3

 Encoding of alternate fields in highlighting
 

 Key: SOLR-2603
 URL: https://issues.apache.org/jira/browse/SOLR-2603
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 3.1, 3.2, 4.0
Reporter: Massimo Schiavon
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.3, 4.0


 Could be useful if the method 
 DefaultSolrHighlighter.alternateField(NamedList, SolrParams, Document, 
 String) applies configured encoding to alternate fields.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2603) Encoding of alternate fields in highlighting

2011-06-18 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2603:
-

Attachment: SOLR-2603.patch

Massimo, thank you for opening the issue. Can you try the attached patch?

 Encoding of alternate fields in highlighting
 

 Key: SOLR-2603
 URL: https://issues.apache.org/jira/browse/SOLR-2603
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 3.1, 3.2, 4.0
Reporter: Massimo Schiavon
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2603.patch


 Could be useful if the method 
 DefaultSolrHighlighter.alternateField(NamedList, SolrParams, Document, 
 String) applies configured encoding to alternate fields.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-18 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I made some more changes to the earlier patch. I tried putting a nocommit 
wherever I thought the code was leading to a assertError or a bufferSize as 0 
error. 

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2919:
--

Attachment: LUCENE-2919-filter.patch

New patch:
- simplified the Filter logic
- added option to negate the filter in the IndexReader, this enabled use of 
only *one* TermRangeFilter and simply negate it for the second pass.
- made code correctly close using IOUtils.closeSafely

Tests are still ugly.

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3206) FST package API refactoring

2011-06-18 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051475#comment-13051475
 ] 

Dawid Weiss commented on LUCENE-3206:
-

bq. this could be a non-negligible increase in FST size for the non-ascii case 
I think?

I don't know. If the non-ASCII is encoded as UTF8 for the BytesRef, then 
storing full unicode points on transitions shouldn't really account for much 
more (in fact it may create fewer states/ transitions because multibyte UTF8 
sequences will require multiple transitions)? This we would need to check, of 
course. And I assume input sequences ARE text, which in general may not be the 
case... I think I'll leave BYTE1/BYTE4 an option for now and see if I can 
improve on it once I have a working test suite.

bq. I think SimpleText codec is a good example? Also 
VariableGapTermsIndexReader, and MemoryCodec? Each of these use the 
BytesRefFSTEnum, I believe.

I wasn't clear -- I can find the places where they're used, but I wanted to 
clarify the nature of stored keys and values (are they UTF8 text, utf16, 
unicode, random bytes)? I can go through the code, but you're probably a faster 
source of information on this one. Robert, if you're reading this -- anything 
you envision could be stored as transition labels?


 FST package API refactoring
 ---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3206.patch


 The current API is still marked @experimental, so I think there's still time 
 to fiddle with it. I've been using the current API for some time and I do 
 have some ideas for improvement. This is a placeholder for these -- I'll post 
 a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-18 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051480#comment-13051480
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

Shawn:
Of course, it's just a quick prototype to demonstrate the functionality. The 
Layout will change if it's integrated :)

Uwe:
Thanks for the Changes! Yes, the Analysis-Page has a few things that needs to 
be changed - mainly regarding layout/arrangement, but also functionality. Will 
see if i can finish working on that next Week. 

noah:
Thanks, it's integrated .. {{id}} as property works w/o problems? The Layout on 
Safari looks good, compared to the provided Screenshots?

Erick:
that was a [quick 
change|https://github.com/steffkes/solr-admin/commit/799da2e97889b7a576eaf1a516511bc126dcb1b4]
 : every entry has now it's own url .. so after reloading the page, the view 
will be the same as before.

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, 
 SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
 SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051484#comment-13051484
 ] 

Paul Elschot commented on LUCENE-2454:
--

Tried the current patch here to make use prevSetBit, but ran into a problem 
with the query weight that could be related to LUCENE-3208.

When fixing the patch here so that NestedDocumentQuery.java looks like this:
{code}
  public Weight createWeight(IndexSearcher searcher) throws IOException {
return new NestedDocumentQueryWeight(childQuery.createWeight(searcher));
  }
{code}

the TestNestedDocumentQuery from the patch here fails with an 
UnsupportedOperationException.

After adding the class name to Query.java constructing this exception the test 
fails by:

UnsupportedOperationException: org.apache.lucene.search.NumericRangeQuery

That means that probably the above fix to the patch is wrong.
Any comments on how to continue this?



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051486#comment-13051486
 ] 

Michael McCandless commented on LUCENE-2454:


I suspect the NestedDocumentQuery must impl rewrite, and rewrite the 
childQuery.  I hit this on LUCENE-3171, too.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051487#comment-13051487
 ] 

Michael McCandless commented on LUCENE-2919:


Patch looks great Uwe!  I love how generic it is now, that you can just provide 
any Filter.

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2919:
---

Attachment: LUCENE-2919-3x.patch

Here's patch for back-porting original approach to 3.x.

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051490#comment-13051490
 ] 

Uwe Schindler commented on LUCENE-2919:
---

I will fix the test and commit this, then backport again, using your 
TermPositions.

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-2919:
-

Assignee: Uwe Schindler

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3206) FST package API refactoring

2011-06-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051491#comment-13051491
 ] 

Michael McCandless commented on LUCENE-3206:


{quote}
bq. this could be a non-negligible increase in FST size for the non-ascii case 
I think?

I don't know. If the non-ASCII is encoded as UTF8 for the BytesRef, then 
storing full unicode points on transitions shouldn't really account for much 
more (in fact it may create fewer states/ transitions because multibyte UTF8 
sequences will require multiple transitions)? This we would need to check, of 
course. And I assume input sequences ARE text, which in general may not be the 
case... I think I'll leave BYTE1/BYTE4 an option for now and see if I can 
improve on it once I have a working test suite.
{quote}

Ahh, yes I agree it'd be a more interesting comparison if you use
UTF32 instead of UTF8.

The case I was worried about is if you must use UTF8 (ie because
TermsEnum speaks only BytesRef), then writing those bytes as a vInt
instead of a fixed byte is a penalty to non-ascii.

{quote}
bq. I think SimpleText codec is a good example? Also 
VariableGapTermsIndexReader, and MemoryCodec? Each of these use the 
BytesRefFSTEnum, I believe.

I wasn't clear -- I can find the places where they're used, but I wanted to 
clarify the nature of stored keys and values (are they UTF8 text, utf16, 
unicode, random bytes)? I can go through the code, but you're probably a faster 
source of information on this one. Robert, if you're reading this -- anything 
you envision could be stored as transition labels?
{quote}

Ahh... I think all uses have BytesRef (UTF8 encoded term) as the key,
and various things as the values.

I don't think we've used FST during analysis yet but we should try;
then I suspect we'd use UTF16 labels?


 FST package API refactoring
 ---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3206.patch


 The current API is still marked @experimental, so I think there's still time 
 to fiddle with it. I've been using the current API for some time and I do 
 have some ideas for improvement. This is a placeholder for these -- I'll post 
 a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2308) Separately specify a field's type

2011-06-18 Thread Nikola Tankovic (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikola Tankovic updated LUCENE-2308:


Attachment: LUCENE-2308-2.patch

A baby step towards dividing AbstractField and Field towards Field, TextField, 
StringField, NumericField and BinaryField with default FieldType's.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-2.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2308) Separately specify a field's type

2011-06-18 Thread Nikola Tankovic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051492#comment-13051492
 ] 

Nikola Tankovic edited comment on LUCENE-2308 at 6/18/11 10:00 AM:
---

New patch: a baby step towards dividing AbstractField and Field towards Field, 
TextField, StringField, NumericField and BinaryField with default FieldType's.

  was (Author: ntankovic):
A baby step towards dividing AbstractField and Field towards Field, 
TextField, StringField, NumericField and BinaryField with default FieldType's.
  
 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-2.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2919:
--

Attachment: LUCENE-2919-filter.patch

Final patch:
- improved tests
- changed api to be able to pass arbitrary filter

This ready to commit, will do this soon, as the current trunk is unfortunately 
broken (splits incorrect)

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3171) BlockJoinQuery/Collector

2011-06-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3171:
---

Attachment: LUCENE-3171.patch

New patch, I think it's ready to commit!

 BlockJoinQuery/Collector
 

 Key: LUCENE-3171
 URL: https://issues.apache.org/jira/browse/LUCENE-3171
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/other
Reporter: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3171.patch, LUCENE-3171.patch


 I created a single-pass Query + Collector to implement nested docs.
 The approach is similar to LUCENE-2454, in that the app must index
 documents in join order, as a block (IW.add/updateDocuments), with
 the parent doc at the end of the block, except that this impl is one
 pass.
 Once you join at indexing time, you can take any query that matches
 child docs and join it up to the parent docID space, using
 BlockJoinQuery.  You then use BlockJoinCollector, which sorts parent
 docs by provided Sort, to gather results, grouped by parent; this
 collector finds any BlockJoinQuerys (using Scorer.visitScorers) and
 retains the child docs corresponding to each collected parent doc.
 After searching is done, you retrieve the TopGroups from a provided
 BlockJoinQuery.
 Like LUCENE-2454, this is less general than the arbitrary joins in
 Solr (SOLR-2272) or parent/child from ElasticSearch
 (https://github.com/elasticsearch/elasticsearch/issues/553), since you
 must do the join at indexing time as a doc block, but it should be
 able to handle nested joins as well as joins to multiple tables,
 though I don't yet have test cases for these.
 I put this in a new Join module (modules/join); I think as we
 refactor join impls we should put them here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051495#comment-13051495
 ] 

Paul Elschot commented on LUCENE-2454:
--

NestedDocumentQuery already implements rewrite() by returning *this*, just as 
in 3171.

This is a more complete traceback of exception:

{noformat}
[junit] java.lang.UnsupportedOperationException: 
org.apache.lucene.search.NumericRangeQuery
[junit] at org.apache.lucene.search.Query.createWeight(Query.java:91)
[junit] at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177)
[junit] at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358)
[junit] at 
org.apache.lucene.search.nested.NestedDocumentQuery.createWeight(NestedDocumentQuery.java:65)
[junit] at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177)
[junit] at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358)
[junit] at 
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:676)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:292)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
[junit] at 
org.apache.lucene.search.TestNestedDocumentQuery.testSimple(TestNestedDocumentQuery.java:92)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1414)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1332)
{noformat}

Could BooleanWeight be the offendor?



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3206) FST package API refactoring

2011-06-18 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051496#comment-13051496
 ] 

Dawid Weiss commented on LUCENE-3206:
-

I think I know how to compare storing byte[] of UTF8 as compared to 
vint-encoded codepoints in UTF32 -- I'll encode the wikipedia terms list in 
both ways and we will see what comes out. Theoretically they should be very, 
very similar (and full unicode codepoints should generate fewer arcs) because 
UTF8 uses an encoding scheme with similar overhead to vint encoding... os if 
something is a single-byte sequence in UTF8, will remain single byte vint. 
Double-byte UTF8 character will remaing double-byte vint (last double byte 
codepoint is 0x7ff=2047, whereas the last double byte vint is 2^14=16384. And 
so on. So for text, vint-encoded UTF32 should be more compact than UTF8... The 
gain is of course when your labels are not text, but arbitrary bytes -- then 
byte[] representation would be nicer.



 FST package API refactoring
 ---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3206.patch


 The current API is still marked @experimental, so I think there's still time 
 to fiddle with it. I've been using the current API for some time and I do 
 have some ideas for improvement. This is a placeholder for these -- I'll post 
 a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051495#comment-13051495
 ] 

Paul Elschot edited comment on LUCENE-2454 at 6/18/11 10:45 AM:


NestedDocumentQuery already implements rewrite() by returning *this*, just as 
in 3171.

This is a more complete traceback of the exception:

{noformat}
[junit] java.lang.UnsupportedOperationException: 
org.apache.lucene.search.NumericRangeQuery
[junit] at org.apache.lucene.search.Query.createWeight(Query.java:91)
[junit] at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177)
[junit] at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358)
[junit] at 
org.apache.lucene.search.nested.NestedDocumentQuery.createWeight(NestedDocumentQuery.java:65)
[junit] at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177)
[junit] at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358)
[junit] at 
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:676)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:292)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
[junit] at 
org.apache.lucene.search.TestNestedDocumentQuery.testSimple(TestNestedDocumentQuery.java:92)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1414)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1332)
{noformat}

Could BooleanWeight be the offendor?

Or should NestedDocumentQuery rewrite by rewriting its child, as you suggested?
But rewrite to what then?



  was (Author: paul.elsc...@xs4all.nl):
NestedDocumentQuery already implements rewrite() by returning *this*, just 
as in 3171.

This is a more complete traceback of exception:

{noformat}
[junit] java.lang.UnsupportedOperationException: 
org.apache.lucene.search.NumericRangeQuery
[junit] at org.apache.lucene.search.Query.createWeight(Query.java:91)
[junit] at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177)
[junit] at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358)
[junit] at 
org.apache.lucene.search.nested.NestedDocumentQuery.createWeight(NestedDocumentQuery.java:65)
[junit] at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177)
[junit] at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358)
[junit] at 
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:676)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:292)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
[junit] at 
org.apache.lucene.search.TestNestedDocumentQuery.testSimple(TestNestedDocumentQuery.java:92)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1414)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1332)
{noformat}

Could BooleanWeight be the offendor?


  
 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051497#comment-13051497
 ] 

Uwe Schindler commented on LUCENE-2919:
---

Committed trunk revision: 1137162

Backporting...

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2919:
--

Attachment: (was: LUCENE-2919-3x.patch)

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Attachments: LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2919:
--

Attachment: LUCENE-2919-3x.patch

Patch for 3.x (not merged one).

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk - Build # 1598 - Still Failing

2011-06-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-trunk/1598/

No tests ran.

Build Log (for compile errors):
[...truncated 11605 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2919.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.3

Committed 3.x revision: 1137166

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3212) Supply FilterIndexReader based on any o.a.l.search.Filter

2011-06-18 Thread Uwe Schindler (JIRA)
Supply FilterIndexReader based on any o.a.l.search.Filter
-

 Key: LUCENE-3212
 URL: https://issues.apache.org/jira/browse/LUCENE-3212
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


When coding LUCENE-2919 (PKIndexSplitter), Mike and me had the idea, how to 
effectively apply filters on the lowest level (before query execution). This is 
very useful for e.g. security Filters that simply hide some documents. 
Currently when you apply the filter after searching, lots of useless work was 
done like scoring filtered documents, iterating term positions (for Phrases),...

This patch will provide a FilterIndexReader subclass (4.0 only, 3.x is too 
complicated to implement), that hides filtered documents by returning them in 
getDeletedDocs(). In contrast to LUCENE-2919, the filtering will work on 
per-segment (without SlowMultiReaderWrapper), so per segment search keeps 
available and reopening can be done very efficient, as the filter is only 
calculated on openeing new or changed segments.

This filter should improve use-cases where the filter can be applied one time 
before all queries (like security filters) on (re-)opening the IndexReader.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3213) Use AtomicReaderContext also for CustomScoreProvider

2011-06-18 Thread Uwe Schindler (JIRA)
Use AtomicReaderContext also for CustomScoreProvider


 Key: LUCENE-3213
 URL: https://issues.apache.org/jira/browse/LUCENE-3213
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


When moving to AtomicReaderContext, one place was not changed to use it: 
CustomScoreQuery's CustomScoreProvider. It should also take AtomicReaderContext 
instead of IndexReader, as this may help users to effectively implement custom 
scoring there absolute DocIds are needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051502#comment-13051502
 ] 

Paul Elschot edited comment on LUCENE-2454 at 6/18/11 11:39 AM:


One of the nocommits in the patch is about the use of an Filter for the parent 
filter.
NesteDocumentQuery uses an OpenBitSet from this Filter for next() and advance() 
just like a Filter and also as a parent filter.

So how about adding sth like this:

{code}
public abstract class ParentFilter {
  public abstract ParentDISI getParentDISI(IndexReader reader);
}

public class ParentDISI extends DocIdSetIterator {
  public int getParent(); // to be used only after next() or advance() returned 
 NO_MORE_DOCS
}

{code}

together with another constructor for NestedDocumentQuery with a ParentFilter 
argument?


  was (Author: paul.elsc...@xs4all.nl):
One of the nocommits in the patch is about the use of an Filter for the 
parent filter.
NesteDocumentQuery uses an OpenBitSet from this Filter for next() and advance() 
just like a Filter and also as a parent filter.

So how about adding sth like this:

{code}
public abstract class ParentFilter {
  public abstract ParentDISI getParentDISI(IndexReader reader);
}

public class ParentDISI extends DocIdSetIterator {
  public int getParent(); // to be used only after next() or advance() returned 
 NO_MORE_DOCS
}

{code}

together with another constructor for NestedDocumentIterator with a 
ParentFilter argument?

  
 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051502#comment-13051502
 ] 

Paul Elschot commented on LUCENE-2454:
--

One of the nocommits in the patch is about the use of an Filter for the parent 
filter.
NesteDocumentQuery uses an OpenBitSet from this Filter for next() and advance() 
just like a Filter and also as a parent filter.

So how about adding sth like this:

{code}
public abstract class ParentFilter {
  public abstract ParentDISI getParentDISI(IndexReader reader);
}

public class ParentDISI extends DocIdSetIterator {
  public int getParent(); // to be used only after next() or advance() returned 
 NO_MORE_DOCS
}

{code}

together with another constructor for NestedDocumentIterator with a 
ParentFilter argument?


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051502#comment-13051502
 ] 

Paul Elschot edited comment on LUCENE-2454 at 6/18/11 11:41 AM:


One of the nocommits in the patch is about the use of an Filter for the parent 
filter.
NestedDocumentQuery uses an OpenBitSet from this Filter for next() and 
advance() just like a Filter and also as a parent filter.

So how about adding sth like this:

{code}
public abstract class ParentFilter {
  public abstract ParentDISI getParentDISI(IndexReader reader);
}

public abstract class ParentDISI extends DocIdSetIterator {
  public abstract int getParent(); // to be used only after next() or advance() 
returned  NO_MORE_DOCS
}

{code}

together with another constructor for NestedDocumentQuery with a ParentFilter 
argument?


  was (Author: paul.elsc...@xs4all.nl):
One of the nocommits in the patch is about the use of an Filter for the 
parent filter.
NesteDocumentQuery uses an OpenBitSet from this Filter for next() and advance() 
just like a Filter and also as a parent filter.

So how about adding sth like this:

{code}
public abstract class ParentFilter {
  public abstract ParentDISI getParentDISI(IndexReader reader);
}

public class ParentDISI extends DocIdSetIterator {
  public int getParent(); // to be used only after next() or advance() returned 
 NO_MORE_DOCS
}

{code}

together with another constructor for NestedDocumentQuery with a ParentFilter 
argument?

  
 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-2454:
-

Comment: was deleted

(was: One of the nocommits in the patch is about the use of an Filter for the 
parent filter.
NestedDocumentQuery uses an OpenBitSet from this Filter for next() and 
advance() just like a Filter and also as a parent filter.

So how about adding sth like this:

{code}
public abstract class ParentFilter {
  public abstract ParentDISI getParentDISI(IndexReader reader);
}

public abstract class ParentDISI extends DocIdSetIterator {
  public abstract int getParent(); // to be used only after next() or advance() 
returned  NO_MORE_DOCS
}

{code}

together with another constructor for NestedDocumentQuery with a ParentFilter 
argument?
)

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3213) Use AtomicReaderContext also for CustomScoreProvider

2011-06-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3213:
--

Attachment: LUCENE-3213.patch

Easy patch, will commit soon!

 Use AtomicReaderContext also for CustomScoreProvider
 

 Key: LUCENE-3213
 URL: https://issues.apache.org/jira/browse/LUCENE-3213
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0

 Attachments: LUCENE-3213.patch


 When moving to AtomicReaderContext, one place was not changed to use it: 
 CustomScoreQuery's CustomScoreProvider. It should also take 
 AtomicReaderContext instead of IndexReader, as this may help users to 
 effectively implement custom scoring there absolute DocIds are needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3213) Use AtomicReaderContext also for CustomScoreProvider

2011-06-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3213.
---

Resolution: Fixed

Committed trunk revision: 1137176

 Use AtomicReaderContext also for CustomScoreProvider
 

 Key: LUCENE-3213
 URL: https://issues.apache.org/jira/browse/LUCENE-3213
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0

 Attachments: LUCENE-3213.patch


 When moving to AtomicReaderContext, one place was not changed to use it: 
 CustomScoreQuery's CustomScoreProvider. It should also take 
 AtomicReaderContext instead of IndexReader, as this may help users to 
 effectively implement custom scoring there absolute DocIds are needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2607) Clean up /clients directory

2011-06-18 Thread Eric Pugh (JIRA)
Clean up /clients directory
---

 Key: SOLR-2607
 URL: https://issues.apache.org/jira/browse/SOLR-2607
 Project: Solr
  Issue Type: Task
  Components: clients - java, clients - ruby - flare
Affects Versions: 4.0
Reporter: Eric Pugh
Priority: Minor


The /clients directory is a bit of a mess.  The only actively maintained client 
SolrJ is actually in the /dist directory!   The other clients that used to be 
in here, /php and /javascript (I think!) have been moved.  The only one is 
/ruby, and it isn't actively maintained the way other ruby clients are.

I'd recommend just removing the /clients directory since it's very confusing to 
a new user who would logically go here to find clients, and only find a ruby 
one!  It would also let us slim down the size of the download.

Alterntively if we want the /clients directory, then lets copy over the solrj 
lib to this dir instead of /dist

I am happy to submit a patch if this makes sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-06-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051512#comment-13051512
 ] 

Michael McCandless commented on LUCENE-2308:


Patch looks good, thanks Nikola!

When you make the patch, can you run svn diff from the top-level
dir?  Ie, so that file paths look
lucene/src/java/org/apache/lucene/document/Field.java

A couple minor code-formatting things:

  * Please add { } around one-line ifs, eg in FieldType.toString

  * import lines go after the copyright (FieldType.java)

  * If possible please try to avoid adding noise to the patch, for
example re-formatting javadocs (eg NumericField.java).  It's fine
to clean things up (add missing {}'s to existing code) as you go,
but if it's simply a reformat that just adds noise which makes it
harder to see real changes.

Other stuff:

  * The DEFAULT_TYPE for each field can be final right?

  * For FieldType, can we use direct members of the class, instead of
the EnumSet?  (Ie, boolean indexed, boolean stored, etc.).

The patch causes compilation errors when I run ant compile-core, but
that's expected right?

I think our immediate goal here should be to get a compilable patch
with tests passing, ie the dirt path.  Then we can go back and
iterate.

But, because so many tests rely on the current Document/Field API... I
think in order to stage this we should make a totally new package,
call it document2 for now, and create all these new classes inside
there.  Then, one by one we can cutover tests to use document2/*,
starting with TestDemo.  Eventually, once everything is cutover, we
can remove document and rename document2 to document.


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-2.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



faceting on multivalued fields

2011-06-18 Thread Dennis de Boer
Hello all,

I'm new to this list, so if I don't use it correctly, please say so.

I have a question about facetting on multivalued fields. I have indexed some
data from a product feed. One of my fields, the category field, is a
multivalued field.
This field contains multiple categories related to the product. For example
when I index a product feed composed of fashion items the category field can
be filled with
the values women men boys when this piece of clothing is available for
women, men and boys.

Like I said I indexed the category as a multivalued field. Now I want to
facet on it. Facetation works, however, not as expected.
When I query Solr with the following URL
q=*:*facet=truefacet.field=categoryfq=category:women, I receive the
following response

int name=women71/int
int name=men6/int
int name=babies1/int
int name=baby0/int
int name=boys0/int
int name=girls0/int

It looks like Solr returns every document where 'women' is part of the
multivalued category field, but also returns the facets(count) for all
keywords that where indexed as part of the multivalued field, along with
'Women'. In this example I got back documents which had the category field
indexed like  women men , women babies and women babies men.

What is worse, since it also calculates the facets for babies and men, when
I put another facet.field in the query (like brand), the response also
returns brands for categories men and babies.

Is this as designed? Is there a way to let Solr *only* return the documents
which have 'women' in their category field?


Thanks a lot for the help!

Regards,
Dennis


[jira] [Resolved] (LUCENE-3209) Memory codec

2011-06-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3209.


Resolution: Fixed

 Memory codec
 

 Key: LUCENE-3209
 URL: https://issues.apache.org/jira/browse/LUCENE-3209
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3209.patch


 This codec stores all terms/postings in RAM.  It uses an
 FSTBytesRef.  This is useful on a primary key field to ensure
 lookups don't need to hit disk, to keep NRT reopen time fast even
 under IO contention.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: faceting on multivalued fields

2011-06-18 Thread Gora Mohanty
On Sat, Jun 18, 2011 at 5:36 PM, Dennis de Boer datdeb...@gmail.com wrote:
 Hello all,

 I'm new to this list, so if I don't use it correctly, please say so.

 I have a question about facetting on multivalued fields. I have indexed some
 data from a product feed. One of my fields, the category field, is a
 multivalued field.
[...]

Please show us the definition of this field in the Solr schema,
including the tokenizers/analyzers on the field type. My guess
is that your facet field is getting analyzed. This might be of help:
http://wiki.apache.org/solr/SolrFacetingOverview

Regards,
Gora

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: faceting on multivalued fields

2011-06-18 Thread Jayendra Patil
As Mohanty said, your facet field seems to be analysed with the white
space tokenizer (field type probably text) which would generate
individual tokens for category - women babies men and hence the
individual facets.
You should use string as the field type for category so that it is not
tokenized.

Regards,
Jayendra

On Sat, Jun 18, 2011 at 11:50 AM, Gora Mohanty g...@mimirtech.com wrote:
 On Sat, Jun 18, 2011 at 5:36 PM, Dennis de Boer datdeb...@gmail.com wrote:
 Hello all,

 I'm new to this list, so if I don't use it correctly, please say so.

 I have a question about facetting on multivalued fields. I have indexed some
 data from a product feed. One of my fields, the category field, is a
 multivalued field.
 [...]

 Please show us the definition of this field in the Solr schema,
 including the tokenizers/analyzers on the field type. My guess
 is that your facet field is getting analyzed. This might be of help:
 http://wiki.apache.org/solr/SolrFacetingOverview

 Regards,
 Gora

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3214) Ability to mlock certain fields from the terms dict

2011-06-18 Thread Michael McCandless (JIRA)
Ability to mlock certain fields from the terms dict
---

 Key: LUCENE-3214
 URL: https://issues.apache.org/jira/browse/LUCENE-3214
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless


This is a hacked up prototype!  It works but I'm not sure how to get this to a 
committable point.

The patch invokes mlock() (tested only on Linux), locking pages from the terms 
dictionary file that hold terms for a specified field.  You can only do this 
with MMapDirectory.

I used this to lock pages for the id field in the NRT stress test; it's an 
alternative to MemoryCodec.  But, it requires you set up the OS to allow the 
app/user to lock pages in RAM.

It works very well in reducing the NRT reopen latency even when large merges 
are running...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3214) Ability to mlock certain fields from the terms dict

2011-06-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3214:
---

Attachment: LUCENE-3214.patch

Prototype hacked up but working patch.

 Ability to mlock certain fields from the terms dict
 ---

 Key: LUCENE-3214
 URL: https://issues.apache.org/jira/browse/LUCENE-3214
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3214.patch


 This is a hacked up prototype!  It works but I'm not sure how to get this to 
 a committable point.
 The patch invokes mlock() (tested only on Linux), locking pages from the 
 terms dictionary file that hold terms for a specified field.  You can only do 
 this with MMapDirectory.
 I used this to lock pages for the id field in the NRT stress test; it's an 
 alternative to MemoryCodec.  But, it requires you set up the OS to allow the 
 app/user to lock pages in RAM.
 It works very well in reducing the NRT reopen latency even when large merges 
 are running...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051552#comment-13051552
 ] 

Jason Rutherglen commented on LUCENE-2919:
--

Thanks, committing this means I can remove a custom GitHub branch with only 
this patch.  Also, it'd be great if we somehow published nightly versions to 
Maven repositories.  Though they'd accumulate over time.

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-06-18 Thread Nikola Tankovic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051553#comment-13051553
 ] 

Nikola Tankovic commented on LUCENE-2308:
-

Thanks Mike,

everything sound good, I'll correct suggested things, then start with document2 
package! :)

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-2.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time

2011-06-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3197.


Resolution: Fixed

 Optimize runs forever if you keep deleting docs at the same time
 

 Key: LUCENE-3197
 URL: https://issues.apache.org/jira/browse/LUCENE-3197
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3197.patch


 Because we cascade merges for an optimize... if you also delete documents 
 while the merges are running, then the merge policy will see the resulting 
 single segment as still not optimized (since it has pending deletes) and do a 
 single-segment merge, and will repeat indefinitely (as long as your app keeps 
 deleting docs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051575#comment-13051575
 ] 

Ryan McKinley commented on LUCENE-2919:
---

to get the current maven build, check:
https://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts/


 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051576#comment-13051576
 ] 

Jason Rutherglen commented on LUCENE-2919:
--

@Ryan Thanks!  What would one place as the artifact info into the pom.xml?

 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term

2011-06-18 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051586#comment-13051586
 ] 

Ryan McKinley commented on LUCENE-2919:
---

Jason... not really sure what you are asking  4.0-SNAPSHOT?
https://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts/org/apache/lucene/lucene-core/4.0-SNAPSHOT/maven-metadata.xml


 IndexSplitter that divides by primary key term
 --

 Key: LUCENE-2919
 URL: https://issues.apache.org/jira/browse/LUCENE-2919
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, 
 LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch


 Index splitter that divides by primary key term.  The contrib 
 MultiPassIndexSplitter we have divides by docid, however to guarantee 
 external constraints it's sometimes necessary to split by a primary key term 
 id.  I think this implementation is a fairly trivial change.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3206) FST package API refactoring

2011-06-18 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051597#comment-13051597
 ] 

Dawid Weiss commented on LUCENE-3206:
-

I encoded wikipedia termslist in UTF32 (int4) and UTF8 (int1). Interesting 
results:
{noformat}
271,461,850 utf32.fst
Arcs:  64.485.082
Nodes: 36.624.613

270,137,939 utf8.fst
Arcs:  66.478.193
Nodes: 38.687.637
{noformat}

So... the files are pretty much the same size... UTF32 is slighly bigger, but 
(as predicted) it has fewer arcs and fewer nodes. I checked and ALL input UTF8 
strings are the same or longer than vint-coded UTF32 sequences... So how come 
UTF32 automaton is larger? I have no clue -- I assume it may be something with 
the size of v-coded pointers... but I have no clue. In any case, the size gain 
from using int1 to encode UTF8 is minimal over just using full unicode 
codepoints and v-coded int4. Performance-wise it may be a hit (because one 
would need to convert UTF8/UTF16 to full unicode codepoints), but size-wise it 
seems to be relatively the same.

 FST package API refactoring
 ---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3206.patch


 The current API is still marked @experimental, so I think there's still time 
 to fiddle with it. I've been using the current API for some time and I do 
 have some ideas for improvement. This is a placeholder for these -- I'll post 
 a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3206) FST package API refactoring

2011-06-18 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051598#comment-13051598
 ] 

Dawid Weiss commented on LUCENE-3206:
-

Oh, a wild guess: with int4 more nodes will be expanded into bsearch arrays 
(fixed size arcs). This may account for the observed size difference. And it 
may matter for traversals too (because int4 nodes will have a higher fanout, 
especially at root and first levels... something to consider).

 FST package API refactoring
 ---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3206.patch


 The current API is still marked @experimental, so I think there's still time 
 to fiddle with it. I've been using the current API for some time and I do 
 have some ideas for improvement. This is a placeholder for these -- I'll post 
 a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8901 - Failure

2011-06-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8901/

1 tests failed.
REGRESSION:  org.apache.lucene.TestExternalCodecs.testPerFieldCodec

Error Message:
expected:727 but was:728

Stack Trace:
junit.framework.AssertionFailedError: expected:727 but was:728
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333)
at 
org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:566)




Build Log (for compile errors):
[...truncated 3256 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8901 - Failure

2011-06-18 Thread Robert Muir
this one was triggered by LUCENE-3197

On Sat, Jun 18, 2011 at 5:59 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8901/

 1 tests failed.
 REGRESSION:  org.apache.lucene.TestExternalCodecs.testPerFieldCodec

 Error Message:
 expected:727 but was:728

 Stack Trace:
 junit.framework.AssertionFailedError: expected:727 but was:728
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333)
        at 
 org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:566)




 Build Log (for compile errors):
 [...truncated 3256 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051611#comment-13051611
 ] 

Paul Elschot commented on LUCENE-2454:
--

At Query, the javadocs of both createWeight() and rewrite() start with a word 
of warning.
I'll probably need at least a few days to wrap my head around it, so in case 
anyone meanwhile can provide more help...

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2589) Commenting out the arr name=queries section in firstSearcher generates an NPE

2011-06-18 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051619#comment-13051619
 ] 

Erick Erickson commented on SOLR-2589:
--

Forgot to include the issue number in the comment, so it's not showing up here, 
the revision is r1137092. Here's the ViewVC link: 
http://svn.apache.org/viewvc?view=revisionrevision=r1137092

Thanks Steve for pointing this out.

 Commenting out  the arr name=queries section in firstSearcher generates 
 an NPE
 --

 Key: SOLR-2589
 URL: https://issues.apache.org/jira/browse/SOLR-2589
 Project: Solr
  Issue Type: Bug
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Trivial
 Fix For: 3.3, 4.0

 Attachments: SOLR-2589-3x.patch, SOLR-2589.patch, SOLR-2589.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 This has been around from at least 1.4.1, it just clutters up the log, it's 
 pretty harmless but easy to fix. I'll get it done as soon as I get my account 
 set up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-18 Thread Erick Erickson
Way cool!

On Sat, Jun 18, 2011 at 4:56 AM, Stefan Matheis (steffkes) (JIRA)
j...@apache.org wrote:

    [ 
 https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051480#comment-13051480
  ]

 Stefan Matheis (steffkes) commented on SOLR-2399:
 -

 Shawn:
 Of course, it's just a quick prototype to demonstrate the functionality. The 
 Layout will change if it's integrated :)

 Uwe:
 Thanks for the Changes! Yes, the Analysis-Page has a few things that needs to 
 be changed - mainly regarding layout/arrangement, but also functionality. 
 Will see if i can finish working on that next Week.

 noah:
 Thanks, it's integrated .. {{id}} as property works w/o problems? The Layout 
 on Safari looks good, compared to the provided Screenshots?

 Erick:
 that was a [quick 
 change|https://github.com/steffkes/solr-admin/commit/799da2e97889b7a576eaf1a516511bc126dcb1b4]
  : every entry has now it's own url .. so after reloading the page, the view 
 will be the same as before.

 Solr Admin Interface, reworked
 --

                 Key: SOLR-2399
                 URL: https://issues.apache.org/jira/browse/SOLR-2399
             Project: Solr
          Issue Type: Improvement
          Components: web gui
            Reporter: Stefan Matheis (steffkes)
            Assignee: Ryan McKinley
            Priority: Minor
             Fix For: 4.0

         Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, 
 SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
 SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

 --
 This message is automatically generated by JIRA.
 For more information on JIRA, see: http://www.atlassian.com/software/jira



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-3.x - Build # 412 - Still Failing

2011-06-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-3.x/412/

2 tests failed.
REGRESSION:  org.apache.lucene.search.TestPhraseQuery.testRandomPhrases

Error Message:
GC overhead limit exceeded

Stack Trace:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:89)
at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:62)
at 
org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132)
at 
org.apache.lucene.store.RAMOutputStream.writeBytes(RAMOutputStream.java:118)
at 
org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:119)
at 
org.apache.lucene.index.TermInfosWriter.writeTerm(TermInfosWriter.java:227)
at org.apache.lucene.index.TermInfosWriter.add(TermInfosWriter.java:191)
at 
org.apache.lucene.index.FormatPostingsDocsWriter.finish(FormatPostingsDocsWriter.java:122)
at 
org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:314)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:119)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:113)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:70)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at 
org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3542)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3507)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2063)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2030)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108)
at 
org.apache.lucene.search.TestPhraseQuery.testRandomPhrases(TestPhraseQuery.java:659)


FAILED:  org.apache.lucene.util.fst.TestFSTs.testBigSet

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.IntsRef.copy(IntsRef.java:111)
at org.apache.lucene.util.IntsRef.init(IntsRef.java:44)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.verifyPruned(TestFSTs.java:791)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:499)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:363)
at org.apache.lucene.util.fst.TestFSTs.doTest(TestFSTs.java:211)
at 
org.apache.lucene.util.fst.TestFSTs.testRandomWords(TestFSTs.java:944)
at org.apache.lucene.util.fst.TestFSTs.testBigSet(TestFSTs.java:964)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1272)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1190)




Build Log (for compile errors):
[...truncated 12536 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings

2011-06-18 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-2535:


Attachment: SOLR-2535.patch

Here's the patch I used.  As before, it's just David's with the extra changes 
omitted.

 In Solr 3.2 and trunk the admin/file handler fails to show directory listings
 -

 Key: SOLR-2535
 URL: https://issues.apache.org/jira/browse/SOLR-2535
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.1, 3.2, 4.0
 Environment: java 1.6, jetty
Reporter: Peter Wolanin
 Fix For: 3.3

 Attachments: SOLR-2535.patch, 
 SOLR-2535_fix_admin_file_handler_for_directory_listings.patch


 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
 listing of the conf directory, like:
 {noformat}
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/int/lst
 lst name=files
   lst name=elevate.xmllong name=size1274/longdate 
 name=modified2011-03-06T20:42:54Z/date/lst
   ...
 /lst
 /response
 {noformat}
 I can list the xslt sub-dir using solr/admin/files?file=/xslt
 In Solr 3.1.0, both of these fail with a 500 error:
 {noformat}
 HTTP ERROR 500
 Problem accessing /solr/admin/file/. Reason:
 did not find a CONTENT object
 java.io.IOException: did not find a CONTENT object
 {noformat}
 Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
 should still handle directory listings if not file name is given, or if the 
 file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk - Build # 1599 - Still Failing

2011-06-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-trunk/1599/

2 tests failed.
FAILED:  org.apache.lucene.search.TestPhraseQuery.testRandomPhrases

Error Message:
close() called in wrong state: INCREMENT

Stack Trace:
junit.framework.AssertionFailedError: close() called in wrong state: INCREMENT
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333)
at 
org.apache.lucene.analysis.MockTokenizer.close(MockTokenizer.java:176)
at org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:48)
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:187)
at 
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:293)
at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:229)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:372)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1474)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1234)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1215)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:163)
at 
org.apache.lucene.search.TestPhraseQuery.testRandomPhrases(TestPhraseQuery.java:659)


FAILED:  org.apache.lucene.util.fst.TestFSTs.testBigSet

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.IntsRef.copy(IntsRef.java:111)
at org.apache.lucene.util.IntsRef.init(IntsRef.java:44)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.verifyPruned(TestFSTs.java:793)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:501)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:365)
at org.apache.lucene.util.fst.TestFSTs.doTest(TestFSTs.java:213)
at 
org.apache.lucene.util.fst.TestFSTs.testRandomWords(TestFSTs.java:946)
at org.apache.lucene.util.fst.TestFSTs.testBigSet(TestFSTs.java:966)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333)




Build Log (for compile errors):
[...truncated 11546 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8907 - Failure

2011-06-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8907/

1 tests failed.
REGRESSION:  org.apache.lucene.TestExternalCodecs.testPerFieldCodec

Error Message:
expected:720 but was:721

Stack Trace:
junit.framework.AssertionFailedError: expected:720 but was:721
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333)
at 
org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:566)




Build Log (for compile errors):
[...truncated 3266 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org