[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094364#comment-13094364
 ] 

Simon Willnauer commented on LUCENE-2308:
-

Hey guys, why don't we put plain old immutable java objects with a single ctor 
into core and add a builder API into modules / sandbox? This keeps things 
simple in core and if users want to use it they can grab it out of a module? 

bq. Can we avoid the builder API? I think we shouldnt invite accidental 
creation of lots of FieldType instances during indexing... why not just a 
single ctor in fieldtype that takes all the parameters the base class cares 
about? then it serves double-duty as the 'expert' fieldtype anyway, subclasses 
like TextField are just the sugar.

so I haven't seen a single technical argument against a builder here. I 
personally think that a builder has many advantages:

* simple to add new fields, doesn't need deprecation if you add another field 
to a type
* simple to use since lots of people are use to chaining
* provides immutability by design
* represents a small but clear DSL to build a field type. you could do things 
like providing setters for TV only if you chain it with a call to indexed() 
like: {code} builder.indexed().storeTV(); {code} which would not be visible 
otherwise. 
* a ctor call will require many parameters that you don't want to set, but 
you're forced to pass a value for them anyway
* since most of the parameters are booleans long sequences of identically typed 
parameters can cause subtle bugs. If the user accidentally reverses two such 
parameters, the compiler won't complain, but the program will misbehave at 
runtime. That sucks! especially if you spend hours of indexing and realize that 
your TV has not been stored because you missed to set indexed = true
* builder code is easy to write and, more importantly, to read.
* a builder simulates named optional parameters like in python and other 
languages which java is lacking.

I think the Builder pattern is a good choice when designing classes whose 
constructors would have more than a handful of parameters, especially if most 
of those parameters are optional. Client code is much easier to read and write 
with builders than with the traditional telescoping constructor pattern.



 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

new AutomatonQuery(RunAutomaton) ?

2011-08-31 Thread eks dev
At the moment it is not possible (?) to construct AutomatonQuery with
RunAutomaton.
Would it make sense to add this possibility? Is it doable at all?

I have to keep a collection of RunAtomaton-s for other purposes (after
search feature extraction) and it would be handy to feed them directly
to AutomatonQuery.
I could as well keep cached AutomatonQuery objects (Field name does
not change), but then I would need to get (Run)Automaton from the
Query...

Thanks,
eks.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-08-31 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094368#comment-13094368
 ] 

Simon Willnauer commented on LUCENE-2312:
-

jason, I will look at this patch soon I hope. Busy times here right now so 
gimme some time.

thanks

 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Fix For: Realtime Branch

 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, 
 LUCENE-2312.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094371#comment-13094371
 ] 

Chris Male commented on LUCENE-2308:


+1

I couldn't have put it better myself.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094378#comment-13094378
 ] 

Uwe Schindler commented on LUCENE-2308:
---

+1

I agree, too! I am personally in favour of builder patterns when parameters get 
beyond 3 or 4, especially if they are simply booleans.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094379#comment-13094379
 ] 

Uwe Schindler commented on LUCENE-2308:
---

Somehow related, but for the same reasons (too many booleans in ctor), 
WordDelimiterFilter would also be a candidate for a WordDelimiterFilterBuilder.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3409) NRT reader/writer over RAMDirectory memory leak

2011-08-31 Thread tal steier (JIRA)
NRT reader/writer over RAMDirectory memory leak
---

 Key: LUCENE-3409
 URL: https://issues.apache.org/jira/browse/LUCENE-3409
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.3, 3.0.2
Reporter: tal steier


with NRT reader/writer, emptying an index using:
writer.deleteAll()
writer.commit()
doesn't release all allocated memory.

for example the following code will generate a memory leak:

/**
 * Reveals a memory leak in NRT reader/writerbr
 * 
 * The following main() does 10K cycles of:
 * ul
 * liAdd 10K empty documents to index writer/li
 * licommit()/li
 * liopen NRT reader over the writer, and immediately close it/li
 * lidelete all documents from the writer/li
 * licommit changes to the writer/li
 * /ul
 * 
 * Running with -Xmx256M results in an OOME after ~2600 cycles
 */
public static void main(String[] args) throws Exception {
RAMDirectory d = new RAMDirectory();
IndexWriter w = new IndexWriter(d, new 
IndexWriterConfig(Version.LUCENE_33, new KeywordAnalyzer()));
Document doc = new Document();

for(int i = 0; i  1; i++) {
for(int j = 0; j  1; ++j) {
w.addDocument(doc);
}
w.commit();
IndexReader.open(w, true).close();

w.deleteAll();
w.commit();
}

w.close();
d.close();
}   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3409) NRT reader/writer over RAMDirectory memory leak

2011-08-31 Thread Gilad Barkai (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094388#comment-13094388
 ] 

Gilad Barkai commented on LUCENE-3409:
--

This issue is relevant for trunk as well.
Please update the Affected versions accordingly.

 NRT reader/writer over RAMDirectory memory leak
 ---

 Key: LUCENE-3409
 URL: https://issues.apache.org/jira/browse/LUCENE-3409
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.0.2, 3.3
Reporter: tal steier

 with NRT reader/writer, emptying an index using:
 writer.deleteAll()
 writer.commit()
 doesn't release all allocated memory.
 for example the following code will generate a memory leak:
 /**
* Reveals a memory leak in NRT reader/writerbr
* 
* The following main() does 10K cycles of:
* ul
* liAdd 10K empty documents to index writer/li
* licommit()/li
* liopen NRT reader over the writer, and immediately close it/li
* lidelete all documents from the writer/li
* licommit changes to the writer/li
* /ul
* 
* Running with -Xmx256M results in an OOME after ~2600 cycles
*/
   public static void main(String[] args) throws Exception {
   RAMDirectory d = new RAMDirectory();
   IndexWriter w = new IndexWriter(d, new 
 IndexWriterConfig(Version.LUCENE_33, new KeywordAnalyzer()));
   Document doc = new Document();
   
   for(int i = 0; i  1; i++) {
   for(int j = 0; j  1; ++j) {
   w.addDocument(doc);
   }
   w.commit();
   IndexReader.open(w, true).close();
   w.deleteAll();
   w.commit();
   }
   
   w.close();
   d.close();
   }   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094397#comment-13094397
 ] 

Jan Høydahl commented on LUCENE-3130:
-

Let's get back to the original issue: we need some way to let the original 
form of a term have higher weight than the alternative forms generated by 
analysis (whether those are synonyms, stems, lowercase or what have you).

Is tagging the added tokens with a tokenType, and then enabling the QParsers to 
act on these tokenTypes a viable way forward?

 Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
 give lower boosts
 ---

 Key: LUCENE-3130
 URL: https://issues.apache.org/jira/browse/LUCENE-3130
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man

 A recent thread asked if there was anyway to use QueryTime synonyms such that 
 matches on the original term specified by the user would score higher then 
 matches on the synonym.  It occurred to me later that a float Attribute could 
 be set by the SynonymFilter in such situations, and QueryParser could use 
 that float as a boost in the resulting Query.  IThis would be fairly 
 straightforward for the simple synonyms = BooleamQuery case, but we'd have 
 to decide how to handle the case of synonyms with multiple terms that produce 
 MTPQ, possibly just punt for now)
 Likewise, there may be other TokenFilters that inject artificial tokens at 
 query time where it also might make sense to have a reduced boost factor...
 * SynonymFilter
 * CommonGramsFilter
 * WordDelimiterFilter
 * etc...
 In all of these cases, the amount of the boost could me configured, and for 
 back compact could default to 1.0 (or null to not set a boost at all)
 Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
 the boost attribute into the payload attribute, these same filters could give 
 penalizing payloads to terms when used at index time) could give 
 penalizing payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3409) NRT reader/writer over RAMDirectory memory leak

2011-08-31 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3409:
--

Assignee: Michael McCandless

 NRT reader/writer over RAMDirectory memory leak
 ---

 Key: LUCENE-3409
 URL: https://issues.apache.org/jira/browse/LUCENE-3409
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.0.2, 3.3
Reporter: tal steier
Assignee: Michael McCandless

 with NRT reader/writer, emptying an index using:
 writer.deleteAll()
 writer.commit()
 doesn't release all allocated memory.
 for example the following code will generate a memory leak:
 /**
* Reveals a memory leak in NRT reader/writerbr
* 
* The following main() does 10K cycles of:
* ul
* liAdd 10K empty documents to index writer/li
* licommit()/li
* liopen NRT reader over the writer, and immediately close it/li
* lidelete all documents from the writer/li
* licommit changes to the writer/li
* /ul
* 
* Running with -Xmx256M results in an OOME after ~2600 cycles
*/
   public static void main(String[] args) throws Exception {
   RAMDirectory d = new RAMDirectory();
   IndexWriter w = new IndexWriter(d, new 
 IndexWriterConfig(Version.LUCENE_33, new KeywordAnalyzer()));
   Document doc = new Document();
   
   for(int i = 0; i  1; i++) {
   for(int j = 0; j  1; ++j) {
   w.addDocument(doc);
   }
   w.commit();
   IndexReader.open(w, true).close();
   w.deleteAll();
   w.commit();
   }
   
   w.close();
   d.close();
   }   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3408) Remove unnecessary memory barriers in DWPT

2011-08-31 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094406#comment-13094406
 ] 

Michael McCandless commented on LUCENE-3408:


Looks good Simon!  Have you tested perf...?  Likely minor but you never know :)

 Remove unnecessary memory barriers in DWPT
 --

 Key: LUCENE-3408
 URL: https://issues.apache.org/jira/browse/LUCENE-3408
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3408.patch


 Currently DWPT still uses AtomicLong to count the bytesUsed. Each write 
 access issues an implicite memory barrier which is totally unnecessary since 
 we doing everything single threaded on that level. This might be very minor 
 but we shouldn't issue unnecessary memory barriers causing processors to lock 
 their instruction pipeline for no reason.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2723) If you don't choose a shard name for a SolrCore, the system should auto assign shard names.

2011-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094407#comment-13094407
 ] 

Jan Høydahl commented on SOLR-2723:
---

Hmm, should startup sequence determine role? Sceptical, but if this is on first 
boot only, and only if not choosing a shard name, perhaps...

 If you don't choose a shard name for a SolrCore, the system should auto 
 assign shard names.
 ---

 Key: SOLR-2723
 URL: https://issues.apache.org/jira/browse/SOLR-2723
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller
 Fix For: 4.0


 When you first boot up a node with the collection files to use, you might 
 also pass how many slices you want - if you choose 3 slices, the first 3 
 nodes that come up would each go to a different slice and get a unique shard 
 name - further nodes that come up would be replicas in each slice and get one 
 of the 3 shard names.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3408) Remove unnecessary memory barriers in DWPT

2011-08-31 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094409#comment-13094409
 ] 

Simon Willnauer commented on LUCENE-3408:
-

no I haven't tested perf yet, I think I will just wait for the nightly 
benchmark here.

I plan to commit this soon.

 Remove unnecessary memory barriers in DWPT
 --

 Key: LUCENE-3408
 URL: https://issues.apache.org/jira/browse/LUCENE-3408
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3408.patch


 Currently DWPT still uses AtomicLong to count the bytesUsed. Each write 
 access issues an implicite memory barrier which is totally unnecessary since 
 we doing everything single threaded on that level. This might be very minor 
 but we shouldn't issue unnecessary memory barriers causing processors to lock 
 their instruction pipeline for no reason.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3409) NRT reader/writer over RAMDirectory memory leak

2011-08-31 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3409:
---

Affects Version/s: 4.0
Fix Version/s: 4.0
   3.4

I found the issue: we are failing to drop pool'd readers in IW.deleteAll.  I'll 
commit fix shortly.

 NRT reader/writer over RAMDirectory memory leak
 ---

 Key: LUCENE-3409
 URL: https://issues.apache.org/jira/browse/LUCENE-3409
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.0.2, 3.3, 4.0
Reporter: tal steier
Assignee: Michael McCandless
 Fix For: 3.4, 4.0


 with NRT reader/writer, emptying an index using:
 writer.deleteAll()
 writer.commit()
 doesn't release all allocated memory.
 for example the following code will generate a memory leak:
 /**
* Reveals a memory leak in NRT reader/writerbr
* 
* The following main() does 10K cycles of:
* ul
* liAdd 10K empty documents to index writer/li
* licommit()/li
* liopen NRT reader over the writer, and immediately close it/li
* lidelete all documents from the writer/li
* licommit changes to the writer/li
* /ul
* 
* Running with -Xmx256M results in an OOME after ~2600 cycles
*/
   public static void main(String[] args) throws Exception {
   RAMDirectory d = new RAMDirectory();
   IndexWriter w = new IndexWriter(d, new 
 IndexWriterConfig(Version.LUCENE_33, new KeywordAnalyzer()));
   Document doc = new Document();
   
   for(int i = 0; i  1; i++) {
   for(int j = 0; j  1; ++j) {
   w.addDocument(doc);
   }
   w.commit();
   IndexReader.open(w, true).close();
   w.deleteAll();
   w.commit();
   }
   
   w.close();
   d.close();
   }   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1163568 - in /lucene/dev/trunk/lucene: CHANGES.txt src/java/org/apache/lucene/index/IndexFileDeleter.java src/java/org/apache/lucene/index/IndexWriter.java src/test/org/apache/lucene/

2011-08-31 Thread Simon Willnauer
On Wed, Aug 31, 2011 at 12:36 PM,  mikemcc...@apache.org wrote:
 Author: mikemccand
 Date: Wed Aug 31 10:36:36 2011
 New Revision: 1163568

 URL: http://svn.apache.org/viewvc?rev=1163568view=rev
 Log:
 LUCENE-3409: drop reader pool from IW.deleteAll

 Modified:
    lucene/dev/trunk/lucene/CHANGES.txt
    
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexFileDeleter.java
    lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java
    
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriter.java

 Modified: lucene/dev/trunk/lucene/CHANGES.txt
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/CHANGES.txt?rev=1163568r1=1163567r2=1163568view=diff
 ==
 --- lucene/dev/trunk/lucene/CHANGES.txt (original)
 +++ lucene/dev/trunk/lucene/CHANGES.txt Wed Aug 31 10:36:36 2011
 @@ -577,6 +577,10 @@ Bug fixes
   throw NoSuchDirectoryException when all files written so far have been
   written to one directory, but the other still has not yet been created on 
 the
   filesystem.  (Robert Muir)
 +
 +* LUCENE-3409: IndexWriter.deleteAll was failing to close pooled NRT
 +  SegmentReaders, leading to unused files accumulating in the
 +  Directory.  (tal steier via Mike McCandless)

  New Features


 Modified: 
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexFileDeleter.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexFileDeleter.java?rev=1163568r1=1163567r2=1163568view=diff
 ==
 --- 
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexFileDeleter.java
  (original)
 +++ 
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexFileDeleter.java
  Wed Aug 31 10:36:36 2011
 @@ -374,6 +374,10 @@ final class IndexFileDeleter {
   }

   public void refresh() throws IOException {
 +    // Set to null so that we regenerate the list of pending
 +    // files; else we can accumulate same file more than
 +    // once
 +    deletable = null;
     refresh(null);
   }


 Modified: 
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java?rev=1163568r1=1163567r2=1163568view=diff
 ==
 --- lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
 (original)
 +++ lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
 Wed Aug 31 10:36:36 2011
 @@ -600,6 +600,23 @@ public class IndexWriter implements Clos
       drop(info, IOContext.Context.MERGE);
     }

 +    public synchronized void dropAll() throws IOException {
 +      IteratorMap.EntrySegmentCacheKey,SegmentReader iter = 
 readerMap.entrySet().iterator();
 +      while (iter.hasNext()) {
 +
 +        final Map.EntrySegmentCacheKey,SegmentReader ent = iter.next();
 +
 +        SegmentReader sr = ent.getValue();
 +        sr.hasChanges = false;
 +        iter.remove();
 +
 +        // NOTE: it is allowed that this decRef does not
 +        // actually close the SR; this can happen when a
 +        // near real-time reader using this SR is still open
 +        sr.decRef();
 +      }
 +    }
 +

just being a little picky here, can't we simply iterate over the
readerMap.values() and call readerMap.clear() afterwards like
snip
for (SegementReader sr : readerMap.values()) {
  sr.hasChange = false;
  sr.decRef();
}
readerMap.clear();
/snip

the iter.remove() call does a key lookup each time which is totally
unnecessary (well this is super minor!) but it looks more readable,
its less code and slightly more efficient?

simon

     public synchronized void drop(SegmentInfo info, IOContext.Context 
 context) throws IOException {
       final SegmentReader sr;
       if ((sr = readerMap.remove(new SegmentCacheKey(info, context))) != 
 null) {
 @@ -2141,7 +2158,7 @@ public class IndexWriter implements Clos
       deleter.refresh();

       // Don't bother saving any changes in our segmentInfos
 -      readerPool.clear(null);
 +      readerPool.dropAll();

       // Mark that the index has changed
       ++changeCount;
 @@ -3698,7 +3715,6 @@ public class IndexWriter implements Clos

             synchronized(this) {
               deleter.deleteFile(compoundFileName);
 -
               deleter.deleteFile(IndexFileNames.segmentFileName(mergedName, 
 , IndexFileNames.COMPOUND_FILE_ENTRIES_EXTENSION));
               deleter.deleteNewFiles(merge.info.files());
             }

 Modified: 
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriter.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriter.java?rev=1163568r1=1163567r2=1163568view=diff
 

[jira] [Resolved] (LUCENE-3408) Remove unnecessary memory barriers in DWPT

2011-08-31 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3408.
-

Resolution: Fixed

 Remove unnecessary memory barriers in DWPT
 --

 Key: LUCENE-3408
 URL: https://issues.apache.org/jira/browse/LUCENE-3408
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3408.patch


 Currently DWPT still uses AtomicLong to count the bytesUsed. Each write 
 access issues an implicite memory barrier which is totally unnecessary since 
 we doing everything single threaded on that level. This might be very minor 
 but we shouldn't issue unnecessary memory barriers causing processors to lock 
 their instruction pipeline for no reason.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-2694) LogUpdateProcessor not thread safe

2011-08-31 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl closed SOLR-2694.
-

Resolution: Cannot Reproduce

Closing this for now - could reopen if necessary...

 LogUpdateProcessor not thread safe
 --

 Key: SOLR-2694
 URL: https://issues.apache.org/jira/browse/SOLR-2694
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.4.1, 3.1, 3.2, 3.3, 4.0
Reporter: Jan Høydahl

 Using the LogUpdateProcessor while feeding in multiple parallell threads does 
 not work, as LogUpdateProcessor is not threadsafe.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1163568 - in /lucene/dev/trunk/lucene: CHANGES.txt src/java/org/apache/lucene/index/IndexFileDeleter.java src/java/org/apache/lucene/index/IndexWriter.java src/test/org/apache/lucene/

2011-08-31 Thread Michael McCandless
Good idea, I'll fix!

Mike McCandless

http://blog.mikemccandless.com

On Wed, Aug 31, 2011 at 6:58 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 On Wed, Aug 31, 2011 at 12:36 PM,  mikemcc...@apache.org wrote:
 Author: mikemccand
 Date: Wed Aug 31 10:36:36 2011
 New Revision: 1163568

 URL: http://svn.apache.org/viewvc?rev=1163568view=rev
 Log:
 LUCENE-3409: drop reader pool from IW.deleteAll

 Modified:
    lucene/dev/trunk/lucene/CHANGES.txt
    
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexFileDeleter.java
    lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java
    
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriter.java

 Modified: lucene/dev/trunk/lucene/CHANGES.txt
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/CHANGES.txt?rev=1163568r1=1163567r2=1163568view=diff
 ==
 --- lucene/dev/trunk/lucene/CHANGES.txt (original)
 +++ lucene/dev/trunk/lucene/CHANGES.txt Wed Aug 31 10:36:36 2011
 @@ -577,6 +577,10 @@ Bug fixes
   throw NoSuchDirectoryException when all files written so far have been
   written to one directory, but the other still has not yet been created on 
 the
   filesystem.  (Robert Muir)
 +
 +* LUCENE-3409: IndexWriter.deleteAll was failing to close pooled NRT
 +  SegmentReaders, leading to unused files accumulating in the
 +  Directory.  (tal steier via Mike McCandless)

  New Features


 Modified: 
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexFileDeleter.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexFileDeleter.java?rev=1163568r1=1163567r2=1163568view=diff
 ==
 --- 
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexFileDeleter.java
  (original)
 +++ 
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexFileDeleter.java
  Wed Aug 31 10:36:36 2011
 @@ -374,6 +374,10 @@ final class IndexFileDeleter {
   }

   public void refresh() throws IOException {
 +    // Set to null so that we regenerate the list of pending
 +    // files; else we can accumulate same file more than
 +    // once
 +    deletable = null;
     refresh(null);
   }


 Modified: 
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java?rev=1163568r1=1163567r2=1163568view=diff
 ==
 --- 
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
 (original)
 +++ 
 lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
 Wed Aug 31 10:36:36 2011
 @@ -600,6 +600,23 @@ public class IndexWriter implements Clos
       drop(info, IOContext.Context.MERGE);
     }

 +    public synchronized void dropAll() throws IOException {
 +      IteratorMap.EntrySegmentCacheKey,SegmentReader iter = 
 readerMap.entrySet().iterator();
 +      while (iter.hasNext()) {
 +
 +        final Map.EntrySegmentCacheKey,SegmentReader ent = iter.next();
 +
 +        SegmentReader sr = ent.getValue();
 +        sr.hasChanges = false;
 +        iter.remove();
 +
 +        // NOTE: it is allowed that this decRef does not
 +        // actually close the SR; this can happen when a
 +        // near real-time reader using this SR is still open
 +        sr.decRef();
 +      }
 +    }
 +

 just being a little picky here, can't we simply iterate over the
 readerMap.values() and call readerMap.clear() afterwards like
 snip
 for (SegementReader sr : readerMap.values()) {
  sr.hasChange = false;
  sr.decRef();
 }
 readerMap.clear();
 /snip

 the iter.remove() call does a key lookup each time which is totally
 unnecessary (well this is super minor!) but it looks more readable,
 its less code and slightly more efficient?

 simon

     public synchronized void drop(SegmentInfo info, IOContext.Context 
 context) throws IOException {
       final SegmentReader sr;
       if ((sr = readerMap.remove(new SegmentCacheKey(info, context))) != 
 null) {
 @@ -2141,7 +2158,7 @@ public class IndexWriter implements Clos
       deleter.refresh();

       // Don't bother saving any changes in our segmentInfos
 -      readerPool.clear(null);
 +      readerPool.dropAll();

       // Mark that the index has changed
       ++changeCount;
 @@ -3698,7 +3715,6 @@ public class IndexWriter implements Clos

             synchronized(this) {
               deleter.deleteFile(compoundFileName);
 -
               deleter.deleteFile(IndexFileNames.segmentFileName(mergedName, 
 , IndexFileNames.COMPOUND_FILE_ENTRIES_EXTENSION));
               deleter.deleteNewFiles(merge.info.files());
             }

 Modified: 
 lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriter.java
 URL: 
 

Re: new AutomatonQuery(RunAutomaton) ?

2011-08-31 Thread Robert Muir
On Wed, Aug 31, 2011 at 3:51 AM, eks dev eks...@yahoo.co.uk wrote:
 At the moment it is not possible (?) to construct AutomatonQuery with
 RunAutomaton.
 Would it make sense to add this possibility? Is it doable at all?

Its not doable, we need more information than the runautomaton, its not enough.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094454#comment-13094454
 ] 

Robert Muir commented on LUCENE-3130:
-

{quote}
Let's get back to the original issue: we need some way to let the original 
form of a term have higher weight than the alternative forms generated by 
analysis (whether those are synonyms, stems, lowercase or what have you).
{quote}

I'm not sure we do! see my last response. I think 2 fields is just fine.

As for things like synonyms, these already set TypeAttribute. So if your 
consumer wants to do something on synonyms, look for type = SYNONYM or 
whatever it already sets.

 Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
 give lower boosts
 ---

 Key: LUCENE-3130
 URL: https://issues.apache.org/jira/browse/LUCENE-3130
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man

 A recent thread asked if there was anyway to use QueryTime synonyms such that 
 matches on the original term specified by the user would score higher then 
 matches on the synonym.  It occurred to me later that a float Attribute could 
 be set by the SynonymFilter in such situations, and QueryParser could use 
 that float as a boost in the resulting Query.  IThis would be fairly 
 straightforward for the simple synonyms = BooleamQuery case, but we'd have 
 to decide how to handle the case of synonyms with multiple terms that produce 
 MTPQ, possibly just punt for now)
 Likewise, there may be other TokenFilters that inject artificial tokens at 
 query time where it also might make sense to have a reduced boost factor...
 * SynonymFilter
 * CommonGramsFilter
 * WordDelimiterFilter
 * etc...
 In all of these cases, the amount of the boost could me configured, and for 
 back compact could default to 1.0 (or null to not set a boost at all)
 Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
 the boost attribute into the payload attribute, these same filters could give 
 penalizing payloads to terms when used at index time) could give 
 penalizing payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3409) NRT reader/writer over RAMDirectory memory leak

2011-08-31 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3409.


Resolution: Fixed

Thanks tal!

 NRT reader/writer over RAMDirectory memory leak
 ---

 Key: LUCENE-3409
 URL: https://issues.apache.org/jira/browse/LUCENE-3409
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.0.2, 3.3, 4.0
Reporter: tal steier
Assignee: Michael McCandless
 Fix For: 3.4, 4.0


 with NRT reader/writer, emptying an index using:
 writer.deleteAll()
 writer.commit()
 doesn't release all allocated memory.
 for example the following code will generate a memory leak:
 /**
* Reveals a memory leak in NRT reader/writerbr
* 
* The following main() does 10K cycles of:
* ul
* liAdd 10K empty documents to index writer/li
* licommit()/li
* liopen NRT reader over the writer, and immediately close it/li
* lidelete all documents from the writer/li
* licommit changes to the writer/li
* /ul
* 
* Running with -Xmx256M results in an OOME after ~2600 cycles
*/
   public static void main(String[] args) throws Exception {
   RAMDirectory d = new RAMDirectory();
   IndexWriter w = new IndexWriter(d, new 
 IndexWriterConfig(Version.LUCENE_33, new KeywordAnalyzer()));
   Document doc = new Document();
   
   for(int i = 0; i  1; i++) {
   for(int j = 0; j  1; ++j) {
   w.addDocument(doc);
   }
   w.commit();
   IndexReader.open(w, true).close();
   w.deleteAll();
   w.commit();
   }
   
   w.close();
   d.close();
   }   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094462#comment-13094462
 ] 

Robert Muir commented on LUCENE-2308:
-

{quote}
so I haven't seen a single technical argument against a builder here. I 
personally think that a builder has many advantages:
{quote}

I gave one already, it creates too many objects. It also adds complexity to the 
API.

Just because a constructor has a couple parameters does *NOT* mean a builder 
fits. In situations like this one, its a bad choice.


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094466#comment-13094466
 ] 

Chris Male commented on LUCENE-2308:


How does it create too many objects?

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094477#comment-13094477
 ] 

Robert Muir commented on LUCENE-2308:
-

we have to realize, most people indexing with lucene do it like this:

{noformat}
while(...) {
  Document doc = new Document(...);
  Field field1 = new Field(...);
  Field field2 = new Field(...);
}
{noformat}

So for MOST people FT is increasing the number of objects being created 
per-document (most people will create a new one for every field). I think we 
should keep that at a minimum.

Adding a builder on top, will at minimum require an additional object for the 
builder itself *AND*:
* creation of a new intermediate throw-away FieldType with *each* .set() OR
* creation of an additional mutable object used internally by the builder which 
will require keeping in sync with the immutable form.


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094476#comment-13094476
 ] 

Uwe Schindler commented on LUCENE-2308:
---

bq. How does it create too many objects?

Thats implementation internal. If you want final unmodifiable objects, every 
builder call will produce a new one in its return parameter (see ScorerContext).

In general the builder pattern can also change existing objects, like 
StringBuilder does.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094478#comment-13094478
 ] 

Robert Muir commented on LUCENE-2308:
-

{quote}
In general the builder pattern can also change existing objects, like 
StringBuilder does
{quote}

And thats another bug in the visitor anti-pattern, if you want to have a 
resulting immutable form, thats going to require either an object-creation orgy 
or a massive code duplication so that it can store an internal mutable form.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094491#comment-13094491
 ] 

Simon Willnauer commented on LUCENE-2308:
-

bq. Second time this morning you didn't even read what I said.
I did but apparently we talk about different things? the entire purpose of FT 
is that you don't have to create it multiple times so folks can create Field 
each time but they should reuse FT, no? I personally talk about createing FT 
using a builder but what uwe says is we can also do that for field though. 

Again how do you create way more object when you use a builder than when you 
use the ctor?

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094492#comment-13094492
 ] 

Chris Male commented on LUCENE-2308:


I'm confused about what the reuse of Field objects has to do with this? That 
seems a corollary issue.  Aren't we talking about reducing the cost of creating 
FieldType instances? Which as Simon said, are then shared.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094496#comment-13094496
 ] 

Robert Muir commented on LUCENE-2308:
-

Its shared only if the person reuses them explicitly, but if they arent reusing 
fields (like most people don't do), then they arent likely to reuse fieldtypes 
either.

In general, I think we shouldnt create so many objects or add so much 
complexity to the indexing loop. 

Personally I just dont think in practice people are going to set things up so 
that they actually reuse fieldtype.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094499#comment-13094499
 ] 

Michael McCandless commented on LUCENE-2308:


bq. Change FieldType to an interface inside index.* and use it for the source 
of properties about an IndexableField. 

+1, I think we should have an oal.index.FieldType interface, that
exposes (get-only) methods.  Ie, we'd just move the getters out of
IndexableField into this new FT interface (likewise for
StorableField).

This interface should be marked as experimental, ie, we are free to
change it.

bq. Add a builder for FieldType to document.* which will create FieldType 
instances.

I don't think we should use a builder API here; I think either
big-ctor-takes-all-settings and so all fields are final, or what we
have today (.freeze()) is better.

There are two things I don't like about the builder pattern: setter
chaining and the object overhead of hard immutability.

On setter chaining:

  * It's two ways to do the same thing (chaining or not); generally an
API (and a PL) should offer one (obvious) way to do things.
Suddenly we'll see tutorials and articles etc. online, some with
chaining, some without, and some mixed.

  * Code is less readable w/ chaining: it makes it easy to sneak in
multiple statements per line, embed them into other statements,
etc., vs unchained where you always have one statement per line

  * I don't like .indexed() as a name; I prefer .setIndexed() so it's
clear you setting something about the object.

  * In encourages inefficient code, because it's easy to inline new
X().this().that() when in fact the app really should create 
reuse FieldType up front.  This is trappy -- the app doesn't
realize they're creating N+1 objects.

I also don't like the hard immutability (every field is final so every
setter returns a new object) since this will mean the typical use is
creating tons of objects per field per doc.  Yes we can have a mutable
builder with a .build() in the end but that's making the API even more
cumbersome.

In contrast, the soft immutability we have now (freeze) is very
effective, and creates no additional objects: it will prevent you from
altering a FT instance once any Field uses it.  Really the
immutability is a minor detail of the implementation here; we only
need it to prevent this trap.

Generally we should try to keep Lucene's core APIs as
plain/simple/straightforward as possible.  Someone can always later
layer on a builder API on top of the simpler setter+freeze or
all-properties-to-ctor API, but, not vice/versa (efficiently anyway).


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold

[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094502#comment-13094502
 ] 

Chris Male commented on LUCENE-2308:


Didn't Simon suggest we add the big-ctor version to core?

{quote}
why don't we put plain old immutable java objects with a single ctor into core 
and add a builder API into modules / sandbox? 
{quote}

So yes, Lucene's core can stay lean and mean, but we can have the builder is 
userland / module / sandbox

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094505#comment-13094505
 ] 

Jan Høydahl commented on LUCENE-3130:
-

Robert, two fields work great for supporting stuff like phonetic and 
stem/non-stem search, and also lower/exact-case search although index size 
could be lower with a one-field approach. Let's those use cases rest for now.

But for the synonym case, what remains is to modify the QueryParser to act on 
the already-present TypeAttribute, is that so? If so, let's open another issue 
for that.


 Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
 give lower boosts
 ---

 Key: LUCENE-3130
 URL: https://issues.apache.org/jira/browse/LUCENE-3130
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man

 A recent thread asked if there was anyway to use QueryTime synonyms such that 
 matches on the original term specified by the user would score higher then 
 matches on the synonym.  It occurred to me later that a float Attribute could 
 be set by the SynonymFilter in such situations, and QueryParser could use 
 that float as a boost in the resulting Query.  IThis would be fairly 
 straightforward for the simple synonyms = BooleamQuery case, but we'd have 
 to decide how to handle the case of synonyms with multiple terms that produce 
 MTPQ, possibly just punt for now)
 Likewise, there may be other TokenFilters that inject artificial tokens at 
 query time where it also might make sense to have a reduced boost factor...
 * SynonymFilter
 * CommonGramsFilter
 * WordDelimiterFilter
 * etc...
 In all of these cases, the amount of the boost could me configured, and for 
 back compact could default to 1.0 (or null to not set a boost at all)
 Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
 the boost attribute into the payload attribute, these same filters could give 
 penalizing payloads to terms when used at index time) could give 
 penalizing payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094502#comment-13094502
 ] 

Chris Male edited comment on LUCENE-2308 at 8/31/11 1:11 PM:
-

Didn't Simon suggest we add the big-ctor version to core?

{quote}
why don't we put plain old immutable java objects with a single ctor into core 
and add a builder API into modules / sandbox? 
{quote}

So yes, Lucene's core can stay lean and mean, but we can have the builder in 
userland / module / sandbox

  was (Author: cmale):
Didn't Simon suggest we add the big-ctor version to core?

{quote}
why don't we put plain old immutable java objects with a single ctor into core 
and add a builder API into modules / sandbox? 
{quote}

So yes, Lucene's core can stay lean and mean, but we can have the builder is 
userland / module / sandbox
  
 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094514#comment-13094514
 ] 

Uwe Schindler commented on LUCENE-2308:
---

I am on opposite side:

In general the constructor of the immutable class is hidden (package-private or 
private depending on class hierarchy). So nobody can use it. The only API the 
user sees is the builder pattern. By that we only have *one* API and one usage 
type.

Builder patterns can be formatted very nice and it does not matter if people do:

{code:java}
Field.Builder b = new Field.Builder();
b.setFoo();
b.setBar();
Field f = b.build();
{code}

versus

{code:java}
Field f = new Field.Builder()
 .setFoo()
 .setBar()
 .build();
{code}

The last chaining one is even more readable, and that is why *I* prefer 
builders. A so called telescoping constructor is the antipattern because its 
completely unreadable, as Java lacks of named parameters [the best example is 
WordDelimiterFilter, that one is horrible - a typical candidate for 
WordDelimiterFilter.Builder subclass). The chaining code is for stack based 
machines like the JVM and the x86 processors also more natural than the first 
one. The return value of the previous call resides already on the stack after 
the method returns, but instead of popping it and pushing again, it can stay 
there and you simply add the parameters of the next method call. This leads to 
also very elegant bytecode, for which hotspot has optimizations :-)

About code duplication: You can in the hidden ctor of the immutable class make 
a clone of the builder and keep it somewhere private final inside the instance. 
This one then holds the unmodifiable instance state.

About number of objects (yes, we have the builder object and possibly a clone 
to it as suggested before and finally the immutable object): The number of 
objects is really nonsense here as all of those will be created in the Eden 
space and disappear as soon as the loop/method exits. You can try autoboxing 
with a recent JavaVM - you would in most cases see no slowdown caused by 
autoboxing. These are problems from pre-2000 when we had Java 1.1.

Uwe

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094517#comment-13094517
 ] 

Simon Willnauer commented on LUCENE-2308:
-

awesome writeup uwe! thank you!

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094522#comment-13094522
 ] 

Robert Muir commented on LUCENE-2308:
-

{quote}
In general the constructor of the immutable class is hidden (package-private or 
private depending on class hierarchy). So nobody can use it. The only API the 
user sees is the builder pattern.
{quote}

I am strongly against this: there is no reason to do this. We should instead 
expose the constructor of the immutable class so that people who want builders 
can use them, but i don't want builders, i shouldnt have to. there is no reason 
for this.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094526#comment-13094526
 ] 

Uwe Schindler commented on LUCENE-2308:
---

If we release the code with the builder pattern then there is only one 
possibility and one example code in the class description. If somebody does not 
like the builder pattern, who cares? If there is nothing else, you have to use 
it. PERIOD.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094526#comment-13094526
 ] 

Uwe Schindler edited comment on LUCENE-2308 at 8/31/11 1:41 PM:


If we release the code with the builder pattern then there is only one 
possibility and one example code in the class description. If somebody does not 
like the builder pattern, who cares? If there is nothing else, you have to use 
it. PERIOD.

bq. About code duplication: You can in the hidden ctor of the immutable class 
make a clone of the builder and keep it somewhere private final inside the 
instance. This one then holds the unmodifiable instance state.

I already explained: The code duplication comes from the two ways to do it.

Of course for lovers of telescopic unreadbale methods we can still add some 
conventional factory methods, taking tons of parameters, but internally use the 
builder. The user would not see the builder.

  was (Author: thetaphi):
If we release the code with the builder pattern then there is only one 
possibility and one example code in the class description. If somebody does not 
like the builder pattern, who cares? If there is nothing else, you have to use 
it. PERIOD.
  
 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094527#comment-13094527
 ] 

Robert Muir commented on LUCENE-2308:
-

I care, thats why i am -1 to the builder pattern. The pro-builders on this 
issue just silently argue that my concerns don't matter.

Mike gave his opinion on it too.

Stating that our concerns are meaningless is not the way to create consensus 
towards a good solution here.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094529#comment-13094529
 ] 

Chris Male commented on LUCENE-2308:


{quote}
The pro-builders on this issue just silently argue that my concerns don't 
matter.
{quote}

I resent that.  I've actively tried to understand your concerns and reach a 
compromise and consensus.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094531#comment-13094531
 ] 

Robert Muir commented on LUCENE-2308:
-

{quote}
So yes, Lucene's core can stay lean and mean, but we can have the builder in 
userland / module / sandbox
{quote}

Chris, personally I think this is a reasonable solution, but my arguments are 
instead against the other ridiculous statements on the issue implying that my 
concerns do not matter.

The original idea for using a simple java object was just this, so that people 
can do whatever they want (builders, whatever). But there is no reason to 
enforce any specific anti-pattern here, when we can just leave that to the 
application.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094532#comment-13094532
 ] 

Yonik Seeley commented on LUCENE-2308:
--

bq. I think we should have an oal.index.FieldType interface, that exposes 
(get-only) methods.

+1

I also don't see a lot of value in jumping through too many hoops trying to 
enforce immutability (vs just making it easy for people to avoid common 
mistakes).


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2454) Would like link in site navigation to the ManifoldCF project

2011-08-31 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-2454:
-

Assignee: Jan Høydahl

 Would like link in site navigation to the ManifoldCF project
 

 Key: SOLR-2454
 URL: https://issues.apache.org/jira/browse/SOLR-2454
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Karl Wright
Assignee: Jan Høydahl
Priority: Minor
 Attachments: SOLR-2454.patch


 The Solr/Lucene site points to lots of other Apache projects.  It would be 
 nice if it also pointed to ManifoldCF.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094535#comment-13094535
 ] 

Robert Muir commented on LUCENE-3130:
-

{quote}
But for the synonym case, what remains is to modify the QueryParser to act on 
the already-present TypeAttribute, is that so? If so, let's open another issue 
for that.
{quote}

I think so? Though it might be more useful not to modify the core queryparser 
for this? The reason is that such a feature is geared towards synonyms and 
multi-word synonyms don't work well with it... So maybe instead to a simpler 
queryparser that *does* work well with multi-word synonyms by default?

 Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
 give lower boosts
 ---

 Key: LUCENE-3130
 URL: https://issues.apache.org/jira/browse/LUCENE-3130
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man

 A recent thread asked if there was anyway to use QueryTime synonyms such that 
 matches on the original term specified by the user would score higher then 
 matches on the synonym.  It occurred to me later that a float Attribute could 
 be set by the SynonymFilter in such situations, and QueryParser could use 
 that float as a boost in the resulting Query.  IThis would be fairly 
 straightforward for the simple synonyms = BooleamQuery case, but we'd have 
 to decide how to handle the case of synonyms with multiple terms that produce 
 MTPQ, possibly just punt for now)
 Likewise, there may be other TokenFilters that inject artificial tokens at 
 query time where it also might make sense to have a reduced boost factor...
 * SynonymFilter
 * CommonGramsFilter
 * WordDelimiterFilter
 * etc...
 In all of these cases, the amount of the boost could me configured, and for 
 back compact could default to 1.0 (or null to not set a boost at all)
 Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
 the boost attribute into the payload attribute, these same filters could give 
 penalizing payloads to terms when used at index time) could give 
 penalizing payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094539#comment-13094539
 ] 

Chris Male commented on LUCENE-2308:


Alright, so can we move towards a consensus on a solution?

So far I see people are okay with:

- Moving FieldType over to an interface which exposes get only methods
- Creating the core implementation which uses a ctor with final fields
- Builder API can be created and placed in a yet to be determined place.

Sweet?

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094541#comment-13094541
 ] 

Chris Male commented on LUCENE-2308:


Err Yonik pointed out that we still have the option of continuing to use the 
freezable 'soft' immutability.  I didn't mean to ignore it.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2454) Would like link in site navigation to the ManifoldCF project

2011-08-31 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-2454.
---

Resolution: Fixed

Checked in on trunk, not yet deployed to live site

 Would like link in site navigation to the ManifoldCF project
 

 Key: SOLR-2454
 URL: https://issues.apache.org/jira/browse/SOLR-2454
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Karl Wright
Assignee: Jan Høydahl
Priority: Minor
 Attachments: SOLR-2454.patch


 The Solr/Lucene site points to lots of other Apache projects.  It would be 
 nice if it also pointed to ManifoldCF.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094543#comment-13094543
 ] 

Uwe Schindler commented on LUCENE-2308:
---

I disagree, because that would again create different usage patterns and more 
questions on the user.

If we only have one way to do it (I favour the builder pattern) with a code 
example (like NumericRangeQuery does in its javadocs) this is all obvious to 
users.

I think telescopic ctors/methods are an antipattern because of readability and 
I think also Robert will agree with me that e.g. WordDelimiterFilter is 
unuseable.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3406) Add source packaging targets that make a tarball from a local working copy

2011-08-31 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-3406:


Affects Version/s: 3.3
Fix Version/s: 3.4
  Summary: Add source packaging targets that make a tarball from a 
local working copy  (was: Add source distribution packaging targets that make a 
tarball from a local working copy)

 Add source packaging targets that make a tarball from a local working copy
 --

 Key: LUCENE-3406
 URL: https://issues.apache.org/jira/browse/LUCENE-3406
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Affects Versions: 3.3, 4.0
Reporter: Seung-Yeoul Yang
Assignee: Steven Rowe
Priority: Minor
  Labels: patch
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3406.patch, LUCENE-3406.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 I am adding back targets that were removed in 
 https://issues.apache.org/jira/browse/LUCENE-2973 that are used to create 
 source distribution packaging from a local working copy as new Ant targets.
 2 things to note about the patch:
 1) For package-local-src-tgz in solr/build.xml, I had to specify additional 
 directories under solr/ that have been added since LUCENE-2973.
 2) I couldn't get the package-tgz-local-src in lucene/build.xml to generate 
 the docs folder, which does get added by package-tgz-src. 
 The patch is against the trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094545#comment-13094545
 ] 

Jan Høydahl commented on LUCENE-3130:
-

The core of this issue is providing a mechanism for deboosting synonyms, and as 
long as it works with single-term synonyms that at least covers what most 
people use today. I propose we handle that first. Agree that it would be nice 
with a query-parser which can handle multi word synonyms. But that could be 
handled incrementally in a separate issue.

 Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
 give lower boosts
 ---

 Key: LUCENE-3130
 URL: https://issues.apache.org/jira/browse/LUCENE-3130
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man

 A recent thread asked if there was anyway to use QueryTime synonyms such that 
 matches on the original term specified by the user would score higher then 
 matches on the synonym.  It occurred to me later that a float Attribute could 
 be set by the SynonymFilter in such situations, and QueryParser could use 
 that float as a boost in the resulting Query.  IThis would be fairly 
 straightforward for the simple synonyms = BooleamQuery case, but we'd have 
 to decide how to handle the case of synonyms with multiple terms that produce 
 MTPQ, possibly just punt for now)
 Likewise, there may be other TokenFilters that inject artificial tokens at 
 query time where it also might make sense to have a reduced boost factor...
 * SynonymFilter
 * CommonGramsFilter
 * WordDelimiterFilter
 * etc...
 In all of these cases, the amount of the boost could me configured, and for 
 back compact could default to 1.0 (or null to not set a boost at all)
 Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
 the boost attribute into the payload attribute, these same filters could give 
 penalizing payloads to terms when used at index time) could give 
 penalizing payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094547#comment-13094547
 ] 

Robert Muir commented on LUCENE-2308:
-

But if fieldtype is an interface with get-only methods, then someone could make 
a Freezable implementation right?
Maybe the interface is good, because I dislike 'forcing' freezable too, just 
not as much as a dislike builder. 

so, i think the interface sounds good, and would still personally prefer if our 
'default' core implementation did not use freezable, and used the simpler ctor 
instead.

also I think we should be gearing the API so that most people can use the 
simpler fieldtypes (StringField/TextField etc) for 90% of lucene uses instead: 
I think we want using FieldType directly to be more expert usage (e.g. i should 
be able to do a typical body+title+metadata fields with these 
StringField/TextField etc and never deal with this stuff).

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2687) Add new Solr book 'Apache Solr 3.1 Cookbook' to selection of Solr books and news.

2011-08-31 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-2687:
-

Assignee: Jan Høydahl

 Add new Solr book 'Apache Solr 3.1 Cookbook' to selection of Solr books and 
 news.
 -

 Key: SOLR-2687
 URL: https://issues.apache.org/jira/browse/SOLR-2687
 Project: Solr
  Issue Type: Task
Reporter: Julian Copes
Assignee: Jan Høydahl
 Attachments: solr-2687.patch, solr_31_cookbook.jpg


 Find below the news of the new Solr book. I can provide an image when 
 prompted. Below is a news item and I've included the URL for the new book. 
 The text is as follows:
 Rafał Kuć is proud to introduce a new book on Solr, Apache Solr 3.1 
 Cookbook from Packt Publishing.
 The Solr 3.1 Cookbook will make your everyday work easier by using real-life 
 examples that show you how to deal with the most common problems that can 
 arise while using the Apache Solr search engine. 
 This cookbook will show you how to get the most out of your search engine. 
 Each chapter covers a different aspect of working with Solr from analyzing 
 your text data through querying, performance improvement, and developing your 
 own modules. The practical recipes will help you to quickly solve common 
 problems with data analysis, show you how to use faceting to collect data and 
 to speed up the performance of Solr. You will learn about functionalities 
 that most newbies are unaware of, such as sorting results by a function 
 value, highlighting matched words, and computing statistics to make your work 
 with Solr easy and stress free.
 Click here to read more about the Apache Solr 3.1 Cookbook. 
 (http://www.packtpub.com/solr-3-1-enterprise-search-server-cookbook/book)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094548#comment-13094548
 ] 

Chris Male commented on LUCENE-2308:


I don't see anything wrong with providing options.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3406) Add source packaging targets that make a tarball from a local working copy

2011-08-31 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-3406:


Attachment: LUCENE-3406.patch

This version of the patch makes a couple of small changes to the Solr exclude 
pattern (adding {{\*\*/pom.xml}} and excluding {{\*\*/\*.iws}} and 
{{\*\*/\*.ipr}}; these two IntelliJ config files are not used by the setup 
provided by {{ant idea}}), and adds {{CHANGES.txt}} entries for Solr and Lucene.

I will commit shortly to trunk, then backport to branch_3x.

 Add source packaging targets that make a tarball from a local working copy
 --

 Key: LUCENE-3406
 URL: https://issues.apache.org/jira/browse/LUCENE-3406
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Affects Versions: 3.3, 4.0
Reporter: Seung-Yeoul Yang
Assignee: Steven Rowe
Priority: Minor
  Labels: patch
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3406.patch, LUCENE-3406.patch, LUCENE-3406.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 I am adding back targets that were removed in 
 https://issues.apache.org/jira/browse/LUCENE-2973 that are used to create 
 source distribution packaging from a local working copy as new Ant targets.
 2 things to note about the patch:
 1) For package-local-src-tgz in solr/build.xml, I had to specify additional 
 directories under solr/ that have been added since LUCENE-2973.
 2) I couldn't get the package-tgz-local-src in lucene/build.xml to generate 
 the docs folder, which does get added by package-tgz-src. 
 The patch is against the trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094551#comment-13094551
 ] 

Chris Male commented on LUCENE-2308:


Okay so we seem to have consensus on moving to a get-only interface.

The question just remains how to implement the 'default' core implementation.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094554#comment-13094554
 ] 

Robert Muir commented on LUCENE-3130:
-

Jan, not sure that most people only use single-term synonyms... if this is the 
case maybe we should rethink our synonyms implementation because multi-word 
adds a ton of complexity!

Another reason I suggested avoiding adding this to the core queryparser is 
because its going to be challenging to allow this optional boosting in a 
flexible way (just look at the getFieldQuery... its very hairy). I think in the 
ideal case, we somehow restructure all this code so that subclasses have more 
control over how the query is created... however I think this might be 
challenging just given how the code is structured now.

The reason I think it would be best exposed as a 'hook' to subclasses (versus 
adding a deboost synonyms option directly to the core QP), is that I think 
people are going to want to customize how this works, e.g. control it per-field 
and things like that.

At the end of the day, a queryparser could always subclass getFieldQuery 
completely and do this now, but thats not great either because the code is so 
hairy :(

This kind of feature might be easier to implement with the new queryparser in 
contrib, but I'm not sure.

 Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
 give lower boosts
 ---

 Key: LUCENE-3130
 URL: https://issues.apache.org/jira/browse/LUCENE-3130
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man

 A recent thread asked if there was anyway to use QueryTime synonyms such that 
 matches on the original term specified by the user would score higher then 
 matches on the synonym.  It occurred to me later that a float Attribute could 
 be set by the SynonymFilter in such situations, and QueryParser could use 
 that float as a boost in the resulting Query.  IThis would be fairly 
 straightforward for the simple synonyms = BooleamQuery case, but we'd have 
 to decide how to handle the case of synonyms with multiple terms that produce 
 MTPQ, possibly just punt for now)
 Likewise, there may be other TokenFilters that inject artificial tokens at 
 query time where it also might make sense to have a reduced boost factor...
 * SynonymFilter
 * CommonGramsFilter
 * WordDelimiterFilter
 * etc...
 In all of these cases, the amount of the boost could me configured, and for 
 back compact could default to 1.0 (or null to not set a boost at all)
 Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
 the boost attribute into the payload attribute, these same filters could give 
 penalizing payloads to terms when used at index time) could give 
 penalizing payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094557#comment-13094557
 ] 

Uwe Schindler commented on LUCENE-2308:
---

Freezeable is an antipattern and produces messy code on the implementation 
side, just because someone still stays in the 1990s when Java was not able to 
handle lots of small short-living objects. That's since almost the beginning of 
this century no issue anymore.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2726) NullPointerException when using spellcheck.q

2011-08-31 Thread Bernd Fehling (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Fehling updated SOLR-2726:


Attachment: SOLR-2726.patch

According to SOLR-572 the default analyzer should be WhitespaceAnalyzer. So I 
added the WhitespaceAnalyzer to init of Suggester.java.
Now spellcheck.q works without NPE.

Tip:
To get suggestions with multiple words like New Y for New York and also for 
New Year you can use queryAnalyzerFieldType with an analyzer having a 
PatternReplaceFilterFactory for e.g. _ (underscore).
If you now lookup up suggestions with New_Y you will get suggestions for New 
York, New Year, ...


 NullPointerException when using spellcheck.q
 

 Key: SOLR-2726
 URL: https://issues.apache.org/jira/browse/SOLR-2726
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.3, 4.0
 Environment: ubuntu
Reporter: valentin
  Labels: nullpointerexception, spellcheck
 Attachments: SOLR-2726.patch


 When I use spellcheck.q in my query to define what will be spellchecked, I 
 always have this error, for every configuration I try :
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476)
 at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:202)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 All my other functions works great, this is the only thing which doesn't work 
 at all, just when i add spellcheck.q=my%20sentence in the query...
 Example of a query : 
 http://localhost:8983/solr/db/suggest_full?q=american%20israelspellcheck.q=american%20israel
 In solrconfig.xml :
 searchComponent name=suggest_full class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypesuggestTextFull/str
lst name=spellchecker
 str name=namesuggest_full/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=fieldtext_suggest_full/str
 str name=fieldTypesuggestTextFull/str
/lst
 /searchComponent
 requestHandler name=/suggest_full 
 class=org.apache.solr.handler.component.SearchHandler
   lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest_full/str
str name=spellcheck.count10/str
str name=spellcheck.onlyMorePopulartrue/str
   /lst
   arr name=components
strsuggest_full/str
   /arr
 /requestHandler
 I'm using SolR 3.3, and I tried it too on SolR 4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2726) NullPointerException when using spellcheck.q

2011-08-31 Thread Bernd Fehling (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094558#comment-13094558
 ] 

Bernd Fehling edited comment on SOLR-2726 at 8/31/11 2:18 PM:
--

According to SOLR-572 the default analyzer should be WhitespaceAnalyzer. 
With this SOLR-2726.patch I added the WhitespaceAnalyzer to init of 
Suggester.java.
Now spellcheck.q works without NPE.

Tip:
To get suggestions with multiple words like New Y for New York and also for 
New Year you can use queryAnalyzerFieldType with an analyzer having a 
PatternReplaceFilterFactory for e.g. _ (underscore).
If you now lookup up suggestions with New_Y you will get suggestions for New 
York, New Year, ...


  was (Author: befehl):
According to SOLR-572 the default analyzer should be WhitespaceAnalyzer. So 
I added the WhitespaceAnalyzer to init of Suggester.java.
Now spellcheck.q works without NPE.

Tip:
To get suggestions with multiple words like New Y for New York and also for 
New Year you can use queryAnalyzerFieldType with an analyzer having a 
PatternReplaceFilterFactory for e.g. _ (underscore).
If you now lookup up suggestions with New_Y you will get suggestions for New 
York, New Year, ...

  
 NullPointerException when using spellcheck.q
 

 Key: SOLR-2726
 URL: https://issues.apache.org/jira/browse/SOLR-2726
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.3, 4.0
 Environment: ubuntu
Reporter: valentin
  Labels: nullpointerexception, spellcheck
 Attachments: SOLR-2726.patch


 When I use spellcheck.q in my query to define what will be spellchecked, I 
 always have this error, for every configuration I try :
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476)
 at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:202)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 All my other functions works great, this is the only thing which doesn't work 
 at all, just when i add spellcheck.q=my%20sentence in the query...
 Example of a query : 
 http://localhost:8983/solr/db/suggest_full?q=american%20israelspellcheck.q=american%20israel
 In solrconfig.xml :
 searchComponent name=suggest_full class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypesuggestTextFull/str
lst name=spellchecker
 str name=namesuggest_full/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=fieldtext_suggest_full/str
 str name=fieldTypesuggestTextFull/str
/lst
 /searchComponent
 requestHandler name=/suggest_full 
 class=org.apache.solr.handler.component.SearchHandler
   lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest_full/str
str name=spellcheck.count10/str
str name=spellcheck.onlyMorePopulartrue/str
   /lst
   arr name=components

[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094564#comment-13094564
 ] 

Uwe Schindler commented on LUCENE-2308:
---

In my opinion, we should vote for the following solutions:

[1] Old-style telescopic ctors on a immutable FieldType
[2] FieldType.Builder pattern with hidden FieldType Ctor and optionally static 
FieldType factory methods that produce commonly used types/maybe even 
telescopic (thise factories use builder internally / have a set of 
preconfigured builders available). The private ctor tkaes the Builder instance 
and clones it to keep state final (like IndexWriter)
[3] Modifiable FieldType with a freeze() method and iffecient code because of 
stupid checks on every method - this is somehow a builder, the difference is 
only that the builder and final instance are same class.
[4] Readonly interface with all three implementations

Here my +1 for a easy to use Builder-only [2] implementation and nothing else. 
This has no additional object creation except the builder and an internal 
clone, but those are shortliving.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094567#comment-13094567
 ] 

Robert Muir commented on LUCENE-2308:
-

{quote}
Okay so we seem to have consensus on moving to a get-only interface.
{quote}

I'm not sure: we should see what Uwe thinks. It seems he might be against the 
idea that there are multiple ways to do this (I think its a valid concern, i 
just disagree with him though). I think the ideal situation is where 
StringField/TextField cover the majority of use cases and doing anything with 
FT is expert, e.g. intended for apps like Solr to implement the interface and 
probably not even use our 'default' FieldType implementation. I think the 
default impl is just for someone that wants something thats not out-of-box, 
e.g. tokenized TextField that omits TF.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094572#comment-13094572
 ] 

Robert Muir commented on LUCENE-2308:
-

well, looking at them, thats what they are now already? or am i totally 
confused?

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094570#comment-13094570
 ] 

Chris Male commented on LUCENE-2308:


{quote}
I think the ideal situation is where StringField/TextField cover the majority 
of use cases and doing anything with FT is expert, e.g. intended for apps like 
Solr to implement the interface and probably not even use our 'default' 
FieldType implementation.
{quote}

Separate to the implementation issue, I don't think I've fully grasped what you 
want StringField/TextField to be? Do you seem them as having pre-defined 
FieldType setups?

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094575#comment-13094575
 ] 

Chris Male commented on LUCENE-2308:


Ah sorry, when I last looked at them some had constructors which accepted 
FieldTypes.  Now those have been removed so yes, thats what they are now.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3406) Add source packaging targets that make a tarball from a local working copy

2011-08-31 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved LUCENE-3406.
-

Resolution: Fixed

Committed to trunk and branch_3x.

Thanks Seung-Yeoul!

 Add source packaging targets that make a tarball from a local working copy
 --

 Key: LUCENE-3406
 URL: https://issues.apache.org/jira/browse/LUCENE-3406
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Affects Versions: 3.3, 4.0
Reporter: Seung-Yeoul Yang
Assignee: Steven Rowe
Priority: Minor
  Labels: patch
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3406.patch, LUCENE-3406.patch, LUCENE-3406.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 I am adding back targets that were removed in 
 https://issues.apache.org/jira/browse/LUCENE-2973 that are used to create 
 source distribution packaging from a local working copy as new Ant targets.
 2 things to note about the patch:
 1) For package-local-src-tgz in solr/build.xml, I had to specify additional 
 directories under solr/ that have been added since LUCENE-2973.
 2) I couldn't get the package-tgz-local-src in lucene/build.xml to generate 
 the docs folder, which does get added by package-tgz-src. 
 The patch is against the trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: new AutomatonQuery(RunAutomaton) ?

2011-08-31 Thread eks dev
Thanks Robert, this is what I expected after looking into CompiledAutomaton ..

On Wed, Aug 31, 2011 at 2:00 PM, Robert Muir rcm...@gmail.com wrote:
 On Wed, Aug 31, 2011 at 3:51 AM, eks dev eks...@yahoo.co.uk wrote:
 At the moment it is not possible (?) to construct AutomatonQuery with
 RunAutomaton.
 Would it make sense to add this possibility? Is it doable at all?

 Its not doable, we need more information than the runautomaton, its not 
 enough.

 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: new AutomatonQuery(RunAutomaton) ?

2011-08-31 Thread Robert Muir
Can you provide more information about your automaton and why
'recompiling' it might be expensive?

E.g. #states/#transitions, is it finite or infinite, etc.

On Wed, Aug 31, 2011 at 10:56 AM, eks dev eks...@yahoo.co.uk wrote:
 Thanks Robert, this is what I expected after looking into CompiledAutomaton ..

 On Wed, Aug 31, 2011 at 2:00 PM, Robert Muir rcm...@gmail.com wrote:
 On Wed, Aug 31, 2011 at 3:51 AM, eks dev eks...@yahoo.co.uk wrote:
 At the moment it is not possible (?) to construct AutomatonQuery with
 RunAutomaton.
 Would it make sense to add this possibility? Is it doable at all?

 Its not doable, we need more information than the runautomaton, its not 
 enough.

 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094606#comment-13094606
 ] 

Michael McCandless commented on LUCENE-2308:


bq. Builder patterns can be formatted very nice:

{noformat}
Field f = new Field.Builder()
 .setFoo()
 .setBar()
 .build();
{noformat}

This is nice in theory but in practice I often see massive compound
hard-to-read lines like this:

{noformat}
  IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig( 
TEST_VERSION_CURRENT, new 
MockAnalyzer(random)).setMaxBufferedDocs(2).setMergePolicy(newLogMergePolicy()));
{noformat}

I don't like that the chained setters make such code possible: it's
unreadable.


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094615#comment-13094615
 ] 

Uwe Schindler commented on LUCENE-2308:
---

This is still more readable than 

{code:java}
new IndexWriter(dir, TEST_VERSION_CURRENT, new MockAnalyzer(random), 2, 
newLogMergePolicy());
{code}

What does 2 mean? Yes its more verbose, but withy any recent UI, the syntax 
highlighting makes even a one-line chain easy readable.

Here some quotes from Jushua Bloch (who is also the founder of Java) about 
his pattern: [http://www.goodreads.com/author/quotes/60805.Joshua_Bloch]

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094615#comment-13094615
 ] 

Uwe Schindler edited comment on LUCENE-2308 at 8/31/11 3:21 PM:


This is still more readable than 

{code:java}
new IndexWriter(dir, TEST_VERSION_CURRENT, new MockAnalyzer(random), 2, 
newLogMergePolicy());
{code}

What does 2 mean? Yes its more verbose, but withy any recent UI, the syntax 
highlighting makes even a one-line chain easy readable.

Here some quotes from Jushua Bloch (who is also the founder of Java Collections 
framework) about his pattern: 
[http://www.goodreads.com/author/quotes/60805.Joshua_Bloch]

  was (Author: thetaphi):
This is still more readable than 

{code:java}
new IndexWriter(dir, TEST_VERSION_CURRENT, new MockAnalyzer(random), 2, 
newLogMergePolicy());
{code}

What does 2 mean? Yes its more verbose, but withy any recent UI, the syntax 
highlighting makes even a one-line chain easy readable.

Here some quotes from Jushua Bloch (who is also the founder of Java) about 
his pattern: [http://www.goodreads.com/author/quotes/60805.Joshua_Bloch]
  
 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094618#comment-13094618
 ] 

Uwe Schindler commented on LUCENE-2308:
---

bq. I don't like that the chained setters make such code possible: it's 
unreadable.

Even as one-line its much better readable than anything else. Did you try to 
create a WordDelimiterFilter using it 15 argument ctor? Two minutes later you 
dont know anymore whet the 3rd boolean is about.

The chained calls can be read left to right and you can do that very fast. The 
syntax shown above is just extra sugar, but the one line variant is perfectly 
readable. OK, not for people still using two whitespaces after the 
end-of-sentence-period ([http://en.wikipedia.org/wiki/Sentence_spacing]) :-)

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094629#comment-13094629
 ] 

Robert Muir commented on LUCENE-2308:
-

{quote}
Uwe, lets open an issue to look at improving WordDelimiterFilter, yeah? I've 
seen that floating around in tests: new WordDelimiter(1, 1, 0, 1, 1, 0, 0..), 
yeah its tough to read. 
{quote}

I agree we should open an issue to improve WDF, these int parameters are 
actually all boolean flags and we could just pass 'int flags' instead.

this way you could do new WordDelimiterFilter(GENERATE_WORD_PARTS | 
CATENATE_ALL | SPLIT_ON_CASE_CHANGE), much more readable.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094632#comment-13094632
 ] 

Michael McCandless commented on LUCENE-2308:


bq. Even as one-line its much better readable than anything else. Did you try 
to create a WordDelimiterFilter using it 15 argument ctor? Two minutes later 
you dont know anymore whet the 3rd boolean is about.

In fact this is what I like about .freeze(): you invoke simple
setters, one per line (usually), one object.

The only reason we want immutability here is to prevent the trap of
changing the FT after binding it to a Field.  And freeze accomplishes
this well.

I agree massive single ctor isn't great; but maybe w/ EnumSet or int
flags for the boolean properties it's OK.  Or maybe we go back to
Field.Index.X, Field.Store.Y, etc.  Or stick with .freeze.

Builder API can still be built out (eg in contrib or modules or google
code or somewhere) on top; I just don't think it should be in Lucene's core.
In general Lucene's core should keep things as straightforward as
possible.


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-08-31 Thread Chris Male (JIRA)
Make WordDelimiterFilter's instantiation more readable
--

 Key: LUCENE-3410
 URL: https://issues.apache.org/jira/browse/LUCENE-3410
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Chris Male
Priority: Minor


Currently WordDelimiterFilter's constructor is:

{code}
public WordDelimiterFilter(TokenStream in,
 byte[] charTypeTable,
 int generateWordParts,
 int generateNumberParts,
 int catenateWords,
 int catenateNumbers,
 int catenateAll,
 int splitOnCaseChange,
 int preserveOriginal,
 int splitOnNumerics,
 int stemEnglishPossessive,
 CharArraySet protWords) {
{code}

which means its instantiation is an unreadable combination of 1s and 0s.  

We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094638#comment-13094638
 ] 

Chris Male commented on LUCENE-2308:


LUCENE-3410 for WDF improvements.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094641#comment-13094641
 ] 

Robert Muir commented on LUCENE-3410:
-

I think flags is a good solution here, its very simple and will improve 
readability: the backwards compat is obvious too.

I think its a bit scary to use enumset, it will involve complicated generics 
and the jdk itself does not seem to use enumset anywhere!
e.g. Pattern.compile(String regex, int flags)

I think a builder is overkill here, if someone wants a builder they can always 
make a builder on top of flags for their own use.


 Make WordDelimiterFilter's instantiation more readable
 --

 Key: LUCENE-3410
 URL: https://issues.apache.org/jira/browse/LUCENE-3410
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Chris Male
Priority: Minor

 Currently WordDelimiterFilter's constructor is:
 {code}
 public WordDelimiterFilter(TokenStream in,
byte[] charTypeTable,
int generateWordParts,
int generateNumberParts,
int catenateWords,
int catenateNumbers,
int catenateAll,
int splitOnCaseChange,
int preserveOriginal,
int splitOnNumerics,
int stemEnglishPossessive,
CharArraySet protWords) {
 {code}
 which means its instantiation is an unreadable combination of 1s and 0s.  
 We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-08-31 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094642#comment-13094642
 ] 

Yonik Seeley commented on LUCENE-3410:
--

For historical context, the reason I used an int for stuff like 
generateWordParts was that I had the idea of using it as a minimum (i.e. only 
generate word parts that are over a certain size, etc).  This obviously never 
happened though ;-)

 Make WordDelimiterFilter's instantiation more readable
 --

 Key: LUCENE-3410
 URL: https://issues.apache.org/jira/browse/LUCENE-3410
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Chris Male
Priority: Minor

 Currently WordDelimiterFilter's constructor is:
 {code}
 public WordDelimiterFilter(TokenStream in,
byte[] charTypeTable,
int generateWordParts,
int generateNumberParts,
int catenateWords,
int catenateNumbers,
int catenateAll,
int splitOnCaseChange,
int preserveOriginal,
int splitOnNumerics,
int stemEnglishPossessive,
CharArraySet protWords) {
 {code}
 which means its instantiation is an unreadable combination of 1s and 0s.  
 We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2650) Empty docs array on response with grouping and result pagination

2011-08-31 Thread Des Lownds (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094643#comment-13094643
 ] 

Des Lownds commented on SOLR-2650:
--

I was able to duplicate this problem, and was also seeing the following stack 
trace in some circumstances:

{code}
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 35
at org.apache.solr.search.DocSlice$1.nextDoc(DocSlice.java:117)
at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:543)
at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)
at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)
at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
at org.apache.solr.response.XMLWriter.writeNamedList(XMLWriter.java:620)
at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:593)
at org.apache.solr.response.XMLWriter.writeNamedList(XMLWriter.java:620)
at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:593)
at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
at 
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:589)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:291)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
{code}



 Empty docs array on response with grouping and result pagination
 

 Key: SOLR-2650
 URL: https://issues.apache.org/jira/browse/SOLR-2650
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.3
Reporter: Massimo Schiavon

 Requesting a certain number of rows and setting start parameter to a greater 
 value returns 0 results with grouping enabled.
 For example, requesting:
 http://localhost:8080/solr/web/select/?q=*:*rows=1start=2
 (grouping and highlighting are enabled by default)
 I get this response:
 [...]
   response: {
   numFound: 117852
   start: 2
   docs: [ ]
   }
   highlighting: {
 0938630598: {
   title: [ ... ]
   content: [ ... ]
 }
   }
 [...]
 docs array is empty while the highlighted values of the document are present
 Debugging the request in
 org.apache.solr.search.Grouping.Command.createSimpleResponse() at row 534
 [...]
  int len = Math.min(numGroups, docsGathered);
   if (offset  len) {
 len = 0;
   }
 [...]
 The initial vars values are:
 numGroups = 1
 docsGathered = 3
 offset = 2
 so after the execution len = 0
 I've tried commenting the if statement and this resolves the issue but could 
 introduce some other bugs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2650) Empty docs array on response with grouping and result pagination

2011-08-31 Thread Des Lownds (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094647#comment-13094647
 ] 

Des Lownds commented on SOLR-2650:
--

Seems that using group.format=simple results in the ArrayIndexOutOfBounds 
exception, while using standard format returns wrong results(no results.)



 Empty docs array on response with grouping and result pagination
 

 Key: SOLR-2650
 URL: https://issues.apache.org/jira/browse/SOLR-2650
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.3
Reporter: Massimo Schiavon

 Requesting a certain number of rows and setting start parameter to a greater 
 value returns 0 results with grouping enabled.
 For example, requesting:
 http://localhost:8080/solr/web/select/?q=*:*rows=1start=2
 (grouping and highlighting are enabled by default)
 I get this response:
 [...]
   response: {
   numFound: 117852
   start: 2
   docs: [ ]
   }
   highlighting: {
 0938630598: {
   title: [ ... ]
   content: [ ... ]
 }
   }
 [...]
 docs array is empty while the highlighted values of the document are present
 Debugging the request in
 org.apache.solr.search.Grouping.Command.createSimpleResponse() at row 534
 [...]
  int len = Math.min(numGroups, docsGathered);
   if (offset  len) {
 len = 0;
   }
 [...]
 The initial vars values are:
 numGroups = 1
 docsGathered = 3
 offset = 2
 so after the execution len = 0
 I've tried commenting the if statement and this resolves the issue but could 
 introduce some other bugs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2650) Empty docs array on response with grouping and result pagination

2011-08-31 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094651#comment-13094651
 ] 

Yonik Seeley commented on SOLR-2650:


There are a few grouping+paging bugs fixed in 3x (which will be 3.4 when 
released).  Can you try a recent 3x nightly build and see if any of the 
problems remain?

 Empty docs array on response with grouping and result pagination
 

 Key: SOLR-2650
 URL: https://issues.apache.org/jira/browse/SOLR-2650
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.3
Reporter: Massimo Schiavon

 Requesting a certain number of rows and setting start parameter to a greater 
 value returns 0 results with grouping enabled.
 For example, requesting:
 http://localhost:8080/solr/web/select/?q=*:*rows=1start=2
 (grouping and highlighting are enabled by default)
 I get this response:
 [...]
   response: {
   numFound: 117852
   start: 2
   docs: [ ]
   }
   highlighting: {
 0938630598: {
   title: [ ... ]
   content: [ ... ]
 }
   }
 [...]
 docs array is empty while the highlighted values of the document are present
 Debugging the request in
 org.apache.solr.search.Grouping.Command.createSimpleResponse() at row 534
 [...]
  int len = Math.min(numGroups, docsGathered);
   if (offset  len) {
 len = 0;
   }
 [...]
 The initial vars values are:
 numGroups = 1
 docsGathered = 3
 offset = 2
 so after the execution len = 0
 I've tried commenting the if statement and this resolves the issue but could 
 introduce some other bugs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3410) Make WordDelimiterFilter's instantiation more readable

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094653#comment-13094653
 ] 

Uwe Schindler commented on LUCENE-3410:
---

OK, if those integers are always used only as boolean flags, I would prefer a 
single (int flags) parameter. No builder pattern needed. I would maybe prefer a 
long to make it extensibler (but 31 flags should be enough, too).

 Make WordDelimiterFilter's instantiation more readable
 --

 Key: LUCENE-3410
 URL: https://issues.apache.org/jira/browse/LUCENE-3410
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Chris Male
Priority: Minor

 Currently WordDelimiterFilter's constructor is:
 {code}
 public WordDelimiterFilter(TokenStream in,
byte[] charTypeTable,
int generateWordParts,
int generateNumberParts,
int catenateWords,
int catenateNumbers,
int catenateAll,
int splitOnCaseChange,
int preserveOriginal,
int splitOnNumerics,
int stemEnglishPossessive,
CharArraySet protWords) {
 {code}
 which means its instantiation is an unreadable combination of 1s and 0s.  
 We should improve this by either using a Builder, 'int flags' or an EnumSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: more granular updateRequestProcessorChain

2011-08-31 Thread Yonik Seeley
On Wed, Aug 31, 2011 at 7:52 AM, Jan Høydahl jan@cominvent.com wrote:
 Hi,
 Can you explain the wanted functional result of your copy operation? I've
 done copying fields in processors without trouble.
 What do you want to do with the Lucene Document?

Indeed - I've started going in the opposite direction and removed the
lucene Document from the AddUpdateCommand altogether (see SOLR-2700).

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094656#comment-13094656
 ] 

Uwe Schindler commented on LUCENE-2308:
---

bq. The only reason we want immutability here is to prevent the trap of 
changing the FT after binding it to a Field. And freeze accomplishes this well.

Where is the difference to builder?

You can also call builders one per line if you like it. I like builders 
especially for their readability: You can read the line and break it at any 
place just like a sentence. This is why the method names should look like 
sentence components and not setXXX() like (ideally).

Freeze is an antipattern as you use one object for changing and then for 
freezing, leading to if-checks everywhere. If you make return freeze() an new 
immutable object, it is builder. Just without the possibility to chain.

I dislike freeze, but if you want to do this, please add chaining, it costs you 
nothing as implementor.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094662#comment-13094662
 ] 

Yonik Seeley commented on LUCENE-2308:
--

Since it seems like there is no agreement on enforcing immutability, perhaps we 
shouldn't.
We don't do it in a lot of other places, for example all of our query classes 
(and I don't think we should start).

Rethinking the interface a bit... even that seems like a little overkill (and 
perhaps just a by-product of no one agreeing on the concrete implementation?)  
After all, if this is to just be a holder for parameters (like indexed, stored, 
etc) then allowing one to subclass doesn't add any power or even make much 
sense (they aren't going to change the behavior of anything, right?)  The 
other normal use cases for interfaces wouldn't seem to apply to this situation 
either.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094664#comment-13094664
 ] 

Robert Muir commented on LUCENE-2308:
-

{quote}
I agree massive single ctor isn't great; but maybe w/ EnumSet or int
flags for the boolean properties it's OK. 
{quote}

Maybe FieldType should really just be an 'int' (e.g. we dont have a class or 
anything) ?

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-31 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094665#comment-13094665
 ] 

Uwe Schindler commented on LUCENE-2308:
---

bq. However whether or not people agree with you on Builders and chain calls, 
at this stage there just isn't the support to make Builders mandatory. Yes we 
should create one and I'll look to you for help on that. But as a first step 
forward lets move FieldType over to being a get-only interface. That will leave 
the freezable API in there and we can then consider the next step forward. But 
again, I don't really see consensus on the Builder-only approach. Rather I see 
a lot of support for having a single ctor implementation and a builder using 
that.

I would like to have an on-list vote, please. Thanks.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: new AutomatonQuery(RunAutomaton) ?

2011-08-31 Thread eks dev
I do not think it will be expensive, it is just an attempt to keep
code smaller, simpler and marginally faster :)

those are a lot (Ca 1000) of small prefix based regex-es with limited
alphabet compiled as RunAutomaton I load on startup and lookup from
some RunAutomaton[] on request...

they look like Regex(((123)|(124)|(401)|(777)|(351))[0-9]{0,2})

By the way, what will AutomatonQuery prefer (XXX)[0-9]{0,2} or
(XXX)[0-9]* or (XXX).* ? Any performance difference?

Semantically are they the same as I know that my content is only 5 digits

I need them to
1. formulate complex BooleanQuery, where AutomatonQuery gets one clause
2. do post processing (a lot of hits) of the query against hits and
this has to be fast.

I guess, I will switch to keeping only Automaton[] and build
RunAutomaton on the fly (per request) for fast query vs hits, this is
done once per request only, but them I need to keep state of the
RunAutomaton per query... makes things slightly more verbose...








On Wed, Aug 31, 2011 at 5:06 PM, Robert Muir rcm...@gmail.com wrote:
 Can you provide more information about your automaton and why
 'recompiling' it might be expensive?

 E.g. #states/#transitions, is it finite or infinite, etc.

 On Wed, Aug 31, 2011 at 10:56 AM, eks dev eks...@yahoo.co.uk wrote:
 Thanks Robert, this is what I expected after looking into CompiledAutomaton 
 ..

 On Wed, Aug 31, 2011 at 2:00 PM, Robert Muir rcm...@gmail.com wrote:
 On Wed, Aug 31, 2011 at 3:51 AM, eks dev eks...@yahoo.co.uk wrote:
 At the moment it is not possible (?) to construct AutomatonQuery with
 RunAutomaton.
 Would it make sense to add this possibility? Is it doable at all?

 Its not doable, we need more information than the runautomaton, its not 
 enough.

 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: new AutomatonQuery(RunAutomaton) ?

2011-08-31 Thread Robert Muir
On Wed, Aug 31, 2011 at 1:30 PM, eks dev eks...@googlemail.com wrote:
 I do not think it will be expensive, it is just an attempt to keep
 code smaller, simpler and marginally faster :)

I think you will find the compile is pretty fast, this only happens
once per query too (its not per-segment or anything)... see below

 those are a lot (Ca 1000) of small prefix based regex-es with limited
 alphabet compiled as RunAutomaton I load on startup and lookup from
 some RunAutomaton[] on request...

 they look like Regex(((123)|(124)|(401)|(777)|(351))[0-9]{0,2})

 By the way, what will AutomatonQuery prefer (XXX)[0-9]{0,2} or
 (XXX)[0-9]* or (XXX).* ? Any performance difference?

Well, you would have to benchmark, and it definitely depends on your content.
(XXX)[0-9]{0,2} is the 'simplest' automaton in that its finite, if you
actually have (XXX)[0-9][0-9]junk it will seek past that.

the other two forms you listed are infinite, and when automatonquery
finds a 'loop' in the automaton, it turns itself into a 'filtering
rangequery' temporarily with the upperbound being the end of the
transition.
This prevents it from doing a lot of useless disk seeks.

if you have (XXX)[0-9]* this is going to seek to (XXX) and then act as
a range query to (XXX)a (exclusive, just indicating a is the first
valid term after the infinitely long pattern
(XXX)9..)
then for each term in the range query its going to 'check' that it
matches the automaton.

(XXX).* will be similar to the above, except its going to be obviously
accept more terms, e.g. (XXX)m, and its 'range query' will be
something like (XXX)-(XXY)


 Semantically are they the same as I know that my content is only 5 digits

 I need them to
 1. formulate complex BooleanQuery, where AutomatonQuery gets one clause
 2. do post processing (a lot of hits) of the query against hits and
 this has to be fast.

 I guess, I will switch to keeping only Automaton[] and build
 RunAutomaton on the fly (per request) for fast query vs hits, this is
 done once per request only, but them I need to keep state of the
 RunAutomaton per query... makes things slightly more verbose...


AutomatonQuery computes this stuff a single time, up-front in its
constructor. Can you just reuse the AutomatonQuery(s)? in your app?
This should work fine.


-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: new AutomatonQuery(RunAutomaton) ?

2011-08-31 Thread Robert Muir
On Wed, Aug 31, 2011 at 1:44 PM, Robert Muir rcm...@gmail.com wrote:
 By the way, what will AutomatonQuery prefer (XXX)[0-9]{0,2} or
 (XXX)[0-9]* or (XXX).* ? Any performance difference?

Also, what I said only applies to old term dictionaries
implementations... if you are absolutely using the latest trunk with
BlockTree (https://issues.apache.org/jira/browse/LUCENE-3030), the
rules don't apply, the automaton is intersected in a totally
different way

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2687) Add new Solr book 'Apache Solr 3.1 Cookbook' to selection of Solr books and news.

2011-08-31 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-2687.
---

Resolution: Fixed

Ok, this is committed, thanks for the contrib!

 Add new Solr book 'Apache Solr 3.1 Cookbook' to selection of Solr books and 
 news.
 -

 Key: SOLR-2687
 URL: https://issues.apache.org/jira/browse/SOLR-2687
 Project: Solr
  Issue Type: Task
Reporter: Julian Copes
Assignee: Jan Høydahl
 Attachments: solr-2687.patch, solr_31_cookbook.jpg


 Find below the news of the new Solr book. I can provide an image when 
 prompted. Below is a news item and I've included the URL for the new book. 
 The text is as follows:
 Rafał Kuć is proud to introduce a new book on Solr, Apache Solr 3.1 
 Cookbook from Packt Publishing.
 The Solr 3.1 Cookbook will make your everyday work easier by using real-life 
 examples that show you how to deal with the most common problems that can 
 arise while using the Apache Solr search engine. 
 This cookbook will show you how to get the most out of your search engine. 
 Each chapter covers a different aspect of working with Solr from analyzing 
 your text data through querying, performance improvement, and developing your 
 own modules. The practical recipes will help you to quickly solve common 
 problems with data analysis, show you how to use faceting to collect data and 
 to speed up the performance of Solr. You will learn about functionalities 
 that most newbies are unaware of, such as sorting results by a function 
 value, highlighting matched words, and computing statistics to make your work 
 with Solr easy and stress free.
 Click here to read more about the Apache Solr 3.1 Cookbook. 
 (http://www.packtpub.com/solr-3-1-enterprise-search-server-cookbook/book)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Web site updated

2011-08-31 Thread Jan Høydahl
Hi,

After committing SOLR-2454 and SOLR-2687 I have now updated the solr site (from 
trunk). Should propagate soon. If any problems, old /www/lucene.apache.org/solr 
is backed up as solr.old.janhoy :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: new AutomatonQuery(RunAutomaton) ?

2011-08-31 Thread eks dev
bytes are good, I am in byte range on this data, and even simpler is good :)

It is simple, I just need to know if  this automaton I used for
AutomatonQuery accepts one stored field, so yes it is the same
information as in Term, but I need to run over it once more because my
query is not filtering on AutomatonQuery

((AutomatonQuery(A)) OR (OtherQuery) )+

So I get back documents not matched by this Automaton and I do not
know which ones are there due to the OtherQuery

running search in 2 passes, with and without automaton  is not practicable




On Wed, Aug 31, 2011 at 8:45 PM, Robert Muir rcm...@gmail.com wrote:
 On Wed, Aug 31, 2011 at 2:37 PM, eks dev eks...@yahoo.co.uk wrote:
 Keeping AutomatonQuery around came to me as an option, but do not
 forget, I need Automaton (RunAutomaton) for post processing... There
 is no way to get Automaton back from the AutomatonQuery?


 The compiled automaton is not always a RunAutomaton, sometimes its
 internal representation is something even simpler :)
 Additionally, when it is a RunAutomaton, its a UTF-8 one, for
 operating directly on bytes...

 Can you describe a little bit about what 'post processing' you need to
 do? I imagine its post processing on something other than the terms?

 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2650) Empty docs array on response with grouping and result pagination

2011-08-31 Thread Des Lownds (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Des Lownds updated SOLR-2650:
-

Attachment: grouping_patch.txt

patch file

 Empty docs array on response with grouping and result pagination
 

 Key: SOLR-2650
 URL: https://issues.apache.org/jira/browse/SOLR-2650
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.3
Reporter: Massimo Schiavon
 Attachments: grouping_patch.txt


 Requesting a certain number of rows and setting start parameter to a greater 
 value returns 0 results with grouping enabled.
 For example, requesting:
 http://localhost:8080/solr/web/select/?q=*:*rows=1start=2
 (grouping and highlighting are enabled by default)
 I get this response:
 [...]
   response: {
   numFound: 117852
   start: 2
   docs: [ ]
   }
   highlighting: {
 0938630598: {
   title: [ ... ]
   content: [ ... ]
 }
   }
 [...]
 docs array is empty while the highlighted values of the document are present
 Debugging the request in
 org.apache.solr.search.Grouping.Command.createSimpleResponse() at row 534
 [...]
  int len = Math.min(numGroups, docsGathered);
   if (offset  len) {
 len = 0;
   }
 [...]
 The initial vars values are:
 numGroups = 1
 docsGathered = 3
 offset = 2
 so after the execution len = 0
 I've tried commenting the if statement and this resolves the issue but could 
 introduce some other bugs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2650) Empty docs array on response with grouping and result pagination

2011-08-31 Thread Des Lownds (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094896#comment-13094896
 ] 

Des Lownds commented on SOLR-2650:
--

I'd be happy to test a nightly, where do I download them from? or is it a svn 
co?

 Empty docs array on response with grouping and result pagination
 

 Key: SOLR-2650
 URL: https://issues.apache.org/jira/browse/SOLR-2650
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.3
Reporter: Massimo Schiavon
 Attachments: grouping_patch.txt


 Requesting a certain number of rows and setting start parameter to a greater 
 value returns 0 results with grouping enabled.
 For example, requesting:
 http://localhost:8080/solr/web/select/?q=*:*rows=1start=2
 (grouping and highlighting are enabled by default)
 I get this response:
 [...]
   response: {
   numFound: 117852
   start: 2
   docs: [ ]
   }
   highlighting: {
 0938630598: {
   title: [ ... ]
   content: [ ... ]
 }
   }
 [...]
 docs array is empty while the highlighted values of the document are present
 Debugging the request in
 org.apache.solr.search.Grouping.Command.createSimpleResponse() at row 534
 [...]
  int len = Math.min(numGroups, docsGathered);
   if (offset  len) {
 len = 0;
   }
 [...]
 The initial vars values are:
 numGroups = 1
 docsGathered = 3
 offset = 2
 so after the execution len = 0
 I've tried commenting the if statement and this resolves the issue but could 
 introduce some other bugs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2906) Filter to process output of ICUTokenizer and create overlapping bigrams for CJK

2011-08-31 Thread Tom Burton-West (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094902#comment-13094902
 ] 

Tom Burton-West commented on LUCENE-2906:
-

Any chance this might get implemented for 3.4?


 Filter to process output of ICUTokenizer and create overlapping bigrams for 
 CJK 
 

 Key: LUCENE-2906
 URL: https://issues.apache.org/jira/browse/LUCENE-2906
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Tom Burton-West
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-2906.patch


 The ICUTokenizer produces unigrams for CJK. We would like to use the 
 ICUTokenizer but have overlapping bigrams created for CJK as in the CJK 
 Analyzer.  This filter would take the output of the ICUtokenizer, read the 
 ScriptAttribute and for selected scripts (Han, Kana), would produce 
 overlapping bigrams.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2011-08-31 Thread Luca Stancapiano (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094903#comment-13094903
 ] 

Luca Stancapiano edited comment on LUCENE-3167 at 8/31/11 9:13 PM:
---

Hi Steven,

I send a new updated patch.

I added two new stamp properties in the build-manifest macro (start.touch.time 
and end.touch.time) that log the milliseconds of the process.

War files in OSGI are worked as the jar files. If the OSGI repository has 
functionalities to work with containers, it takes the informations directly by 
the bundle. The MANIFEST.MF file doesn't include informations about containers.

I added the bnd library from 
http://dl.dropbox.com/u/2590603/bnd/biz.aQute.bndlib.jar (actually in the 
dropbox there is the only version for ant. See: 
http://www.aqute.biz/Bnd/Download) and added it to the ant classpath how for 
the 'generate-maven-artifacts' target.

Here the responses to the tasks:

1 - checked the box to grant the Apache license.

2 - Renamed the patch according the convetion.

3 - Deleted the bnd configuration for solr. Now only the build-manifest macro 
declared in the common-build.xml of lucene project is used. But I was forced to 
declare the attributes @{title} and @{implementation.title} as properties 
inside the build-manifest macro, else they didn't seen in the external file 
lucene.bnd.

4 - I see the correct value of ${bnd.project.description} because the property 
is created through the configuration : xmlproperty file=${ant.file} 
collapseAttributes=true prefix=bnd/ inside the build-manifest macro. Maybe 
I didn't added all in the previous patch. Let me know if the problem persists.

5 - I excluded the DSTAMP, TSTAMP, and TODAY properties by the bnd 
configuration through the property: -removeheaders . The main problem is that 
the bnd ant task takes all the ant properties starting with an uppercased 
lecter and add them without ask. Should be a bnd property -inherit (true/false) 
that tells if import the ant properties but it doesn't work. This problem is 
signed in: https://github.com/bnd/bnd/issues/72. An other important thing is 
that the 'Name' ant property declared in some build.xml is not accepted by the 
bnd ant task. In the bnd ant task code there is an hard exception if the 'Name' 
property is found:

if (header.equalsIgnoreCase(Name)) {
error(Your bnd file contains a header called 
'Name'. This interferes with the manifest name section.);
continue;
}

So I was forced to rename the 'Name' property and its references in 'LuceneName'

6 - Added the ${user.name} property in the Implementation-Version manifest 
property

7 - Renamed the Bundle-DocUR property to Bundle-DocURL 

  was (Author: luca.stancaqpiano):
Hi Steven,

I send a new updated patch.

I added two new stamp properties in the build-manifest macro (start.touch.time 
and end.touch.time) that log the milliseconds of the process.

War files in OSGI are worked as the jar files. If the OSGI repository has 
functionalities to work with containers, it takes the informations directly by 
the bundle. The MANIFEST.MF file doesn't include informations about containers.

I added the bnd library from 
http://dl.dropbox.com/u/2590603/bnd/biz.aQute.bndlib.jar (actually in the 
dropbox there is the only version for ant. See: 
http://www.aqute.biz/Bnd/Download) and added it to the ant classpath how for 
the 'generate-maven-artifacts' target.

Here the reposts to the tasks:

1 - checked the box to grant the Apache license.

2 - Renamed the patch according the convetion.

3 - Deleted the bnd configuration for solr. Now only the build-manifest macro 
declared in the common-build.xml of lucene project is used. But I was forced to 
declare the attributes @{title} and @{implementation.title} as properties 
inside the build-manifest macro, else they didn't seen in the external file 
lucene.bnd.

4 - I see the correct value of ${bnd.project.description} because the property 
is created through the configuration : xmlproperty file=${ant.file} 
collapseAttributes=true prefix=bnd/ inside the build-manifest macro. Maybe 
I didn't added all in the previous patch. Let me know if the problem persists.

5 - I excluded the DSTAMP, TSTAMP, and TODAY properties by the bnd 
configuration through the property: -removeheaders . The main problem is that 
the bnd ant task takes all the ant properties starting with an uppercased 
lecter and add them without ask. Should be a bnd property -inherit (true/false) 
that tells if import the ant properties but it doesn't work. This problem is 
signed in: https://github.com/bnd/bnd/issues/72. An other important thing is 
that the 'Name' ant property declared in some build.xml is not accepted by the 
bnd ant task. In the bnd ant task code there is an hard exception if 

[jira] [Updated] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2011-08-31 Thread Luca Stancapiano (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Stancapiano updated LUCENE-3167:
-

Attachment: LUCENE-3167.patch

Hi Steven,

I send a new updated patch.

I added two new stamp properties in the build-manifest macro (start.touch.time 
and end.touch.time) that log the milliseconds of the process.

War files in OSGI are worked as the jar files. If the OSGI repository has 
functionalities to work with containers, it takes the informations directly by 
the bundle. The MANIFEST.MF file doesn't include informations about containers.

I added the bnd library from 
http://dl.dropbox.com/u/2590603/bnd/biz.aQute.bndlib.jar (actually in the 
dropbox there is the only version for ant. See: 
http://www.aqute.biz/Bnd/Download) and added it to the ant classpath how for 
the 'generate-maven-artifacts' target.

Here the reposts to the tasks:

1 - checked the box to grant the Apache license.

2 - Renamed the patch according the convetion.

3 - Deleted the bnd configuration for solr. Now only the build-manifest macro 
declared in the common-build.xml of lucene project is used. But I was forced to 
declare the attributes @{title} and @{implementation.title} as properties 
inside the build-manifest macro, else they didn't seen in the external file 
lucene.bnd.

4 - I see the correct value of ${bnd.project.description} because the property 
is created through the configuration : xmlproperty file=${ant.file} 
collapseAttributes=true prefix=bnd/ inside the build-manifest macro. Maybe 
I didn't added all in the previous patch. Let me know if the problem persists.

5 - I excluded the DSTAMP, TSTAMP, and TODAY properties by the bnd 
configuration through the property: -removeheaders . The main problem is that 
the bnd ant task takes all the ant properties starting with an uppercased 
lecter and add them without ask. Should be a bnd property -inherit (true/false) 
that tells if import the ant properties but it doesn't work. This problem is 
signed in: https://github.com/bnd/bnd/issues/72. An other important thing is 
that the 'Name' ant property declared in some build.xml is not accepted by the 
bnd ant task. In the bnd ant task code there is an hard exception if the 'Name' 
property is found:

if (header.equalsIgnoreCase(Name)) {
error(Your bnd file contains a header called 
'Name'. This interferes with the manifest name section.);
continue;
}

So I was forced to rename the 'Name' property and its references in 'LuceneName'

6 - Added the ${user.name} property in the Implementation-Version manifest 
property

7 - Renamed the Bundle-DocUR property to Bundle-DocURL 

 Make lucene/solr a OSGI bundle through Ant
 --

 Key: LUCENE-3167
 URL: https://issues.apache.org/jira/browse/LUCENE-3167
 Project: Lucene - Java
  Issue Type: New Feature
 Environment: bndtools
Reporter: Luca Stancapiano
 Attachments: LUCENE-3167.patch, lucene_trunk.patch, lucene_trunk.patch


 We need to make a bundle thriugh Ant, so the binary can be published and no 
 more need the download of the sources. Actually to get a OSGI bundle we need 
 to use maven tools and build the sources. Here the reference for the creation 
 of the OSGI bundle through Maven:
 https://issues.apache.org/jira/browse/LUCENE-1344
 Bndtools could be used inside Ant

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2011-08-31 Thread Luca Stancapiano (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094903#comment-13094903
 ] 

Luca Stancapiano edited comment on LUCENE-3167 at 8/31/11 9:14 PM:
---

Hi Steven,

I send a new updated patch.

I added two new stamp properties in the build-manifest macro (start.touch.time 
and end.touch.time) that log the milliseconds of the process.

War files in OSGI are worked as the jar files. If the OSGI repository has 
functionalities to work with containers, it takes the informations directly by 
the bundle. The MANIFEST.MF file doesn't include informations about containers.

I added the bnd library from 
http://dl.dropbox.com/u/2590603/bnd/biz.aQute.bndlib.jar (actually in the 
dropbox there is the only version for ant. See: 
http://www.aqute.biz/Bnd/Download) and added it to the ant classpath how for 
the 'generate-maven-artifacts' target.

Here the responses to the tasks:

1 - checked the box to grant the Apache license.

2 - Renamed the patch according the convetion.

3 - Deleted the bnd configuration for solr. Now only the build-manifest macro 
declared in the common-build.xml of lucene project is used. But I was forced to 
declare the attributes @{title} and @{implementation.title} as properties 
inside the build-manifest macro, else they didn't seen in the external file 
lucene.bnd.

4 - I see the correct value of ${bnd.project.description} because the property 
is created through the configuration : xmlproperty file=${ant.file} 
collapseAttributes=true prefix=bnd/ inside the build-manifest macro. Maybe 
I didn't added all in the previous patch. Let me know if the problem persists.

5 - I excluded the DSTAMP, TSTAMP, and TODAY properties by the bnd 
configuration through the property: -removeheaders . The main problem is that 
the bnd ant task takes all the ant properties starting with an uppercased 
lecter and add them without ask. Should be a bnd property -inherit (true/false) 
that tells if import the ant properties but it doesn't work. This problem is 
signed in: https://github.com/bnd/bnd/issues/72. An other important thing is 
that the 'Name' ant property declared in some build.xml is not accepted by the 
bnd ant task. In the bnd ant task code there is an hard exception if the 'Name' 
property is found:

{code}
if (header.equalsIgnoreCase(Name)) {
error(Your bnd file contains a header called 
'Name'. This interferes with the manifest name section.);
continue;
}
{code}

So I was forced to rename the 'Name' property and its references in 'LuceneName'

6 - Added the ${user.name} property in the Implementation-Version manifest 
property

7 - Renamed the Bundle-DocUR property to Bundle-DocURL 

  was (Author: luca.stancaqpiano):
Hi Steven,

I send a new updated patch.

I added two new stamp properties in the build-manifest macro (start.touch.time 
and end.touch.time) that log the milliseconds of the process.

War files in OSGI are worked as the jar files. If the OSGI repository has 
functionalities to work with containers, it takes the informations directly by 
the bundle. The MANIFEST.MF file doesn't include informations about containers.

I added the bnd library from 
http://dl.dropbox.com/u/2590603/bnd/biz.aQute.bndlib.jar (actually in the 
dropbox there is the only version for ant. See: 
http://www.aqute.biz/Bnd/Download) and added it to the ant classpath how for 
the 'generate-maven-artifacts' target.

Here the responses to the tasks:

1 - checked the box to grant the Apache license.

2 - Renamed the patch according the convetion.

3 - Deleted the bnd configuration for solr. Now only the build-manifest macro 
declared in the common-build.xml of lucene project is used. But I was forced to 
declare the attributes @{title} and @{implementation.title} as properties 
inside the build-manifest macro, else they didn't seen in the external file 
lucene.bnd.

4 - I see the correct value of ${bnd.project.description} because the property 
is created through the configuration : xmlproperty file=${ant.file} 
collapseAttributes=true prefix=bnd/ inside the build-manifest macro. Maybe 
I didn't added all in the previous patch. Let me know if the problem persists.

5 - I excluded the DSTAMP, TSTAMP, and TODAY properties by the bnd 
configuration through the property: -removeheaders . The main problem is that 
the bnd ant task takes all the ant properties starting with an uppercased 
lecter and add them without ask. Should be a bnd property -inherit (true/false) 
that tells if import the ant properties but it doesn't work. This problem is 
signed in: https://github.com/bnd/bnd/issues/72. An other important thing is 
that the 'Name' ant property declared in some build.xml is not accepted by the 
bnd ant task. In the bnd ant task code there is an 

[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2011-08-31 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094931#comment-13094931
 ] 

Steven Rowe commented on LUCENE-3167:
-

Hi Luca,

I'll take a look at your new patch today or tomorrow.  

Have you done any timings yet?

I don't understand a couple of things you wrote:

bq. War files in OSGI are worked as the jar files.

Do you mean that OSGI treats .war files the same as .jar files?

bq. If the OSGI repository has functionalities to work with containers, it 
takes the informations directly by the bundle.  The MANIFEST.MF file doesn't 
include informations about containers.

What is a container?  What is a bundle?  Why does it matter that MANIFEST.MF 
does not include information about containers?  How are these things related to 
the other topics under discussion on this issue?  (I wasn't kidding when I 
wrote that I know nothing about OSGi.)




 Make lucene/solr a OSGI bundle through Ant
 --

 Key: LUCENE-3167
 URL: https://issues.apache.org/jira/browse/LUCENE-3167
 Project: Lucene - Java
  Issue Type: New Feature
 Environment: bndtools
Reporter: Luca Stancapiano
 Attachments: LUCENE-3167.patch, lucene_trunk.patch, lucene_trunk.patch


 We need to make a bundle thriugh Ant, so the binary can be published and no 
 more need the download of the sources. Actually to get a OSGI bundle we need 
 to use maven tools and build the sources. Here the reference for the creation 
 of the OSGI bundle through Maven:
 https://issues.apache.org/jira/browse/LUCENE-1344
 Bndtools could be used inside Ant

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: new AutomatonQuery(RunAutomaton) ?

2011-08-31 Thread Mike Sokolov
Can you clone the AutomatonQuery and combine it with a filter returning 
a single document?  There is code that does something like this in 
LUCENE-3318.  That way you can test if the automaton matches a document 
without the need to tease it apart.


-Mike


On 08/31/2011 04:32 PM, eks dev wrote:

bytes are good, I am in byte range on this data, and even simpler is good :)

It is simple, I just need to know if  this automaton I used for
AutomatonQuery accepts one stored field, so yes it is the same
information as in Term, but I need to run over it once more because my
query is not filtering on AutomatonQuery

((AutomatonQuery(A)) OR (OtherQuery) )+

So I get back documents not matched by this Automaton and I do not
know which ones are there due to the OtherQuery

running search in 2 passes, with and without automaton  is not practicable




On Wed, Aug 31, 2011 at 8:45 PM, Robert Muirrcm...@gmail.com  wrote:
   

On Wed, Aug 31, 2011 at 2:37 PM, eks deveks...@yahoo.co.uk  wrote:
 

Keeping AutomatonQuery around came to me as an option, but do not
forget, I need Automaton (RunAutomaton) for post processing... There
is no way to get Automaton back from the AutomatonQuery?

   

The compiled automaton is not always a RunAutomaton, sometimes its
internal representation is something even simpler :)
Additionally, when it is a RunAutomaton, its a UTF-8 one, for
operating directly on bytes...

Can you describe a little bit about what 'post processing' you need to
do? I imagine its post processing on something other than the terms?

--
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

   


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >