[Lucene.Net] [jira] [Created] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net

2011-03-23 Thread Pasha Bizhan (JIRA)
Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
--

 Key: LUCENENET-406
 URL: https://issues.apache.org/jira/browse/LUCENENET-406
 Project: Lucene.Net
  Issue Type: New Feature
  Components: Lucene.Net Core
Reporter: Pasha Bizhan
Priority: Minor


Lucene.Net 1.4. 
nunit tests included. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Updated] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net

2011-03-23 Thread Pasha Bizhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pasha Bizhan updated LUCENENET-406:
---

Component/s: (was: Lucene.Net Core)

 Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
 --

 Key: LUCENENET-406
 URL: https://issues.apache.org/jira/browse/LUCENENET-406
 Project: Lucene.Net
  Issue Type: New Feature
Reporter: Pasha Bizhan
Priority: Minor

 Lucene.Net 1.4. 
 nunit tests included. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Updated] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net

2011-03-23 Thread Pasha Bizhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pasha Bizhan updated LUCENENET-406:
---

Attachment: solr.net.zip

full source code with nInut tests

 Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
 --

 Key: LUCENENET-406
 URL: https://issues.apache.org/jira/browse/LUCENENET-406
 Project: Lucene.Net
  Issue Type: New Feature
Reporter: Pasha Bizhan
Priority: Minor
 Attachments: solr.net.zip


 Lucene.Net 1.4. 
 nunit tests included. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Issue Comment Edited] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net

2011-03-23 Thread Pasha Bizhan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010037#comment-13010037
 ] 

Pasha Bizhan edited comment on LUCENENET-406 at 3/23/11 9:38 AM:
-

full source code with nUnit tests

  was (Author: pbizhan):
full source code with nInut tests
  
 Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
 --

 Key: LUCENENET-406
 URL: https://issues.apache.org/jira/browse/LUCENENET-406
 Project: Lucene.Net
  Issue Type: New Feature
Reporter: Pasha Bizhan
Priority: Minor
 Attachments: solr.net.zip


 Lucene.Net 1.4. 
 nunit tests included. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-380) Evaluate Sharpen as a port tool

2011-03-23 Thread Scott Lombard (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010051#comment-13010051
 ] 

Scott Lombard commented on LUCENENET-380:
-

Why fork outside of ASF if we can keep it inside? Is a independent project 
justified? It seems to me there is a lot of infrastructure that needs to be 
duplicated and maintained.

I agreed to starting a fork outside of ASF because I didn't think there was any 
possibility to bring code into lucene.net. Now, I just don't understand 
licensing well enough to rule out a dOCL license from db4o.

 Evaluate Sharpen as a port tool
 ---

 Key: LUCENENET-380
 URL: https://issues.apache.org/jira/browse/LUCENENET-380
 Project: Lucene.Net
  Issue Type: Task
  Components: Build Automation, Lucene.Net Contrib, Lucene.Net Core, 
 Lucene.Net Demo, Lucene.Net Test
Reporter: George Aroush
Assignee: Alex Thompson
 Attachments: 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip, 
 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java, 
 Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip, 
 Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java, 
 TestBufferedIndexInput.java, TestDateFilter.java


 This task is to evaluate Sharpen as a port tool for Lucene.Net.
 The files to be evaluated are attached.  We need to run those files (which 
 are off Java Lucene 2.9.2) against Sharpen and compare the result against 
 JLCA result.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] Wrong home link in lucene.net website

2011-03-23 Thread Simone Chiaretta
Just a quick bug report on the website:

http://incubator.apache.org/lucene.net/

the Lucene.Net logo links to the homepage of the incubator and not on the
homepage of the project.
Simone

-- 
Simone Chiaretta
Microsoft MVP ASP.NET - ASPInsider
Blog: http://codeclimber.net.nz
RSS: http://feeds2.feedburner.com/codeclimber
twitter: @simonech

Any sufficiently advanced technology is indistinguishable from magic
Life is short, play hard


[Lucene.Net] [jira] [Commented] (LUCENENET-380) Evaluate Sharpen as a port tool

2011-03-23 Thread Alex Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010206#comment-13010206
 ] 

Alex Thompson commented on LUCENENET-380:
-

My thoughts on the fork have been to make something that would be useful beyond 
Lucene, and the scope of the problems seem to be beyond the scope of 
Lucene.Net, so I do think an independent project would be a more natural fit. 
And if we used say BitBucket, would the infrastructure really be that much of a 
barrier?

 Evaluate Sharpen as a port tool
 ---

 Key: LUCENENET-380
 URL: https://issues.apache.org/jira/browse/LUCENENET-380
 Project: Lucene.Net
  Issue Type: Task
  Components: Build Automation, Lucene.Net Contrib, Lucene.Net Core, 
 Lucene.Net Demo, Lucene.Net Test
Reporter: George Aroush
Assignee: Alex Thompson
 Attachments: 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip, 
 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java, 
 Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip, 
 Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java, 
 TestBufferedIndexInput.java, TestDateFilter.java


 This task is to evaluate Sharpen as a port tool for Lucene.Net.
 The files to be evaluated are attached.  We need to run those files (which 
 are off Java Lucene 2.9.2) against Sharpen and compare the result against 
 JLCA result.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010022#comment-13010022
 ] 

Simon Willnauer commented on LUCENE-2310:
-

Hey Chris,

good that you reactivate this issue! I was looking into similar stuff while 
working on docvalues since it really needs to add stuff to Field / Fieldable. 
With a cleanup and eventually FieldType this would be way less painless I 
guess. I have a couple of questions and comments to the current patch. 
Btw. I like the fact that the previous patch was uploaded March 21 2010 and the 
latest took 1 year to come up on march 23 2011 :)

* Why do you reformat all the stuff in Field, is that necessary here at all? I 
mean its needed eventually but for the deprecation of things it only bloats the 
patch really doesn't it?

* When you deprecate AbstractField and Fieldable, Field should ideally be a 
standalone class. So I see that this still needs to subclass Fieldable / 
AbstractField but could it stand alone now so that we can simply remove the 
extends / implements on Field once we drop things in 4.0? I think it looks good 
from looking at the patch though

* I don't like the name getAllFields on Document since it implies that we have 
a getPartialFields or something. I see that you can not use getFields since it 
only differs in return type which doesn't belong to the signature though. Maybe 
we should implement IterableField here and offer an additional method 
getFieldsAsList or maybe getFields(ListField fields)

* once we have this in what are the next steps towards FieldType? Will we have 
only one class Field that is backed by a FieldType but still offers the methods 
it has now? Or doe we have two totally new classes FieldTyps and FieldValue, 
something like this:
{code} 
class FieldValue {
  FieldType type;
  float boost;
  String name;
  Object value;
}
{code}

* I wonder if this patch raises tons of deprecation warnings all over lucene 
where Fieldable was used? In IW we use it all over the place though. We must 
fix that in this issue too otherwise uwe will go mad I guess :)

thanks for bringing this up again!

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010027#comment-13010027
 ] 

Chris Male commented on LUCENE-2310:


Thanks for taking a look at this Simon.

bq. Why do you reformat all the stuff in Field, is that necessary here at all? 
I mean its needed eventually but for the deprecation of things it only bloats 
the patch really doesn't it?

Because for me this issue is about reducing the complexity of these classes and 
Field is a mess.  Making it more readable reduces the complexity.  If needs be 
I will do this in two patches, but I don't feel this issue is resolved till the 
code in Field is readable.

bq. When you deprecate AbstractField and Fieldable, Field should ideally be a 
standalone class. So I see that this still needs to subclass Fieldable / 
AbstractField but could it stand alone now so that we can simply remove the 
extends / implements on Field once we drop things in 4.0? I think it looks good 
from looking at the patch though

I don't really understand what you're suggesting here.  In 3x where the 
deprecations will be occurring Field has to continue to extend AbstractField.  
Yes in 4.0 we can drop that extension but addressing the deprecations is not in 
the scope of 3x.

bq. I don't like the name getAllFields on Document since it implies that we 
have a getPartialFields or something. I see that you can not use getFields 
since it only differs in return type which doesn't belong to the signature 
though. Maybe we should implement IterableField here and offer an additional 
method getFieldsAsList or maybe getFields(ListField fields)

Yeah good call.  I think implementing IterableField is best, but it will also 
require adding a count() method to Document since often people retrieve the 
List to get the number of fields.

bq. once we have this in what are the next steps towards FieldType? Will we 
have only one class Field that is backed by a FieldType but still offers the 
methods it has now? Or doe we have two totally new classes FieldTyps and 
FieldValue

Once FieldType is in, all the various metadata properties (isIndexed, isStored 
etc) will be moved to FieldType, leaving Field as what you suggest as 
FieldValue.  Field will contain its type, boost, name, value.  If we have 
Analyzers on FieldTypes, then we will be able to remove the TokenStream from 
Field.

bq. I wonder if this patch raises tons of deprecation warnings all over lucene 
where Fieldable was used? In IW we use it all over the place though. We must 
fix that in this issue too otherwise uwe will go mad I guess

Yeah but not in 3x unfortunately.  As it stands people can retrieve the List of 
Fieldables via getFields() and add whatever implementation of Fieldable they 
like.  Consequently we need to continue to support Fieldable in IW for example. 
 Once this code has been committed I will create a new patch for trunk which 
moves all of Solr and Lucene over to the Field.  I could do this in many places 
already of course, but that core classes like IW would have to remain as they 
are.

I will wait for your thoughts on the reformating and then make a new patch.



 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]

2011-03-23 Thread David Nemeskey
Hey Simon and all,

May we get an update on this? I understand that Google has published the list 
of accepted organizations, which -- not surprisingly -- includes the ASF. Is 
there any information on how many slots Apache got, and which issues will be 
selected?

The student application period opens on the 28th, so I'm just wondering if I 
should go ahead and apply or wait for the decision.

Thanks,
David

On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote:
 Hey folks,
 
 Google Summer of Code 2011 is very close and the Project Applications
 Period has started recently. Now it's time to get some excited students
 on board for this year's GSoC.
 
 I encourage students to submit an application to the Google Summer of Code
 web-application. Lucene  Solr are amazing projects and GSoC is an
 incredible opportunity to join the community and push the project
 forward.
 
 If you are a student and you are interested spending some time on a
 great open source project while getting paid for it, you should submit
 your application from March 28 - April 8, 2011. There are only 3
 weeks until this process starts!
 
 Quote from the GSoC website: We hear almost universally from our
 mentoring organizations that the best applications they receive are
 from students who took the time to interact and discuss their ideas
 before submitting an application, so make sure to check out each
 organization's Ideas list to get to know a particular open source
 organization better.
 
 So if you have any ideas what Lucene  Solr should have, or if you
 find any of the GSoC pre-selected projects [1] interesting, please
 join us on dev@lucene.apache.org [2].  Since you as a student must
 apply for a certain project via the GSoC website [3], it's a good idea
 to work on it ahead of time and include the community and possible
 mentors as soon as possible.
 
 Open source development here at the Apache Software
 Foundation happens almost exclusively in the public and I encourage you to
 follow this. Don't mail folks privately; please use the mailing list to
 get the best possible visibility and attract interested community
 members and push your idea forward. As always, it's the idea that
 counts not the person!
 
 That said, please do not underestimate the complexity of even small
 GSoC - Projects. Don't try to rewrite Lucene or Solr!  A project
 usually gains more from a smaller, well discussed and carefully
 crafted  tested feature than from a half baked monster change that's
 too large to work with.
 
 Once your proposal has been accepted and you begin work, you should
 give the community the opportunity to iterate with you.  We prefer
 progress over perfection so don't hesitate to describe your overall
 vision, but when the rubber meets the road let's take it in small
 steps.  A code patch of 20 KB is likely to be reviewed very quickly so
 get fast feedback, while a patch even 60kb in size can take very
 - Hide quoted text -
 long. So try to break up your vision and the community will work with
 you to get things done!
 
 On behalf of the Lucene  Solr community,
 
 Go! join the mailing list and apply for GSoC 2011,
 
 Simon
 
 [1]
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQu
 ery=labels+%3D+lucene-gsoc-11 [2]
 http://lucene.apache.org/java/docs/mailinglists.html
 [3] http://www.google-melange.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-2983) FieldInfos should be read-only if loaded from disk

2011-03-23 Thread Simon Willnauer (JIRA)
FieldInfos should be read-only if loaded from disk
--

 Key: LUCENE-2983
 URL: https://issues.apache.org/jira/browse/LUCENE-2983
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0


Currently FieldInfos create a private FieldNumberBiMap when they are loaded 
from a directory which is necessary due to some limitation we need to face with 
IW#addIndexes(Dir). If we add an index via a directory to an existing index 
field number can conflict with the global field numbers in the IW receiving the 
directories. Those field number conflicts will remain until those segments are 
merged and we stabilize again based on the IW global field numbers. Yet, we 
unnecessarily creating a BiMap here where we actually should enforce read-only 
semantics since nobody should modify this FieldInfos instance we loaded from 
the directory. If somebody needs to get a modifiable copy they should simply 
create a new one and all all FieldInfo instances to it.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010030#comment-13010030
 ] 

Simon Willnauer commented on LUCENE-2310:
-

bq. I don't really understand what you're suggesting here. In 3x where the 
deprecations will be occurring Field has to continue to extend AbstractField. 
Yes in 4.0 we can drop that extension but addressing the deprecations is not in 
the scope of 3x.

What I mean here is that if I would simply remove the extends AbstractField 
from Field would it still compile or are there any dependencies from 
AbstractField? IMO AbstractField should just be empty now right?

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2983) FieldInfos should be read-only if loaded from disk

2011-03-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2983:


Attachment: LUCENE-2983.patch

here is a patch with tests. All tests pass

 FieldInfos should be read-only if loaded from disk
 --

 Key: LUCENE-2983
 URL: https://issues.apache.org/jira/browse/LUCENE-2983
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2983.patch


 Currently FieldInfos create a private FieldNumberBiMap when they are loaded 
 from a directory which is necessary due to some limitation we need to face 
 with IW#addIndexes(Dir). If we add an index via a directory to an existing 
 index field number can conflict with the global field numbers in the IW 
 receiving the directories. Those field number conflicts will remain until 
 those segments are merged and we stabilize again based on the IW global field 
 numbers. Yet, we unnecessarily creating a BiMap here where we actually should 
 enforce read-only semantics since nobody should modify this FieldInfos 
 instance we loaded from the directory. If somebody needs to get a modifiable 
 copy they should simply create a new one and all all FieldInfo instances to 
 it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010032#comment-13010032
 ] 

Chris Male commented on LUCENE-2310:


Yes Field would still compile if you removed the extends.  However if we empty 
AbstractField then any client code that also extends AbstractField would break. 
 Thats why I deprecate the whole class but leave its code in.  We could empty 
it and change it to extend Field, I think that would still work.

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-2984) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos

2011-03-23 Thread Simon Willnauer (JIRA)
Move hasVectors()  hasProx() responsibility out of SegmentInfo to FieldInfos 
--

 Key: LUCENE-2984
 URL: https://issues.apache.org/jira/browse/LUCENE-2984
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


Spin-off from LUCENe-2881 which had this change already but due to some random 
failures related to this change I remove this part of the patch to make it more 
isolated and easier to test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010035#comment-13010035
 ] 

Simon Willnauer commented on LUCENE-2310:
-

{quote}
Yeah but not in 3x unfortunately. As it stands people can retrieve the List of 
Fieldables via getFields() and add whatever implementation of Fieldable they 
like. Consequently we need to continue to support Fieldable in IW for example. 
Once this code has been committed I will create a new patch for trunk which 
moves all of Solr and Lucene over to the Field. I could do this in many places 
already of course, but that core classes like IW would have to remain as they 
are.
{quote}

So, what is the reason for doing this in 3.x at all, can't we simply drop stuff 
in 4.0 and let 3.x alone?

Simon

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010036#comment-13010036
 ] 

Chris Male commented on LUCENE-2310:


bq. So, what is the reason for doing this in 3.x at all, can't we simply drop 
stuff in 4.0 and let 3.x alone?

Very good question.  Certainly we are simplifying the codebase and I feel that 
Field is what most users use (not AbstractField).  But I know some expert users 
do use AbstractField.  But maybe they can handle the hard change?

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2984) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos

2011-03-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2984:


Description: Spin-off from LUCENE-2881 which had this change already but 
due to some random failures related to this change I remove this part of the 
patch to make it more isolated and easier to test.   (was: Spin-off from 
LUCENe-2881 which had this change already but due to some random failures 
related to this change I remove this part of the patch to make it more isolated 
and easier to test. )

 Move hasVectors()  hasProx() responsibility out of SegmentInfo to FieldInfos 
 --

 Key: LUCENE-2984
 URL: https://issues.apache.org/jira/browse/LUCENE-2984
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


 Spin-off from LUCENE-2881 which had this change already but due to some 
 random failures related to this change I remove this part of the patch to 
 make it more isolated and easier to test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010039#comment-13010039
 ] 

Doron Cohen commented on LUCENE-2980:
-

bq. Perhaps we should add a specific test in CSTest for this problem? I 
wouldn't use file.delete() as in indicator because on Linux it will pass

Agree, I'll add one.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010043#comment-13010043
 ] 

Doron Cohen commented on LUCENE-2980:
-

bq. Perhaps we should add a specific test in CSTest for this problem? I 
wouldn't use file.delete() as in indicator because on Linux it will pass

Changed my mind about adding this test to ContentSourceTest - I think such a 
test fits more to the CommonCompress project, because it should directly call 
CompressorStreamFactory.createCompressorInputStream(in). In our test we invoke 
ContentSource.getInputStream(File) and so we cannot pass such a close-sensing 
stream. 

But this is a valid point, especially, the test case I provided to COMPRESS-127 
will fail on Windows but will likely pass on Linux. I'll add a reference to 
your comment in COMPRESS-127.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2977:


Summary: WriteLineDocTask should write gzip/bzip2/txt according to the 
extension of specified output file name  (was: WriteLineDocTask should write 
gzip/bzip2/txt according to the extension of specifie output file name)

 WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
 specified output file name
 -

 Key: LUCENE-2977
 URL: https://issues.apache.org/jira/browse/LUCENE-2977
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0


 Since the readers behave this way it would be nice and handy if also this 
 line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010064#comment-13010064
 ] 

Shai Erera commented on LUCENE-2980:


Agreed.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2980:


Attachment: LUCENE-2980.patch

Updated patch applies workaround only for GZIP format, as other types do close 
their wrapped stream (COMPRESS-127).

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-2980.
-

   Resolution: Fixed
Lucene Fields:   (was: [New])

Committed:
- trunk: r1084544, r1084549
- 3x: r1084552

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2982) Get rid of ContenSource's workaround for closing b/gzip input stream once this is fixed in CommonCompress

2011-03-23 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010086#comment-13010086
 ] 

Doron Cohen commented on LUCENE-2982:
-

COMPRESS-127 was fixed, so whenever a new CommonsCompress release is available 
should be able to complete this one.
I subscribed to annou...@apache.org to be notified when that happens...

 Get rid of ContenSource's workaround for closing b/gzip input stream once 
 this is fixed in CommonCompress
 -

 Key: LUCENE-2982
 URL: https://issues.apache.org/jira/browse/LUCENE-2982
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/benchmark
Reporter: Doron Cohen
Priority: Minor

 Once COMPRESS-127 is fixed get rid of the entire workaround method 
 ContentSource.closableCompressorInputStream(). It would simplify the code and 
 would perform better without that delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing

2011-03-23 Thread Simon Willnauer (JIRA)
Build SegmentCodecs incrementally for consistent codecIDs during indexing
-

 Key: LUCENE-2985
 URL: https://issues.apache.org/jira/browse/LUCENE-2985
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Codecs, Index
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0


currently we build the SegementCodecs during flush which is fine as long as no 
codec needs to know which fields it should handle. This will change with 
DocValues or when we expose StoredFields / TermVectors via Codec (see 
LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a 
consistent view of which codec belongs to which field during indexing and all 
FieldInfo instances are unassigned (set to -1). Instead we should build the 
SegmentCodecs incrementally as fields come in so no matter when a codec needs 
to be selected to process a document / field we have the right codec ID 
assigned.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]

2011-03-23 Thread Simon Willnauer
On Wed, Mar 23, 2011 at 9:37 AM, David Nemeskey
nemeskey.da...@sztaki.hu wrote:
 Hey Simon and all,

 May we get an update on this? I understand that Google has published the list
 of accepted organizations, which -- not surprisingly -- includes the ASF. Is
 there any information on how many slots Apache got, and which issues will be
 selected?

 The student application period opens on the 28th, so I'm just wondering if I
 should go ahead and apply or wait for the decision.

David,

you should go ahead and apply via the GSoC website and reference the
issue there this is how I understand it works.
We will later rate the proposals from the GSoC website and decide
which we choose. This is also when slots get assigned.

simon

 Thanks,
 David

 On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote:
 Hey folks,

 Google Summer of Code 2011 is very close and the Project Applications
 Period has started recently. Now it's time to get some excited students
 on board for this year's GSoC.

 I encourage students to submit an application to the Google Summer of Code
 web-application. Lucene  Solr are amazing projects and GSoC is an
 incredible opportunity to join the community and push the project
 forward.

 If you are a student and you are interested spending some time on a
 great open source project while getting paid for it, you should submit
 your application from March 28 - April 8, 2011. There are only 3
 weeks until this process starts!

 Quote from the GSoC website: We hear almost universally from our
 mentoring organizations that the best applications they receive are
 from students who took the time to interact and discuss their ideas
 before submitting an application, so make sure to check out each
 organization's Ideas list to get to know a particular open source
 organization better.

 So if you have any ideas what Lucene  Solr should have, or if you
 find any of the GSoC pre-selected projects [1] interesting, please
 join us on dev@lucene.apache.org [2].  Since you as a student must
 apply for a certain project via the GSoC website [3], it's a good idea
 to work on it ahead of time and include the community and possible
 mentors as soon as possible.

 Open source development here at the Apache Software
 Foundation happens almost exclusively in the public and I encourage you to
 follow this. Don't mail folks privately; please use the mailing list to
 get the best possible visibility and attract interested community
 members and push your idea forward. As always, it's the idea that
 counts not the person!

 That said, please do not underestimate the complexity of even small
 GSoC - Projects. Don't try to rewrite Lucene or Solr!  A project
 usually gains more from a smaller, well discussed and carefully
 crafted  tested feature than from a half baked monster change that's
 too large to work with.

 Once your proposal has been accepted and you begin work, you should
 give the community the opportunity to iterate with you.  We prefer
 progress over perfection so don't hesitate to describe your overall
 vision, but when the rubber meets the road let's take it in small
 steps.  A code patch of 20 KB is likely to be reviewed very quickly so
 get fast feedback, while a patch even 60kb in size can take very
 - Hide quoted text -
 long. So try to break up your vision and the community will work with
 you to get things done!

 On behalf of the Lucene  Solr community,

 Go! join the mailing list and apply for GSoC 2011,

 Simon

 [1]
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQu
 ery=labels+%3D+lucene-gsoc-11 [2]
 http://lucene.apache.org/java/docs/mailinglists.html
 [3] http://www.google-melange.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing

2011-03-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2985:


Attachment: LUCENE-2985.patch

here is an initial patch that uses a SegmentCodecBuilder to assign codec IDs 
during indexing in DocFieldProcessorPerThread.

 Build SegmentCodecs incrementally for consistent codecIDs during indexing
 -

 Key: LUCENE-2985
 URL: https://issues.apache.org/jira/browse/LUCENE-2985
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Codecs, Index
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0

 Attachments: LUCENE-2985.patch


 currently we build the SegementCodecs during flush which is fine as long as 
 no codec needs to know which fields it should handle. This will change with 
 DocValues or when we expose StoredFields / TermVectors via Codec (see 
 LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a 
 consistent view of which codec belongs to which field during indexing and all 
 FieldInfo instances are unassigned (set to -1). Instead we should build the 
 SegmentCodecs incrementally as fields come in so no matter when a codec needs 
 to be selected to process a document / field we have the right codec ID 
 assigned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #70: POMs out of sync

2011-03-23 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-3.x/70/

No tests ran.

Build Log (for compile errors):
[...truncated 22 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

2011-03-23 Thread Grant Ingersoll
Hey Dawid,

Thanks for doing this.  It would be good, too, if we no longer had to pass in 
-Dsolr.clustering.enabled=true as there is no reason why we can't just have it 
on like the other components.

-Grant

On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote:

 Author: dweiss
 Date: Tue Mar 22 20:44:21 2011
 New Revision: 1084345
 
 URL: http://svn.apache.org/viewvc?rev=1084345view=rev
 Log:
 Removing the note about excluded JARs (everything is included).
 
 Modified:
lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
 
 Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345r1=1084344r2=1084345view=diff
 ==
 --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original)
 +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 
 20:44:21 2011
 @@ -1183,12 +1183,10 @@
 
http://wiki.apache.org/solr/ClusteringComponent
 
 -   This relies on third party jars which are notincluded in the
 -   release.  To use this component (and the /clustering handler)
 -   Those jars will need to be downloaded, and you'll need to set
 -   the solr.cluster.enabled system property when running solr...
 +   You'll need to set the solr.cluster.enabled system property 
 +   when running solr to run with clustering enabled:
 
 -  java -Dsolr.clustering.enabled=true -jar start.jar
 +   java -Dsolr.clustering.enabled=true -jar start.jar
 --
   searchComponent name=clustering 
enable=${solr.clustering.enabled:false}
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-23 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-2967.
-

   Resolution: Won't Fix
Lucene Fields:   (was: [New])

I spent some time on this. It's quite fascinating: the number of collisions for 
the default probing is smaller than:

a) linear probing with murmurhash mix of the original hash
b) linear probing without murmurhash mix (start from raw hash only).

Curiously, the number of collisions for (b) is smaller than for (a) -- this 
could be explained if we assume bits are spread evently throughout the entire 
32-bit range after murmurhash, so after masking to table size there should be 
more collisions on lower bits compared to a raw hash (this would have more 
collisions on upper bits and fewer on lower bits because it is 
multiplicative... or at least I think so).

Anyway, I tried many different versions and I don't see any significant 
difference in favor of linear probing here. Measured the GC overhead during my 
tests too, but it is not the primary factor contributing to the total cost of 
constructing the FST (about 3-5% of the total time, running in parallel, 
typically).

 Use linear probing with an additional good bit avalanching function in FST's 
 NodeHash.
 --

 Key: LUCENE-2967
 URL: https://issues.apache.org/jira/browse/LUCENE-2967
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0

 Attachments: LUCENE-2967.patch


 I recently had an interesting discussion with Sebastiano Vigna (fastutil), 
 who suggested that linear probing, given a hash mixing function with good 
 avalanche properties, is a way better method of constructing lookups in 
 associative arrays compared to quadratic probing. Indeed, with linear probing 
 you can implement removals from a hash map without removed slot markers and 
 linear probing has nice properties with respect to modern CPUs (caches). I've 
 reimplemented HPPC's hash maps to use linear probing and we observed a nice 
 speedup (the same applies for fastutils of course).
 This patch changes NodeHash's implementation to use linear probing. The code 
 is a bit simpler (I think :). I also moved the load factor to a constant -- 
 0.5 seems like a generous load factor, especially if we allow large FSTs to 
 be built. I don't see any significant speedup in constructing large automata, 
 but there is no slowdown either (I checked on one machine only for now, but 
 will verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Erik Hatcher
+1

  * Ran Solr example
  * Perused entire structure of both binary and source distros

Noticed the minor issues others have reported, to echo Ryan, none seem like 
blockers to me.

And also to echo Ryan's thanks huge thanks to everyone's hard work on the 
3.1 Lucene/Solr release(s).  This is a big milestone for the technology and 
community.

Erik

On Mar 22, 2011, at 23:42 , Ryan McKinley wrote:

 +1
 
 * Walked through the solr example
 * Tested a simple maven project, worked well
 
 I don't think the minor issues listed so far are blockers
 
 Thanks to everyone who worked on this!
 
 ryan
 
 
 On Tue, Mar 22, 2011 at 10:21 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 Please vote to release the artifacts at
 http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2
 as Lucene 3.1 and Solr 3.1
 
 Thanks for everyone's help pulling all this together!
 
 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-23 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010110#comment-13010110
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. I have not looked this patch so this comment may be off base.

The slideshare deck gives a good overview: 
http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

As a simple Lucene-focused addition I'd prefer not to explore all the possible 
implications for Solr adoption here. The affected areas in Solr are extensive 
and would include schema definitions, query syntax, facets/filter caching, 
result-fetching, DIH etc etc. Probably best discussed elsewhere.



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-03-23 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated SOLR-2436:
--

Attachment: SOLR-2436_2.patch

Hello Koji,
I've tested your patch, I needed to align it to latest patch applied (see 
SOLR-2387) to make tests work (see attached patch). 

In my opinion this solution is better than the current one as it reflects the 
Solr way of specifying parameters in Handlers.

However I think it should be good if it was possible to alternatively get rid 
of the uimaConfig file defining each parameter inside the Processor with Solr 
elements (str/lst/int etc.) as well.



 move uimaConfig to under the uima's update processor in solrconfig.xml
 --

 Key: SOLR-2436
 URL: https://issues.apache.org/jira/browse/SOLR-2436
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch


 Solr contrib UIMA has its config just beneath config. I think it should 
 move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-03-23 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010112#comment-13010112
 ] 

Tommaso Teofili edited comment on SOLR-2436 at 3/23/11 1:26 PM:


Hello Koji,
I've tested your patch, I needed to align it to latest patch applied (see 
SOLR-2387) to make tests work (see attached patch). 

In my opinion the solution you're proposing is better than the current one as 
it reflects the Solr way of specifying parameters in Handlers.

However I think it should be good if it was possible to alternatively get rid 
of the uimaConfig file defining each parameter inside the Processor with Solr 
elements (str/lst/int etc.) as well.



  was (Author: teofili):
Hello Koji,
I've tested your patch, I needed to align it to latest patch applied (see 
SOLR-2387) to make tests work (see attached patch). 

In my opinion this solution is better than the current one as it reflects the 
Solr way of specifying parameters in Handlers.

However I think it should be good if it was possible to alternatively get rid 
of the uimaConfig file defining each parameter inside the Processor with Solr 
elements (str/lst/int etc.) as well.


  
 move uimaConfig to under the uima's update processor in solrconfig.xml
 --

 Key: SOLR-2436
 URL: https://issues.apache.org/jira/browse/SOLR-2436
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch


 Solr contrib UIMA has its config just beneath config. I think it should 
 move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

2011-03-23 Thread Grant Ingersoll


On Mar 23, 2011, at 9:20 AM, Dawid Weiss wrote:

 Sure, I'll change it. Can I alter branch_3x too?

That's fine to change 3_x, the 3.1 release is on lucene_solr_3_1 (or something 
similar).  This way it will be on in 3.2.

-Grant

 Don't know what the
 policy is after the RCs have been published.
 
 Dawid
 
 On Wed, Mar 23, 2011 at 2:07 PM, Grant Ingersoll gsing...@apache.org wrote:
 Hey Dawid,
 
 Thanks for doing this.  It would be good, too, if we no longer had to pass 
 in -Dsolr.clustering.enabled=true as there is no reason why we can't just 
 have it on like the other components.
 
 -Grant
 
 On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote:
 
 Author: dweiss
 Date: Tue Mar 22 20:44:21 2011
 New Revision: 1084345
 
 URL: http://svn.apache.org/viewvc?rev=1084345view=rev
 Log:
 Removing the note about excluded JARs (everything is included).
 
 Modified:
lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
 
 Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345r1=1084344r2=1084345view=diff
 ==
 --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original)
 +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 
 20:44:21 2011
 @@ -1183,12 +1183,10 @@
 
http://wiki.apache.org/solr/ClusteringComponent
 
 -   This relies on third party jars which are notincluded in the
 -   release.  To use this component (and the /clustering handler)
 -   Those jars will need to be downloaded, and you'll need to set
 -   the solr.cluster.enabled system property when running solr...
 +   You'll need to set the solr.cluster.enabled system property
 +   when running solr to run with clustering enabled:
 
 -  java -Dsolr.clustering.enabled=true -jar start.jar
 +   java -Dsolr.clustering.enabled=true -jar start.jar
 --
   searchComponent name=clustering
enable=${solr.clustering.enabled:false}
 
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2573:


Attachment: LUCENE-2573.patch

here is my current state on this issue. I did't add all JDocs needed (by far) 
and I will wait until we settled on the API for FlushPolicy.

* I removed the complex TieredFlushPolicy entirely and added one 
DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT 
pending.
* DW will stall threads if we reach 2 x maxNetRam which is retrieved from 
FlushPolicy so folks can lower that depending on their env.

* DWFlushControl checks if a single DWPT grows too large and sets it forcefully 
pending once its ram consumption is  1.9 GB. That should be enough buffer to 
not reach the 2048MB limit. We should consider making this configurable.

* FlushPolicy has now three methods onInsert, onUpdate and onDelete while 
DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base 
class just calls those on an update.

* I removed FlushControl from IW
* added documentation on IWC for FlushPolicy and removed the jdocs for the RAM 
limit. I think we should add some lines about how RAM is now used and that 
users should balance the RAM with the number of threads they are using. Will do 
that later on though.

* For testing I added a ThrottledIndexOutput that makes flushing slow so I can 
test if we are stalled and / or blocked. This is passed to 
MockDirectoryWrapper. Its currently under util but it rather should go under 
store, no?

* byte consumption is now committed before FlushPolicy is called since we don't 
have the multitier flush which required that to reliably proceed across tier 
boundaries (not required but it was easier to test really). So FP doesn't need 
to take care of the delta

* FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered 
delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which 
causes some failures though. I added //nocommit  @Ignore to those tests.

* this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy 
which I couldn't figure out why it is failing but it could be due to an old 
version of LUCENE-2881 on this branch. I will see if it still fails once we 
merged.

* Healthiness now doesn't stall if we are not flushing on RAM consumption to 
ensure we don't lock in threads. 


over all this seems much closer now. I will start writing jdocs. Flush on 
buffered delete terms might need some tests and I should also write a more 
reliable test for Healthiness... current it relies on that the 
ThrottledIndexOutput is slowing down indexing enough to block which might not 
be true all the time. It didn't fail yet. 



 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
 LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
 LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity

2011-03-23 Thread Robert Muir (JIRA)
divorce defaultsimilarityprovider from defaultsimilarity


 Key: LUCENE-2986
 URL: https://issues.apache.org/jira/browse/LUCENE-2986
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 4.0


In LUCENE-2236 as a start, we made DefaultSimilarity which implements the 
factory interface (SimilarityProvider), and also extends Similarity.

Its factory interface just returns itself always by default.

Doron mentioned it would be cleaner to split the two, and I thought it would be 
good to revisit it later.

Today as I was looking at SOLR-2338, it became pretty clear that we should do 
this, it makes things a lot cleaner. I think currently its confusing to users 
to see the two apis mixed if they are trying to subclass.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity

2011-03-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2986:


Attachment: LUCENE-2986.patch

Attached is a patch: adds DefaultSimilarityProvider, which has our default 
implementations of the non-field-specific methods (coord/queryNorm/etc), and 
always returns DefaultSimilarity.

 divorce defaultsimilarityprovider from defaultsimilarity
 

 Key: LUCENE-2986
 URL: https://issues.apache.org/jira/browse/LUCENE-2986
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2986.patch


 In LUCENE-2236 as a start, we made DefaultSimilarity which implements the 
 factory interface (SimilarityProvider), and also extends Similarity.
 Its factory interface just returns itself always by default.
 Doron mentioned it would be cleaner to split the two, and I thought it would 
 be good to revisit it later.
 Today as I was looking at SOLR-2338, it became pretty clear that we should do 
 this, it makes things a lot cleaner. I think currently its confusing to users 
 to see the two apis mixed if they are trying to subclass.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2977:


Attachment: LUCENE-2977.patch

Patch for auto-detecting output compression mode of result line file:

- getInputStream() moved from ContentSource to a new class StreamUtils under 
util. It is now named inputStream(File).
- outputStream() method added to StreamUtils.

Before applying this patch *svn mv 
modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/ContentSourceTest.java
 
modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/utils/StreamUtilsTest.java*

I kept for now the force-bzip logic in WriteLineDocTask but I would like to 
remove it - it is strange, and in any case LineDocSource would only auto-detect 
bzip input format if WriteLineDocTask was able to auto-detect bzip output 
format. Removing it will also simplify StreamUtils. Any opinions on removing 
this force-bzip option?


 WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
 specified output file name
 -

 Key: LUCENE-2977
 URL: https://issues.apache.org/jira/browse/LUCENE-2977
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2977.patch


 Since the readers behave this way it would be nice and handy if also this 
 line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-23 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-2945:
-

Attachment: LUCENE-2945d.patch

Also has the changes to SpanNearClauseFactory.

 Surround Query doesn't properly handle equals/hashcode
 --

 Key: LUCENE-2945
 URL: https://issues.apache.org/jira/browse/LUCENE-2945
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1.1, 4.0

 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
 LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, 
 LUCENE-2945d.patch


 In looking at using the surround queries with Solr, I am hitting issues 
 caused by collisions due to equals/hashcode not being implemented on the 
 anonymous inner classes that are created by things like DistanceQuery (branch 
 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-23 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010218#comment-13010218
 ] 

Paul Elschot edited comment on LUCENE-2945 at 3/23/11 5:01 PM:
---

New -2945d patch that also has the changes to SpanNearClauseFactory.

  was (Author: paul.elsc...@xs4all.nl):
Also has the changes to SpanNearClauseFactory.
  
 Surround Query doesn't properly handle equals/hashcode
 --

 Key: LUCENE-2945
 URL: https://issues.apache.org/jira/browse/LUCENE-2945
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1.1, 4.0

 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
 LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, 
 LUCENE-2945d.patch


 In looking at using the surround queries with Solr, I am hitting issues 
 caused by collisions due to equals/hashcode not being implemented on the 
 anonymous inner classes that are created by things like DistanceQuery (branch 
 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)
Case Insensitive Search for Wildcard Queries


 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge


This patch adds support to allow case-insensitive queries on wildcard searches 
for configured TextField field types.

This patch extends the excellent work done Yonik and Michael in SOLR-219.
The approach here is different enough (imho) to warrant a separate JIRA issue.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-2438:
---

Attachment: SOLR-2438.patch

Attached patch file

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
 Attachments: SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010263#comment-13010263
 ] 

Shai Erera commented on LUCENE-2977:


Patch looks good !

In StreamUtils you have b.bz/b -- it should be b.bz2/b

bq. Any opinions on removing this force-bzip option?

+1 (you mean the bzip.compression property in WLDT right?). I think that it's 
reasonable to request the user to specify an output file with .bz2 extension if 
he wants bzip compression. I don't see how it will simplify StreamUtils though, 
but I trust you :) (perhaps you meant it will simplify WLDT?)

 WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
 specified output file name
 -

 Key: LUCENE-2977
 URL: https://issues.apache.org/jira/browse/LUCENE-2977
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2977.patch


 Since the readers behave this way it would be nice and handy if also this 
 line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010268#comment-13010268
 ] 

Peter Sturge commented on SOLR-2438:


If you're like me, you may have often wondered why MyTerm, myterm, myter* and 
MyTer* can return different, and sometimes empty results.
This patch addresses this for wildcard queries by adding an attribute to 
relevant solr.TextField entries in schema.xml.
The new attribute is called:  {{ignoreCaseForWildcards}}

Example entry in schema.xml:
{code:title=schema.xml [excerpt]|borderStyle=solid}
fieldType name=text_lcws class=solr.TextField positionIncrementGap=100 
ignoreCaseForWildcards=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
  /analyzer
/fieldType
{code}

It's worth noting that this will lower-case text for ALL terms that match the 
field type - including synonyms and stemmers.

For backward compatibility, the default behaviour is as before - i.e. a case 
sensitive wildcard search ({{ignoreCaseForWildcards=false}}).

The patch was created against the lucene_solr_3_1 branch. I've not applied it 
yet on trunk.

[caveat emptor] I freely admit I'm no schema expert, so commiters and community 
members may see use cases where this approach could pose problems. I'm all for 
feedback to enhance the functionality...

The hope here is to re-ignite enthusiasm for case-insensitive wildcard searches 
in Solr - in line with the 'it just works' Solr philosophy.

Enjoy!


 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
 Attachments: SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: write byte[] directly to TokenStream

2011-03-23 Thread Ryan McKinley
works great - thanks!


On Wed, Mar 23, 2011 at 1:04 AM, Robert Muir rcm...@gmail.com wrote:

 On Mar 22, 2011 11:38 PM, Ryan McKinley ryan...@gmail.com wrote:

 I'm messing with putting binary data directly in the index.  I have a
 field class with:

  @Override
  public TokenStream tokenStreamValue() {
    byte[] value = (byte[])fieldsData;

    Token token = new Token( 0, value.length, geo );
    token.resizeBuffer( value.length );
    BytesRef ref = token.getBytesRef();
    ref.bytes = value;
    ref.length = value.length;
    ref.offset = 0;
    token.setLength( ref.length );
    return new SingleTokenTokenStream( token );
  }

 but that is just writing an empty token.  Is it possible to set the
 Token value without converting to char[]?


 check out Test2BTerms for an example...


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2415) Change XMLWriter version parameter to wt.xml.version

2011-03-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010308#comment-13010308
 ] 

Hoss Man commented on SOLR-2415:


bq. how should we handle the desire to change the faceting format (to make it 
easier to add metadata like total number of constraints, etc)? version would 
be one way. facet.format would be another way.

i don't think the *structure* of the response (ie: the facet response section) 
should be driven by the same param as the *format* of the response, which is 
what version currently is.  Something like facet.format seems more 
appropriate when dealing with a specific component like that ... but i don't 
think it should be a numeric version equse property, i think it should be 
descriptive (ie: flat, vs nested or something)


bq. perhaps we should add a getVersion() parameter on SolrQueryRequest and have 
that used across all components.

when i suggested we have a common wt.version param that all of the response 
writers could use, i didn't mean to suggest that it should have a singular id 
space. my suggestion was that the specific values specified for version or 
wt.version or whatever would only be meaningful to the specific response 
writer used -- just as the current values of the version param that the 
XMLResponseWriter uses are meaninless to the JSONResponseWriter.  the overlap 
would only be in reusing the param name (in the same way that q is the common 
param name for the main query, regardless of what query parser is specified by 
defType)


bq. Look at how long the existing response writers have hung around in their 
current format, independent of the version # changes (1.2, 1.3, 1.4, and now 
3.1)

the version param of the XML response writer has never been in sync with the 
solr version, it was never intended to be.  it's always been the version number 
of the xml format.

 Change XMLWriter version parameter to wt.xml.version
 --

 Key: SOLR-2415
 URL: https://issues.apache.org/jira/browse/SOLR-2415
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Trivial
 Fix For: 4.0


 The XMLWriter has a parameter called 'version'.  This controls some specifics 
 about how the XMLWriter works.  Using the parameter name 'version' made sense 
 back when the XMLWriter was the only option, but with all the various writers 
 and different places where 'version' makes sense, I think we should change 
 this parameter name to wt.xml.version so that it specifically refers to the 
 XMLWriter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-03-23 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010318#comment-13010318
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

Ryan: ty, will take your points on my list - pretty sure, that it should be 
possible to integrate them
Mark: ty! :)

For today, it's about *Logging*. Talked about that with Hoss on #solr the last 
days, so already changed a few things .. on the way, but not finished: 
http://files.mathe.is/solr-admin/07_logging.png

Actually thinking about the following points:
* Tree Structure good way to solve it?
* Do we need the possibitly to collapse/expand the three/the childrens? The 
List could be longer (the screenshot is cropped, just for layout reasons) 
especially while using SolrCloud which adds about 30 Loggers
* In the current er .. Interface you are able to see that the row you're 
looking at has a level set and in the end (at the right) which is the effective 
level - for me, that does not matter. if a row/logger, has level-x - that's 
enough to know. don't need to see if this level is set or inherited.
* just a quick idea: if you change f.e. {{org.apache.solr}} then the interface 
will automatically update all childrens in realtime, affects all nested/sub 
loggers w/o a assigned level.

Thoughts on these points? anyone? :

Short Note: i moved Logging to a global level, because it's not configurable on 
a per-core basis.

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 4.0


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin 
 [This commit shows the 
 differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d]
  between old/existing index.jsp and my new one (which is could 
 copy-cut/paste'd from the existing one).
 Main Action takes place in 
 [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js]
  which is actually neither clean nor pretty .. just work-in-progress.
 Actually it's Work in Progress, so ... give it a try. It's developed with 
 Firefox as Browser, so, for a first impression .. please don't use _things_ 
 like Internet Explorer or so ;o
 Jan already suggested a bunch of good things, i'm sure there are more ideas 
 over there :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2977:


Attachment: LUCENE-2977.patch

Thanks for reviewing Shai!

bq. In StreamUtils you have b.bz/b -- it should be b.bz2/b

Good catch!
Fixed.

bq. +1 (you mean the bzip.compression property in WLDT right?). 

Yes.

bq. I think that it's reasonable to request the user to specify an output file 
with .bz2 extension if he wants bzip compression. 

Great, I removed it.

bq. I don't see how it will simplify StreamUtils though, but I trust you :) 
(perhaps you meant it will simplify WLDT?)

It allowed to keep just one of the two variations of 
StreamUtils.outputStream(). WLDT and the tests became simpler as well.

Attaching updated patch.
(again first apply that svn mv...)

 WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
 specified output file name
 -

 Key: LUCENE-2977
 URL: https://issues.apache.org/jira/browse/LUCENE-2977
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2977.patch, LUCENE-2977.patch


 Since the readers behave this way it would be nice and handy if also this 
 line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2399) Solr Admin Interface, reworked

2011-03-23 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010318#comment-13010318
 ] 

Stefan Matheis (steffkes) edited comment on SOLR-2399 at 3/23/11 8:15 PM:
--

Ryan: ty, will take your points on my list - pretty sure, that it should be 
possible to integrate them
Mark: ty! :)

For today, it's about *Logging*. Talked about that with Hoss on #solr the last 
days, so already changed a few things .. on the way, but not finished: 
http://files.mathe.is/solr-admin/07_logging.png

Actually thinking about the following points:
* Tree Structure good way to solve it?
* Do we need the possibitly to collapse/expand the three/the childrens? The 
List could be longer (the screenshot is cropped, just for layout reasons) 
especially while using SolrCloud which adds about 30 Loggers
* In the current er .. Interface you are able to see that the row you're 
looking at has a level set and in the end (at the right) which is the effective 
level - for me, that does not matter. if a row/logger, has level-x - that's 
enough to know. don't need to see if this level is set or inherited.
* just a quick idea: if you change f.e. {{org.apache.solr}} then the interface 
will automatically update all childrens in realtime, affects all nested/sub 
loggers w/o a assigned level.

Thoughts on these points? anyone? :

Short Note: i moved Logging to a global level, because it's not configurable on 
a per-core basis.

# Edit

What i forgot to mention .. actually it's based on a [static 
logging.json-file|https://github.com/steffkes/solr-admin/blob/master/logging.json]
 but will try to change the {{LogLevelSection}} Servlet so that it outputs the 
needed json-structure

  was (Author: steffkes):
Ryan: ty, will take your points on my list - pretty sure, that it should be 
possible to integrate them
Mark: ty! :)

For today, it's about *Logging*. Talked about that with Hoss on #solr the last 
days, so already changed a few things .. on the way, but not finished: 
http://files.mathe.is/solr-admin/07_logging.png

Actually thinking about the following points:
* Tree Structure good way to solve it?
* Do we need the possibitly to collapse/expand the three/the childrens? The 
List could be longer (the screenshot is cropped, just for layout reasons) 
especially while using SolrCloud which adds about 30 Loggers
* In the current er .. Interface you are able to see that the row you're 
looking at has a level set and in the end (at the right) which is the effective 
level - for me, that does not matter. if a row/logger, has level-x - that's 
enough to know. don't need to see if this level is set or inherited.
* just a quick idea: if you change f.e. {{org.apache.solr}} then the interface 
will automatically update all childrens in realtime, affects all nested/sub 
loggers w/o a assigned level.

Thoughts on these points? anyone? :

Short Note: i moved Logging to a global level, because it's not configurable on 
a per-core basis.
  
 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 4.0


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin 
 [This commit shows the 
 differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d]
  between old/existing index.jsp and my new one (which is could 
 copy-cut/paste'd from the existing one).
 Main Action takes place in 
 [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js]
  which is actually neither clean nor pretty .. just work-in-progress.
 Actually it's Work in Progress, so ... give it a try. It's developed with 
 Firefox as Browser, so, for a first impression .. please don't use _things_ 
 like Internet Explorer or so ;o
 Jan already suggested a bunch of good things, i'm sure there are more ideas 
 over there :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2439) change solr javadocs to link to local lucene javadocs w/relative links

2011-03-23 Thread Hoss Man (JIRA)
change solr javadocs to link to local lucene javadocs w/relative links
--

 Key: SOLR-2439
 URL: https://issues.apache.org/jira/browse/SOLR-2439
 Project: Solr
  Issue Type: Task
  Components: documentation
Reporter: Hoss Man
 Fix For: 3.2


Now that solr/lucene are in lock step development, and solr releases include 
the entire lucene-java release, the solr ant targets for building javadocs 
should depend on the lucene (and module) targets for building javadocs and link 
directly to the local copies of those docs (using relative paths)

(currently, the links point to 
https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Chris Hostetter

: Please vote to release the artifacts at
: http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2

-0

I can't in good conscience vote for these artifacts.

For the most part, there only only a few minor hicups -- but the big 
blocker (in my opinion) is that since RC1, dev-tools has been removed from 
the solr src packages and this causes the top level build.xml (and 
instructions for IDE users in the top level README.txt file) to be broken.

My detailed notes below...

##
### apache-solr-3.1.0-src.tgz

dev-tools isn't in here -- this totally boggles my mind, particularly 
since there was a deliberate and concious switch to make the source 
releases match what you get when doing an svn export

because dev-tools is missing, 3 of the top level ant targets advertised 
using ant -p don't work; including 'ant idea' and 'ant eclipse' which 
are also explicitly mentioned in the top level README.txt as how people 
using those IDEs should get started developing the code.

This seems like a major issue to me.   

we're setting ourselves up to make the release look completely broken 
right out of the gate for anyone using one of those IDEs.

Ask about this on IRC.  yonik  ryan indicated that a couple of folks had 
said they would veto any release with dev-tools in it because that stuff 
is suppose to be unsupported ... this makes no sense to me as we have 
lots of places in the code base where things are documented as being 
experimental, subject to change, and/or for developer use only.  i don't 
relaly see how dev-tools should be any different.

if there is really such violent oposition to including dev-tools in src 
releases, then the top level build.xml should not depend on it, and the 
top level README.txt should not refer to it (except maybe with something 
like people interested in hacking on the src should use svn which 
includes some unofficial 'dev-tools'
---

Now that the src packages are driven by svn exports, more files exist then 
were in RC1 and some of the changes we made to the solr/README.txt based 
on the earlier release candidates are missleading.  

In particular a lot of things are listed as being in the docs directory 
of a binary distribution, but those files *do* exist in the src packages 
-- if you look in the site directory.  This seems silly, but at no point 
is the README.txt factually incorrect, so I guess it's not a big enough 
deal to worry about.

---

running all tests, running the example, and building the javadocs all 
worked fine.

##
### apache-solr-3.1.0.tgz

docs look good, basic example usage works fine.

##
### apache-solr-3.1.0.zip

Diffing the contents of apache-solr-3.1.0.tgz with apache-solr-3.1.0.zip 
(using diff --ignore-all-space --strip-trailing-cr -r) turned up a quite 
a fiew instances where the CRLF fixing in build.xml seems to have 
corrupted some non-ascii characters in a few files

 contrib/dataimporthandler/lib/activation-LICENSE.txt 
 contrib/dataimporthandler/lib/mail-LICENSE.txt
 docs/skin/CommonMessages_de.xml
 docs/skin/CommonMessages_es.xml
 docs/skin/CommonMessages_fr.xml
 example/solr/conf/velocity/facet_dates.vm

...but these changes don't seem to have substantively harmed the files.

##
### lucene-3.1.0-src.tar.gz

tests and javadocs worked fine.

##
### lucene-3.1.0.tar.gz

docs look good, demo runs fine.

##
### lucene-3.1.0.zip

no differences found with lucene-3.1.0.tar.gz





-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010547#comment-13010547
 ] 

Shai Erera commented on LUCENE-2977:


Looks good to me.

 WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
 specified output file name
 -

 Key: LUCENE-2977
 URL: https://issues.apache.org/jira/browse/LUCENE-2977
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2977.patch, LUCENE-2977.patch


 Since the readers behave this way it would be nice and handy if also this 
 line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Ryan McKinley

 : Please vote to release the artifacts at
 : http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2

 -0

 I can't in good conscience vote for these artifacts.


I don't want to suggest anything to slow down the release... but if
the problems are with the source release, what about just doing a
single source release for lucene+solr?

We currently have:

lucene-solr-3.1RC2/lucene/
lucene-solr-3.1RC2/lucene/lucene-3.1.0-src.tar.gz
lucene-solr-3.1RC2/lucene/...
lucene-solr-3.1RC2/solr/
lucene-solr-3.1RC2/solr/apache-solr-3.1.0-src.tgz
lucene-solr-3.1RC2/solr/...

Why not:
lucene-solr-3.1RC2/lucene-3.1.0-src.tar.gz
lucene-solr-3.1RC2/lucene/...
lucene-solr-3.1RC2/solr/...

and let the src release be as close to svn export as possible?  This
will make sure the result builds just as it does when we actually
build it!

With the maven artifacts, we have source for each jar:
http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/solr/maven/org/apache/solr/solr-core/3.1.0/solr-core-3.1.0-sources.jar

http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/lucene/maven/org/apache/lucene/lucene-queries/3.1.0/lucene-queries-3.1.0-sources.jar

I'm not sure the exact ASF source requirements, but maybe the maven
source.jar files are good enough?

Again, I don't think this should be a blocker, but it would be nice to
have things simplified for the next release -- gasp.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2338:
--

Attachment: SOLR-2338.patch

Here's a first stab: I included LUCENE-2986's cleanup work for easy testing 
(this issue depends upon it).

Here is the syntax:
{noformat}
  !--  specify a Similarity classname directly --
  fieldType name=sim1 class=solr.TextField
analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
similarity class=org.apache.lucene.misc.SweetSpotSimilarity/
  /fieldType

  !--  specify a Similarity factory --  
  fieldType name=sim2 class=solr.TextField
analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
similarity class=org.apache.solr.schema.CustomSimilarityFactory
  str name=echois there an echo?/str
/similarity
  /fieldType
{noformat}

Additionally, its necessary to allow customization of the SimilarityProvider 
too, in order to customize the non-field specific stuff like coord()... this is 
done via:
{noformat}
 !-- expert: SimilarityProvider contains scoring routines that are not 
field-specific,
  such as coord() and queryNorm(). most scoring customization happens in 
the fieldtype.
  A custom similarity provider may be specified here, but the default is 
fine
  for most applications.
 --
 similarityProvider 
class=org.apache.solr.schema.CustomSimilarityProviderFactory
   str name=echois there an echo?/str
 /similarityProvider
{noformat}


 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Robert Muir
On Thu, Mar 24, 2011 at 12:18 AM, Ryan McKinley ryan...@gmail.com wrote:

 I don't want to suggest anything to slow down the release... but if
 the problems are with the source release, what about just doing a
 single source release for lucene+solr?

 We currently have:

 lucene-solr-3.1RC2/lucene/
 lucene-solr-3.1RC2/lucene/lucene-3.1.0-src.tar.gz
 lucene-solr-3.1RC2/lucene/...
 lucene-solr-3.1RC2/solr/
 lucene-solr-3.1RC2/solr/apache-solr-3.1.0-src.tgz
 lucene-solr-3.1RC2/solr/...

 Why not:
 lucene-solr-3.1RC2/lucene-3.1.0-src.tar.gz
 lucene-solr-3.1RC2/lucene/...
 lucene-solr-3.1RC2/solr/...

 and let the src release be as close to svn export as possible?  This
 will make sure the result builds just as it does when we actually
 build it!

 With the maven artifacts, we have source for each jar:
 http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/solr/maven/org/apache/solr/solr-core/3.1.0/solr-core-3.1.0-sources.jar

 http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/lucene/maven/org/apache/lucene/lucene-queries/3.1.0/lucene-queries-3.1.0-sources.jar

 I'm not sure the exact ASF source requirements, but maybe the maven
 source.jar files are good enough?


I don't think someone should have to deal with maven to get the lucene
source release... I think lucene should have its own artifacts as in
the past (the source code being the most important).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



GSoC 2011

2011-03-23 Thread Phillipe Ramalho
Hello,

I am planning to submit a project proposal to GSoC 2011 and Lucene seems to
have a lot of GSoC projects this year. Last year I did a GSoC project using
Lucene for PhotArk project. This year, instead of just using Lucene, I am
planning to contribute code to it.

My experience with Lucene is just as a regular user, the only code I have
changed/extended so far was token streams/analyzers and query parser, so I
have more knowledge on this part of the code. Based on that, I'm planning to
focus on query parser and analyzer/token stream projects. Does that sound
reasonable?

I will be studying the code and planning the proposal(s), so you should
start seeing more posts from me in the next few days.

--
Phillipe Ramalho


Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Ryan McKinley

 I don't think someone should have to deal with maven to get the lucene
 source release... I think lucene should have its own artifacts as in
 the past (the source code being the most important).


sorry, did not mean to muddy the water with maven discussion...
ignore my comment

when you say lucene should have its own artifacts do you mean lucene
w/o solr?  or could a single source artifact include everything?
(making the release process easier and apparently cleaner)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org