date:20110323

Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
--

 Key: LUCENENET-406
 URL: https://issues.apache.org/jira/browse/LUCENENET-406
 Project: Lucene.Net
  Issue Type: New Feature
  Components: Lucene.Net Core
Reporter: Pasha Bizhan
Priority: Minor


Lucene.Net 1.4. 
nunit tests included. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Updated] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net


 [ 
https://issues.apache.org/jira/browse/LUCENENET-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pasha Bizhan updated LUCENENET-406:
---

Component/s: (was: Lucene.Net Core)

 Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
 --

 Key: LUCENENET-406
 URL: https://issues.apache.org/jira/browse/LUCENENET-406
 Project: Lucene.Net
  Issue Type: New Feature
Reporter: Pasha Bizhan
Priority: Minor

 Lucene.Net 1.4. 
 nunit tests included. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Updated] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net


 [ 
https://issues.apache.org/jira/browse/LUCENENET-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pasha Bizhan updated LUCENENET-406:
---

Attachment: solr.net.zip

full source code with nInut tests

 Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
 --

 Key: LUCENENET-406
 URL: https://issues.apache.org/jira/browse/LUCENENET-406
 Project: Lucene.Net
  Issue Type: New Feature
Reporter: Pasha Bizhan
Priority: Minor
 Attachments: solr.net.zip


 Lucene.Net 1.4. 
 nunit tests included. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Issue Comment Edited] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net


[ 
https://issues.apache.org/jira/browse/LUCENENET-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010037#comment-13010037
 ] 

Pasha Bizhan edited comment on LUCENENET-406 at 3/23/11 9:38 AM:
-

full source code with nUnit tests

  was (Author: pbizhan):
full source code with nInut tests
  
 Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
 --

 Key: LUCENENET-406
 URL: https://issues.apache.org/jira/browse/LUCENENET-406
 Project: Lucene.Net
  Issue Type: New Feature
Reporter: Pasha Bizhan
Priority: Minor
 Attachments: solr.net.zip


 Lucene.Net 1.4. 
 nunit tests included. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Commented] (LUCENENET-380) Evaluate Sharpen as a port tool

2011-03-23 Thread Scott Lombard (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010051#comment-13010051
 ] 

Scott Lombard commented on LUCENENET-380:
-

Why fork outside of ASF if we can keep it inside? Is a independent project 
justified? It seems to me there is a lot of infrastructure that needs to be 
duplicated and maintained.

I agreed to starting a fork outside of ASF because I didn't think there was any 
possibility to bring code into lucene.net. Now, I just don't understand 
licensing well enough to rule out a dOCL license from db4o.

 Evaluate Sharpen as a port tool
 ---

 Key: LUCENENET-380
 URL: https://issues.apache.org/jira/browse/LUCENENET-380
 Project: Lucene.Net
  Issue Type: Task
  Components: Build Automation, Lucene.Net Contrib, Lucene.Net Core, 
 Lucene.Net Demo, Lucene.Net Test
Reporter: George Aroush
Assignee: Alex Thompson
 Attachments: 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip, 
 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java, 
 Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip, 
 Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java, 
 TestBufferedIndexInput.java, TestDateFilter.java


 This task is to evaluate Sharpen as a port tool for Lucene.Net.
 The files to be evaluated are attached.  We need to run those files (which 
 are off Java Lucene 2.9.2) against Sharpen and compare the result against 
 JLCA result.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] Wrong home link in lucene.net website

2011-03-23 Thread Simone Chiaretta

Just a quick bug report on the website:

http://incubator.apache.org/lucene.net/

the Lucene.Net logo links to the homepage of the incubator and not on the
homepage of the project.
Simone

-- 
Simone Chiaretta
Microsoft MVP ASP.NET - ASPInsider
Blog: http://codeclimber.net.nz
RSS: http://feeds2.feedburner.com/codeclimber
twitter: @simonech

Any sufficiently advanced technology is indistinguishable from magic
Life is short, play hard

[Lucene.Net] [jira] [Commented] (LUCENENET-380) Evaluate Sharpen as a port tool

2011-03-23 Thread Alex Thompson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010206#comment-13010206
 ] 

Alex Thompson commented on LUCENENET-380:
-

My thoughts on the fork have been to make something that would be useful beyond 
Lucene, and the scope of the problems seem to be beyond the scope of 
Lucene.Net, so I do think an independent project would be a more natural fit. 
And if we used say BitBucket, would the infrastructure really be that much of a 
barrier?

 Evaluate Sharpen as a port tool
 ---

 Key: LUCENENET-380
 URL: https://issues.apache.org/jira/browse/LUCENENET-380
 Project: Lucene.Net
  Issue Type: Task
  Components: Build Automation, Lucene.Net Contrib, Lucene.Net Core, 
 Lucene.Net Demo, Lucene.Net Test
Reporter: George Aroush
Assignee: Alex Thompson
 Attachments: 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip, 
 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java, 
 Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip, 
 Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java, 
 TestBufferedIndexInput.java, TestDateFilter.java


 This task is to evaluate Sharpen as a port tool for Lucene.Net.
 The files to be evaluated are attached.  We need to run those files (which 
 are off Java Lucene 2.9.2) against Sharpen and compare the result against 
 JLCA result.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

[
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010022#comment-13010022
]

Simon Willnauer commented on LUCENE-2310:
-

Hey Chris,

good that you reactivate this issue! I was looking into similar stuff while
working on docvalues since it really needs to add stuff to Field / Fieldable.
With a cleanup and eventually FieldType this would be way less painless I
guess. I have a couple of questions and comments to the current patch.
Btw. I like the fact that the previous patch was uploaded March 21 2010 and the
latest took 1 year to come up on march 23 2011 :)

* Why do you reformat all the stuff in Field, is that necessary here at all? I
mean its needed eventually but for the deprecation of things it only bloats the
patch really doesn't it?

* When you deprecate AbstractField and Fieldable, Field should ideally be a
standalone class. So I see that this still needs to subclass Fieldable /
AbstractField but could it stand alone now so that we can simply remove the
extends / implements on Field once we drop things in 4.0? I think it looks good
from looking at the patch though

* I don't like the name getAllFields on Document since it implies that we have
a getPartialFields or something. I see that you can not use getFields since it
only differs in return type which doesn't belong to the signature though. Maybe
we should implement IterableField here and offer an additional method
getFieldsAsList or maybe getFields(ListField fields)

* once we have this in what are the next steps towards FieldType? Will we have
only one class Field that is backed by a FieldType but still offers the methods
it has now? Or doe we have two totally new classes FieldTyps and FieldValue,
something like this:
{code}
class FieldValue {
FieldType type;
float boost;
String name;
Object value;
}
{code}

* I wonder if this patch raises tons of deprecation warnings all over lucene
where Fieldable was used? In IW we use it all over the place though. We must
fix that in this issue too otherwise uwe will go mad I guess :)

thanks for bringing this up again!

Reduce Fieldable, AbstractField and Field complexity

Key: LUCENE-2310
URL: https://issues.apache.org/jira/browse/LUCENE-2310
Project: Lucene - Java
Issue Type: Sub-task
Components: Index
Reporter: Chris Male
Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch,
LUCENE-2310-Deprecate-AbstractField.patch,
LUCENE-2310-Deprecate-AbstractField.patch,
LUCENE-2310-Deprecate-AbstractField.patch,
LUCENE-2310-Deprecate-DocumentGetFields-core.patch,
LUCENE-2310-Deprecate-DocumentGetFields.patch,
LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch

In order to move field type like functionality into its own class, we really
need to try to tackle the hierarchy of Fieldable, AbstractField and Field.
Currently AbstractField depends on Field, and does not provide much more
functionality that storing fields, most of which are being moved over to
FieldType. Therefore it seems ideal to try to deprecate AbstractField (and
possible Fieldable), moving much of the functionality into Field and
FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010027#comment-13010027
]

Chris Male commented on LUCENE-2310:

Thanks for taking a look at this Simon.

bq. Why do you reformat all the stuff in Field, is that necessary here at all?
I mean its needed eventually but for the deprecation of things it only bloats
the patch really doesn't it?

Because for me this issue is about reducing the complexity of these classes and
Field is a mess. Making it more readable reduces the complexity. If needs be
I will do this in two patches, but I don't feel this issue is resolved till the
code in Field is readable.

bq. When you deprecate AbstractField and Fieldable, Field should ideally be a
standalone class. So I see that this still needs to subclass Fieldable /
AbstractField but could it stand alone now so that we can simply remove the
extends / implements on Field once we drop things in 4.0? I think it looks good
from looking at the patch though

I don't really understand what you're suggesting here. In 3x where the
deprecations will be occurring Field has to continue to extend AbstractField.
Yes in 4.0 we can drop that extension but addressing the deprecations is not in
the scope of 3x.

bq. I don't like the name getAllFields on Document since it implies that we
have a getPartialFields or something. I see that you can not use getFields
since it only differs in return type which doesn't belong to the signature
though. Maybe we should implement IterableField here and offer an additional
method getFieldsAsList or maybe getFields(ListField fields)

Yeah good call. I think implementing IterableField is best, but it will also
require adding a count() method to Document since often people retrieve the
List to get the number of fields.

bq. once we have this in what are the next steps towards FieldType? Will we
have only one class Field that is backed by a FieldType but still offers the
methods it has now? Or doe we have two totally new classes FieldTyps and
FieldValue

Once FieldType is in, all the various metadata properties (isIndexed, isStored
etc) will be moved to FieldType, leaving Field as what you suggest as
FieldValue. Field will contain its type, boost, name, value. If we have
Analyzers on FieldTypes, then we will be able to remove the TokenStream from
Field.

bq. I wonder if this patch raises tons of deprecation warnings all over lucene
where Fieldable was used? In IW we use it all over the place though. We must
fix that in this issue too otherwise uwe will go mad I guess

Yeah but not in 3x unfortunately. As it stands people can retrieve the List of
Fieldables via getFields() and add whatever implementation of Fieldable they
like. Consequently we need to continue to support Fieldable in IW for example.
Once this code has been committed I will create a new patch for trunk which
moves all of Solr and Lucene over to the Field. I could do this in many places
already of course, but that core classes like IW would have to remain as they
are.

I will wait for your thoughts on the reformating and then make a new patch.

Reduce Fieldable, AbstractField and Field complexity

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]

2011-03-23 Thread David Nemeskey

Hey Simon and all,

May we get an update on this? I understand that Google has published the list 
of accepted organizations, which -- not surprisingly -- includes the ASF. Is 
there any information on how many slots Apache got, and which issues will be 
selected?

The student application period opens on the 28th, so I'm just wondering if I 
should go ahead and apply or wait for the decision.

Thanks,
David

On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote:
 Hey folks,
 
 Google Summer of Code 2011 is very close and the Project Applications
 Period has started recently. Now it's time to get some excited students
 on board for this year's GSoC.
 
 I encourage students to submit an application to the Google Summer of Code
 web-application. Lucene  Solr are amazing projects and GSoC is an
 incredible opportunity to join the community and push the project
 forward.
 
 If you are a student and you are interested spending some time on a
 great open source project while getting paid for it, you should submit
 your application from March 28 - April 8, 2011. There are only 3
 weeks until this process starts!
 
 Quote from the GSoC website: We hear almost universally from our
 mentoring organizations that the best applications they receive are
 from students who took the time to interact and discuss their ideas
 before submitting an application, so make sure to check out each
 organization's Ideas list to get to know a particular open source
 organization better.
 
 So if you have any ideas what Lucene  Solr should have, or if you
 find any of the GSoC pre-selected projects [1] interesting, please
 join us on dev@lucene.apache.org [2].  Since you as a student must
 apply for a certain project via the GSoC website [3], it's a good idea
 to work on it ahead of time and include the community and possible
 mentors as soon as possible.
 
 Open source development here at the Apache Software
 Foundation happens almost exclusively in the public and I encourage you to
 follow this. Don't mail folks privately; please use the mailing list to
 get the best possible visibility and attract interested community
 members and push your idea forward. As always, it's the idea that
 counts not the person!
 
 That said, please do not underestimate the complexity of even small
 GSoC - Projects. Don't try to rewrite Lucene or Solr!  A project
 usually gains more from a smaller, well discussed and carefully
 crafted  tested feature than from a half baked monster change that's
 too large to work with.
 
 Once your proposal has been accepted and you begin work, you should
 give the community the opportunity to iterate with you.  We prefer
 progress over perfection so don't hesitate to describe your overall
 vision, but when the rubber meets the road let's take it in small
 steps.  A code patch of 20 KB is likely to be reviewed very quickly so
 get fast feedback, while a patch even 60kb in size can take very
 - Hide quoted text -
 long. So try to break up your vision and the community will work with
 you to get things done!
 
 On behalf of the Lucene  Solr community,
 
 Go! join the mailing list and apply for GSoC 2011,
 
 Simon
 
 [1]
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQu
 ery=labels+%3D+lucene-gsoc-11 [2]
 http://lucene.apache.org/java/docs/mailinglists.html
 [3] http://www.google-melange.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2983) FieldInfos should be read-only if loaded from disk

FieldInfos should be read-only if loaded from disk
--

 Key: LUCENE-2983
 URL: https://issues.apache.org/jira/browse/LUCENE-2983
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0


Currently FieldInfos create a private FieldNumberBiMap when they are loaded 
from a directory which is necessary due to some limitation we need to face with 
IW#addIndexes(Dir). If we add an index via a directory to an existing index 
field number can conflict with the global field numbers in the IW receiving the 
directories. Those field number conflicts will remain until those segments are 
merged and we stabilize again based on the IW global field numbers. Yet, we 
unnecessarily creating a BiMap here where we actually should enforce read-only 
semantics since nobody should modify this FieldInfos instance we loaded from 
the directory. If somebody needs to get a modifiable copy they should simply 
create a new one and all all FieldInfo instances to it.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010030#comment-13010030
 ] 

Simon Willnauer commented on LUCENE-2310:
-

bq. I don't really understand what you're suggesting here. In 3x where the 
deprecations will be occurring Field has to continue to extend AbstractField. 
Yes in 4.0 we can drop that extension but addressing the deprecations is not in 
the scope of 3x.

What I mean here is that if I would simply remove the extends AbstractField 
from Field would it still compile or are there any dependencies from 
AbstractField? IMO AbstractField should just be empty now right?

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2983) FieldInfos should be read-only if loaded from disk


 [ 
https://issues.apache.org/jira/browse/LUCENE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2983:


Attachment: LUCENE-2983.patch

here is a patch with tests. All tests pass

 FieldInfos should be read-only if loaded from disk
 --

 Key: LUCENE-2983
 URL: https://issues.apache.org/jira/browse/LUCENE-2983
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2983.patch


 Currently FieldInfos create a private FieldNumberBiMap when they are loaded 
 from a directory which is necessary due to some limitation we need to face 
 with IW#addIndexes(Dir). If we add an index via a directory to an existing 
 index field number can conflict with the global field numbers in the IW 
 receiving the directories. Those field number conflicts will remain until 
 those segments are merged and we stabilize again based on the IW global field 
 numbers. Yet, we unnecessarily creating a BiMap here where we actually should 
 enforce read-only semantics since nobody should modify this FieldInfos 
 instance we loaded from the directory. If somebody needs to get a modifiable 
 copy they should simply create a new one and all all FieldInfo instances to 
 it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010032#comment-13010032
 ] 

Chris Male commented on LUCENE-2310:


Yes Field would still compile if you removed the extends.  However if we empty 
AbstractField then any client code that also extends AbstractField would break. 
 Thats why I deprecate the whole class but leave its code in.  We could empty 
it and change it to extend Field, I think that would still work.

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2984) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos

Move hasVectors()  hasProx() responsibility out of SegmentInfo to FieldInfos 
--

 Key: LUCENE-2984
 URL: https://issues.apache.org/jira/browse/LUCENE-2984
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


Spin-off from LUCENe-2881 which had this change already but due to some random 
failures related to this change I remove this part of the patch to make it more 
isolated and easier to test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010035#comment-13010035
 ] 

Simon Willnauer commented on LUCENE-2310:
-

{quote}
Yeah but not in 3x unfortunately. As it stands people can retrieve the List of 
Fieldables via getFields() and add whatever implementation of Fieldable they 
like. Consequently we need to continue to support Fieldable in IW for example. 
Once this code has been committed I will create a new patch for trunk which 
moves all of Solr and Lucene over to the Field. I could do this in many places 
already of course, but that core classes like IW would have to remain as they 
are.
{quote}

So, what is the reason for doing this in 3.x at all, can't we simply drop stuff 
in 4.0 and let 3.x alone?

Simon

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010036#comment-13010036
 ] 

Chris Male commented on LUCENE-2310:


bq. So, what is the reason for doing this in 3.x at all, can't we simply drop 
stuff in 4.0 and let 3.x alone?

Very good question.  Certainly we are simplifying the codebase and I feel that 
Field is what most users use (not AbstractField).  But I know some expert users 
do use AbstractField.  But maybe they can handle the hard change?

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2984) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos


 [ 
https://issues.apache.org/jira/browse/LUCENE-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2984:


Description: Spin-off from LUCENE-2881 which had this change already but 
due to some random failures related to this change I remove this part of the 
patch to make it more isolated and easier to test.   (was: Spin-off from 
LUCENe-2881 which had this change already but due to some random failures 
related to this change I remove this part of the patch to make it more isolated 
and easier to test. )

 Move hasVectors()  hasProx() responsibility out of SegmentInfo to FieldInfos 
 --

 Key: LUCENE-2984
 URL: https://issues.apache.org/jira/browse/LUCENE-2984
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


 Spin-off from LUCENE-2881 which had this change already but due to some 
 random failures related to this change I remove this part of the patch to 
 make it more isolated and easier to test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)


[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010039#comment-13010039
 ] 

Doron Cohen commented on LUCENE-2980:
-

bq. Perhaps we should add a specific test in CSTest for this problem? I 
wouldn't use file.delete() as in indicator because on Linux it will pass

Agree, I'll add one.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)


[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010043#comment-13010043
 ] 

Doron Cohen commented on LUCENE-2980:
-

bq. Perhaps we should add a specific test in CSTest for this problem? I 
wouldn't use file.delete() as in indicator because on Linux it will pass

Changed my mind about adding this test to ContentSourceTest - I think such a 
test fits more to the CommonCompress project, because it should directly call 
CompressorStreamFactory.createCompressorInputStream(in). In our test we invoke 
ContentSource.getInputStream(File) and so we cannot pass such a close-sensing 
stream. 

But this is a valid point, especially, the test case I provided to COMPRESS-127 
will fail on Windows but will likely pass on Linux. I'll add a reference to 
your comment in COMPRESS-127.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name


 [ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2977:


Summary: WriteLineDocTask should write gzip/bzip2/txt according to the 
extension of specified output file name  (was: WriteLineDocTask should write 
gzip/bzip2/txt according to the extension of specifie output file name)

 WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
 specified output file name
 -

 Key: LUCENE-2977
 URL: https://issues.apache.org/jira/browse/LUCENE-2977
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0


 Since the readers behave this way it would be nice and handy if also this 
 line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010064#comment-13010064
 ] 

Shai Erera commented on LUCENE-2980:


Agreed.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2980:


Attachment: LUCENE-2980.patch

Updated patch applies workaround only for GZIP format, as other types do close 
their wrapped stream (COMPRESS-127).

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-2980.
-

   Resolution: Fixed
Lucene Fields:   (was: [New])

Committed:
- trunk: r1084544, r1084549
- 3x: r1084552

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2982) Get rid of ContenSource's workaround for closing b/gzip input stream once this is fixed in CommonCompress


[ 
https://issues.apache.org/jira/browse/LUCENE-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010086#comment-13010086
 ] 

Doron Cohen commented on LUCENE-2982:
-

COMPRESS-127 was fixed, so whenever a new CommonsCompress release is available 
should be able to complete this one.
I subscribed to annou...@apache.org to be notified when that happens...

 Get rid of ContenSource's workaround for closing b/gzip input stream once 
 this is fixed in CommonCompress
 -

 Key: LUCENE-2982
 URL: https://issues.apache.org/jira/browse/LUCENE-2982
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/benchmark
Reporter: Doron Cohen
Priority: Minor

 Once COMPRESS-127 is fixed get rid of the entire workaround method 
 ContentSource.closableCompressorInputStream(). It would simplify the code and 
 would perform better without that delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing

Build SegmentCodecs incrementally for consistent codecIDs during indexing
-

 Key: LUCENE-2985
 URL: https://issues.apache.org/jira/browse/LUCENE-2985
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Codecs, Index
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0


currently we build the SegementCodecs during flush which is fine as long as no 
codec needs to know which fields it should handle. This will change with 
DocValues or when we expose StoredFields / TermVectors via Codec (see 
LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a 
consistent view of which codec belongs to which field during indexing and all 
FieldInfo instances are unassigned (set to -1). Instead we should build the 
SegmentCodecs incrementally as fields come in so no matter when a codec needs 
to be selected to process a document / field we have the right codec ID 
assigned.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]

2011-03-23 Thread Simon Willnauer

On Wed, Mar 23, 2011 at 9:37 AM, David Nemeskey
nemeskey.da...@sztaki.hu wrote:
 Hey Simon and all,

 May we get an update on this? I understand that Google has published the list
 of accepted organizations, which -- not surprisingly -- includes the ASF. Is
 there any information on how many slots Apache got, and which issues will be
 selected?

 The student application period opens on the 28th, so I'm just wondering if I
 should go ahead and apply or wait for the decision.

David,

you should go ahead and apply via the GSoC website and reference the
issue there this is how I understand it works.
We will later rate the proposals from the GSoC website and decide
which we choose. This is also when slots get assigned.

simon

 Thanks,
 David

 On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote:
 Hey folks,

 Google Summer of Code 2011 is very close and the Project Applications
 Period has started recently. Now it's time to get some excited students
 on board for this year's GSoC.

 I encourage students to submit an application to the Google Summer of Code
 web-application. Lucene  Solr are amazing projects and GSoC is an
 incredible opportunity to join the community and push the project
 forward.

 If you are a student and you are interested spending some time on a
 great open source project while getting paid for it, you should submit
 your application from March 28 - April 8, 2011. There are only 3
 weeks until this process starts!

 Quote from the GSoC website: We hear almost universally from our
 mentoring organizations that the best applications they receive are
 from students who took the time to interact and discuss their ideas
 before submitting an application, so make sure to check out each
 organization's Ideas list to get to know a particular open source
 organization better.

 So if you have any ideas what Lucene  Solr should have, or if you
 find any of the GSoC pre-selected projects [1] interesting, please
 join us on dev@lucene.apache.org [2].  Since you as a student must
 apply for a certain project via the GSoC website [3], it's a good idea
 to work on it ahead of time and include the community and possible
 mentors as soon as possible.

 Open source development here at the Apache Software
 Foundation happens almost exclusively in the public and I encourage you to
 follow this. Don't mail folks privately; please use the mailing list to
 get the best possible visibility and attract interested community
 members and push your idea forward. As always, it's the idea that
 counts not the person!

 That said, please do not underestimate the complexity of even small
 GSoC - Projects. Don't try to rewrite Lucene or Solr!  A project
 usually gains more from a smaller, well discussed and carefully
 crafted  tested feature than from a half baked monster change that's
 too large to work with.

 Once your proposal has been accepted and you begin work, you should
 give the community the opportunity to iterate with you.  We prefer
 progress over perfection so don't hesitate to describe your overall
 vision, but when the rubber meets the road let's take it in small
 steps.  A code patch of 20 KB is likely to be reviewed very quickly so
 get fast feedback, while a patch even 60kb in size can take very
 - Hide quoted text -
 long. So try to break up your vision and the community will work with
 you to get things done!

 On behalf of the Lucene  Solr community,

 Go! join the mailing list and apply for GSoC 2011,

 Simon

 [1]
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQu
 ery=labels+%3D+lucene-gsoc-11 [2]
 http://lucene.apache.org/java/docs/mailinglists.html
 [3] http://www.google-melange.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing


 [ 
https://issues.apache.org/jira/browse/LUCENE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2985:


Attachment: LUCENE-2985.patch

here is an initial patch that uses a SegmentCodecBuilder to assign codec IDs 
during indexing in DocFieldProcessorPerThread.

 Build SegmentCodecs incrementally for consistent codecIDs during indexing
 -

 Key: LUCENE-2985
 URL: https://issues.apache.org/jira/browse/LUCENE-2985
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Codecs, Index
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0

 Attachments: LUCENE-2985.patch


 currently we build the SegementCodecs during flush which is fine as long as 
 no codec needs to know which fields it should handle. This will change with 
 DocValues or when we expose StoredFields / TermVectors via Codec (see 
 LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a 
 consistent view of which codec belongs to which field during indexing and all 
 FieldInfo instances are unassigned (set to -1). Instead we should build the 
 SegmentCodecs incrementally as fields come in so no matter when a codec needs 
 to be selected to process a document / field we have the right codec ID 
 assigned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #70: POMs out of sync

2011-03-23 Thread Apache Hudson Server

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-3.x/70/

No tests ran.

Build Log (for compile errors):
[...truncated 22 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

2011-03-23 Thread Grant Ingersoll

Hey Dawid,

Thanks for doing this.  It would be good, too, if we no longer had to pass in 
-Dsolr.clustering.enabled=true as there is no reason why we can't just have it 
on like the other components.

-Grant

On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote:

 Author: dweiss
 Date: Tue Mar 22 20:44:21 2011
 New Revision: 1084345
 
 URL: http://svn.apache.org/viewvc?rev=1084345view=rev
 Log:
 Removing the note about excluded JARs (everything is included).
 
 Modified:
lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
 
 Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345r1=1084344r2=1084345view=diff
 ==
 --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original)
 +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 
 20:44:21 2011
 @@ -1183,12 +1183,10 @@
 
http://wiki.apache.org/solr/ClusteringComponent
 
 -   This relies on third party jars which are notincluded in the
 -   release.  To use this component (and the /clustering handler)
 -   Those jars will need to be downloaded, and you'll need to set
 -   the solr.cluster.enabled system property when running solr...
 +   You'll need to set the solr.cluster.enabled system property 
 +   when running solr to run with clustering enabled:
 
 -  java -Dsolr.clustering.enabled=true -jar start.jar
 +   java -Dsolr.clustering.enabled=true -jar start.jar
 --
   searchComponent name=clustering 
enable=${solr.clustering.enabled:false}
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-23 Thread Dawid Weiss (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dawid Weiss resolved LUCENE-2967.
-

Resolution: Won't Fix
Lucene Fields: (was: [New])

I spent some time on this. It's quite fascinating: the number of collisions for
the default probing is smaller than:

a) linear probing with murmurhash mix of the original hash
b) linear probing without murmurhash mix (start from raw hash only).

Curiously, the number of collisions for (b) is smaller than for (a) -- this
could be explained if we assume bits are spread evently throughout the entire
32-bit range after murmurhash, so after masking to table size there should be
more collisions on lower bits compared to a raw hash (this would have more
collisions on upper bits and fewer on lower bits because it is
multiplicative... or at least I think so).

Anyway, I tried many different versions and I don't see any significant
difference in favor of linear probing here. Measured the GC overhead during my
tests too, but it is not the primary factor contributing to the total cost of
constructing the FST (about 3-5% of the total time, running in parallel,
typically).

Use linear probing with an additional good bit avalanching function in FST's
NodeHash.
--

Key: LUCENE-2967
URL: https://issues.apache.org/jira/browse/LUCENE-2967
Project: Lucene - Java
Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
Fix For: 4.0

Attachments: LUCENE-2967.patch

I recently had an interesting discussion with Sebastiano Vigna (fastutil),
who suggested that linear probing, given a hash mixing function with good
avalanche properties, is a way better method of constructing lookups in
associative arrays compared to quadratic probing. Indeed, with linear probing
you can implement removals from a hash map without removed slot markers and
linear probing has nice properties with respect to modern CPUs (caches). I've
reimplemented HPPC's hash maps to use linear probing and we observed a nice
speedup (the same applies for fastutils of course).
This patch changes NodeHash's implementation to use linear probing. The code
is a bit simpler (I think :). I also moved the load factor to a constant --
0.5 seems like a generous load factor, especially if we allow large FSTs to
be built. I don't see any significant speedup in constructing large automata,
but there is no slowdown either (I checked on one machine only for now, but
will verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Erik Hatcher

+1

  * Ran Solr example
  * Perused entire structure of both binary and source distros

Noticed the minor issues others have reported, to echo Ryan, none seem like 
blockers to me.

And also to echo Ryan's thanks huge thanks to everyone's hard work on the 
3.1 Lucene/Solr release(s).  This is a big milestone for the technology and 
community.

Erik

On Mar 22, 2011, at 23:42 , Ryan McKinley wrote:

 +1
 
 * Walked through the solr example
 * Tested a simple maven project, worked well
 
 I don't think the minor issues listed so far are blockers
 
 Thanks to everyone who worked on this!
 
 ryan
 
 
 On Tue, Mar 22, 2011 at 10:21 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 Please vote to release the artifacts at
 http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2
 as Lucene 3.1 and Solr 3.1
 
 Thanks for everyone's help pulling all this together!
 
 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-23 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010110#comment-13010110
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. I have not looked this patch so this comment may be off base.

The slideshare deck gives a good overview: 
http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

As a simple Lucene-focused addition I'd prefer not to explore all the possible 
implications for Solr adoption here. The affected areas in Solr are extensive 
and would include schema definitions, query syntax, facets/filter caching, 
result-fetching, DIH etc etc. Probably best discussed elsewhere.



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-03-23 Thread Tommaso Teofili (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated SOLR-2436:
--

Attachment: SOLR-2436_2.patch

Hello Koji,
I've tested your patch, I needed to align it to latest patch applied (see 
SOLR-2387) to make tests work (see attached patch). 

In my opinion this solution is better than the current one as it reflects the 
Solr way of specifying parameters in Handlers.

However I think it should be good if it was possible to alternatively get rid 
of the uimaConfig file defining each parameter inside the Processor with Solr 
elements (str/lst/int etc.) as well.



 move uimaConfig to under the uima's update processor in solrconfig.xml
 --

 Key: SOLR-2436
 URL: https://issues.apache.org/jira/browse/SOLR-2436
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch


 Solr contrib UIMA has its config just beneath config. I think it should 
 move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-03-23 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010112#comment-13010112
 ] 

Tommaso Teofili edited comment on SOLR-2436 at 3/23/11 1:26 PM:


Hello Koji,
I've tested your patch, I needed to align it to latest patch applied (see 
SOLR-2387) to make tests work (see attached patch). 

In my opinion the solution you're proposing is better than the current one as 
it reflects the Solr way of specifying parameters in Handlers.

However I think it should be good if it was possible to alternatively get rid 
of the uimaConfig file defining each parameter inside the Processor with Solr 
elements (str/lst/int etc.) as well.



  was (Author: teofili):
Hello Koji,
I've tested your patch, I needed to align it to latest patch applied (see 
SOLR-2387) to make tests work (see attached patch). 

In my opinion this solution is better than the current one as it reflects the 
Solr way of specifying parameters in Handlers.

However I think it should be good if it was possible to alternatively get rid 
of the uimaConfig file defining each parameter inside the Processor with Solr 
elements (str/lst/int etc.) as well.


  
 move uimaConfig to under the uima's update processor in solrconfig.xml
 --

 Key: SOLR-2436
 URL: https://issues.apache.org/jira/browse/SOLR-2436
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.1
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch


 Solr contrib UIMA has its config just beneath config. I think it should 
 move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

2011-03-23 Thread Grant Ingersoll



On Mar 23, 2011, at 9:20 AM, Dawid Weiss wrote:

 Sure, I'll change it. Can I alter branch_3x too?

That's fine to change 3_x, the 3.1 release is on lucene_solr_3_1 (or something 
similar).  This way it will be on in 3.2.

-Grant

 Don't know what the
 policy is after the RCs have been published.
 
 Dawid
 
 On Wed, Mar 23, 2011 at 2:07 PM, Grant Ingersoll gsing...@apache.org wrote:
 Hey Dawid,
 
 Thanks for doing this.  It would be good, too, if we no longer had to pass 
 in -Dsolr.clustering.enabled=true as there is no reason why we can't just 
 have it on like the other components.
 
 -Grant
 
 On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote:
 
 Author: dweiss
 Date: Tue Mar 22 20:44:21 2011
 New Revision: 1084345
 
 URL: http://svn.apache.org/viewvc?rev=1084345view=rev
 Log:
 Removing the note about excluded JARs (everything is included).
 
 Modified:
lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
 
 Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345r1=1084344r2=1084345view=diff
 ==
 --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original)
 +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 
 20:44:21 2011
 @@ -1183,12 +1183,10 @@
 
http://wiki.apache.org/solr/ClusteringComponent
 
 -   This relies on third party jars which are notincluded in the
 -   release.  To use this component (and the /clustering handler)
 -   Those jars will need to be downloaded, and you'll need to set
 -   the solr.cluster.enabled system property when running solr...
 +   You'll need to set the solr.cluster.enabled system property
 +   when running solr to run with clustering enabled:
 
 -  java -Dsolr.clustering.enabled=true -jar start.jar
 +   java -Dsolr.clustering.enabled=true -jar start.jar
 --
   searchComponent name=clustering
enable=${solr.clustering.enabled:false}
 
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Willnauer updated LUCENE-2573:

Attachment: LUCENE-2573.patch

here is my current state on this issue. I did't add all JDocs needed (by far)
and I will wait until we settled on the API for FlushPolicy.

* I removed the complex TieredFlushPolicy entirely and added one
DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT
pending.
* DW will stall threads if we reach 2 x maxNetRam which is retrieved from
FlushPolicy so folks can lower that depending on their env.

* DWFlushControl checks if a single DWPT grows too large and sets it forcefully
pending once its ram consumption is 1.9 GB. That should be enough buffer to
not reach the 2048MB limit. We should consider making this configurable.

* FlushPolicy has now three methods onInsert, onUpdate and onDelete while
DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base
class just calls those on an update.

* I removed FlushControl from IW
* added documentation on IWC for FlushPolicy and removed the jdocs for the RAM
limit. I think we should add some lines about how RAM is now used and that
users should balance the RAM with the number of threads they are using. Will do
that later on though.

* For testing I added a ThrottledIndexOutput that makes flushing slow so I can
test if we are stalled and / or blocked. This is passed to
MockDirectoryWrapper. Its currently under util but it rather should go under
store, no?

* byte consumption is now committed before FlushPolicy is called since we don't
have the multitier flush which required that to reliably proceed across tier
boundaries (not required but it was easier to test really). So FP doesn't need
to take care of the delta

* FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered
delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which
causes some failures though. I added //nocommit @Ignore to those tests.

* this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy
which I couldn't figure out why it is failing but it could be due to an old
version of LUCENE-2881 on this branch. I will see if it still fails once we
merged.

* Healthiness now doesn't stall if we are not flushing on RAM consumption to
ensure we don't lock in threads.

over all this seems much closer now. I will start writing jdocs. Flush on
buffered delete terms might need some tests and I should also write a more
reliable test for Healthiness... current it relies on that the
ThrottledIndexOutput is slowing down indexing enough to block which might not
be true all the time. It didn't fail yet.

Tiered flushing of DWPTs by RAM with low/high water marks
-

Key: LUCENE-2573
URL: https://issues.apache.org/jira/browse/LUCENE-2573
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch

Now that we have DocumentsWriterPerThreads we need to track total consumed
RAM across all DWPTs.
A flushing strategy idea that was discussed in LUCENE-2324 was to use a
tiered approach:
- Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
- Flush all DWPTs at a high water mark (e.g. at 110%)
- Use linear steps in between high and low watermark: E.g. when 5 DWPTs are
used, flush at 90%, 95%, 100%, 105% and 110%.
Should we allow the user to configure the low and high water mark values
explicitly using total values (e.g. low water mark at 120MB, high water mark
at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB()
config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity

2011-03-23 Thread Robert Muir (JIRA)

divorce defaultsimilarityprovider from defaultsimilarity


 Key: LUCENE-2986
 URL: https://issues.apache.org/jira/browse/LUCENE-2986
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 4.0


In LUCENE-2236 as a start, we made DefaultSimilarity which implements the 
factory interface (SimilarityProvider), and also extends Similarity.

Its factory interface just returns itself always by default.

Doron mentioned it would be cleaner to split the two, and I thought it would be 
good to revisit it later.

Today as I was looking at SOLR-2338, it became pretty clear that we should do 
this, it makes things a lot cleaner. I think currently its confusing to users 
to see the two apis mixed if they are trying to subclass.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity

2011-03-23 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2986:


Attachment: LUCENE-2986.patch

Attached is a patch: adds DefaultSimilarityProvider, which has our default 
implementations of the non-field-specific methods (coord/queryNorm/etc), and 
always returns DefaultSimilarity.

 divorce defaultsimilarityprovider from defaultsimilarity
 

 Key: LUCENE-2986
 URL: https://issues.apache.org/jira/browse/LUCENE-2986
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2986.patch


 In LUCENE-2236 as a start, we made DefaultSimilarity which implements the 
 factory interface (SimilarityProvider), and also extends Similarity.
 Its factory interface just returns itself always by default.
 Doron mentioned it would be cleaner to split the two, and I thought it would 
 be good to revisit it later.
 Today as I was looking at SOLR-2338, it became pretty clear that we should do 
 this, it makes things a lot cleaner. I think currently its confusing to users 
 to see the two apis mixed if they are trying to subclass.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name


 [ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2977:


Attachment: LUCENE-2977.patch

Patch for auto-detecting output compression mode of result line file:

- getInputStream() moved from ContentSource to a new class StreamUtils under 
util. It is now named inputStream(File).
- outputStream() method added to StreamUtils.

Before applying this patch *svn mv 
modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/ContentSourceTest.java
 
modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/utils/StreamUtilsTest.java*

I kept for now the force-bzip logic in WriteLineDocTask but I would like to 
remove it - it is strange, and in any case LineDocSource would only auto-detect 
bzip input format if WriteLineDocTask was able to auto-detect bzip output 
format. Removing it will also simplify StreamUtils. Any opinions on removing 
this force-bzip option?


 WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
 specified output file name
 -

 Key: LUCENE-2977
 URL: https://issues.apache.org/jira/browse/LUCENE-2977
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2977.patch


 Since the readers behave this way it would be nice and handy if also this 
 line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-23 Thread Paul Elschot (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-2945:
-

Attachment: LUCENE-2945d.patch

Also has the changes to SpanNearClauseFactory.

 Surround Query doesn't properly handle equals/hashcode
 --

 Key: LUCENE-2945
 URL: https://issues.apache.org/jira/browse/LUCENE-2945
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1.1, 4.0

 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
 LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, 
 LUCENE-2945d.patch


 In looking at using the surround queries with Solr, I am hitting issues 
 caused by collisions due to equals/hashcode not being implemented on the 
 anonymous inner classes that are created by things like DistanceQuery (branch 
 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-23 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010218#comment-13010218
 ] 

Paul Elschot edited comment on LUCENE-2945 at 3/23/11 5:01 PM:
---

New -2945d patch that also has the changes to SpanNearClauseFactory.

  was (Author: paul.elsc...@xs4all.nl):
Also has the changes to SpanNearClauseFactory.
  
 Surround Query doesn't properly handle equals/hashcode
 --

 Key: LUCENE-2945
 URL: https://issues.apache.org/jira/browse/LUCENE-2945
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1.1, 4.0

 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
 LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, 
 LUCENE-2945d.patch


 In looking at using the surround queries with Solr, I am hitting issues 
 caused by collisions due to equals/hashcode not being implemented on the 
 anonymous inner classes that are created by things like DistanceQuery (branch 
 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)

Case Insensitive Search for Wildcard Queries


 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge


This patch adds support to allow case-insensitive queries on wildcard searches 
for configured TextField field types.

This patch extends the excellent work done Yonik and Michael in SOLR-219.
The approach here is different enough (imho) to warrant a separate JIRA issue.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-2438:
---

Attachment: SOLR-2438.patch

Attached patch file

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
 Attachments: SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010263#comment-13010263
 ] 

Shai Erera commented on LUCENE-2977:


Patch looks good !

In StreamUtils you have b.bz/b -- it should be b.bz2/b

bq. Any opinions on removing this force-bzip option?

+1 (you mean the bzip.compression property in WLDT right?). I think that it's 
reasonable to request the user to specify an output file with .bz2 extension if 
he wants bzip compression. I don't see how it will simplify StreamUtils though, 
but I trust you :) (perhaps you meant it will simplify WLDT?)

 WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
 specified output file name
 -

 Key: LUCENE-2977
 URL: https://issues.apache.org/jira/browse/LUCENE-2977
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2977.patch


 Since the readers behave this way it would be nice and handy if also this 
 line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010268#comment-13010268
 ] 

Peter Sturge commented on SOLR-2438:


If you're like me, you may have often wondered why MyTerm, myterm, myter* and 
MyTer* can return different, and sometimes empty results.
This patch addresses this for wildcard queries by adding an attribute to 
relevant solr.TextField entries in schema.xml.
The new attribute is called:  {{ignoreCaseForWildcards}}

Example entry in schema.xml:
{code:title=schema.xml [excerpt]|borderStyle=solid}
fieldType name=text_lcws class=solr.TextField positionIncrementGap=100 
ignoreCaseForWildcards=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
  /analyzer
/fieldType
{code}

It's worth noting that this will lower-case text for ALL terms that match the 
field type - including synonyms and stemmers.

For backward compatibility, the default behaviour is as before - i.e. a case 
sensitive wildcard search ({{ignoreCaseForWildcards=false}}).

The patch was created against the lucene_solr_3_1 branch. I've not applied it 
yet on trunk.

[caveat emptor] I freely admit I'm no schema expert, so commiters and community 
members may see use cases where this approach could pose problems. I'm all for 
feedback to enhance the functionality...

The hope here is to re-ignite enthusiasm for case-insensitive wildcard searches 
in Solr - in line with the 'it just works' Solr philosophy.

Enjoy!


 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
 Attachments: SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: write byte[] directly to TokenStream

2011-03-23 Thread Ryan McKinley

works great - thanks!


On Wed, Mar 23, 2011 at 1:04 AM, Robert Muir rcm...@gmail.com wrote:

 On Mar 22, 2011 11:38 PM, Ryan McKinley ryan...@gmail.com wrote:

 I'm messing with putting binary data directly in the index.  I have a
 field class with:

  @Override
  public TokenStream tokenStreamValue() {
    byte[] value = (byte[])fieldsData;

    Token token = new Token( 0, value.length, geo );
    token.resizeBuffer( value.length );
    BytesRef ref = token.getBytesRef();
    ref.bytes = value;
    ref.length = value.length;
    ref.offset = 0;
    token.setLength( ref.length );
    return new SingleTokenTokenStream( token );
  }

 but that is just writing an empty token.  Is it possible to set the
 Token value without converting to char[]?


 check out Test2BTerms for an example...


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2415) Change XMLWriter version parameter to wt.xml.version

2011-03-23 Thread Hoss Man (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010308#comment-13010308
]

Hoss Man commented on SOLR-2415:

bq. how should we handle the desire to change the faceting format (to make it
easier to add metadata like total number of constraints, etc)? version would
be one way. facet.format would be another way.

i don't think the *structure* of the response (ie: the facet response section)
should be driven by the same param as the *format* of the response, which is
what version currently is. Something like facet.format seems more
appropriate when dealing with a specific component like that ... but i don't
think it should be a numeric version equse property, i think it should be
descriptive (ie: flat, vs nested or something)

bq. perhaps we should add a getVersion() parameter on SolrQueryRequest and have
that used across all components.

when i suggested we have a common wt.version param that all of the response
writers could use, i didn't mean to suggest that it should have a singular id
space. my suggestion was that the specific values specified for version or
wt.version or whatever would only be meaningful to the specific response
writer used -- just as the current values of the version param that the
XMLResponseWriter uses are meaninless to the JSONResponseWriter. the overlap
would only be in reusing the param name (in the same way that q is the common
param name for the main query, regardless of what query parser is specified by
defType)

bq. Look at how long the existing response writers have hung around in their
current format, independent of the version # changes (1.2, 1.3, 1.4, and now
3.1)

the version param of the XML response writer has never been in sync with the
solr version, it was never intended to be. it's always been the version number
of the xml format.

Change XMLWriter version parameter to wt.xml.version
--

Key: SOLR-2415
URL: https://issues.apache.org/jira/browse/SOLR-2415
Project: Solr
Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Trivial
Fix For: 4.0

The XMLWriter has a parameter called 'version'. This controls some specifics
about how the XMLWriter works. Using the parameter name 'version' made sense
back when the XMLWriter was the only option, but with all the various writers
and different places where 'version' makes sense, I think we should change
this parameter name to wt.xml.version so that it specifically refers to the
XMLWriter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-03-23 Thread Stefan Matheis (steffkes) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010318#comment-13010318
]

Stefan Matheis (steffkes) commented on SOLR-2399:
-

Ryan: ty, will take your points on my list - pretty sure, that it should be
possible to integrate them
Mark: ty! :)

For today, it's about *Logging*. Talked about that with Hoss on #solr the last
days, so already changed a few things .. on the way, but not finished:
http://files.mathe.is/solr-admin/07_logging.png

Actually thinking about the following points:
* Tree Structure good way to solve it?
* Do we need the possibitly to collapse/expand the three/the childrens? The
List could be longer (the screenshot is cropped, just for layout reasons)
especially while using SolrCloud which adds about 30 Loggers
* In the current er .. Interface you are able to see that the row you're
looking at has a level set and in the end (at the right) which is the effective
level - for me, that does not matter. if a row/logger, has level-x - that's
enough to know. don't need to see if this level is set or inherited.
* just a quick idea: if you change f.e. {{org.apache.solr}} then the interface
will automatically update all childrens in realtime, affects all nested/sub
loggers w/o a assigned level.

Thoughts on these points? anyone? :

Short Note: i moved Logging to a global level, because it's not configurable on
a per-core basis.

Solr Admin Interface, reworked
--

Key: SOLR-2399
URL: https://issues.apache.org/jira/browse/SOLR-2399
Project: Solr
Issue Type: Improvement
Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor
Fix For: 4.0

*The idea was to create a new, fresh (and hopefully clean) Solr Admin
Interface.* [Based on this
[ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
I've quickly created a Github-Repository (Just for me, to keep track of the
changes)
» https://github.com/steffkes/solr-admin
[This commit shows the
differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d]
between old/existing index.jsp and my new one (which is could
copy-cut/paste'd from the existing one).
Main Action takes place in
[js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js]
which is actually neither clean nor pretty .. just work-in-progress.
Actually it's Work in Progress, so ... give it a try. It's developed with
Firefox as Browser, so, for a first impression .. please don't use _things_
like Internet Explorer or so ;o
Jan already suggested a bunch of good things, i'm sure there are more ideas
over there :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name