[jira] Commented: (SOLR-248) Capitalization Filter Factory

2007-05-24 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498492
 ] 

Ryan McKinley commented on SOLR-248:


 
 1) would it make sense for the keep option to refer to a file, using the same 
 format as StopFilter ... that way it's easy to reuse the same file (which 
 seems like it would be a common case.
 

probably.  that is a good idea


 2) what is the point of forceFirstLetter=true ? ... if you want to force 
 capitalization, what's the point of making hte keep list?
 

This is one that came of necessity!

with keep=the ...  and input:
 Grand army of the Republic, the arts

I want: Grand Army of the Republic and The Arts

forceFirstLetter only applies to the first character in the token, not to 
each word.


 3) is okPrefix going to force the case for things that have that prefix in an 
 alternate case, or only allow that casing to remain (ie: if i index McKeen, 
 Mckeen, mckeen and MCKEEN what tokens do i wind up with?)
 

As written, if the prefix matches, it assumes the word capitalization is 
correct.  For my input data, this is sufficient -- but it should problem do 
something smarter.

So, if you index McKeen, Mckeen, mckeen, MCKEEN and McKEEN, you would get:

 McKeen, Mckeen, Mckeen, Mckeen And McKEEN

If okPrefix was treated as *the* capitalization for input where the lowercase 
prefix matches mck, it would give:

 McKeen, McKeen, McKeen, McKeen And McKeen



 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-248) Capitalization Filter Factory

2007-05-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498700
 ] 

Yonik Seeley commented on SOLR-248:
---

Hmmm, this feels slightly strange implementing at the indexing level.
What are the ads/disads vs just lowercasing for indexing and capitalizing at 
the presentation/application layer?


 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-248) Capitalization Filter Factory

2007-05-24 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498711
 ] 

Ryan McKinley commented on SOLR-248:


It is a little strange, but (in my case anyway) i think it makes sense...  

I am indexing a bunch of metadata from a bunch of libraries (OAI-PMH) -- I want 
to display the data exactly as it came from the source, but for faceted 
browsing I need to normalize capitalization.

Implemented at the indexing level, I can have different values for the stored 
value and indexed terms.  Also, at the indexing level I can leverage existing 
Tokenizers and Filters to build the tokens that need capitalization -- it keeps 
all the configuration in schema.xml and lets the OAI - solr xml be a simple 
transformation, this way whoever takes care of this need only learn solr 
configuration, not ryan+solr configuration. 

If it is not generally useful I can keep it elsewhere - that is why we have the 
nice plugin framework!



 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-248) Capitalization Filter Factory

2007-05-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498717
 ] 

Yonik Seeley commented on SOLR-248:
---

 Implemented at the indexing level, I can have different values for the stored 
 value and indexed terms.
One downside is that it complicates certain things like wildcard or prefix 
queries (capitalizing the first letter and lowercasing the second is something 
that the QueryParser does not support).

You could still store the values verbatim, and index as all lowercase.
Then the application could capitalize the results it gets back as it sees fit.
I do see value pushing this type of logic back to the search engine though.

Of course, I think this might be a more general problem in faceting... what to 
actually use as a label for display purposes vs what the terms in the index 
were (think price formatting, labels for more complex facet queries, etc).


 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-128) Include Newer version of Jetty

2007-05-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498729
 ] 

Yonik Seeley commented on SOLR-128:
---

Yeah, I saw that hadoop issue too, which is why I was planning on some quick
indexing  querying benchmarks.

 Include Newer version of Jetty
 --

 Key: SOLR-128
 URL: https://issues.apache.org/jira/browse/SOLR-128
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
Priority: Minor
 Attachments: jetty-6.3-example.zip, Jetty6.config.patch, lib.zip, 
 start.jar


 It would be good to include an up-to-date jetty version for the example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-24 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-239:
--

Attachment: IndexSchemaStream2.patch

the attached patch (IndexSchemaStream2.patch) includes a cleaned up test case 
as well as making the IndexSchema constructors throw a SolrException since they 
are reading InputStreams (which they were before).  i think perhaps they should 
throw something a big 'stronger' but that seemed to have more wide-reaching 
implications.


 Read IndexSchema from InputStream instead of Config file
 

 Key: SOLR-239
 URL: https://issues.apache.org/jira/browse/SOLR-239
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.2
 Environment: all
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.2

 Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, 
 IndexSchemaStream2.patch


 Soon to follow patch adds a constructor to IndexSchema to allow them to be 
 created directly from InputStreams.  The overall logic for the Core's use of 
 the IndexSchema creation/use does not change however this allows java clients 
 like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
 parsed, the client can inspect an index's capabilities which is useful for 
 building generic search UI's.  ie provide a drop down list of fields to 
 search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

2007-05-24 Thread Chris Hostetter

: Also, the reason other engines require you to mark the fields in the
: index definition is because they actually index the data differently if
: it is a facet vs a normal indexed field.  It's cool that solr doesn't
: have to do this but there may be a case where it would be a good idea
: someday.

right ... if down the road we find a way to imrpvoe faceting (or any other
feature) by storing more data on disk at indexing time, then configuration
to tell you that data was there and how to use it would live in the
schema.xml -- but options that don't matter once the data is already
written (or can be differet for differnet people depending on how they
use the data) can/should live in solrconfig.xml (like the options in
mainIndex right now)

Alternately: if kwe add some custom facet caching that doens't require
any new data on disk, but builds new in memory structures, that should
live in the solrconfig.xml as well since it's the kind of thing that would
likely be configured idfferently for masters/slaves.


-Hoss



[jira] Commented: (SOLR-246) Be able to turn off TopTerm collecting in LukeRequestHandler

2007-05-24 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498767
 ] 

Erik Hatcher commented on SOLR-246:
---

Yes, that resolves the issue very nicely!

INFO: /admin/luke numTerms=0wt=rubyindent=on 0 2

versus 

INFO: /admin/luke numTerms=2wt=rubyindent=on 0 372795

*whew*  it was painful just trying numTerms  0 to give those stats :)

 Be able to turn off TopTerm collecting in LukeRequestHandler
 

 Key: SOLR-246
 URL: https://issues.apache.org/jira/browse/SOLR-246
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
 Fix For: 1.2

 Attachments: SOLR-246-LukeTopTermStopper.patch


 See discussion:
 http://www.nabble.com/Luke-request-handler-issue-tf3762155.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [Solr Wiki] Update of MoreLikeThis by ryan

2007-05-24 Thread Chris Hostetter

:  (although an interesting question is what happens if i want to find
:  similar docs based on a field htat is stored by not indexed so it *really*
:  has no analyzer)

: I think the MLT implementation would need some modification to support
: that -- what you are suggesting is to get the top tf/idf terms for a
: stored but not indexed field then query against a different field (that
: is indexed).  As is, it compares like fields to one another...

ah ... i didn't know that about MLT ... i thought you could tell if to
find words from a set of source fields and then query them against a
single target field.


-Hoss



SOLR-238 - some tweaks to the way site updates and releases happen

2007-05-24 Thread Chris Hostetter

just wanted to send out a little spam to raise visibility on SOLR-238
since we're probably going to try and do a release soon and this issue
would have some impacts on the way both website updates nad releases
happen.

I'd apprecaite it if one or two other committers could read the
comments in the issue, try out the patch, play with forrest and setting
specversion, look at how it changes hte tutorial nad think about how this
impacts site updates and release process.

The goal is to make sure everytime the tutorial.html file is generated, it
correctly identifies what version of Solr it applies to; it's done in a
way that should make it easy to reuse for other future docs (and possibly
other future ant variables besides specversion)



-Hoss



[jira] Resolved: (SOLR-246) Be able to turn off TopTerm collecting in LukeRequestHandler

2007-05-24 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley resolved SOLR-246.


Resolution: Fixed

 Be able to turn off TopTerm collecting in LukeRequestHandler
 

 Key: SOLR-246
 URL: https://issues.apache.org/jira/browse/SOLR-246
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
 Fix For: 1.2

 Attachments: SOLR-246-LukeTopTermStopper.patch


 See discussion:
 http://www.nabble.com/Luke-request-handler-issue-tf3762155.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [Solr Wiki] Update of Solr1.2 by ryan

2007-05-24 Thread Yonik Seeley

+  * link to the [http://people.apache.org/builds/lucene/solr/nightly/ nightly 
build] from http://lucene.apache.org/solr/


Nigel ( a hadoop committer) was nice enough to set up hudson to do
nightly builds (along with nutch, lucene, and hadoop).  Perhaps we
should start pointing to that?

http://lucene.zones.apache.org:8080/hudson/job/Solr-Nightly/

-Yonik


Re: [Solr Wiki] Update of Solr1.2 by ryan

2007-05-24 Thread Chris Hostetter

: Nigel ( a hadoop committer) was nice enough to set up hudson to do
: nightly builds (along with nutch, lucene, and hadoop).  Perhaps we
: should start pointing to that?
:
: http://lucene.zones.apache.org:8080/hudson/job/Solr-Nightly/

i still don't really know enough about Hudson to have an opinion on
relying on it for nightly builds, but either way i don't think we should
add a link to the nighltys from the site ... Doug's repeated reminders
have engrained in me the importance of not making nightly buidls too
accessible, they should be for developers only and we should do nothing
to imply that they are endorsed releases.

Lucene-Java does this by only linking to them from a special Developer
Resources page...

  http://lucene.apache.org/java/docs/developer-resources.html

...i think only linking to them from the wiki is a wise idea.

(Personally: I think we shouldn't save the artifacts at all, just test
that they build correctly and record the revision number -- people can use
svn checkout -r  if they want the build for a certain day.)


-Hoss



Re: [Solr Wiki] Update of Solr1.2 by ryan

2007-05-24 Thread Yonik Seeley

On 5/24/07, Chris Hostetter [EMAIL PROTECTED] wrote:

i don't think we should
add a link to the nighltys from the site ... Doug's repeated reminders
have engrained in me the importance of not making nightly buidls too
accessible, they should be for developers only and we should do nothing
to imply that they are endorsed releases.


Yeah, that's actually a general ASF policy.

-Yonik


Re: [Solr Wiki] Update of MoreLikeThis by ryan

2007-05-24 Thread Ryan McKinley

Chris Hostetter wrote:

:  (although an interesting question is what happens if i want to find
:  similar docs based on a field htat is stored by not indexed so it *really*
:  has no analyzer)

: I think the MLT implementation would need some modification to support
: that -- what you are suggesting is to get the top tf/idf terms for a
: stored but not indexed field then query against a different field (that
: is indexed).  As is, it compares like fields to one another...

ah ... i didn't know that about MLT ... i thought you could tell if to
find words from a set of source fields and then query them against a
single target field.



That would be something we could add to the solr MoreLikeThisHelper... 
contrib MoreLikeThis can take text/reader as the input.  It is just when 
you use a Document as the input that you are locked into the same fields.




-Hoss






Re: [Solr Wiki] Update of Solr1.2 by ryan

2007-05-24 Thread Mike Klaas


On 24-May-07, at 12:23 PM, Chris Hostetter wrote:



i still don't really know enough about Hudson to have an opinion on
relying on it for nightly builds, but either way i don't think we  
should

add a link to the nighltys from the site ... Doug's repeated reminders
have engrained in me the importance of not making nightly buidls too
accessible, they should be for developers only and we should do  
nothing

to imply that they are endorsed releases.


The biggest advertisement for the nightlies is that 1.2 is  
plethorically-referrenced in the wiki g.



(Personally: I think we shouldn't save the artifacts at all, just test
that they build correctly and record the revision number -- people  
can use

svn checkout -r  if they want the build for a certain day.)


Nightlies may be unofficial and  unendorsed, but they are still quite  
useful.  It would be a shame to require someone to install and grok  
subversion to play with some upcoming features.


-Mike


[jira] Updated: (SOLR-103) SQL Upload Plugin

2007-05-24 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-103:
---

Affects Version/s: (was: 1.2)
   1.3

 SQL Upload Plugin
 -

 Key: SOLR-103
 URL: https://issues.apache.org/jira/browse/SOLR-103
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Ryan McKinley
 Fix For: 1.2

 Attachments: SOLR-103-SQLUpdateRequestHandler.patch, 
 SOLR-103-SQLUpdateRequestHandler.patch


 Solr needs an easy way to upload lots of files directly from SQL.
 See also: SOLR-66 (CSV uploader)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-112) Hierarchical Handler Config

2007-05-24 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-112:
---

Affects Version/s: (was: 1.2)
   1.3

 Hierarchical Handler Config
 ---

 Key: SOLR-112
 URL: https://issues.apache.org/jira/browse/SOLR-112
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Ryan McKinley
Priority: Minor
 Fix For: 1.2

 Attachments: SOLR-112.patch


 From J.J. Larrea on SOLR-104
 2. What would make this even more powerful would be the ability to subclass 
 (meaning refine and/or extend) request handler configs: If the requestHandler 
 element allowed an attribute extends=another-requesthandler-name and 
 chained the SolrParams, then one could do something like:
   requestHandler name=search/products/all 
 class=solr.DisMaxRequestHandler 
 lst name=defaults
  float name=tie0.01/float
  str name=qf
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  /str
  ... much more, per the dismax example in the sample solrconfig.xml ...
   /requestHandler
   ... and replacing the partitioned example ...
   requestHandler name=search/products/instock 
 extends=search/products/all 
 lst name=appends
   str name=fqinStock:true/str
 /lst
   /requestHandler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-103) SQL Upload Plugin

2007-05-24 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-103:
---

Fix Version/s: (was: 1.2)
   1.3

 SQL Upload Plugin
 -

 Key: SOLR-103
 URL: https://issues.apache.org/jira/browse/SOLR-103
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Ryan McKinley
 Fix For: 1.3

 Attachments: SOLR-103-SQLUpdateRequestHandler.patch, 
 SOLR-103-SQLUpdateRequestHandler.patch


 Solr needs an easy way to upload lots of files directly from SQL.
 See also: SOLR-66 (CSV uploader)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-112) Hierarchical Handler Config

2007-05-24 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-112:
---

Fix Version/s: (was: 1.2)
   1.3

 Hierarchical Handler Config
 ---

 Key: SOLR-112
 URL: https://issues.apache.org/jira/browse/SOLR-112
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Ryan McKinley
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-112.patch


 From J.J. Larrea on SOLR-104
 2. What would make this even more powerful would be the ability to subclass 
 (meaning refine and/or extend) request handler configs: If the requestHandler 
 element allowed an attribute extends=another-requesthandler-name and 
 chained the SolrParams, then one could do something like:
   requestHandler name=search/products/all 
 class=solr.DisMaxRequestHandler 
 lst name=defaults
  float name=tie0.01/float
  str name=qf
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  /str
  ... much more, per the dismax example in the sample solrconfig.xml ...
   /requestHandler
   ... and replacing the partitioned example ...
   requestHandler name=search/products/instock 
 extends=search/products/all 
 lst name=appends
   str name=fqinStock:true/str
 /lst
   /requestHandler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-248) Capitalization Filter Factory

2007-05-24 Thread J.J. Larrea (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498817
 ] 

J.J. Larrea commented on SOLR-248:
--

While I fully agree that faceting does raise some odd issues stemming from the 
display of normally-invisible indexed values to humans, and that it  
theoretically should be responsibility of the front-end to translate index 
values into human-readable values, there are great practical advantages in both 
efficiency and convenience to making the indexed values pretty, and to 
centralize as much of that as possible in the Analysis stage.

In particular, I will try this and am very likely to put this into use this 
weekend, so thank you Ryan!  So I'm +1 to adding it to the Solr distribution, 
though to avoid confusing people it should have a JavaDoc comment explaining 
that the main use is in faceting to avoid having to introduce such common logic 
into the presentation-layer.

Regarding the implementation,

1. For 'keep' and 'okPrefix' (and were it not for reverse-compatibility issues, 
for 'words' in StopFilter), it would be nice to have a means to specify either 
a direct list or a filename in the same parameter.  A simple approach might be 
something like keep=word word word... vs. keep=file, or even keep=file 
file word word (with the requirement for backslash-escaping spaces in 
either)...  Or alternately something like txt:filename (vs. xml:filename, 
json:filename, etc.) with an unescaped : being significant.

2. Why is so much of the logic in the Factory?  This drags Solr-specific stuff 
in when a user might want to use just the Analyzer in a non-Solr context. 
Wouldn't it be better in general for Solr Analyzers to be self-complete, with 
the Factory merely being an adaptor between SolrParams  external resources and 
the Analyzer's constructor?

Also, why is keep in a synchronized map, since there is no mutator?  (I know, 
picky picky...)

Good luck with the deadline!


 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-248) Capitalization Filter Factory

2007-05-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498834
 ] 

Yonik Seeley commented on SOLR-248:
---

 Why is so much of the logic in the Factory?

I haven't looked at this specific code, but this is my preference in general.  
multiple TokenFilters are created per-field instance on the index side, and 
per-query-term on the search side, so it's better to pull all the setup you can 
out of the Filter for performance reasons.


 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-24 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498840
 ] 

Hoss Man commented on SOLR-239:
---

a few notes from skimming the patch...

1) there is a public API change here by removing the getIputStream() method 
from IndexSearcher.  probably not a big deal but important that we consider it.

2) why did you remove testDynamicCopy() from IndexSchemaTest ?

3) raw-schema.jsp on the trunk appears to be completely broken (multiple %@ 
page contentType=...% declarations), and not linked to from the admin screen 
anyway ... we might want to just remove it completely and make a note in the 
CHANGES in case people have the old URL bookmarked.

 Read IndexSchema from InputStream instead of Config file
 

 Key: SOLR-239
 URL: https://issues.apache.org/jira/browse/SOLR-239
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.2
 Environment: all
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.2

 Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, 
 IndexSchemaStream2.patch


 Soon to follow patch adds a constructor to IndexSchema to allow them to be 
 created directly from InputStreams.  The overall logic for the Core's use of 
 the IndexSchema creation/use does not change however this allows java clients 
 like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
 parsed, the client can inspect an index's capabilities which is useful for 
 building generic search UI's.  ie provide a drop down list of fields to 
 search/sort by.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-248) Capitalization Filter Factory

2007-05-24 Thread Chris Hostetter

: I haven't looked at this specific code, but this is my preference in
: general.  multiple TokenFilters are created per-field instance on the
: index side, and per-query-term on the search side, so it's better to
: pull all the setup you can out of the Filter for performance reasons.

computation can be done at factory instantiation, but it can make sense to
put the code for the computation in static methods within the Filter class
itself -- so it's more reusable outside of Solr.



-Hoss



[jira] Commented: (SOLR-248) Capitalization Filter Factory

2007-05-24 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498841
 ] 

Ryan McKinley commented on SOLR-248:


 Why is so much of the logic in the Factory? 

It seemed silly to copy the same things over and over for each time the type is 
indexed or queried...  

 why is keep in a synchronized map,

I'm not sure it needs to be, but i was being cautious...   the map is only 
created once (and never edited) but could be accessed my many threads 
simultaneously.




 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-24 Thread Will Johnson
1) there is a public API change here by removing the getIputStream() method 
from IndexSearcher.  probably not a big deal but important that we consider it.

true, that called wasn't used anywhere else in the solr trunk code.  also after 
a lot of thought i realized that it's in general a poor idea to rely on getting 
an input stream in any reliable fashion other than when it's first opened.  
(many don't support reset)  i can put it back easily if people are that worried 
about breaking compatibility but in general it seems like it's asking for 
trouble without knowing the implemntation.

2) why did you remove testDynamicCopy() from IndexSchemaTest ?

becuase it had nothing to do with testing the index schema.  as far as i could 
tell it was a ctrl-c / ctrl-v error.  that or i'm really blind and happy to put 
it back.

3) raw-schema.jsp on the trunk appears to be completely broken (multiple %@ 
page contentType=...% declarations), and not linked to from the admin screen 
anyway ... we might want to just remove it completely and make a note in the 
CHANGES in case people have the old URL bookmarked.

my patch worked but i also saw that it wasn't linked anywhere.
 
- will
 

 Read IndexSchema from InputStream instead of Config file
 

 Key: SOLR-239
 URL: https://issues.apache.org/jira/browse/SOLR-239
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.2
 Environment: all
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.2

 Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, 
 IndexSchemaStream2.patch


 Soon to follow patch adds a constructor to IndexSchema to allow them to be 
 created directly from InputStreams.  The overall logic for the Core's use of 
 the IndexSchema creation/use does not change however this allows java clients 
 like those in SOLR-20 to be able to parse an IndexSchema.  Once a schema is 
 parsed, the client can inspect an index's capabilities which is useful for 
 building generic search UI's.  ie provide a drop down list of fields to 
 search/sort by. 

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




RE: [jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-24 Thread Chris Hostetter

: 2) why did you remove testDynamicCopy() from IndexSchemaTest ?
:
: becuase it had nothing to do with testing the index schema.  as far as i
: could tell it was a ctrl-c / ctrl-v error.  that or i'm really blind and
: happy to put it back.

idon't see a test with that name defined anywhere.  it's testing that you
can declare dynamic fields and copy them using copyField ... that sounds
like an IndexSchemaTest to me  (lots of other schema related tests may be
in BasicFunctionalityTest or ConvertedLegacyTest, but we should try to use
the class specific test classes when the test is very narrow)

: 3) raw-schema.jsp on the trunk appears to be completely broken (multiple
: %@ page contentType=...% declarations), and not linked to from the

: my patch worked but i also saw that it wasn't linked anywhere.

i thought your patch left the multiple contentType declarations, but i
don't rememebr for certain now ... it's a trivial issue either way.




-Hoss



RE: [jira] Commented: (SOLR-239) Read IndexSchema from InputStream instead of Config file

2007-05-24 Thread Will Johnson
i'll have another go at the patch tomorrow morning; testing the raw-schema.jsp 
(even if it's not used) and put back the test.
 
- will



From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Thu 5/24/2007 6:02 PM
To: solr-dev@lucene.apache.org
Subject: RE: [jira] Commented: (SOLR-239) Read IndexSchema from InputStream 
instead of Config file




: 2) why did you remove testDynamicCopy() from IndexSchemaTest ?
:
: becuase it had nothing to do with testing the index schema.  as far as i
: could tell it was a ctrl-c / ctrl-v error.  that or i'm really blind and
: happy to put it back.

idon't see a test with that name defined anywhere.  it's testing that you
can declare dynamic fields and copy them using copyField ... that sounds
like an IndexSchemaTest to me  (lots of other schema related tests may be
in BasicFunctionalityTest or ConvertedLegacyTest, but we should try to use
the class specific test classes when the test is very narrow)

: 3) raw-schema.jsp on the trunk appears to be completely broken (multiple
: %@ page contentType=...% declarations), and not linked to from the

: my patch worked but i also saw that it wasn't linked anywhere.

i thought your patch left the multiple contentType declarations, but i
don't rememebr for certain now ... it's a trivial issue either way.




-Hoss