date:20070523

[jira] Updated: (SOLR-245) Coding Style

2007-05-23 Thread Otis Gospodnetic (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-245:
--

Attachment: lucene-eclipse-3.2.xml

And here is an Eclipse 3.2 code style for Lucene and its children.


 Coding Style
 

 Key: SOLR-245
 URL: https://issues.apache.org/jira/browse/SOLR-245
 Project: Solr
  Issue Type: New Feature
 Environment: Intellij IDEA 6.x
Reporter: Grant Ingersoll
Priority: Trivial
 Attachments: lucene-eclipse-3.2.xml, Lucene.xml


 Per discussion at 
 http://www.mail-archive.com/solr-dev@lucene.apache.org/msg04068.html, here is 
 my attempt at an IntelliJ coding style template that more or less fits the 
 Lucene/Solr style.   Please feel free to change as needed, it is just a 
 starting point.
 As per Doug's discussion way back when 
 (http://www.gossamer-threads.com/lists/lucene/java-dev/18320?search_string=code%20style;#18320),
  I don't think it is a big deal if submitted code isn't 100% formatted the 
 way we want.
 Attachment to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Luke request handler issue

2007-05-23 Thread Erik Hatcher



On May 22, 2007, at 10:42 PM, Ryan McKinley wrote:

If thats the case, I think the .diff you posted is fine...


Not really, because I commented out a bit to get past things.  It was  
more than just setting the default to zero.


The only thing I would change is I think the default should be some  
positive number.  For the app where you want the default to be 0,  
you can initialize the request handler with:


  requestHandler ... 
lst name=defaults
 int name=numTerms0/int
/lst
  /requestHandler


I don't get why the default should be non-zero.  The most common use  
case would be field/type/size introspection, I presume.  I don't see  
getting top terms being as needed.  But, I'm fine with the default  
being non-zero if others feel it should be - setting it in the config  
file is no big deal for me :)


Erik

Re: Luke request handler issue

2007-05-23 Thread Yonik Seeley


On 5/22/07, Ryan McKinley [EMAIL PROTECTED] wrote:

How do you imagine the parameters would be aligned?


It just seemed like they were doing largely the same thing...
specify if you want terms enumerated in order, or sorted,
specify the number of top terms, etc.


It could use the same per/field specification:
  f.category.facet.limit=5

perhaps it Luke should support:
  terms.top=10
   and
  f.category.terms.top=10

I'm reluctant to go this route because it makes asking if any we should
calculate top terms or not difficut (ok, akward) and i'm not sure it
helps that much...


Then one could have topTerms=true like highlighting/faceting do, or
one could perhaps specify a field list
 topTerms=fooField,barField
or
 topTerms=*

If someone wants to retrieve *all* of the terms in a specific field,
it doesn't seem like they should have to get all of the terms in all
other fields too, right?

All this configurability doesn't need to be implemented now, but we
should plan for it and leave room in the interface if possible.

-Yonik

Re: Luke request handler issue

2007-05-23 Thread Ryan McKinley


Erik Hatcher wrote:


On May 22, 2007, at 10:42 PM, Ryan McKinley wrote:

If thats the case, I think the .diff you posted is fine...


Not really, because I commented out a bit to get past things.  It was 
more than just setting the default to zero.




the bit you commented calculated numTerms across all fields (forcing it 
to walk through all terms) since this is not all that useful and 
configuring it seems overkill, I don't mind throwing it out.


I'll take a look and make sure though.


The only thing I would change is I think the default should be some 
positive number.  For the app where you want the default to be 0, you 
can initialize the request handler with:


  requestHandler ... 
lst name=defaults
 int name=numTerms0/int
/lst
  /requestHandler


I don't get why the default should be non-zero.  The most common use 
case would be field/type/size introspection, I presume.  


I have been using it as a visual inspection of what it in the index. 
The default page that shows all information for all fields is good 
because (without figuring out what parameters do what) you can just see 
what is in the index...  for the indexes I have worked with (so far 
300K docs) that has been fine.


Luke (the app) opens showing top terms across all fields - then you 
click on individual fields to see the top terms for that field.


I would like the default (no params / no config) be the most useful to 
people who are just starting with lucene/solr and want to know what all 
this talk about terms is.


programmatic uses can easily send numTerms=0 in the request or 
configure it in the defaults.



I don't see 
getting top terms being as needed.  But, I'm fine with the default being 
non-zero if others feel it should be - setting it in the config file is 
no big deal for me :)


Erik

[jira] Commented: (SOLR-208) RSS feed XSL example

2007-05-23 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498280
 ] 

Yonik Seeley commented on SOLR-208:
---

+1 on including this in the 1.2 release

 RSS feed XSL example
 

 Key: SOLR-208
 URL: https://issues.apache.org/jira/browse/SOLR-208
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.2
Reporter: Brian Whitman
 Assigned To: Hoss Man
Priority: Trivial
 Attachments: atom.xsl, rss.xsl


 A quick .xsl file for transforming solr queries into RSS feeds. To get the 
 date and time in properly you'll need an XSL 2.0 processor, as in 
 http://wiki.apache.org/solr/XsltResponseWriter .  Tested to work with the 
 example solr distribution in the nightly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Luke request handler issue

2007-05-23 Thread Ryan McKinley



If someone wants to retrieve *all* of the terms in a specific field,
it doesn't seem like they should have to get all of the terms in all
other fields too, right?



As implemented, you get the top terms for all the fields you ask for. 
By default this is all of them.  If you specify a field (with fl=xxx) 
you only get that field's top terms:

 http://localhost:8983/solr/admin/luke?fl=textnumTerms=1000

It may be useful to want 10 terms from field 'A' and 100 for field 'B', 
but for now, that should probably be done with faceting.


Faceting returns readable values (from the schema) while Luke deals with 
the raw lucene index.



All this configurability doesn't need to be implemented now, but we
should plan for it and leave room in the interface if possible.



that sounds good.  For now, making numTerms=0 not walk through should be 
enough.  The rest should come as we see a specific need for it.

[jira] Created: (SOLR-246) Be able to turn off TopTerm collecting in LukeRequestHandler

2007-05-23 Thread Ryan McKinley (JIRA)

Be able to turn off TopTerm collecting in LukeRequestHandler


 Key: SOLR-246
 URL: https://issues.apache.org/jira/browse/SOLR-246
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
 Fix For: 1.2


See discussion:

http://www.nabble.com/Luke-request-handler-issue-tf3762155.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

2007-05-23 Thread Ryan McKinley (JIRA)

Allow facet.field=* to facet on all fields (without knowing what they are)
--

 Key: SOLR-247
 URL: https://issues.apache.org/jira/browse/SOLR-247
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor


I don't know if this is a good idea to include -- it is potentially a bad idea 
to use it, but that can be ok.

This came out of trying to use faceting for the LukeRequestHandler top term 
collecting.
http://www.nabble.com/Luke-request-handler-issue-tf3762155.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

2007-05-23 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-247:
---

Attachment: SOLR-247-FacetAllFields.patch

 Allow facet.field=* to facet on all fields (without knowing what they are)
 --

 Key: SOLR-247
 URL: https://issues.apache.org/jira/browse/SOLR-247
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-247-FacetAllFields.patch


 I don't know if this is a good idea to include -- it is potentially a bad 
 idea to use it, but that can be ok.
 This came out of trying to use faceting for the LukeRequestHandler top term 
 collecting.
 http://www.nabble.com/Luke-request-handler-issue-tf3762155.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Luke request handler issue

2007-05-23 Thread Yonik Seeley


On 5/23/07, Ryan McKinley [EMAIL PROTECTED] wrote:

 If someone wants to retrieve *all* of the terms in a specific field,
 it doesn't seem like they should have to get all of the terms in all
 other fields too, right?


As implemented, you get the top terms for all the fields you ask for.
By default this is all of them.  If you specify a field (with fl=xxx)
you only get that field's top terms:
  http://localhost:8983/solr/admin/luke?fl=textnumTerms=1000

It may be useful to want 10 terms from field 'A' and 100 for field 'B',
but for now, that should probably be done with faceting.

Faceting returns readable values (from the schema) while Luke deals with
the raw lucene index.


Ah, yes... I see both as being useful.
If solr does know about the fieldType, should the default be to use
the external (human readable) values?

-Yonik

[jira] Updated: (SOLR-246) Be able to turn off TopTerm collecting in LukeRequestHandler

2007-05-23 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-246:
---

Attachment: SOLR-246-LukeTopTermStopper.patch

when topTerms=0, this will not walk through the reader.terms()

This is useful for large indexes.

Eric - does this work for you?

 Be able to turn off TopTerm collecting in LukeRequestHandler
 

 Key: SOLR-246
 URL: https://issues.apache.org/jira/browse/SOLR-246
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
 Fix For: 1.2

 Attachments: SOLR-246-LukeTopTermStopper.patch


 See discussion:
 http://www.nabble.com/Luke-request-handler-issue-tf3762155.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-246) Be able to turn off TopTerm collecting in LukeRequestHandler

2007-05-23 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498296
 ] 

Ryan McKinley commented on SOLR-246:


Oops, I mean:

Erik - does this work for you?

 Be able to turn off TopTerm collecting in LukeRequestHandler
 

 Key: SOLR-246
 URL: https://issues.apache.org/jira/browse/SOLR-246
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
 Fix For: 1.2

 Attachments: SOLR-246-LukeTopTermStopper.patch


 See discussion:
 http://www.nabble.com/Luke-request-handler-issue-tf3762155.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Luke request handler issue

2007-05-23 Thread Ryan McKinley


Yonik Seeley wrote:

On 5/23/07, Ryan McKinley [EMAIL PROTECTED] wrote:

 If someone wants to retrieve *all* of the terms in a specific field,
 it doesn't seem like they should have to get all of the terms in all
 other fields too, right?


As implemented, you get the top terms for all the fields you ask for.
By default this is all of them.  If you specify a field (with fl=xxx)
you only get that field's top terms:
  http://localhost:8983/solr/admin/luke?fl=textnumTerms=1000

It may be useful to want 10 terms from field 'A' and 100 for field 'B',
but for now, that should probably be done with faceting.

Faceting returns readable values (from the schema) while Luke deals with
the raw lucene index.


Ah, yes... I see both as being useful.
If solr does know about the fieldType, should the default be to use
the external (human readable) values?



Thats how it currently works:

  NamedListInteger list = new NamedListInteger();
  for (TermInfo i : aslist) {
String txt = i.term.text();
SchemaField ft = schema.getFieldOrNull( i.term.field() );
if( ft != null ) {
  txt = ft.getType().indexedToReadable( txt );
}
list.add( txt, i.docFreq );
  }
  return list;

When you inspect a single document, it returns both.

ryan

[jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

2007-05-23 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498311
 ] 

Erik Hatcher commented on SOLR-247:
---

I can see value in supporting the dynamicField wildcard syntax, so *_facet 
would work.   In fact, maybe that'd be a good syntax to support for all fl-like 
parameters too. 

* scares me, and it'd certainly be discouraged for anything but small indexes!  
 But of course I don't have to use it.   :) 

 Allow facet.field=* to facet on all fields (without knowing what they are)
 --

 Key: SOLR-247
 URL: https://issues.apache.org/jira/browse/SOLR-247
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-247-FacetAllFields.patch


 I don't know if this is a good idea to include -- it is potentially a bad 
 idea to use it, but that can be ok.
 This came out of trying to use faceting for the LukeRequestHandler top term 
 collecting.
 http://www.nabble.com/Luke-request-handler-issue-tf3762155.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-128) Include Newer version of Jetty

2007-05-23 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498335
 ] 

Yonik Seeley commented on SOLR-128:
---

OK, let's quickly go ahead then.  There have been some JSP issues with our 
current Jetty version anyway.

I'd like to have all the core changes done in the next few days so we can get a 
release out by the end of this month (1 week away).

 Include Newer version of Jetty
 --

 Key: SOLR-128
 URL: https://issues.apache.org/jira/browse/SOLR-128
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: jetty-6.3-example.zip, Jetty6.config.patch, lib.zip, 
 start.jar


 It would be good to include an up-to-date jetty version for the example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (SOLR-128) Include Newer version of Jetty

2007-05-23 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley reassigned SOLR-128:
--

Assignee: Ryan McKinley

 Include Newer version of Jetty
 --

 Key: SOLR-128
 URL: https://issues.apache.org/jira/browse/SOLR-128
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
Priority: Minor
 Attachments: jetty-6.3-example.zip, Jetty6.config.patch, lib.zip, 
 start.jar


 It would be good to include an up-to-date jetty version for the example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-230) make post.jar support better args for using tutorial

2007-05-23 Thread Hoss Man (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man resolved SOLR-230.
---

Resolution: Fixed
Fix Version/s: 1.2

Committed revision 541046.

make post.jar support better args for using tutorial

Key: SOLR-230
URL: https://issues.apache.org/jira/browse/SOLR-230
Project: Solr
Issue Type: New Feature
Components: update
Reporter: Hoss Man
Assigned To: Hoss Man
Fix For: 1.2

Attachments: SOLR-230.patch

SOLR-86 create post.jar which eliminated the need for post.sh ... but as
noticed in
SOLR-164 there are still some cases in the tutorial that require direct use
of curl (deleting) and there are some nice things about post.sh that post.jar
doesn't support (defaulting the URL)
this issue is to tackle some of the ideas Bertrand and I posted as a comment
in SOLR-86 after it was resolved
Bertrand Delacretaz [19/Feb/07 12:35 AM] ...
Considering the tutorial examples
(http://lucene.apache.org/solr/tutorial.html), it'd be useful to allow this
to POST its standard input, or the contents of a command-line parameter: ...
Hoss Man [19/Feb/07 11:50 AM]
yeah ... i think we should hardcode http://localhost:8983/solr/update with a
possible override by system prop, then add either a command line switch other
another system prop indicating to use the command line as filenames or as raw
data, and another op for stdin.
java -jar -Ddata=files post.jar *.xml
java -jar post.jar *.xml ... data=files being the default
echo deletequeryname:DDR/query/delete | java -jar -Ddata=stdin
post.jar
cat *.xml | java -jar -Ddata=stdin post.jar
java -jar -Ddata=args post.jar deletequeryname:DDR/query/delete
java -jar -Durl=http://localhost:8983/solr/update post.jar *.xml

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

2007-05-23 Thread Chris Hostetter


: One issue is that fl=XXX is typically a field list separated with , or
: |, facet.field expects each field as a separate parameter.

personally, i've never really lke that splitting behavior of fl, i'd
really rather not add it to facet.field.


-Hoss

[jira] Assigned: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

2007-05-23 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned SOLR-238:
-

Assignee: Hoss Man

 [Patch] The tutorial on our website is against trunk which causes confusion 
 by user
 ---

 Key: SOLR-238
 URL: https://issues.apache.org/jira/browse/SOLR-238
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Thorsten Scherler
 Assigned To: Hoss Man
 Attachments: SOLR-238.diff, SOLR-238.diff, SOLR-238.png


 The patch will add a note to the tutorial page with the following headsup:
 This is documentation for the development version (TRUNK). Some instructions 
 may only work if you are working against svn head.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

2007-05-23 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498346
 ] 

Hoss Man commented on SOLR-238:
---

i'm going to try and look into this today or tomorrow ... Thorsten, a couple of 
quick questions...

1) is the file name symbols-project-v10.ent significant in some way, or can 
we make it something a little easier for people to understand, like 
solr-specific-forrest-variables.ent ?

(in particular, the v10 jumps out at me as being confusing and odd .. version 
10 of what?)

2) is there any reason why forrest would care if the symbols file lives in the 
resources directory, or can it live anywhere as long as the relative URI in the 
!ENTITY declaration points at the right spot?

3) what is the purpose of the catalog.xcat file your patch adds?

 [Patch] The tutorial on our website is against trunk which causes confusion 
 by user
 ---

 Key: SOLR-238
 URL: https://issues.apache.org/jira/browse/SOLR-238
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Thorsten Scherler
 Assigned To: Hoss Man
 Attachments: SOLR-238.diff, SOLR-238.diff, SOLR-238.png


 The patch will add a note to the tutorial page with the following headsup:
 This is documentation for the development version (TRUNK). Some instructions 
 may only work if you are working against svn head.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

2007-05-23 Thread Thorsten Scherler (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498364
]

Thorsten Scherler commented on SOLR-238:

cheers Hoss!

1) yes, you can change the name. I will add a new version.

2)
a) no, you can change it in the forrest.properties:
#project.schema-dir=${project.resources-dir}/schema
is the default.
You can change it to something like
project.schema-dir=src/schema
if you want, just uncomment the property.
b) not sure about the path better use the forrest.properties.

3) As I understand it (used it the first time in this contribution) it links to
the *. ent file, giving the benefit that you can import it to your favorite xml
editor:
http://forrest.apache.org/docs_0_70/catalog.html
further (as I understand it) forrest is using it to look up the *.ent file.

[Patch] The tutorial on our website is against trunk which causes confusion
by user
---

Key: SOLR-238
URL: https://issues.apache.org/jira/browse/SOLR-238
Project: Solr
Issue Type: Improvement
Components: documentation
Reporter: Thorsten Scherler
Assigned To: Hoss Man
Attachments: SOLR-238.diff, SOLR-238.diff, SOLR-238.diff, SOLR-238.png

The patch will add a note to the tutorial page with the following headsup:
This is documentation for the development version (TRUNK). Some instructions
may only work if you are working against svn head.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

2007-05-23 Thread Hoss Man (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-238:
--

Attachment: SOLR-238.diff

Thorsten (et al) I'd apprecaite your feedback on this patch revision...

1) moved the location of the variables file to ./build using a deep ../../ path
... i didn't change the forrest.properties to do this, because I wanted to
keep the catalog.xcat where it was since that seems to be standard.

2) from a clean check out ant init-forrest-entities is now a prepreq for
forrest to run properly, otherwise the XML doesn't validate because the
entities can't resolve. most of hte core ant tasks take care of this via
dependencies.

a couple of notes about the specifics...

a) i used the specversion since it's the most precise of our version numbers,
it contains the datetime of dev builds, and is the number you would expect for
official builds
b) i tried to make the entity name consistent with the property name so that if
someone smarter then me knows a way to get ant to dump all properties using a
filterchain we can refer to any ant properties as entities not just
solr.specversion
c) if committed, Website_Update_HOWTO needs note about ant
init-forrest-entities
d) if committed, HowToRelease needs updated to indicate that the docs on the
branch need regenerated/commited after building/testing the code *sith
specversion set*, but before packaging.

[Patch] The tutorial on our website is against trunk which causes confusion
by user
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-248) Capitalization Filter Factory

2007-05-23 Thread Ryan McKinley (JIRA)

Capitalization Filter Factory
-

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor


For tokens that are used in faceting, it is nice to have standard 
capitalization.  

I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-248) Capitalization Filter Factory

2007-05-23 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-248:
---

Attachment: SOLR-248-CapitalizationFilter.patch

Implementation and test...

filter class=solr.CapitalizationFilterFactory onlyFirstWord=false 
keep=and or the is my or de maxTokenLength=40 maxWordCount=4 
okPrefix=McK forceFirstLetter=true /

onlyFirstWord=false -- this capatalizes every word

keep=and or the is my or de -- don't change capitalization for these words

forceFirstLetter=true -- capitalize the first letter of the Token (not word) 
even if it is in the keep list

maxTokenLength=40 -- if the token is longer then 40 chars, don't even try to 
capitalize it

maxWordCount=4 -- if there are more then 4 words, don't try capitalizing


 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-248) Capitalization Filter Factory

2007-05-23 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498488
 ] 

Hoss Man commented on SOLR-248:
---

1) would it make sense for the keep option to refer to a file, using the same 
format as StopFilter ... that way it's easy to reuse the same file (which seems 
like it would be a common case.

2) what is the point of forceFirstLetter=true ? ... if you want to force 
capitalization, what's the point of making hte keep list?

3) is okPrefix going to force the case for things that have that prefix in an 
alternate case, or only allow that casing to remain (ie: if i index McKeen, 
Mckeen, mckeen and MCKEEN what tokens do i wind up with?)

 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-245) Coding Style

Re: Luke request handler issue

Re: Luke request handler issue

Re: Luke request handler issue

[jira] Commented: (SOLR-208) RSS feed XSL example

Re: Luke request handler issue

[jira] Created: (SOLR-246) Be able to turn off TopTerm collecting in LukeRequestHandler

[jira] Created: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

[jira] Updated: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

Re: Luke request handler issue

[jira] Updated: (SOLR-246) Be able to turn off TopTerm collecting in LukeRequestHandler

[jira] Commented: (SOLR-246) Be able to turn off TopTerm collecting in LukeRequestHandler

Re: Luke request handler issue

[jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

[jira] Commented: (SOLR-128) Include Newer version of Jetty

[jira] Assigned: (SOLR-128) Include Newer version of Jetty

[jira] Resolved: (SOLR-230) make post.jar support better args for using tutorial

Re: [jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

[jira] Assigned: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

[jira] Commented: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

[jira] Commented: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

[jira] Updated: (SOLR-238) [Patch] The tutorial on our website is against trunk which causes confusion by user

[jira] Created: (SOLR-248) Capitalization Filter Factory

[jira] Updated: (SOLR-248) Capitalization Filter Factory

[jira] Commented: (SOLR-248) Capitalization Filter Factory

25 matches

Site Navigation

Mail list logo

Footer information