Re: dynamic copyFields

2007-05-04 Thread Chris Hostetter

: Syntax aside, the major implication is that DynamicCopy would need a
: virtual function:
:   SchemaField getTargetField()

I don't think i've ever looked at DynamicField before today ... but i see
what you're talking about, you mean that final SchemaField targetField
would need to be replaced with SchemaField getTargetField(String
sourceField) right?

yeah that seems simple enough, i'm not sure what Yonik ment by this
comment...

  // Instead of storing a type, this could be implemented as a hierarchy
  // with a virtual matches().
  // Given how often a search will be done, however, speed is the overriding
  // concern and I'm not sure which is faster.

... i don't see how this ever comes into play with search.

on the issue of syntax and regex vs glob, i would leave it as a glob for
now since that's already supported by the syntax and the impl ... if we
want to support regexes that should be done seperately in
DynamicReplacement where it can be leveraged by both copyField and
dynamicField



-Hoss



[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support

2007-05-04 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493770
 ] 

Hoss Man commented on SOLR-69:
--

looking back at the two main use cases Yonik described in his comment from 
06/Feb/07...

 At the most basic level, A request for MLT results for a single doc by 
uniqueKey (case#1) is just a simplistic example of asking for MLT results for 
an arbitrary query (case#2) ... that arbitrary query just happens to be on a 
uniqueKey field, and only returns one result.

Where things get more complicated is when you start returning other tier 2 
type information about the request -- which begs the question what is tier 1 
data?   If the MLT results are added as tier 2 data to 
StandardRequestHandler response, then all of the other tier 2 data blocks 
(highlighting, faceting, debugQuery score explanation, etc..) still refer to 
the main result from the original query ... this may be what you want in use 
case #2, but doesn't really make sense for use case #1, where the tier 1 main 
result only contains the single document you asked for by id ... the score 
explanation and facet count numbers aren't very interesting in that case.

for case #1, what you really want is for the MLT data to be treated as the 
primary (tier 1) result set, and all of hte tier 2 data is about those 
results ... highlighting is done on the MLT docs, facet counts are for the MLT 
docs, debugQuery score explanation tells you *why* the MLT docs are like your 
original docs, etc..

Case #1 and case #2 are both useful, to address Brian's 02/May/07 comment..

  I've personally never understood the more documents 
  that don't match this query but are like the documents 
  in this query ... I'm confused as to how querying by 
  query would work -- if a query for 'apache' returned 10 
  docs, would MLT work on each one and generate n more 
  docs per doc? And would the original query results get 
  returned? What's the ordering? 

in your example, yes ... the users main search on apache would return 10 
results sorted by whatever sort they specified.  for each of those 10 results, 
N similar results might me listed to the side (in a smaller font, or as a pop 
up widget) sorted most likely by how similar they are.  even if you don't want 
to surface those similar docs right there on the main result page, you still 
need to execute the MLT logic as part of hte initial request to know if there 
there are *any* similar docs (so you can surface the link/button for displaying 
them to the user.

I would even argue there is actually a third use case ... 

--
Case 3)
  The GUI queries the standard request handler to display a list of documents, 
with a single subsequent list of similar mlt documents that have things in 
common with all of the docs in the current page of results displayed elsewhere 
on the page.
--

...where case #2 is about having separate MLT lists for each of hte matching 
reuslts, this case is about having a single if you are interested in *all* of 
these items, you might also be interested in these other items list.

case#1 and case#3 can both easily be satisfied with a single 
MoreLikeThisHandler which takes as it's input a generic query (ie: 
q=id:12345 for case#1, and q=apache for case#3) and then generates a single 
tier 1 result block of MLT results that relate to all of the docs matching 
that query (simpel case of 1 doc for case#1) ... all other tier 2 data would 
be in regards to this main MLT result set.

case#2 would still easily be handled by having some new tier 2 MLT data added 
to the StandardRequestHandler.



 PATCH:MoreLikeThis support
 --

 Key: SOLR-69
 URL: https://issues.apache.org/jira/browse/SOLR-69
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Bertrand Delacretaz
Priority: Minor
 Attachments: lucene-queries-2.0.0.jar, lucene-queries-2.1.1-dev.jar, 
 SOLR-69-MoreLikeThisRequestHandler.patch, SOLR-69.patch, SOLR-69.patch, 
 SOLR-69.patch, SOLR-69.patch


 Here's a patch that implements simple support of Lucene's MoreLikeThis class.
 The MoreLikeThisHelper code is heavily based on (hmm...lifted from might be 
 more appropriate ;-) Erik Hatcher's example mentioned in 
 http://www.mail-archive.com/[EMAIL PROTECTED]/msg00878.html
 To use it, add at least the following parameters to a standard or dismax 
 query:
   mlt=true
   mlt.fl=list,of,fields,which,define,similarity
 See the MoreLikeThisHelper source code for more parameters.
 Here are two URLs that work with the example config, after loading all 
 documents found in exampledocs in the index (just to show that it seems to 
 work - of course you need a larger corpus to make it interesting):
 http://localhost:8983/solr/select/?stylesheet=q=apacheqt=standardmlt=truemlt.fl=manu,catmlt.mindf=1mlt.mindf=1fl=id,score
 

Re: dynamic copyFields

2007-05-04 Thread Ryan McKinley

Chris Hostetter wrote:

: Syntax aside, the major implication is that DynamicCopy would need a
: virtual function:
:   SchemaField getTargetField()

I don't think i've ever looked at DynamicField before today ... but i see
what you're talking about, you mean that final SchemaField targetField
would need to be replaced with SchemaField getTargetField(String
sourceField) right?



exactly.



yeah that seems simple enough, i'm not sure what Yonik ment by this
comment...

  // Instead of storing a type, this could be implemented as a hierarchy
  // with a virtual matches().
  // Given how often a search will be done, however, speed is the overriding
  // concern and I'm not sure which is faster.

... i don't see how this ever comes into play with search.



I don't either... I think it only happens at indexing.  ResponseWriters 
do not know (or care) if a field is from a copy field or not.




on the issue of syntax and regex vs glob, i would leave it as a glob for
now since that's already supported by the syntax and the impl ... 


agreed.



if we want to support regexes that should be done seperately in
DynamicReplacement where it can be leveraged by both copyField and
dynamicField



glob is fine for what i need.


Thanks for the feedback, i'll post something on JIRA soon.

ryan


[jira] Commented: (SOLR-86) [PATCH] standalone updater cli based on httpClient

2007-05-04 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493784
 ] 

Will Johnson commented on SOLR-86:
--

has anyone brought up the idea of creating post.bat and post.sh scripts that 
use this java class instead of the curl example that currently ships in 
example/exampledocs?  it would be one less thing for people to figure out and 
possibly screw up. 

 [PATCH]  standalone updater cli based on httpClient
 ---

 Key: SOLR-86
 URL: https://issues.apache.org/jira/browse/SOLR-86
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Thorsten Scherler
 Assigned To: Erik Hatcher
 Attachments: simple-post-tool-2007-02-15.patch, 
 simple-post-tool-2007-02-16.patch, 
 simple-post-using-urlconnection-approach.patch, solr-86.diff, solr-86.diff


 We need a cross platform replacement for the post.sh. 
 The attached code is a direct replacement of the post.sh since it is actually 
 doing the same exact thing.
 In the future one can extend the CLI with other feature like auto commit, 
 etc.. 
 Right now the code assumes that SOLR-85 is applied since we using the servlet 
 of this issue to actually do the update.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-69) PATCH:MoreLikeThis support

2007-05-04 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-69:
--

Attachment: SOLR-69-MoreLikeThisRequestHandler.patch

Refactored the MoreLikeThisRequestHandler so that it can support case #1, #2, #3
- added faceting to the MoreLikeThisHandler
- made it possible to remove the original match from the response.  This makes 
the response look the same as ones that come from /select
- Added documentation to: http://wiki.apache.org/solr/MoreLikeThis


 PATCH:MoreLikeThis support
 --

 Key: SOLR-69
 URL: https://issues.apache.org/jira/browse/SOLR-69
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Bertrand Delacretaz
Priority: Minor
 Attachments: lucene-queries-2.0.0.jar, lucene-queries-2.1.1-dev.jar, 
 SOLR-69-MoreLikeThisRequestHandler.patch, 
 SOLR-69-MoreLikeThisRequestHandler.patch, SOLR-69.patch, SOLR-69.patch, 
 SOLR-69.patch, SOLR-69.patch


 Here's a patch that implements simple support of Lucene's MoreLikeThis class.
 The MoreLikeThisHelper code is heavily based on (hmm...lifted from might be 
 more appropriate ;-) Erik Hatcher's example mentioned in 
 http://www.mail-archive.com/[EMAIL PROTECTED]/msg00878.html
 To use it, add at least the following parameters to a standard or dismax 
 query:
   mlt=true
   mlt.fl=list,of,fields,which,define,similarity
 See the MoreLikeThisHelper source code for more parameters.
 Here are two URLs that work with the example config, after loading all 
 documents found in exampledocs in the index (just to show that it seems to 
 work - of course you need a larger corpus to make it interesting):
 http://localhost:8983/solr/select/?stylesheet=q=apacheqt=standardmlt=truemlt.fl=manu,catmlt.mindf=1mlt.mindf=1fl=id,score
 http://localhost:8983/solr/select/?stylesheet=q=apacheqt=dismaxmlt=truemlt.fl=manu,catmlt.mindf=1mlt.mindf=1fl=id,score
 Results are added to the output like this:
 response
   ...
   lst name=moreLikeThis
 result name=UTF8TEST numFound=1 start=0 maxScore=1.5293242
   doc
 float name=score1.5293242/float
 str name=idSOLR1000/str
   /doc
 /result
 result name=SOLR1000 numFound=1 start=0 maxScore=1.5293242
   doc
 float name=score1.5293242/float
 str name=idUTF8TEST/str
   /doc
 /result
   /lst
 I haven't tested this extensively yet, will do in the next few days. But 
 comments are welcome of course.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-86) [PATCH] standalone updater cli based on httpClient

2007-05-04 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493828
 ] 

Hoss Man commented on SOLR-86:
--

this will ship in the next release, and the tutorial that will ship with that 
release already refers to it.

creating a post.sh or post.bat that delegates to this tool seems like it can 
only complicate things ... file perms, line endings, shell conventions, shebang 
lines ... all things where portability is a concern, but java -jar post.jar 
*.xml works damn near anywhere.

 [PATCH]  standalone updater cli based on httpClient
 ---

 Key: SOLR-86
 URL: https://issues.apache.org/jira/browse/SOLR-86
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Thorsten Scherler
 Assigned To: Erik Hatcher
 Attachments: simple-post-tool-2007-02-15.patch, 
 simple-post-tool-2007-02-16.patch, 
 simple-post-using-urlconnection-approach.patch, solr-86.diff, solr-86.diff


 We need a cross platform replacement for the post.sh. 
 The attached code is a direct replacement of the post.sh since it is actually 
 doing the same exact thing.
 In the future one can extend the CLI with other feature like auto commit, 
 etc.. 
 Right now the code assumes that SOLR-85 is applied since we using the servlet 
 of this issue to actually do the update.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [Solr Wiki] Update of Solr1.2 by ryan

2007-05-04 Thread Mike Klaas

On 5/4/07, Apache Wiki [EMAIL PROTECTED] wrote:


--
  requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 
/
  }}}

+  * audit schema.xml duplicate field definition behavior.  As is {{
+   fieldType name=aaa  ... /
+   fieldType name=aaa  ... /
+
+   field name=aaa  ... /
+   field name=aaa  ... /
+
+   dynamicField name=aaa_*  ... /
+   dynamicField name=aaa_*  ... /
+ }} quietly continues -- tossing out the first definition.  This should add a 
severe error and optionally abort (using SOLR-179)
+


Your description is clear, but not the example.  Is it a problem if a
fieldType and field have the same name, or just two fields?  Also, the
field/dyn field definition seems okay (because of the underscore).
Perhaps we should enforce * to match something (like .+)?

-MIke


[jira] Created: (SOLR-226) support dynamic fields as copyField destination

2007-05-04 Thread Ryan McKinley (JIRA)
support dynamic fields as copyField destination
---

 Key: SOLR-226
 URL: https://issues.apache.org/jira/browse/SOLR-226
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
 Fix For: 1.3


I'd like to use a dynamic field as the destination of a copyField:

Given:
  field name=tag_*   type=string ... /
  field name=text_*  type=text   ... /

I want:
  copyField source=tag_* dest=text_* / 


For background see:
http://www.nabble.com/copyField-to-a-dynamic-field-tf2300115.html#a6419101

http://www.nabble.com/dynamic-copyFields-tf3683816.html#a10296520

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-216) Improvements to solr.py

2007-05-04 Thread Brian Whitman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493835
 ] 

Brian Whitman commented on SOLR-216:


Hi Jason, this is really great. I had one small issue -- highlighting did not 
seem to work. I looked into your code and found you were using hi.fl and hi, 
not hl.fl and hl. Not sure if your solr expects hi, but mine expects hl. Once I 
changed line 453  457 to hl instead of hi it works fine. 


 Improvements to solr.py
 ---

 Key: SOLR-216
 URL: https://issues.apache.org/jira/browse/SOLR-216
 Project: Solr
  Issue Type: Improvement
  Components: clients - python
Affects Versions: 1.2
Reporter: Jason Cater
Priority: Trivial
 Attachments: solr.py


 I've taken the original solr.py code and extended it to include higher-level 
 functions.
   * Requires python 2.3+
   * Supports SSL (https://) schema
   * Conforms (mostly) to PEP 8 -- the Python Style Guide
   * Provides a high-level results object with implicit data type conversion
   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [Solr Wiki] Update of Solr1.2 by ryan

2007-05-04 Thread Ryan McKinley

Mike Klaas wrote:

On 5/4/07, Apache Wiki [EMAIL PROTECTED] wrote:

-- 

  requestParsers enableRemoteStreaming=false 
multipartUploadLimitInKB=2048 /

  }}}

+  * audit schema.xml duplicate field definition behavior.  As is {{
+   fieldType name=aaa  ... /
+   fieldType name=aaa  ... /
+
+   field name=aaa  ... /
+   field name=aaa  ... /
+
+   dynamicField name=aaa_*  ... /
+   dynamicField name=aaa_*  ... /
+ }} quietly continues -- tossing out the first definition.  This 
should add a severe error and optionally abort (using SOLR-179)

+


Your description is clear, but not the example.  Is it a problem if a
fieldType and field have the same name, or just two fields?  Also, the
field/dyn field definition seems okay (because of the underscore).
Perhaps we should enforce * to match something (like .+)?



Sorry, the problem is not between the various types, it is within them. 
 There is no problem with aaa as both a fieldType and field.  I have 
not done the audit yet, so i can't fully describe what happens in each 
case.


I noticed was:

 field name=aaa type=text ... /
 field name=aaa type=string ... /

the first field (with type text) is quietly thrown away and it uses the 
second.


I looked quickly at the other cases and looks like fieldType does the 
same thing.  dynamicField are different in that it will ignore the 
second one.


It is an easy fix just to check if anything comes out of the map when 
you put something in.


ryan





[jira] Updated: (SOLR-226) support dynamic fields as copyField destination

2007-05-04 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-226:
---

Attachment: SOLR-226-DynamicCopyField.patch

 support dynamic fields as copyField destination
 ---

 Key: SOLR-226
 URL: https://issues.apache.org/jira/browse/SOLR-226
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-226-DynamicCopyField.patch


 I'd like to use a dynamic field as the destination of a copyField:
 Given:
   field name=tag_*   type=string ... /
   field name=text_*  type=text   ... /
 I want:
   copyField source=tag_* dest=text_* / 
 For background see:
 http://www.nabble.com/copyField-to-a-dynamic-field-tf2300115.html#a6419101
 http://www.nabble.com/dynamic-copyFields-tf3683816.html#a10296520

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-227) Add errors if you define multiple fieldTypes, fields, dynamicFields, requestHandlers with the same name

2007-05-04 Thread Ryan McKinley (JIRA)
Add errors if you define multiple fieldTypes, fields, dynamicFields, 
requestHandlers with the same name
---

 Key: SOLR-227
 URL: https://issues.apache.org/jira/browse/SOLR-227
 Project: Solr
  Issue Type: Bug
Reporter: Ryan McKinley
 Fix For: 1.2


The current implementation quietly tosses out one definition in favor of the 
other...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-227) Add errors if you define multiple fieldTypes, fields, dynamicFields, requestHandlers with the same name

2007-05-04 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-227:
---

Attachment: SOLR-227-DuplicateNameErrors.patch

 Add errors if you define multiple fieldTypes, fields, dynamicFields, 
 requestHandlers with the same name
 ---

 Key: SOLR-227
 URL: https://issues.apache.org/jira/browse/SOLR-227
 Project: Solr
  Issue Type: Bug
Reporter: Ryan McKinley
 Fix For: 1.2

 Attachments: SOLR-227-DuplicateNameErrors.patch


 The current implementation quietly tosses out one definition in favor of the 
 other...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



SolrParams functions

2007-05-04 Thread Ryan McKinley
SolrParams seems to have most options for how to get whom from where, 
but it is missing:


 public float getFieldFloat(String field, String param, float def);
 public String getFieldParam(String field, String param, String def);

Any objections to adding these functions?