[jira] Updated: (SOLR-1737) Add a FieldStreamDataSource

2010-01-27 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1737:
-

Attachment: SOLR-1737.patch

 Add a FieldStreamDataSource
 ---

 Key: SOLR-1737
 URL: https://issues.apache.org/jira/browse/SOLR-1737
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1737.patch


 TikaEntityProcessor needs a DataSource which returns a Stream instead of a 
 Reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: how to sort facets?

2010-01-27 Thread David Rühr

hi,
thanx.

So Long,
David Rühr


Koji Sekiguchi schrieb:

David Rühr wrote:

hi,

we make a Filter with Faceting feature. In our faceting list the 
order is by count by the matches:

facet.sort=count

but we need to sort by = facet.sort=manufacturer.
Url manipulation doesn't change anything, why?

select?fl=*%2Cscorefq=type%3Apagespellcheck=truefacet=truefacet.mincount=1facet.sort=manufacturerbf=log(supplier_faktor)facet.field=supplierfacet.field=manufacturerversion=1.2q=kindstart=0rows=10 



so long,
David


Try facet.sort=index. facet.sort accepts only count or index.

http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort

Koji




Mit freundlichen Grüßen,

David Rühr
PHP Programmierer

--
Marketing Factory Consulting GmbH  *   mailto:d...@marketing-factory.de
Stephanienstraße 36   *  Tel.: +49 211-361176-58
D-40211 Düsseldorf, Germany   *  Fax:  +49 211-361176-99
Amtsgericht Düsseldorf HRB 53971  *  http://www.marketing-factory.de/

Geschäftsführer:Peter Faisst   |   Katja Faisst
Karoline Steinfatt   |   Christoph Allefeld   |   Markus M. Kimmel 



Hudson build is back to normal: Solr-trunk #1044

2010-01-27 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1044/changes




Using MoreLikeThisHandler

2010-01-27 Thread Nayan Gowda
Hi,
 I am trying to work with the MoreLikeThisHandler inorder to get the
similar documents.

Here is my configuration in Scema.xml.

fields

field name=id type=sint indexed=true stored=true required=true
termVectors=true/

field name=title type=text indexed=true stored=false termVectors=
true/

field name=keywordGroup type=string indexed=true stored=false
multiValued=true termVectors=true/

field name=tagText type=text indexed=true stored=true multiValued=
true default= termVectors=true/

/fields

n Configuration in solrconfig.xml

requestHandler name=/mlt class=solr.MoreLikeThisHandler

lst name=defaults

str name=mlt.fltitle,tagText,keywordGroup/str

str name=mlt.qftitle^1.5 tagText keywordGroup^0.5/str

str name=mlt.mintf1/str

str name=mlt.mindf1/str

str name=mlt.boosttrue/str

str name=mlt.match.includetrue/str

/lst
/requestHandler


and i fire the query like this
http://10.99.82.12:8080/Dev/mlt/?q=id:7735mlt.mindf=1mlt.mintf=1mlt.boost=truemlt.match.include=truemlt.fl=title,tagText,keywordGrouphttp://10.99.82.12:8080/Dev/mlt/?q=id:7735mlt.mindf=1mlt.mintf=1mlt.boost=truemlt.match.include=truemlt.fl=title,tagTexthttp://localhost:8983/solr/mlt?q=id:100

I do get some results but not accurate though..
Now i have a couple of queries.
1. Is this configuration is correct for getting the similar documents.

2. Is it poosible to support different boost for each of the keywordGroup?
If so please give me hint how can i achieve this?

Thanks,
Nayan K


[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2010-01-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805500#action_12805500
 ] 

Jan Høydahl commented on SOLR-1725:
---

It looks logical and nice.

However, I'm leaning towards keeping it very simple. The simplest is one script 
per processor, since that will always work.

As more and more update processors are written, in Java, JS, Jython and more, 
it would be a clear benefit if Administrators don't need to care about the 
underlying implementation, but can use same way of configuring each one - 
That's why I opt for the top-level param structure as default.

I have years of experience with FAST document processing, which is really a 
killer feature, mainly because it's so dead simple. Drop in a python script 
with a deployment descriptor and start using it in your pipelines. You don't 
care if the implementation is pure Python, a C library wrapper or whatever, you 
just care about what parameters to give it. I see this patch as one big step 
towards the same simplicity with Solr!

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1163) Solr Explorer - A generic GWT client for Solr

2010-01-27 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805511#action_12805511
 ] 

Grant Ingersoll commented on SOLR-1163:
---

Uri,

Is this patch still up to date?  Is it a contrib?

 Solr Explorer - A generic GWT client for Solr
 -

 Key: SOLR-1163
 URL: https://issues.apache.org/jira/browse/SOLR-1163
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Affects Versions: 1.3
Reporter: Uri Boness
 Attachments: graphics.zip, solr-explorer.patch, solr-explorer.patch


 The attached patch is a GWT generic client for solr. It is currently 
 standalone, meaning that once built, one can open the generated HTML file in 
 a browser and communicate with any deployed solr. It is configured with it's 
 own configuration file, where one can configure the solr instance/core to 
 connect to. Since it's currently standalone and completely client side based, 
 it uses JSON with padding (cross-side scripting) to connect to remote solr 
 servers. Some of the supported features:
 - Simple query search
 - Sorting - one can dynamically define new sort criterias
 - Search results are rendered very much like Google search results are 
 rendered. It is also possible to view all stored field values for every hit. 
 - Custom hit rendering - It is possible to show thumbnails (images) per hit 
 and also customize a view for a hit based on html templates
 - Faceting - one can dynamically define field and query facets via the UI. it 
 is also possible to pre-configure these facets in the configuration file.
 - Highlighting - you can dynamically configure highlighting. it can also be 
 pre-configured in the configuration file
 - Spellchecking - you can dynamically configure spell checking. Can also be 
 done in the configuration file. Supports collation. It is also possible to 
 send build and reload commands.
 - Data import handler - if used, it is possible to send a full-import and 
 status command (delta-import is not implemented yet, but it's easy to add)
 - Console - For development time, there's a small console which can help to 
 better understand what's going on behind the scenes. One can use it to:
 ** view the client logs
 ** browse the solr scheme
 ** View a break down of the current search context
 ** View a break down of the query URL that is sent to solr
 ** View the raw JSON response returning from Solr
 This client is actually a platform that can be greatly extended for more 
 things. The goal is to have a client where the explorer part is just one view 
 of it. Other future views include: Monitoring, Administration, Query Builder, 
 DataImportHandler configuration, and more...
 To get a better view of what's currently possible. We've set up a public 
 version of this client at: http://search.jteam.nl/explorer. This client is 
 configured with one solr instance where crawled YouTube movies where indexed. 
 You can also check out a screencast for this deployed client: 
 http://search.jteam.nl/help
 The patch created a new folder in the contrib. directory. Since the patch 
 doesn't contain binaries, an additional zip file is provides that needs to be 
 extract to add all the required graphics. This module is maven2 based and is 
 configured in such a way that all GWT related tools/libraries are 
 automatically downloaded when the modules is compiled. One of the artifacts 
 of the build is a war file which can be deployed in any servlet container.
 NOTE: this client works best on WebKit based browsers (for performance 
 reason) but also works on firefox and ie 7+. That said, it should be taken 
 into account that it is still under development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: CHANGES.txt updates for SOLR-1516 and SOLR-1592

2010-01-27 Thread Mattmann, Chris A (388J)
Thanks, Hoss, no problemo, appreciate it!


On 1/26/10 12:22 PM, Chris Hostetter hossman_luc...@fucit.org wrote:



: Not to be a best, but there's no CHANGES.txt updates for SOLR-1516 and
: SOLR-1592. Could someone update them? A trivial patch is attached...

Sorry about that.

Every change (with the possible exception of fixing formating or
documentation typos) *should* have a CHANGES.txt entry.

Every change that affects the public API *MUST* have a CHANGES.txt entry.

Committed revision 903398.


-Hoss




++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2010-01-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805532#action_12805532
 ] 

Yonik Seeley commented on SOLR-1725:


Cool feature!

Performance:
 - It looks like scripts are read from the resource loader and parsed again 
(eval) for every update request. This can be pretty expensive, esp for those 
scripting languages that generate java class files instead of using an 
interpreter. One way to combat this would be to cache and reuse them.

Interface:
- Should we have a way to specify a script in-line (in solrconfig.xml)?
- Or even cooler... allow passing of scripts as parameters in the update 
request! Think about the power of pointing Solr to a CSV file and also 
providing document transformers  field manipulators on the fly!
- This seems to raise the visibility of the UpdateCommand classes, directly 
exposing them to users w/o plugins. We should perhaps consider interface 
cleanups on these classes at the same time as this issue.
- Examples! Using javascript (since it's both fast and included in JDK6), let's 
see what the scripts are for some common usecases. This both helps improve the 
design as well as lets other people give feedback w/o having to read through 
code.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1738) Upgrade to Tika 0.6

2010-01-27 Thread Grant Ingersoll (JIRA)
Upgrade to Tika 0.6
---

 Key: SOLR-1738
 URL: https://issues.apache.org/jira/browse/SOLR-1738
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


See title.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1728) ResponseWriters should support byte[], ByteBuffer

2010-01-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805578#action_12805578
 ] 

Yonik Seeley commented on SOLR-1728:


Seems to make sense from a completeness point of view.  It also allows a closer 
semantic mapping (i.e. we could use the closest equivalent to byte arrays for 
python  ruby).

 ResponseWriters should support byte[], ByteBuffer
 -

 Key: SOLR-1728
 URL: https://issues.apache.org/jira/browse/SOLR-1728
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5


 Only BinaryResponseWriter supports byte[] and ByteBuffer. Other writers also 
 should support these

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: configure FastVectorHihglighter in trunk

2010-01-27 Thread Marc Sturlese

I am having some trouble to make it work. I am debuging the code and I see
when de  FastVectorHighlighter constructor is created, the parameters that
it recieves are ok

// get FastVectorHighlighter instance out of the processing loop
FastVectorHighlighter fvh = new FastVectorHighlighter(
// FVH cannot process hl.usePhraseHighlighter parameter per-field
basis
params.getBool( HighlightParams.USE_PHRASE_HIGHLIGHTER, true ),
// FVH cannot process hl.requireFieldMatch parameter per-field basis
params.getBool( HighlightParams.FIELD_MATCH, false ),
getFragListBuilder( params ),
getFragmentsBuilder( params ) );

The query here is ok aswell:
FieldQuery fieldQuery = fvh.getFieldQuery( query );

But I can't see what's in fieldQuery (just a memory path and don't know to
do someting similar to toString())

The problem I see is in:

String[] snippets = highlighter.getBestFragments( fieldQuery,
req.getSearcher().getReader(), docId, fieldName,
params.getFieldInt( fieldName, HighlightParams.FRAGSIZE, 100
),
params.getFieldInt( fieldName, HighlightParams.SNIPPETS, 1 )
);

snippets ends up with an empty array so it jumps to:
alternateField( docSummaries, params, doc, fieldName );

In solrconfig.xml I added:
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=false/
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=false/

Maybe I am missing something... any idea?
Using the doHighlightingByHighlighter highlight works perfect.

**I also have noticed that using snippet fragment size to 0 (wich in normal
highlight returns the whole field highlighted) gives an error.



Koji Sekiguchi-2 wrote:
 
 Marc Sturlese wrote:
 How do I activate FastVectorHighlighter in trunk? Wich of those params
 sets
 it up?
!-- Configure the standard fragListBuilder --
fragListBuilder name=simple
 class=org.apache.solr.highlight.SimpleFragListBuilder default=true/

!-- Configure the standard fragmentsBuilder --
fragmentsBuilder name=colored
 class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
 default=true/

fragmentsBuilder name=scoreOrder
 class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
 default=true/

 Thanks in advance.
   
 You do not need to activate it. DefaultSolrHighlighter, which is the
 default SolrHighlighter impl, calls automatically uses FVH when you
 specify field names that are termVectors, termPositions and termOffsets
 are true through hl.fl parameter. If you want to use multi colored tag
 feature, you need to specify MultiColored*FragmentsBuilder in 
 solrconfig.xml.
 
 Koji
 
 -- 
 http://www.rondhuit.com/en/
 
 
 

-- 
View this message in context: 
http://old.nabble.com/configure-FastVectorHihglighter-in-trunk-tp27319976p27344139.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Commented: (SOLR-1163) Solr Explorer - A generic GWT client for Solr

2010-01-27 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805601#action_12805601
 ] 

Uri Boness commented on SOLR-1163:
--

Actually I've been working on a new version for the explorer which I plan to 
put soon as a patch here.

 Solr Explorer - A generic GWT client for Solr
 -

 Key: SOLR-1163
 URL: https://issues.apache.org/jira/browse/SOLR-1163
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Affects Versions: 1.3
Reporter: Uri Boness
 Attachments: graphics.zip, solr-explorer.patch, solr-explorer.patch


 The attached patch is a GWT generic client for solr. It is currently 
 standalone, meaning that once built, one can open the generated HTML file in 
 a browser and communicate with any deployed solr. It is configured with it's 
 own configuration file, where one can configure the solr instance/core to 
 connect to. Since it's currently standalone and completely client side based, 
 it uses JSON with padding (cross-side scripting) to connect to remote solr 
 servers. Some of the supported features:
 - Simple query search
 - Sorting - one can dynamically define new sort criterias
 - Search results are rendered very much like Google search results are 
 rendered. It is also possible to view all stored field values for every hit. 
 - Custom hit rendering - It is possible to show thumbnails (images) per hit 
 and also customize a view for a hit based on html templates
 - Faceting - one can dynamically define field and query facets via the UI. it 
 is also possible to pre-configure these facets in the configuration file.
 - Highlighting - you can dynamically configure highlighting. it can also be 
 pre-configured in the configuration file
 - Spellchecking - you can dynamically configure spell checking. Can also be 
 done in the configuration file. Supports collation. It is also possible to 
 send build and reload commands.
 - Data import handler - if used, it is possible to send a full-import and 
 status command (delta-import is not implemented yet, but it's easy to add)
 - Console - For development time, there's a small console which can help to 
 better understand what's going on behind the scenes. One can use it to:
 ** view the client logs
 ** browse the solr scheme
 ** View a break down of the current search context
 ** View a break down of the query URL that is sent to solr
 ** View the raw JSON response returning from Solr
 This client is actually a platform that can be greatly extended for more 
 things. The goal is to have a client where the explorer part is just one view 
 of it. Other future views include: Monitoring, Administration, Query Builder, 
 DataImportHandler configuration, and more...
 To get a better view of what's currently possible. We've set up a public 
 version of this client at: http://search.jteam.nl/explorer. This client is 
 configured with one solr instance where crawled YouTube movies where indexed. 
 You can also check out a screencast for this deployed client: 
 http://search.jteam.nl/help
 The patch created a new folder in the contrib. directory. Since the patch 
 doesn't contain binaries, an additional zip file is provides that needs to be 
 extract to add all the required graphics. This module is maven2 based and is 
 configured in such a way that all GWT related tools/libraries are 
 automatically downloaded when the modules is compiled. One of the artifacts 
 of the build is a war file which can be deployed in any servlet container.
 NOTE: this client works best on WebKit based browsers (for performance 
 reason) but also works on firefox and ie 7+. That said, it should be taken 
 into account that it is still under development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2010-01-27 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805612#action_12805612
 ] 

Uri Boness commented on SOLR-1725:
--

{quote}
Performance:

It looks like scripts are read from the resource loader and parsed again (eval) 
for every update request. This can be pretty expensive, esp for those scripting 
languages that generate java class files instead of using an interpreter. One 
way to combat this would be to cache and reuse them.
{quote}
Yes, indeed the scripts are evaluated per request but for a reason. One of the 
goals here is to keep the scripts as close as possible to the update processor 
interface, so the functions in the scripts has the same signature as the 
methods in the processor. But in order for the scripts to be flexible I decided 
to introduce some global scoped variables which are accessible in the 
functions. (currently the current solr request, response and a logger are 
there). The problem is that the API only defines 3 scopes where you can 
register variables and the lowest one is the engine itself. Since the 
evaluation of a script is done on the engine level as well, when using this API 
together with the global variables I don't think you can escape the need for 
creating an engine per request (thus, also evaluating the scripts).

But I agree with you that if there is a way around it, caching the 
evaluated/compiled scripts will definitely boost things up. I'll need to 
investigate this further and come up with alternatives (I already have some 
ideas using ThreadLocals).

bq. Should we have a way to specify a script in-line (in solrconfig.xml)?

Personally I prefer keeping the solrconfig.xml as clean as possible. I do 
however think that a standardization of Solr scripting support in general can 
be great. (for example, have a scripts folder under _solr.solr.home_ were all 
the scripts are placed, or come up with a standard configuration structure for 
the scripts... perhaps something in the direction Hoss suggested above).

bq. This seems to raise the visibility of the UpdateCommand classes, directly 
exposing them to users w/o plugins. We should perhaps consider interface 
cleanups on these classes at the same time as this issue.
+1

bq. Examples! Using javascript (since it's both fast and included in JDK6), 
let's see what the scripts are for some common usecases. This both helps 
improve the design as well as lets other people give feedback w/o having to 
read through code.
Yep.. that would probably be very helpful. basically I think anyone who's ever 
written an update processor can perhaps try to convert it to a script and see 
how it works. The usual use case for me is to just add a few fields which are 
derived from the other fields, but perhaps there are some other more 
interesting use cases out there. I guess these examples should be put in the 
Wiki, right?





 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2010-01-27 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805672#action_12805672
 ] 

Uri Boness commented on SOLR-1725:
--

Been looking more into it and I think there's a nice way in which we can cache 
the evaluated scripts. But... (and there's always a but) to make it work 
cleanly we need to be able to extend the scripting support, which means we need 
to be able to compile the code in Java 6.

And this brings us back to Mark's comment above on how do we want to do that.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2010-01-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805678#action_12805678
 ] 

Yonik Seeley commented on SOLR-1725:


As you pointed out, Java5 is EOL'd already and Sun/Oracle doesn't even let you 
download JDK5 anymore w/o registration.
Wouldn't hurt my feelings to move to Java6.  After all, the SolrCloud stuff 
we're working on uses zookeeper which requires 1.6.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2010-01-27 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805691#action_12805691
 ] 

Uri Boness commented on SOLR-1725:
--

Well then... I just hope others will not shed tears as well and we can make 
Solr 1.5 Java 6 compiled :-)

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: configure FastVectorHihglighter in trunk

2010-01-27 Thread Koji Sekiguchi

Can you give me the following info to reproduce the problem?

* field data
* query string
* field definition in schema.xml

 **I also have noticed that using snippet fragment size to 0 (wich in 
normal

 highlight returns the whole field highlighted) gives an error.

Hmm, I should check it. Can you open a JIRA issue?

Thank you,

Koji

--
http://www.rondhuit.com/en/


Marc Sturlese wrote:

I am having some trouble to make it work. I am debuging the code and I see
when de  FastVectorHighlighter constructor is created, the parameters that
it recieves are ok

// get FastVectorHighlighter instance out of the processing loop
FastVectorHighlighter fvh = new FastVectorHighlighter(
// FVH cannot process hl.usePhraseHighlighter parameter per-field
basis
params.getBool( HighlightParams.USE_PHRASE_HIGHLIGHTER, true ),
// FVH cannot process hl.requireFieldMatch parameter per-field basis
params.getBool( HighlightParams.FIELD_MATCH, false ),
getFragListBuilder( params ),
getFragmentsBuilder( params ) );

The query here is ok aswell:
FieldQuery fieldQuery = fvh.getFieldQuery( query );

But I can't see what's in fieldQuery (just a memory path and don't know to
do someting similar to toString())

The problem I see is in:

String[] snippets = highlighter.getBestFragments( fieldQuery,
req.getSearcher().getReader(), docId, fieldName,
params.getFieldInt( fieldName, HighlightParams.FRAGSIZE, 100
),
params.getFieldInt( fieldName, HighlightParams.SNIPPETS, 1 )
);

snippets ends up with an empty array so it jumps to:
alternateField( docSummaries, params, doc, fieldName );

In solrconfig.xml I added:
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=false/
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=false/

Maybe I am missing something... any idea?
Using the doHighlightingByHighlighter highlight works perfect.

**I also have noticed that using snippet fragment size to 0 (wich in normal
highlight returns the whole field highlighted) gives an error.



Koji Sekiguchi-2 wrote:
  

Marc Sturlese wrote:


How do I activate FastVectorHighlighter in trunk? Wich of those params
sets
it up?
   !-- Configure the standard fragListBuilder --
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=true/

   !-- Configure the standard fragmentsBuilder --
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=true/

   fragmentsBuilder name=scoreOrder
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
default=true/

Thanks in advance.
  
  

You do not need to activate it. DefaultSolrHighlighter, which is the
default SolrHighlighter impl, calls automatically uses FVH when you
specify field names that are termVectors, termPositions and termOffsets
are true through hl.fl parameter. If you want to use multi colored tag
feature, you need to specify MultiColored*FragmentsBuilder in 
solrconfig.xml.


Koji

--
http://www.rondhuit.com/en/






  





[jira] Resolved: (SOLR-1737) Add a FieldStreamDataSource

2010-01-27 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1737.
--

Resolution: Fixed

committed r:903966

 Add a FieldStreamDataSource
 ---

 Key: SOLR-1737
 URL: https://issues.apache.org/jira/browse/SOLR-1737
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1737.patch


 TikaEntityProcessor needs a DataSource which returns a Stream instead of a 
 Reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.