[jira] Resolved: (SOLR-1697) PluginInfo should load plugins w/o class attribute also

2010-01-05 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1697.
--

Resolution: Fixed

resolved Revision: 895909

 PluginInfo should load plugins w/o class attribute also
 ---

 Key: SOLR-1697
 URL: https://issues.apache.org/jira/browse/SOLR-1697
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1697.patch


 This should enable components to load plugins w/ a default classname too

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796167#action_12796167
 ] 

Noble Paul edited comment on SOLR-1696 at 1/5/10 8:40 AM:
--

The new syntax can be as follows
{code:xml}
searchComponent class=solr.HighLightComponent name=highlight
  highlighting class=DefaultSolrHighlighter
   !-- Configure the standard fragmenter --
   !-- This could most likely be commented out in the default case --
   fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter 
default=true
lst name=defaults
 int name=hl.fragsize100/int
/lst
   /fragmenter

   !-- A regular-expression-based fragmenter (f.i., for sentence extraction) 
--
   fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter
lst name=defaults
  !-- slightly smaller fragsizes work better because of slop --
  int name=hl.fragsize70/int
  !-- allow 50% slop on fragment sizes --
  float name=hl.regex.slop0.5/float
  !-- a basic sentence pattern --
  str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str
/lst
   /fragmenter

   !-- Configure the standard formatter --
   formatter name=html class=org.apache.solr.highlight.HtmlFormatter 
default=true
lst name=defaults
 str name=hl.simple.pre![CDATA[em]]/str
 str name=hl.simple.post![CDATA[/em]]/str
/lst
   /formatter
  /highlighting
/searchComponent
{code}

This way SolrCore can be totally agnostic of highlighter 

  was (Author: noble.paul):
The new syntax can be as follows
{code:xml}
searchComponent class=solr.HighLightComponent
  highlighting class=DefaultSolrHighlighter
   !-- Configure the standard fragmenter --
   !-- This could most likely be commented out in the default case --
   fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter 
default=true
lst name=defaults
 int name=hl.fragsize100/int
/lst
   /fragmenter

   !-- A regular-expression-based fragmenter (f.i., for sentence extraction) 
--
   fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter
lst name=defaults
  !-- slightly smaller fragsizes work better because of slop --
  int name=hl.fragsize70/int
  !-- allow 50% slop on fragment sizes --
  float name=hl.regex.slop0.5/float
  !-- a basic sentence pattern --
  str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str
/lst
   /fragmenter

   !-- Configure the standard formatter --
   formatter name=html class=org.apache.solr.highlight.HtmlFormatter 
default=true
lst name=defaults
 str name=hl.simple.pre![CDATA[em]]/str
 str name=hl.simple.post![CDATA[/em]]/str
/lst
   /formatter
  /highlighting
/searchComponent
{code}

This way SolrCore can be totally agnostic of highlighter 
  
 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-05 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1696:
-

Attachment: SOLR-1696.patch

The old syntax is deprecated and all the code moves in into HighlightComponent. 
SolrCore is agnostic of loading and managing HighlightComponent

 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1696.patch


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1699) deprecate the updateHandler configuration syntax

2010-01-05 Thread Noble Paul (JIRA)
deprecate the updateHandler configuration syntax


 Key: SOLR-1699
 URL: https://issues.apache.org/jira/browse/SOLR-1699
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
 Fix For: 1.5


For all practical purposes, an updatehandler is a requestHandler. We can do 
away with a custom syntax for updatehandler

example
{code:xml}
requestHandler class=solr.DirectUpdateHandler2

lst name=autoCommit 
  int name=maxDocs1/int
  int name=maxTime360/int 
/lst

!-- represents a lower bound on the frequency that commits may
occur (in seconds). NOTE: not yet implemented

int name=commitIntervalLowerBound0/int
--

  /requestHandler
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-05 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796584#action_12796584
 ] 

Chris Male commented on SOLR-1696:
--

Are you planning on logging a warning if they continue to use the deprecated 
syntax?

 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1696.patch


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1698) load balanced distributed search

2010-01-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796585#action_12796585
 ] 

Noble Paul commented on SOLR-1698:
--

is this related to SOLR-1431 . I though we can have custom ShardComponents for 
these things

 load balanced distributed search
 

 Key: SOLR-1698
 URL: https://issues.apache.org/jira/browse/SOLR-1698
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley

 Provide syntax and implementation of load-balancing across shard replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1699) deprecate the updateHandler configuration syntax

2010-01-05 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796586#action_12796586
 ] 

Chris Male commented on SOLR-1699:
--

Hi,

I like the idea of standardising the syntax in the solrconfig.xml, but I think 
this is actually not categorising the UpdateHandler correctly.  It gives the 
impression that it can respond to requests, and is just an alternative to the 
other update request handlers (XmlUpdateRequestHandlers  co), which it isn't.

 deprecate the updateHandler configuration syntax
 

 Key: SOLR-1699
 URL: https://issues.apache.org/jira/browse/SOLR-1699
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
 Fix For: 1.5


 For all practical purposes, an updatehandler is a requestHandler. We can do 
 away with a custom syntax for updatehandler
 example
 {code:xml}
 requestHandler class=solr.DirectUpdateHandler2
 
 lst name=autoCommit 
   int name=maxDocs1/int
   int name=maxTime360/int 
 /lst
 
 !-- represents a lower bound on the frequency that commits may
 occur (in seconds). NOTE: not yet implemented
 
 int name=commitIntervalLowerBound0/int
 --
   /requestHandler
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Deleted: (SOLR-1699) deprecate the updateHandler configuration syntax

2010-01-05 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul deleted SOLR-1699:
-


 deprecate the updateHandler configuration syntax
 

 Key: SOLR-1699
 URL: https://issues.apache.org/jira/browse/SOLR-1699
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul

 For all practical purposes, an updatehandler is a requestHandler. We can do 
 away with a custom syntax for updatehandler
 example
 {code:xml}
 requestHandler class=solr.DirectUpdateHandler2
 
 lst name=autoCommit 
   int name=maxDocs1/int
   int name=maxTime360/int 
 /lst
 
 !-- represents a lower bound on the frequency that commits may
 occur (in seconds). NOTE: not yet implemented
 
 int name=commitIntervalLowerBound0/int
 --
   /requestHandler
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1699) deprecate the updateHandler configuration syntax

2010-01-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796589#action_12796589
 ] 

Noble Paul commented on SOLR-1699:
--

I Chris , you are right . I opened this issue hastily i'm gonna remove this

 deprecate the updateHandler configuration syntax
 

 Key: SOLR-1699
 URL: https://issues.apache.org/jira/browse/SOLR-1699
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul

 For all practical purposes, an updatehandler is a requestHandler. We can do 
 away with a custom syntax for updatehandler
 example
 {code:xml}
 requestHandler class=solr.DirectUpdateHandler2
 
 lst name=autoCommit 
   int name=maxDocs1/int
   int name=maxTime360/int 
 /lst
 
 !-- represents a lower bound on the frequency that commits may
 occur (in seconds). NOTE: not yet implemented
 
 int name=commitIntervalLowerBound0/int
 --
   /requestHandler
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Solr nightly build failure

2010-01-05 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web

compile-solrj:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj
[javac] Compiling 88 source files to /tmp/apache-solr-nightly/build/solrj
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solr
[javac] Compiling 413 source files to /tmp/apache-solr-nightly/build/solr
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 208 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

dist-contrib:

init:
[mkdir] Created dir: 
/tmp/apache-solr-nightly/contrib/clustering/build/classes
[mkdir] Created dir: 
/tmp/apache-solr-nightly/contrib/clustering/lib/downloads
[mkdir] Created dir: /tmp/apache-solr-nightly/build/docs/api

init-forrest-entities:

compile-solrj:

compile:
[javac] Compiling 1 source file to /tmp/apache-solr-nightly/build/solr
[javac] Note: 
/tmp/apache-solr-nightly/src/java/org/apache/solr/search/DocSetHitCollector.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

make-manifest:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/META-INF

proxy.setup:

check-files:

get-colt:
  [get] Getting: 
http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar
  [get] To: 
/tmp/apache-solr-nightly/contrib/clustering/lib/downloads/colt-1.2.0.jar

get-pcj:
  [get] Getting: http://repo1.maven.org/maven2/pcj/pcj/1.2/pcj-1.2.jar
  [get] To: 
/tmp/apache-solr-nightly/contrib/clustering/lib/downloads/pcj-1.2.jar

get-nni:
  [get] Getting: 
http://download.carrot2.org/maven2/org/carrot2/nni/1.0.0/nni-1.0.0.jar
  [get] To: 
/tmp/apache-solr-nightly/contrib/clustering/lib/downloads/nni-1.0.0.jar

get-simple-xml:
  [get] Getting: 
http://mirrors.ibiblio.org/pub/mirrors/maven2/org/simpleframework/simple-xml/1.7.3/simple-xml-1.7.3.jar
  [get] To: 
/tmp/apache-solr-nightly/contrib/clustering/lib/downloads/simple-xml-1.7.3.jar

get-libraries:

compile:
[javac] Compiling 7 source files to 
/tmp/apache-solr-nightly/contrib/clustering/build/classes

build:
  [jar] Building jar: 
/tmp/apache-solr-nightly/contrib/clustering/build/apache-solr-clustering-1.5-dev.jar

dist:
 [copy] Copying 1 file to /tmp/apache-solr-nightly/dist

init:
[mkdir] Created dir: 
/tmp/apache-solr-nightly/contrib/dataimporthandler/target/classes

init-forrest-entities:

compile-solrj:

compile:
[javac] Compiling 1 source file to /tmp/apache-solr-nightly/build/solr
[javac] Note: 
/tmp/apache-solr-nightly/src/java/org/apache/solr/search/DocSetHitCollector.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

make-manifest:

compile:
[javac] Compiling 46 source files to 
/tmp/apache-solr-nightly/contrib/dataimporthandler/target/classes
[javac] Note: 
/tmp/apache-solr-nightly/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/DocBuilder.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compileExtras:
[mkdir] Created dir: 
/tmp/apache-solr-nightly/contrib/dataimporthandler/target/extras/classes
[javac] Compiling 2 source files to 
/tmp/apache-solr-nightly/contrib/dataimporthandler/target/extras/classes
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

build:
  [jar] Building jar: 
/tmp/apache-solr-nightly/contrib/dataimporthandler/target/apache-solr-dataimporthandler-1.5-dev.jar
  [jar] Building jar: 
/tmp/apache-solr-nightly/contrib/dataimporthandler/target/apache-solr-dataimporthandler-extras-1.5-dev.jar

dist:
 [copy] Copying 2 files to /tmp/apache-solr-nightly/build/web
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web/WEB-INF/lib
 [copy] Copying 1 file to /tmp/apache-solr-nightly/build/web/WEB-INF/lib
 [copy] 

[jira] Created: (SOLR-1700) LBHttpSolrServer - Connections managment

2010-01-05 Thread Patrick Sauts (JIRA)
LBHttpSolrServer - Connections managment


 Key: SOLR-1700
 URL: https://issues.apache.org/jira/browse/SOLR-1700
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.4
Reporter: Patrick Sauts
Priority: Minor
 Fix For: 1.5


As a LBHttpSolrServer is a wrapper to CommonsHttpSolrServer 

CommonsHttpSolrServer search1 = new CommonsHttpSolrServer(http://mysearch1;);
search1.setConnectionTimeout(CONNECTION_TIMEOUT);
search1.setSoTimeout(READ_TIMEOUT);
search1.setConnectionManagerTimeout(CONNECTION_MANAGER_TIMEOUT);
search1.setDefaultMaxConnectionsPerHost(MAX_CONNECTIONS_PER_HOST1);
search1.setMaxTotalConnections(MAX_TOTAL_CONNECTIONS1);

CommonsHttpSolrServer search2 = new CommonsHttpSolrServer(http://mysearch1;);
search2.setConnectionTimeout(CONNECTION_TIMEOUT);
search2.setSoTimeout(READ_TIMEOUT);
search2.setConnectionManagerTimeout(CONNECTION_MANAGER_TIMEOUT);
search2.setDefaultMaxConnectionsPerHost(MAX_CONNECTIONS_PER_HOST2);
search2.setMaxTotalConnections(MAX_TOTAL_CONNECTIONS2);

LBHttpSolrServer solrServers = new LBHttpSolrServer(search1, search2);

So we can manage the parameters per server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Solr-trunk #1023

2010-01-05 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1023/changes

Changes:

[noble] SOLR-1697 PluginInfo should load plugins w/o class attribute also

[gsingers] some useful constants

[gsingers] SOLR-1302: some slight refactoring for more reusable distance 
calculations

[gsingers] javadoc

[gsingers] javadoc

[gsingers] SOLR-1692: fix produceSummary issue with Carrot2 clustering

--
[...truncated 4393 lines...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 17.263 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeBinaryJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 13.015 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.458 sec
[junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 15.351 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.275 sec
[junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.409 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.304 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 25.041 sec
[junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 45.175 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 46.646 sec
[junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.386 sec
[junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.446 sec
[junit] Running 
org.apache.solr.client.solrj.response.AnlysisResponseBaseTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.409 sec
[junit] Running 
org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.431 sec
[junit] Running 
org.apache.solr.client.solrj.response.FieldAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.449 sec
[junit] Running org.apache.solr.client.solrj.response.QueryResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.587 sec
[junit] Running org.apache.solr.client.solrj.response.TermsResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.285 sec
[junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 10.837 sec
[junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.491 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.563 sec
[junit] Running org.apache.solr.common.params.ModifiableSolrParamsTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.465 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.443 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.68 sec
[junit] Running org.apache.solr.common.util.DOMUtilTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.499 sec
[junit] Running org.apache.solr.common.util.FileUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.552 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.439 sec
[junit] Running org.apache.solr.common.util.NamedListTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.386 sec
[junit] Running org.apache.solr.common.util.TestFastInputStream
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.593 sec
[junit] Running org.apache.solr.common.util.TestHash
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.618 sec
[junit] Running org.apache.solr.common.util.TestNamedListCodec
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.866 sec
[junit] Running org.apache.solr.common.util.TestXMLEscaping
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.566 sec

Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()

2010-01-05 Thread Grant Ingersoll

On Jan 5, 2010, at 1:56 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 This looks like a hack. It currently only uses highlighter for
 prefetching docs and fields . There is no standard way of other
 components to take part in this.

Possibly, but highlighting is one of the more expensive things to do and making 
sure the fields are there (and not lazily loaded) is important.  Of course, it 
doesn't help if you want to use Term Vectors w/ highlighter

 
 We should either remove this altogether

-1.  


 or have a standard way for all
 components to take part in this.

Perhaps a component could register what fields it needs?  However, do you have 
a use case in mind?  What component would you like to have leverage this?

-Grant

Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()

2010-01-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Jan 5, 2010 at 4:52 PM, Grant Ingersoll gsing...@apache.org wrote:

 On Jan 5, 2010, at 1:56 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 This looks like a hack. It currently only uses highlighter for
 prefetching docs and fields . There is no standard way of other
 components to take part in this.

 Possibly, but highlighting is one of the more expensive things to do and 
 making sure the fields are there (and not lazily loaded) is important.  Of 
 course, it doesn't help if you want to use Term Vectors w/ highlighter


 We should either remove this altogether

 -1.


 or have a standard way for all
 components to take part in this.

 Perhaps a component could register what fields it needs?  However, do you 
 have a use case in mind?  What component would you like to have leverage this?

I don't know. But the point is can we have a an interface
PrefetchAware (or anything nicer) and components can choose to return
the list of fields which it is interested in prefetching. I would like
to remove the Strong coupling of QueryComponent on highlighting.



 -Grant



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()

2010-01-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
2010/1/5 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 On Tue, Jan 5, 2010 at 4:52 PM, Grant Ingersoll gsing...@apache.org wrote:

 On Jan 5, 2010, at 1:56 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 This looks like a hack. It currently only uses highlighter for
 prefetching docs and fields . There is no standard way of other
 components to take part in this.

 Possibly, but highlighting is one of the more expensive things to do and 
 making sure the fields are there (and not lazily loaded) is important.  Of 
 course, it doesn't help if you want to use Term Vectors w/ highlighter


 We should either remove this altogether

 -1.


 or have a standard way for all
 components to take part in this.

 Perhaps a component could register what fields it needs?  However, do you 
 have a use case in mind?  What component would you like to have leverage 
 this?

 I don't know. But the point is can we have a an interface
 PrefetchAware (or anything nicer) and components can choose to return
 the list of fields which it is interested in prefetching. I would like
 to remove the Strong coupling of QueryComponent on highlighting.

Or we can add a method to ResponseBuilder.addPrefetchFields(String[]
fieldNames) and SearchComponents can use this in prepare()/process()
to express interest in prefetching.



 -Grant



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


[jira] Created: (SOLR-1701) Off-by-one error in calculating numFound in Distributed Search

2010-01-05 Thread Shalin Shekhar Mangar (JIRA)
Off-by-one error in calculating numFound in Distributed Search
--

 Key: SOLR-1701
 URL: https://issues.apache.org/jira/browse/SOLR-1701
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Shalin Shekhar Mangar
 Fix For: 1.5


{code}
// This passes
query(q, *:*, sort, id asc, fl, id,text);

// This also passes (notice the rows param)
query(q, *:*, sort, id desc, rows, 12, fl, id,text);

// But this fails
query(q, *:*, sort, id desc, fl, id,text);
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1701) Off-by-one error in calculating numFound in Distributed Search

2010-01-05 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1701:


Attachment: SOLR-1701.patch

Test to demonstrate the bug

 Off-by-one error in calculating numFound in Distributed Search
 --

 Key: SOLR-1701
 URL: https://issues.apache.org/jira/browse/SOLR-1701
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: SOLR-1701.patch


 {code}
 // This passes
 query(q, *:*, sort, id asc, fl, id,text);
 // This also passes (notice the rows param)
 query(q, *:*, sort, id desc, rows, 12, fl, id,text);
 
 // But this fails
 query(q, *:*, sort, id desc, fl, id,text);
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()

2010-01-05 Thread Grant Ingersoll

On Jan 5, 2010, at 6:52 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 On Tue, Jan 5, 2010 at 4:52 PM, Grant Ingersoll gsing...@apache.org wrote:
 
 On Jan 5, 2010, at 1:56 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 This looks like a hack. It currently only uses highlighter for
 prefetching docs and fields . There is no standard way of other
 components to take part in this.
 
 Possibly, but highlighting is one of the more expensive things to do and 
 making sure the fields are there (and not lazily loaded) is important.  Of 
 course, it doesn't help if you want to use Term Vectors w/ highlighter
 
 
 We should either remove this altogether
 
 -1.
 
 
 or have a standard way for all
 components to take part in this.
 
 Perhaps a component could register what fields it needs?  However, do you 
 have a use case in mind?  What component would you like to have leverage 
 this?
 
 I don't know. But the point is can we have a an interface
 PrefetchAware (or anything nicer) and components can choose to return
 the list of fields which it is interested in prefetching. I would like
 to remove the Strong coupling of QueryComponent on highlighting.
 

Sounds reasonable to me.

Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()

2010-01-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
ok I have opened a new issue https://issues.apache.org/jira/browse/SOLR-1702

On Tue, Jan 5, 2010 at 5:50 PM, Grant Ingersoll gsing...@apache.org wrote:

 On Jan 5, 2010, at 6:52 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 On Tue, Jan 5, 2010 at 4:52 PM, Grant Ingersoll gsing...@apache.org wrote:

 On Jan 5, 2010, at 1:56 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 This looks like a hack. It currently only uses highlighter for
 prefetching docs and fields . There is no standard way of other
 components to take part in this.

 Possibly, but highlighting is one of the more expensive things to do and 
 making sure the fields are there (and not lazily loaded) is important.  Of 
 course, it doesn't help if you want to use Term Vectors w/ highlighter


 We should either remove this altogether

 -1.


 or have a standard way for all
 components to take part in this.

 Perhaps a component could register what fields it needs?  However, do you 
 have a use case in mind?  What component would you like to have leverage 
 this?

 I don't know. But the point is can we have a an interface
 PrefetchAware (or anything nicer) and components can choose to return
 the list of fields which it is interested in prefetching. I would like
 to remove the Strong coupling of QueryComponent on highlighting.


 Sounds reasonable to me.



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


[jira] Created: (SOLR-1702) Standardize mechanism for components to prefetch fields

2010-01-05 Thread Noble Paul (JIRA)
Standardize mechanism for components to prefetch fields
---

 Key: SOLR-1702
 URL: https://issues.apache.org/jira/browse/SOLR-1702
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Noble Paul
Priority: Minor
 Fix For: 1.5


The only component that is consulted now for prefetching fields is 
SolrHighlighter. This introduces tight coupling of QueryComponent w/ 
SolrHighlighter. 


We should standardize how this is done by all componnets so that there is no 
coupling.

One way would be to register the prefetch fields with ResponseBuilder in the 
prepare phase and Querycomponent  can make use of that info to do prefetching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1702) Standardize mechanism for components to prefetch fields

2010-01-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796648#action_12796648
 ] 

Noble Paul commented on SOLR-1702:
--

The mail thread http://markmail.org/thread/5c2f2qofz6xpg42c

 Standardize mechanism for components to prefetch fields
 ---

 Key: SOLR-1702
 URL: https://issues.apache.org/jira/browse/SOLR-1702
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Noble Paul
Priority: Minor
 Fix For: 1.5


 The only component that is consulted now for prefetching fields is 
 SolrHighlighter. This introduces tight coupling of QueryComponent w/ 
 SolrHighlighter. 
 We should standardize how this is done by all componnets so that there is no 
 coupling.
 One way would be to register the prefetch fields with ResponseBuilder in the 
 prepare phase and Querycomponent  can make use of that info to do prefetching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1657) convert the rest of solr to use the new tokenstream API

2010-01-05 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1657:
--

Description: 
org.apache.solr.analysis:
BufferedTokenStream
 - -CommonGramsFilter-
 - -CommonGramsQueryFilter-
 - -RemoveDuplicatesTokenFilter-
-CapitalizationFilterFactory-
-HyphenatedWordsFilter-
-LengthFilter (deprecated, remove)-
SynonymFilter
SynonymFilterFactory
WordDelimiterFilter

org.apache.solr.handler:
AnalysisRequestHandler
AnalysisRequestHandlerBase

org.apache.solr.handler.component:
QueryElevationComponent
SpellCheckComponent

org.apache.solr.highlight:
DefaultSolrHighlighter

org.apache.solr.search:
FieldQParserPlugin

org.apache.solr.spelling:
SpellingQueryConverter


  was:
org.apache.solr.analysis:
BufferedTokenStream
 - CommonGramsFilter
 - CommonGramsQueryFilter
 - RemoveDuplicatesTokenFilter
CapitalizationFilterFactory
HyphenatedWordsFilter
LengthFilter (deprecated, remove)
PatternTokenizerFactory (remove deprecated methods)
SynonymFilter
SynonymFilterFactory
WordDelimiterFilter

org.apache.solr.handler:
AnalysisRequestHandler
AnalysisRequestHandlerBase

org.apache.solr.handler.component:
QueryElevationComponent
SpellCheckComponent

org.apache.solr.highlight:
DefaultSolrHighlighter

org.apache.solr.search:
FieldQParserPlugin

org.apache.solr.spelling:
SpellingQueryConverter



 convert the rest of solr to use the new tokenstream API
 ---

 Key: SOLR-1657
 URL: https://issues.apache.org/jira/browse/SOLR-1657
 Project: Solr
  Issue Type: Task
Reporter: Robert Muir
 Attachments: SOLR-1657.patch, SOLR-1657.patch


 org.apache.solr.analysis:
 BufferedTokenStream
  - -CommonGramsFilter-
  - -CommonGramsQueryFilter-
  - -RemoveDuplicatesTokenFilter-
 -CapitalizationFilterFactory-
 -HyphenatedWordsFilter-
 -LengthFilter (deprecated, remove)-
 SynonymFilter
 SynonymFilterFactory
 WordDelimiterFilter
 org.apache.solr.handler:
 AnalysisRequestHandler
 AnalysisRequestHandlerBase
 org.apache.solr.handler.component:
 QueryElevationComponent
 SpellCheckComponent
 org.apache.solr.highlight:
 DefaultSolrHighlighter
 org.apache.solr.search:
 FieldQParserPlugin
 org.apache.solr.spelling:
 SpellingQueryConverter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr Cell revamped as an UpdateProcessor?

2010-01-05 Thread Zacarias
Hi, I'm developing a directory monitor to add in a Sor implementation.
Tell me if it could be interesting for you we will be glad to share it with
the comunity. Also I would like your opinion about the propousal if it looks
ok for you and if you like to make any change or question it will be very
well welcome.

Regards
Zacarias
www.linebee.com


2009/12/8 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor
 is a good idea

 On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll gsing...@apache.org
 wrote:
 
  On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
  Integrating Extraction w/ DIH is a better option. DIH makes it easier
  to do the mapping of fields etc.
 
  Which comment is this directed at?  I'm lacking context here.
 
 
 
  On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll gsing...@apache.org
 wrote:
 
  On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote:
 
 
  ASs someone with very little knowledge of Solr Cell and/or Tika, I
 find myself wondering if ExtractingRequestHandler would make more sense as
 an extractingUpdateProcessor -- where it could be configured to take take
 either binary fields (or string fields containing URLs) out of the
 Documents, parse them with tika, and add the various XPath matching hunks of
 text back into the document as new fields.
 
  Then ExtractingRequestHandler just becomes a handler that slurps up
 it's ContentStreams and adds them as binary data fields and adds the other
 literal params as fields.
 
  Wouldn't that make things like SOLR-1358, and using Tika with
 URLs/filepaths in XML and CSV based updates fairly trivial?
 
  It probably could, but am not sure how it works in a processor chain.
  However, I'm not sure I understand how they work all that much either.  I
 also plan on adding, BTW, a SolrJ client for Tika that does the extraction
 on the client.  In many cases, the ExtrReqHandler is really only designed
 for lighter weight extraction cases, as one would simply not want to send
 that much rich content over the wire.
 
 
 
  --
  -
  Noble Paul | Systems Architect| AOL | http://aol.com
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
  http://www.lucidimagination.com/search
 
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com



Re: Solr Cell revamped as an UpdateProcessor?

2010-01-05 Thread Zacarias
Here is my propousal

Regards



On Tue, Jan 5, 2010 at 12:48 PM, Zacarias zacar...@linebee.com wrote:

 Hi, I'm developing a directory monitor to add in a Sor implementation.
 Tell me if it could be interesting for you we will be glad to share it with
 the comunity. Also I would like your opinion about the propousal if it looks
 ok for you and if you like to make any change or question it will be very
 well welcome.

 Regards
 Zacarias
 www.linebee.com


 2009/12/8 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor
 is a good idea

 On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll gsing...@apache.org
 wrote:
 
  On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
  Integrating Extraction w/ DIH is a better option. DIH makes it easier
  to do the mapping of fields etc.
 
  Which comment is this directed at?  I'm lacking context here.
 
 
 
  On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll gsing...@apache.org
 wrote:
 
  On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote:
 
 
  ASs someone with very little knowledge of Solr Cell and/or Tika, I
 find myself wondering if ExtractingRequestHandler would make more sense as
 an extractingUpdateProcessor -- where it could be configured to take take
 either binary fields (or string fields containing URLs) out of the
 Documents, parse them with tika, and add the various XPath matching hunks of
 text back into the document as new fields.
 
  Then ExtractingRequestHandler just becomes a handler that slurps up
 it's ContentStreams and adds them as binary data fields and adds the other
 literal params as fields.
 
  Wouldn't that make things like SOLR-1358, and using Tika with
 URLs/filepaths in XML and CSV based updates fairly trivial?
 
  It probably could, but am not sure how it works in a processor chain.
  However, I'm not sure I understand how they work all that much either.  I
 also plan on adding, BTW, a SolrJ client for Tika that does the extraction
 on the client.  In many cases, the ExtrReqHandler is really only designed
 for lighter weight extraction cases, as one would simply not want to send
 that much rich content over the wire.
 
 
 
  --
  -
  Noble Paul | Systems Architect| AOL | http://aol.com
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
  http://www.lucidimagination.com/search
 
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





[jira] Issue Comment Edited: (SOLR-1698) load balanced distributed search

2010-01-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796760#action_12796760
 ] 

Yonik Seeley edited comment on SOLR-1698 at 1/5/10 5:13 PM:


Another big question is: can we use LBHttpSolrServer for this, or are the needs 
too different?

Some of the issues:
 - need control over order (so same server will be used in a single request)
 - if we have a big cluster (100 shards), we don't want every node or every 
core to have 100 background threads checking liveness
 - one request may want to hit addresses [A,B] while another may want [A,B,C] - 
a single LBHttpSolrServer can't currently do both at once, and separate 
instances wouldn't share liveness info.

One way: have many LBHttpSolrServer instances (one per shard group) but have 
them share certain things like the liveness of a shard and the background 
cleaning threads

Another way: have a single static LBHttpSolrServer instance that's shared for 
all requests, with an extra method that allows passing of a list of addresses 
on a per-request basis.



  was (Author: ysee...@gmail.com):
Another big question is: can we use LBHttpSolrServer for this, or are the 
needs too different?
  
 load balanced distributed search
 

 Key: SOLR-1698
 URL: https://issues.apache.org/jira/browse/SOLR-1698
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley

 Provide syntax and implementation of load-balancing across shard replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr Cell revamped as an UpdateProcessor?

2010-01-05 Thread Zacarias
I'd attached a file to the previous mail. Is there any filter for pdf files
or any other reason.

On Tue, Jan 5, 2010 at 12:49 PM, Zacarias zacar...@linebee.com wrote:

 Here is my propousal

 Regards




 On Tue, Jan 5, 2010 at 12:48 PM, Zacarias zacar...@linebee.com wrote:

 Hi, I'm developing a directory monitor to add in a Sor implementation.
 Tell me if it could be interesting for you we will be glad to share it
 with the comunity. Also I would like your opinion about the propousal if it
 looks ok for you and if you like to make any change or question it will be
 very well welcome.

 Regards
 Zacarias
 www.linebee.com


 2009/12/8 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor
 is a good idea

 On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll gsing...@apache.org
 wrote:
 
  On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
  Integrating Extraction w/ DIH is a better option. DIH makes it easier
  to do the mapping of fields etc.
 
  Which comment is this directed at?  I'm lacking context here.
 
 
 
  On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll gsing...@apache.org
 wrote:
 
  On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote:
 
 
  ASs someone with very little knowledge of Solr Cell and/or Tika, I
 find myself wondering if ExtractingRequestHandler would make more sense as
 an extractingUpdateProcessor -- where it could be configured to take take
 either binary fields (or string fields containing URLs) out of the
 Documents, parse them with tika, and add the various XPath matching hunks of
 text back into the document as new fields.
 
  Then ExtractingRequestHandler just becomes a handler that slurps up
 it's ContentStreams and adds them as binary data fields and adds the other
 literal params as fields.
 
  Wouldn't that make things like SOLR-1358, and using Tika with
 URLs/filepaths in XML and CSV based updates fairly trivial?
 
  It probably could, but am not sure how it works in a processor chain.
  However, I'm not sure I understand how they work all that much either.  I
 also plan on adding, BTW, a SolrJ client for Tika that does the extraction
 on the client.  In many cases, the ExtrReqHandler is really only designed
 for lighter weight extraction cases, as one would simply not want to send
 that much rich content over the wire.
 
 
 
  --
  -
  Noble Paul | Systems Architect| AOL | http://aol.com
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
  http://www.lucidimagination.com/search
 
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com






Re: Solr Cell revamped as an UpdateProcessor?

2010-01-05 Thread Grant Ingersoll

On Jan 5, 2010, at 1:53 PM, Zacarias wrote:

 I'd attached a file to the previous mail. Is there any filter for pdf files
 or any other reason.

The mailer strips attachments, although you might be able to get a zip through. 
 Perhaps send a pointer to somewhere else or just describe it here.

 
 On Tue, Jan 5, 2010 at 12:49 PM, Zacarias zacar...@linebee.com wrote:
 
 Here is my propousal
 
 Regards
 
 
 
 
 On Tue, Jan 5, 2010 at 12:48 PM, Zacarias zacar...@linebee.com wrote:
 
 Hi, I'm developing a directory monitor to add in a Sor implementation.
 Tell me if it could be interesting for you we will be glad to share it
 with the comunity. Also I would like your opinion about the propousal if it
 looks ok for you and if you like to make any change or question it will be
 very well welcome.
 
 Regards
 Zacarias
 www.linebee.com
 
 
 2009/12/8 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
 I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor
 is a good idea
 
 On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll gsing...@apache.org
 wrote:
 
 On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 Integrating Extraction w/ DIH is a better option. DIH makes it easier
 to do the mapping of fields etc.
 
 Which comment is this directed at?  I'm lacking context here.
 
 
 
 On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll gsing...@apache.org
 wrote:
 
 On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote:
 
 
 ASs someone with very little knowledge of Solr Cell and/or Tika, I
 find myself wondering if ExtractingRequestHandler would make more sense as
 an extractingUpdateProcessor -- where it could be configured to take take
 either binary fields (or string fields containing URLs) out of the
 Documents, parse them with tika, and add the various XPath matching hunks 
 of
 text back into the document as new fields.
 
 Then ExtractingRequestHandler just becomes a handler that slurps up
 it's ContentStreams and adds them as binary data fields and adds the other
 literal params as fields.
 
 Wouldn't that make things like SOLR-1358, and using Tika with
 URLs/filepaths in XML and CSV based updates fairly trivial?
 
 It probably could, but am not sure how it works in a processor chain.
 However, I'm not sure I understand how they work all that much either.  I
 also plan on adding, BTW, a SolrJ client for Tika that does the extraction
 on the client.  In many cases, the ExtrReqHandler is really only designed
 for lighter weight extraction cases, as one would simply not want to send
 that much rich content over the wire.
 
 
 
 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 
 
 
 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com
 
 
 
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Solr Cell revamped as an UpdateProcessor?

2010-01-05 Thread Chris Hostetter

: Subject: Re: Solr Cell revamped as an UpdateProcessor?
: 
: Hi, I'm developing a directory monitor to add in a Sor implementation.

Hmmm ... Is this really related to the Solr Cell thread you replied to? 

Please start a a new thread if you want to discuss a new topic...

http://people.apache.org/~hossman/#threadhijack


-Hoss



[jira] Commented: (SOLR-1698) load balanced distributed search

2010-01-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796838#action_12796838
 ] 

Yonik Seeley commented on SOLR-1698:


Looking into LBHttpSolrServer more, it looks like we have some serious 
concurrency issues.  When a request does fail, a global lock is aquired to move 
from alive to zombie - but this same lock is used while making requests to 
check if zombies have come back (i.e. actual requests to zombies are being made 
with the lock held!).

For distributed search use (SearchHandler) I'm thinking of going with option #2 
from my previous message (have a single static LBHttpSolrServer instance that's 
shared for all requests, with an extra method that allows passing of a list of 
addresses on a per-request basis.).  I'll address the concurrency issues at the 
same time.

 load balanced distributed search
 

 Key: SOLR-1698
 URL: https://issues.apache.org/jira/browse/SOLR-1698
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley

 Provide syntax and implementation of load-balancing across shard replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2010-01-05 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796836#action_12796836
 ] 

Patrick Hunt commented on SOLR-1277:


Yonik, others, you might find this interesting: 
http://github.com/phunt/zookeeper_dashboard
It's apache licensed, based on python/django. I had thoughts of having 
plugins for things like
hbase, solr etc... based on their usage pattern (also for std recipes that 
might benefit from similar)


 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-01-05 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796854#action_12796854
 ] 

Hoss Man commented on SOLR-1677:


bq. User Carl isn't helpful, user Carl is an idiot.

Oh come on now ... that's not really a fair criticism of the example: there are 
plenty of legitimate ways to use (some) TokenFilters only at search time and I 
specifically structured my example to point out potential problems in cases 
just like that -- Carl was very clear that if you used FooTokenFilterFactory 
in an index analyzer you'll need to reindex.


But fine, I'll amend my example to do it your way...


{panel}
...
Bob Asks his question (see previous example)

User Carl is on vacation and never sees Bob's email

User Dwight helpfully replies...

bq. That was identified as a bug with FooTokenFilter that was fixed in Lucene 
3.1, but the default behavior was left as is for backcompatibility. If you 
change your luceneAnalyzerVersionDefault/ value to 3.1 (or 3.2) you'll get 
the newer/better behavior - but you _must_ reindex all of your data after you 
make this change.

Bob makes the change to 3.2 that Carl recommended, reindexes all of his data, 
and is happy to see now his queries work and every thing seems fine.

What Bob doesn't realize (and what Carl wasn't aware of) is that elsewhere in 
his schema.xml file, Bob is also using the YakTokenizerFactory on a differnet 
field (yakField), and the behavior of the YakTokenizer changed in Lucene 3.0.  
This change is generally considered better behavior then YakTokenizer had 
before, but in combination with another TokenFilter Bob is using on the 
yakField it causes behavior that is not what Bob wants.  Now some types of 
queries that use the yakField are failing, and *failing silently*.

{panel}

You could now argue that User Dwight is an idiot because he didn't warn Bob 
that other Analyzers/Tokenizers/TokenFilters might be affected.  But that just 
leads us to scenerious that re-iterates my point that this type of global value 
is something that would be dangerous to ever change

{panel}
...
Bob Asks his question (see previous examples)

User Carl has unsubscribed from the solr-user list (because a Bill Murray 
look-a-like hurt his feelings) and never sees Bob's email.

User Dwight is on vacation and never sees Bob's email.

User Ernest helpfully replies...

{quote}
That was identified as a bug with FooTokenFilter that was fixed in Lucene 3.1, 
but the default behavior was left as is for backcompatibility. If you change 
your luceneAnalyzerVersionDefault/ value to 3.1 (or 3.2) you'll get the 
newer/better behavior -- *But this is Very VERY Dangerous: It could potentially 
affect the behavior of other analyzers you are using.  You need to check the 
javadocs for each and every Analyzer, Tokenizer, and TokenFilter you use to see 
what their behavior is with various values of the Version property before you 
make a change like this.

Personally I never change the value of luceneAnalyzerVersionDefault/ once i 
have an existing schema.xml file.  Instead i suggest you add 
{{luceneVersion=3.2}} to your {{filter class=solr.FooTokenFilterFactory 
/}} declaration so that you know you are only changing the behavior you want 
to change.

BTW: You _must_ reindex all of your data after doing either of these things in 
order for it to work.
{quote}

Bob follow's Ernest's advice, and everything is fine .. but Bob is left 
wondering what the point is of a config option that's so dangerous to change, 
and wishes there was an easy way to know which of his Analyzers and Factories 
are depending on that scary gobal value.

{panel}

At the end of the day it just seems like a bigger risk then a feature ... I 
feel like i must still be misunderstanding the motivation you guys have for 
adding it, because it really seems like it boils down to easier then having 
the property 2.9 set on every analyzer/factory  

I guess i ultimately have no stringent objection to a global schema.xml seting 
like this existing as an expert level feature (for people who want really 
compact config files i guess), I just don't want to see it used in the example 
schema.xml file(s) where it's likely to screw novice users over.



 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
 SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most 

[jira] Created: (SOLR-1703) Sorting by function problems on multicore (more than one core)

2010-01-05 Thread JIRA
Sorting by function problems on multicore (more than one core)
--

 Key: SOLR-1703
 URL: https://issues.apache.org/jira/browse/SOLR-1703
 Project: Solr
  Issue Type: Bug
  Components: multicore, search
Affects Versions: 1.5
 Environment: Linux (debian, ubuntu), 64bits
Reporter: Rafał Kuć


When using sort by function (for example dist function) with multicore with 
more than one core (on multicore with one core, ie. the example deployment the 
problem doesn`t exist) there is a problem with not using the right schema. I 
think there is a problem with this portion of code:

QueryParsing.java:

public static FunctionQuery parseFunction(String func, IndexSchema schema) 
throws ParseException {
SolrCore core = SolrCore.getSolrCore();
return (FunctionQuery) (QParser.getParser(func, func, new 
LocalSolrQueryRequest(core, new HashMap())).parse());
// return new FunctionQuery(parseValSource(new StrParser(func), schema));
}

Code above uses deprecated method to get the core sometimes getting the wrong 
core effecting in impossibility to find the right fields in index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-01-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796862#action_12796862
 ] 

Robert Muir commented on SOLR-1677:
---

bq. Oh come on now ... that's not really a fair criticism of the example: there 
are plenty of legitimate ways to use (some) TokenFilters only at search time 
and I specifically structured my example to point out potential problems in 
cases just like that - Carl was very clear that if you used 
FooTokenFilterFactory in an index analyzer you'll need to reindex.

I disagree, Version applies to all of lucene (even more than tokenstreams), so 
for Carl to imply that you don't need to reindex by bumping Version simply 
because you aren't using X or Y or Z, for that he should be renamed Oscar.

bq. You could now argue that User Dwight is an idiot because he didn't warn Bob 
that other Analyzers/Tokenizers/TokenFilters might be affected. But that just 
leads us to scenerious that re-iterates my point that this type of global value 
is something that would be dangerous to ever change

Yeah, I guess I don't think he is an idiot. I just think he is a moron for 
suggesting such a thing without warning of the consequences.

bq. Personally I never change the value of luceneAnalyzerVersionDefault/ once 
i have an existing schema.xml file. Instead i suggest you add 
luceneVersion=3.2 to your filter class=solr.FooTokenFilterFactory / 
declaration so that you know you are only changing the behavior you want to 
change.

Good for Ernest, i guess he is probably using Windows 3.1 still too because he 
doesn't want to upgrade ever. Unless Ernest carefully reads Lucene CHANGES also 
and reads all the Solr source code and knows which solr features are tied to 
which lucene features, because its not obvious at all: i.e. solr's snowball 
factory doesn't use lucene's snowball, etc etc.

bq. At the end of the day it just seems like a bigger risk then a feature ... I 
feel like i must still be misunderstanding the motivation you guys have for 
adding it, because it really seems like it boils down to easier then having 
the property 2.9 set on every analyzer/factory

Yes you are right, personally I don't want all users to be stuck with 
Version.LUCENE_24 forever. 


 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
 SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
 much more Unicode support, almost every Tokenizer/TokenFilter needs this 
 Version parameter. In 2.9, the deprecated old ctors without Version take 
 LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
 This patch adds basic support for the Lucene Version property to the base 
 factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
 contains a helper map to decode the version strings, but in 3.0 is can be 
 replaced by Version.valueOf(String), as the Version is a subclass of Java5 
 enums. The default value is Version.LUCENE_24 (as this is the default for the 
 no-version ctors in Lucene).
 This patch also removes unneeded conversions to CharArraySet from 
 StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
 to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-01-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796872#action_12796872
 ] 

Uwe Schindler commented on SOLR-1677:
-

In my opinion, the default in solrconfig.xml should be possible to set, because 
there is currently no requirement to set a version for all TS components. This 
default is in the shipped solrconfig.xml the version of the shipped lucene 
version. so new users can use the default config and extend it like learned in 
all courses and books about solr. They do not need to care about the version. 

If they upgrade their lucene version, their config keeps stuck on the previous 
seeting and they are fine. If they want to change some of the components (like 
query parser, index writer, index reader -- flex!!!), they can do it locally. 
So Bob could change like Ernest proposed.

If we do not have a default, all users will keep stuck with lucene 2.4, because 
they do not care about version (it is not required, because it defaults to 2.4 
for BW compatibility). So lots of configs will never use the new unicode 
features of Lucene 3.1. And suddenly Lucene 4.0 comes out and all support for 
Lucene  3 is removed, then all users cry. With a default version set to 2.4, 
they will then get a runtime error in Lucene 4.0, saying that Version.LUCENE_24 
is no longer available as enum constant.

If you really do not want to have a default version in config (not schema, 
because it applies to *all* lucene components), then you should go the way like 
Lucene 3.0: Require a matchVersion for all components. As there may be 
tokenstream components not from lucene, make this attribute in the schema only 
mandatory for lucene-streams (this can be done by my initial patch, too: if the 
matchVersion property is missing then the matchVersion will get NULL and the 
factory should thow IAE if required. In my original patch, only the parsing 
code should be moved out of the factory into a util class in solr. Maybe also 
possible to parse x.y-style versions).

The problem here: Users upgrading from solr 1.4 will suddenly get errors, 
because their configs get invalid.

 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
 SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
 much more Unicode support, almost every Tokenizer/TokenFilter needs this 
 Version parameter. In 2.9, the deprecated old ctors without Version take 
 LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
 This patch adds basic support for the Lucene Version property to the base 
 factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
 contains a helper map to decode the version strings, but in 3.0 is can be 
 replaced by Version.valueOf(String), as the Version is a subclass of Java5 
 enums. The default value is Version.LUCENE_24 (as this is the default for the 
 no-version ctors in Lucene).
 This patch also removes unneeded conversions to CharArraySet from 
 StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
 to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1703) Sorting by function problems on multicore (more than one core)

2010-01-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796874#action_12796874
 ] 

Yonik Seeley commented on SOLR-1703:


Indeed - that's a bug.  New solr code shouldn't be using parseFunction.

 Sorting by function problems on multicore (more than one core)
 --

 Key: SOLR-1703
 URL: https://issues.apache.org/jira/browse/SOLR-1703
 Project: Solr
  Issue Type: Bug
  Components: multicore, search
Affects Versions: 1.5
 Environment: Linux (debian, ubuntu), 64bits
Reporter: Rafał Kuć

 When using sort by function (for example dist function) with multicore with 
 more than one core (on multicore with one core, ie. the example deployment 
 the problem doesn`t exist) there is a problem with not using the right 
 schema. I think there is a problem with this portion of code:
 QueryParsing.java:
 public static FunctionQuery parseFunction(String func, IndexSchema schema) 
 throws ParseException {
 SolrCore core = SolrCore.getSolrCore();
 return (FunctionQuery) (QParser.getParser(func, func, new 
 LocalSolrQueryRequest(core, new HashMap())).parse());
 // return new FunctionQuery(parseValSource(new StrParser(func), schema));
 }
 Code above uses deprecated method to get the core sometimes getting the wrong 
 core effecting in impossibility to find the right fields in index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-01-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796872#action_12796872
 ] 

Uwe Schindler edited comment on SOLR-1677 at 1/5/10 10:29 PM:
--

In my opinion, the default in solrconfig.xml should be possible to set, because 
there is currently no requirement to set a version for all TS components. This 
default is in the shipped solrconfig.xml the version of the shipped lucene 
version. so new users can use the default config and extend it like learned in 
all courses and books about solr. They do not need to care about the version. 

If they upgrade their lucene version, their config keeps stuck on the previous 
seeting and they are fine. If they want to change some of the components (like 
query parser, index writer, index reader -- flex!!!), they can do it locally. 
So Bob could change like Ernest proposed.

If we do not have a default, all users will keep stuck with lucene 2.4, because 
they do not care about version (it is not required, because it defaults to 2.4 
for BW compatibility). So lots of configs will never use the new unicode 
features of Lucene 3.1. And suddenly Lucene 4.0 comes out and all support for 
Lucene  3 is removed, then all users cry. With a default version set to 2.4, 
they will then get a runtime error in Lucene 4.0, saying that Version.LUCENE_24 
is no longer available as enum constant.

If you really do not want to have a default version in config (not schema, 
because it applies to *all* lucene components), then you should go the way like 
Lucene 3.0: Require a matchVersion for all components. As there may be 
tokenstream components not from lucene, make this attribute in the schema only 
mandatory for lucene-streams (this can be done by my initial patch, too: if the 
matchVersion property is missing then the matchVersion will get NULL and the 
factory should thow IAE if required. In my original patch, only the parsing 
code should be moved out of the factory into a util class in solr. Maybe also 
possible to parse x.y-style versions).

The problem here: Users upgrading from solr 1.4 will suddenly get errors, 
because their configs get invalid. Ahh, and because they are stupid they add 
LUCENE_29 (from where should they know that Solr 1.4 used Lucene 2.4 
compatibility?). And then the mailing list gets flooded by questions because 
suddenly the configs fail to produce results with old indexes.

  was (Author: thetaphi):
In my opinion, the default in solrconfig.xml should be possible to set, 
because there is currently no requirement to set a version for all TS 
components. This default is in the shipped solrconfig.xml the version of the 
shipped lucene version. so new users can use the default config and extend it 
like learned in all courses and books about solr. They do not need to care 
about the version. 

If they upgrade their lucene version, their config keeps stuck on the previous 
seeting and they are fine. If they want to change some of the components (like 
query parser, index writer, index reader -- flex!!!), they can do it locally. 
So Bob could change like Ernest proposed.

If we do not have a default, all users will keep stuck with lucene 2.4, because 
they do not care about version (it is not required, because it defaults to 2.4 
for BW compatibility). So lots of configs will never use the new unicode 
features of Lucene 3.1. And suddenly Lucene 4.0 comes out and all support for 
Lucene  3 is removed, then all users cry. With a default version set to 2.4, 
they will then get a runtime error in Lucene 4.0, saying that Version.LUCENE_24 
is no longer available as enum constant.

If you really do not want to have a default version in config (not schema, 
because it applies to *all* lucene components), then you should go the way like 
Lucene 3.0: Require a matchVersion for all components. As there may be 
tokenstream components not from lucene, make this attribute in the schema only 
mandatory for lucene-streams (this can be done by my initial patch, too: if the 
matchVersion property is missing then the matchVersion will get NULL and the 
factory should thow IAE if required. In my original patch, only the parsing 
code should be moved out of the factory into a util class in solr. Maybe also 
possible to parse x.y-style versions).

The problem here: Users upgrading from solr 1.4 will suddenly get errors, 
because their configs get invalid.
  
 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch, 

Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()

2010-01-05 Thread Chris Hostetter

:  This looks like a hack. It currently only uses highlighter for
:  prefetching docs and fields . There is no standard way of other
:  components to take part in this.

It was an optimization that improved the common when it was added, but it 
predates all of the search component stuff, so that's not much of a 
suprise that it's not easy for components to use.

: Or we can add a method to ResponseBuilder.addPrefetchFields(String[]
: fieldNames) and SearchComponents can use this in prepare()/process()
: to express interest in prefetching.

I would suggest using something like a FieldSelector instead of a 
String[], so that components wanting all files that match a rule don't 
have to manifest a large array.

I can't put my finger on it, but it feels like there is a larger issue 
here ... relating to SOLR-1298, and how/when a DocList might/should get 
manifested as DocumentList ...

It would be fairly easy to modify optimizePreFetchDocs to check properties 
on the ResponseBuilder (via the SolrQueryRequest) to decide which fields 
to ask for, but ultimately all optimizePreFetchDocs does is ask the 
SolrIndexSearche to load the doc and then throws it away (relying on the 
documentCache to have those fields handy for later use).  as long as we're 
changing that, we might as well make it do ... something ... better.


-Hoss



[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-01-05 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796937#action_12796937
 ] 

Hoss Man commented on SOLR-1677:




bq. Version applies to all of lucene (even more than tokenstreams), so for Carl 
to imply that you don't need to reindex by bumping Version simply because you 
aren't using X or Y or Z, for that he should be renamed Oscar.

Ok, fair enough ... i was supposing in that example that since i called it 
{{luceneAnalyzerVersionDefault/}} it was clearly specific to analysis objects 
in schema.xml and didn't affect any of the other things Version is used for 
(which would be specified in solrconfig.xml)

bq. i guess he is probably using Windows 3.1 still too because he doesn't want 
to upgrade ever.

No, he uses an OS where he can upgrade indivudal things individually with clear 
implications -- he sets {{luceneMatchVersion=2.9}} on each and every 
{{analyzer/}}, {{tokenizer/}} and {{filter/}} that he declares in his 
schema so that he knows exactly what behavior is changing when he modifies any 
of them.

bq. personally I don't want all users to be stuck with Version.LUCENE_24 
forever. 

I still must be missing something? ... why would all users be stuck with 
Version.LUCENE_24 forever?   

I'm not advocating that we don't allow a way to specify Version, i'm saying 
that having a global value for it that affects things opaquely sounds dangerous 
-- we should certianly have a way for people to specify the Version they want 
on each of the objects that care, but it shouldn't be global.  The 
luceneMatchVersion property that Uwe added to BaseTokenizerFactory and 
BaseTokenFilterFactory in his patch seems perfect to me, it's just the 
{{SolrCoreAware}} / {{core.getSolrConfig().luceneMatchVersion}} that i think is 
a bad idea.

If we modify the analyzer/ initialization to allow constructor args as Erik 
suggested (I'm pretty sure there's already code in Solr to do this, we just 
aren't using it for Analyzers) then we should be good to go for everything in 
schema.xml

If anything declared in solrconfig.xml starts caring about Version (QParser, 
SolrIndexWriter, etc...) then likewise it should get a luceneMatchVersion 
init property as well.  No one will ever be stuck with LUCENE_24, but they 
won't be surprised by behavior changes either.

bq. If we do not have a default, all users will keep stuck with lucene 2.4, 
because they do not care about version (it is not required, because it defaults 
to 2.4 for BW compatibility). So lots of configs will never use the new unicode 
features of Lucene 3.1.

I don't believe that.  Almost every solr user on the planet starts with the 
example configs.  if the example configs start specifying 
luceneMatchVersion=2.9 on every analyzer and factory then people will care 
about Version just as much as they care about the stopwords.txt file that ships 
with solr -- that may be not at all, or it may be a lot, but it will be up to 
them, and it will be obvious to them, because it's right there in the 
declaration where they can see it, and easy for them to refrence and recognize 
that changing that value will affect things.

bq. If you really do not want to have a default version in config (not schema, 
because it applies to all lucene components), then you should go the way like 
Lucene 3.0: Require a matchVersion for all components.

I'm totally on board with that idea in the long run -- but there are ways to 
get there gradually that are back compatible with existing configs.  Individual 
factories that care about luceneMatchVersion should absolutely start warning on 
startup that users should set luceneMatchVersion to get newer/better behavior 
may be available if it is unset (or doesn't match the current value of 
Version.LUCENE_CURRENT) and provide a URL for a wiki page somewhere where more 
detail is available.  The Analyzer init code can do likewise if if sees an 
{{analyzer class=.../}} being inited w/ a constructor that takes in a 
Version which is using an old value.


 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
 SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and 

[jira] Commented: (SOLR-1212) TestNG Test Case

2010-01-05 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796953#action_12796953
 ] 

Hoss Man commented on SOLR-1212:


I'm on the fence ... 

i agree it's (probably) useful to TestNG users and i would like to do as much 
as possible to make it easy for people to use the TestHarness (regardless of 
how they write tests) but the idea of including it in the release does smell 
fishy if we're not actually using it anywhere in Solr -- it may not seem like 
much overhead to maintain it, but if it never gets used internally then it's 
not really clear if/when there are problems with it (even Test code needs to be 
tested to be sure that it's not broken).

If it's not included in the Solr repository, then it may fall out of sync with 
Solr -- but that's true of any plugin someone writes and hosts on sourceforge, 
or github, or googlecode -- we can advertise that it works with Solr 1.4, and 
if something changes in Solr 1.5, or Solr 1.6 or Solr 9.7 that breaks it then 
interested parties are free to update it with new version that does work.

...If i knew more about TestNG i might be able to form a stronger opinion like 
this is awesome, it's super useful, we should include it or this doesn't 
really provide any value add to users but I just don't know enough either way.


 TestNG Test Case 
 -

 Key: SOLR-1212
 URL: https://issues.apache.org/jira/browse/SOLR-1212
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.4
 Environment: Java 6
Reporter: Kay Kay
 Fix For: 1.5

 Attachments: SOLR-1212.patch, testng-5.9-jdk15.jar

   Original Estimate: 1h
  Remaining Estimate: 1h

 TestNG equivalent of AbstractSolrTestCase , without using JUnit altogether . 
 New Class created: AbstractSolrNGTest 
 LICENSE.txt , NOTICE.txt modified as appropriate. ( TestNG under Apache 
 License 2.0 ) 
 TestNG 5.9-jdk15 added to lib. 
 Justification:  In some workplaces - people are moving towards TestNG and 
 take out JUnit altogether from the classpath. Hence useful in those cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-01-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796965#action_12796965
 ] 

Robert Muir commented on SOLR-1677:
---

{quote}
No, he uses an OS where he can upgrade indivudal things individually with clear 
implications - he sets luceneMatchVersion=2.9 on each and every analyzer/, 
tokenizer/ and filter/ that he declares in his schema so that he knows 
exactly what behavior is changing when he modifies any of them.
{quote}

Yeah, but this isnt how Version works in lucene either, please see below

{quote}
I'm not advocating that we don't allow a way to specify Version, i'm saying 
that having a global value for it that affects things opaquely sounds dangerous 
- we should certianly have a way for people to specify the Version they want on 
each of the objects that care, but it shouldn't be global. The 
luceneMatchVersion property that Uwe added to BaseTokenizerFactory and 
BaseTokenFilterFactory in his patch seems perfect to me, it's just the 
SolrCoreAware / core.getSolrConfig().luceneMatchVersion that i think is a bad 
idea.
{quote}

And I disagree, I think that the per-tokenfilter matchVersion should be the 
expert use, with the default global Version being the standard use. 

I don't think Version is intended so you can use X.Y on this part and Y.Z on 
this part and have any chance of anything working, for example it controls 
position increments on stopfilter but also in queryparser, if you use wacky 
combinations, things might not work.

And I personally don't see anyone putting effort into supporting this either, 
because its enough to supply the back compat for previous versions, but not 
some cross product of all possible versions. this is too much. sometimes things 
interact in ways we cannot detect automatically (such as the query parser 
phrasequery / stopfilter thing), its my understanding that things like this are 
why Version was created in the first place.


 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
 SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
 much more Unicode support, almost every Tokenizer/TokenFilter needs this 
 Version parameter. In 2.9, the deprecated old ctors without Version take 
 LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
 This patch adds basic support for the Lucene Version property to the base 
 factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
 contains a helper map to decode the version strings, but in 3.0 is can be 
 replaced by Version.valueOf(String), as the Version is a subclass of Java5 
 enums. The default value is Version.LUCENE_24 (as this is the default for the 
 no-version ctors in Lucene).
 This patch also removes unneeded conversions to CharArraySet from 
 StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
 to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SolrException.ErrorCode

2010-01-05 Thread Chris Hostetter

: Any reason why ErrorCode.code isn't public final?  It seems weird that we have

no idea (to clarify, it was final and grant recently made it public final)

: public void assertQEx(String message, SolrQueryRequest req, int code ) {
: 
: instead of
: public void assertQEx(String message, SolrQueryRequest req, 
SolrException.ErrorCode ) {

I think because that assert code predates the ErrorCode class (once upon a 
time Solr was also using numeric codes like 0, -1 for some internal 
errors) and when the ErrorCode class got added, no one updated the 
AbstractTestCase to make it easy to use.

: Also, if it is public final, there really isn't any harm in exposing it 
: elsewhere.  However, it does seem weird that we have these codes, but 
: they aren't logged either, AFAICT.

They do get logged, but indirectly -- since ErrorCode is an inner class of 
SolrException the SolrException class has always had access to it even 
though it wasn't public, so it inspects it when it's used to construct a 
SolrException, and then SolrException has a public int code() method for 
returning it.  that method is used by the SolrDispatchFilter to 
set the response code, which the Servlet Container uses for logging.

: Finally, do we have a plan for how they should be used?  Can we add new 
: values to them?  Do we document anywhere what they mean?

The enum names seem fairly self documenting.  The ErrorCode enum came 
about in SOLR-249, after there had been a push to move away from
inconsistent values depending on where the SolrException was 
expected to wind up (0,-1 wsa getting used by some internal exception, 
and code that the UpdateServlet dealt with, while SOlrExceptions that were 
getting thrown to the user used HTTP status codes.

I think we should try to stick with using a subset of the HTTP codes so we 
can always be safe leaking them to outside clients (via the Servlet 
Containers default error page mechanism).  If we feel like we need finer 
grained control then that in some cases we could consider adding a 
sub-code to the ErrorCode enum -- but that sounds like it would 
smell fishy.  If we find ourselves wanting more detial like that, we 
should probably subclass SolrException instead of adding more codes (we 
should probably be subclassing SolrException a lot more anyway)




-Hoss



Re: NPE in MoreLikeThis referenced doc not found and debugQuery=True

2010-01-05 Thread Chris Hostetter

: When I do a specific MLT search on a document with debugQuery=True I am 
getting
: a NullPoniterException both on screen and in my catalina logs. The query is as
: follows
: 
: 
http://localhost:8080/solr2/select/?mlt.minwl=3mlt.fl=bodymlt.mintf=1mlt.maxwl=15mlt.maxqt=20version=1.2rows=5mlt.mindf=1fl=nid,title,path,url,digest,teaserstart=0q=nid:16036qt=mltdebugQuery=true
: 
: Is this desired behavior? 

An NPE is never desired behavior ... can you elaborate on which version of 
Solr you are using?  and to clarify: you are saying you only get this when 
debugQuery=true ... correct?

: org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.java:399)
:     at
: 
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandle
: r.java:189)

The MLT Handler (not to be confused with the MLT Component) does things a 
bit differnetly then most other handlers, so it's not suprising that 
something like this might have gotten overlooked.  skimming the code i 
don't see any obvious reason why it should encounter an NPE however.

can you reproduce this using hte example configs/data, or is it something 
special about your data?



-Hoss


[jira] Commented: (SOLR-534) Return all query results with parameter rows=-1

2010-01-05 Thread Lisa Carter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796982#action_12796982
 ] 

Lisa Carter commented on SOLR-534:
--

I would argue that REALLY_BIG_NUMBER is actually significantly MORE dangerous 
than a crash. 

Here's why: A crash at least lets the programmer know something went wrong. 
Missing data is a silent failure. 

1) If the result set is too large for the client, it will run out of memory and 
generate an exception. The programmer will immediately know they did something 
wrong.

2) If the result set is too large for the network (unlikely) this will 
disconnect and fail. The programmer will immediately know they did something 
wrong.

3) If the result set is too large for solr, solr should not crash but rather 
return a page with the standard error handler result set too large/out of 
memory. The programmer will immediately know they did something wrong. Solr 
sure as heck better be checking this already--you never know when you'll run 
into bizarre low memory conditions;allocations should ALWAYS be checked for.

But if you use the REALLY_BIG_NUMBER approach, the same bad programmer who 
never thought he would get back more than a 1000 records will never check 
whether the result set contains more than 1000 records either. If the 
programmer was expecting the complete result set and the database now contains 
1002 records instead of 999, they will not know there is a problem... the last 
records in the set are simply truncated. The programmer who wrote the code may 
not be the person maintaining the application, quite common in production 
environments. The maintenance person may not know for weeks or months that a 
problem even exists! 

The -1 approach ensures immediate, loud failure.

The REALLY_BIG_NUMBER ensures only silent failure.

While it's impossible to idiot-proof everything, loud failure is always 
preferable to silent failure. Barking loudly saves the poor soul who maintains 
the idiot's code a lot of heartache.


 Return all query results with parameter rows=-1
 ---

 Key: SOLR-534
 URL: https://issues.apache.org/jira/browse/SOLR-534
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Priority: Minor
 Attachments: solr-all-results.patch


 The searcher should return all results matching a query when the parameter 
 rows=-1 is given.
 I know that it is a bad idea to do this in general, but as it explicitly 
 requires a special parameter, people using this feature will be aware of what 
 they are doing. The main use case for this feature is probably debugging, but 
 in some cases one might actually need to retrieve all results because they 
 e.g. are to be merged with results from different sources.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1212) TestNG Test Case

2010-01-05 Thread Kay Kay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797004#action_12797004
 ] 

Kay Kay commented on SOLR-1212:
---

@Shalin , @HossMan - I understand the pain of maintaining this one separate 
from Junit - but was concerned mostly about this being out of date with the 
tree. 

As regarding comparison between TestNG vs JUnit - one big advantage is to 
categorize the tests as different groups in testng and run them separately 
(especially useful as the code base of solr gets bigger + contrib ). Which is 
one of the primary reasons (After evaluating both ) - testng was chosen.  So  - 
if you guys are not comfortable with the patch - then (as shalin noted) - just 
make an entry in the wiki and leave this one as such. The code is definitely 
not big to warrant a sf project / github / google code at this point. 

A better patch would be to refactor existing JUnit Test case so that the testng 
version minimizes duplication as much as possible. 

 TestNG Test Case 
 -

 Key: SOLR-1212
 URL: https://issues.apache.org/jira/browse/SOLR-1212
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.4
 Environment: Java 6
Reporter: Kay Kay
 Fix For: 1.5

 Attachments: SOLR-1212.patch, testng-5.9-jdk15.jar

   Original Estimate: 1h
  Remaining Estimate: 1h

 TestNG equivalent of AbstractSolrTestCase , without using JUnit altogether . 
 New Class created: AbstractSolrNGTest 
 LICENSE.txt , NOTICE.txt modified as appropriate. ( TestNG under Apache 
 License 2.0 ) 
 TestNG 5.9-jdk15 added to lib. 
 Justification:  In some workplaces - people are moving towards TestNG and 
 take out JUnit altogether from the classpath. Hence useful in those cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1212) TestNG Test Case

2010-01-05 Thread Kay Kay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797005#action_12797005
 ] 

Kay Kay commented on SOLR-1212:
---

@HossMan - Interesting thread from stackoverflow - 
http://stackoverflow.com/questions/153427/is-there-a-junit-testrunner-for-running-groups-of-tests
  . Not sure - how much of junit has kept up with TestNG recently but TestNG is 
definitely a notch up (IMHO, of course). 

 TestNG Test Case 
 -

 Key: SOLR-1212
 URL: https://issues.apache.org/jira/browse/SOLR-1212
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.4
 Environment: Java 6
Reporter: Kay Kay
 Fix For: 1.5

 Attachments: SOLR-1212.patch, testng-5.9-jdk15.jar

   Original Estimate: 1h
  Remaining Estimate: 1h

 TestNG equivalent of AbstractSolrTestCase , without using JUnit altogether . 
 New Class created: AbstractSolrNGTest 
 LICENSE.txt , NOTICE.txt modified as appropriate. ( TestNG under Apache 
 License 2.0 ) 
 TestNG 5.9-jdk15 added to lib. 
 Justification:  In some workplaces - people are moving towards TestNG and 
 take out JUnit altogether from the classpath. Hence useful in those cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1698) load balanced distributed search

2010-01-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797011#action_12797011
 ] 

Noble Paul commented on SOLR-1698:
--

LBHttpSolrServer can have the concept of a sticky session and the session 
object can be used for all shard requests made in a single solr request.

 load balanced distributed search
 

 Key: SOLR-1698
 URL: https://issues.apache.org/jira/browse/SOLR-1698
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley

 Provide syntax and implementation of load-balancing across shard replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.