date:20100126

Build failed in Hudson: Solr-trunk #1043

2010-01-26 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1043/

--
[...truncated 2370 lines...]
[junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.985 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.419 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 32.127 sec
[junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 50.634 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 43.331 sec
[junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.016 sec
[junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.409 sec
[junit] Running 
org.apache.solr.client.solrj.response.AnlysisResponseBaseTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.41 sec
[junit] Running 
org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.391 sec
[junit] Running 
org.apache.solr.client.solrj.response.FieldAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.529 sec
[junit] Running org.apache.solr.client.solrj.response.QueryResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.799 sec
[junit] Running org.apache.solr.client.solrj.response.TermsResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.674 sec
[junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 11.43 sec
[junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.401 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.435 sec
[junit] Running org.apache.solr.common.params.ModifiableSolrParamsTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.389 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.518 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.852 sec
[junit] Running org.apache.solr.common.util.DOMUtilTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.655 sec
[junit] Running org.apache.solr.common.util.FileUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.354 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.409 sec
[junit] Running org.apache.solr.common.util.NamedListTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.509 sec
[junit] Running org.apache.solr.common.util.TestFastInputStream
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.549 sec
[junit] Running org.apache.solr.common.util.TestHash
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.595 sec
[junit] Running org.apache.solr.common.util.TestNamedListCodec
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.577 sec
[junit] Running org.apache.solr.common.util.TestXMLEscaping
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.452 sec
[junit] Running org.apache.solr.core.AlternateDirectoryTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.485 sec
[junit] Running org.apache.solr.core.AlternateIndexReaderTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.295 sec
[junit] Running org.apache.solr.core.IndexReaderFactoryTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.562 sec
[junit] Running org.apache.solr.core.RequestHandlersTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.998 sec
[junit] Running org.apache.solr.core.ResourceLoaderTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.483 sec
[junit] Running org.apache.solr.core.SOLR749Test
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.898 sec
[junit] Running org.apache.solr.core.SolrCoreTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 14.443 sec
[junit] Running org.apache.solr.core.TestArbitraryIndexDir
[junit] Tests run: 1,

configure FastVectorHihglighter in trunk

2010-01-26 Thread Marc Sturlese


How do I activate FastVectorHighlighter in trunk? Wich of those params sets
it up?
   !-- Configure the standard fragListBuilder --
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=true/

   !-- Configure the standard fragmentsBuilder --
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=true/

   fragmentsBuilder name=scoreOrder
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
default=true/

Thanks in advance.
-- 
View this message in context: 
http://old.nabble.com/configure-FastVectorHihglighter-in-trunk-tp27319976p27319976.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

Need hardware recommendation

2010-01-26 Thread Jayesh Wadhwani

I am trying to do the following:

Index 6 Million database records( SQL Server 2008). Full index daily.
Differential every 15 minutes

Index 2 Million rich documents. Full index weekly. Differential every 15
minutes

Search queries: 1 per minute

20 cores

I am looking for hardware recommendations.

Any advice/recommendation will be appreciated.

-Jayesh Wadhwani

Re: configure FastVectorHihglighter in trunk

2010-01-26 Thread Koji Sekiguchi


Marc Sturlese wrote:

How do I activate FastVectorHighlighter in trunk? Wich of those params sets
it up?
   !-- Configure the standard fragListBuilder --
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=true/

   !-- Configure the standard fragmentsBuilder --
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=true/

   fragmentsBuilder name=scoreOrder
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
default=true/

Thanks in advance.
  

You do not need to activate it. DefaultSolrHighlighter, which is the
default SolrHighlighter impl, calls automatically uses FVH when you
specify field names that are termVectors, termPositions and termOffsets
are true through hl.fl parameter. If you want to use multi colored tag
feature, you need to specify MultiColored*FragmentsBuilder in 
solrconfig.xml.


Koji

--
http://www.rondhuit.com/en/

[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2010-01-26 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805051#action_12805051
 ] 

Jan Høydahl commented on SOLR-1725:
---

Lance, what do you mean by DIH language?

In my example xml that you quoted, the two first processors, 
FileReaderProcessorFactory and TikaProcessorFactory, were supposed to be 
(imagined) ordinary Java processors, not scripting ones.

Uri, I'd prefer if the manner of configuration was as similar as possible, i.e. 
if we could get rid of the lst name=params part, and instead pass all 
top-level params directly to the script (except the scripts param itself).

Even better if the definition of a processor was in a separate xml section and 
then refer by name only in each chain, but that is a bigger change outside 
scope of this patch.


 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1733) Allow specification of date output format on date faceting

2010-01-26 Thread Chris A. Mattmann (JIRA)

Allow specification of date output format on date faceting
--

 Key: SOLR-1733
 URL: https://issues.apache.org/jira/browse/SOLR-1733
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
 Environment: my local mac book pro.
Reporter: Chris A. Mattmann
 Fix For: 1.5


It would be really great if the facet component allowed the specification of 
the date output format, so that e.g., if I wanted to facet by month, I could 
also specify what the resultant date facets look like. In other words, a facet 
query like this:

http://localhost:8993/solr/select/?q=*:*version=2.2start=0rows=146indent=onfacet=onfacet.date=startdatefacet.date.start=NOW/YEAR-50YEARSfacet.date.end=NOWfacet.date.gap=%2B1MONTHfacet.date.output=yy-MM-dd

Showed output like:

{code:xml}
lst name=facet_dates
lst name=startdate
int name=1960-01-010/int
int name=1960-02-011/int
int name=1960-03-010/int
int name=1960-04-010/int
int name=1960-05-012/int
...
/lst
/lst
{code}

Patch forthcoming that implements this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Folder monitoring for diferential index

2010-01-26 Thread Zacarias

Hi, We are developing a folder monitor (watchdog) to add in a client's Sor
implementation.
Tell me if it could be interesting for any of you guys, we will be glad to
share it with the community and eventually to integrate in the trunk.
Also I would like the gurus opinion about the design (I wrote a proposal).

Awaiting for feedback

http://www.linebee.com/wp-content/uploads/2010/01/File_monitoring_for_diferencial_index_1.rtf
http://www.linebee.com/wp-content/uploads/2010/01/File_monitoring_for_diferencial_index_1.rtf%20

linebee labs http://www.linebee.com/?page_id=71

Zacarias
zacar...@linebee.com
www.linebee.com

Re: Folder monitoring for diferential index

2010-01-26 Thread Zacarias

http://www.linebee.com/wp-content/uploads/2010/01/File_monitoring_for_diferencial_index_1.rtf


On Tue, Jan 26, 2010 at 12:06 PM, Zacarias zacar...@linebee.com wrote:

 Hi, We are developing a folder monitor (watchdog) to add in a client's Sor
 implementation.
 Tell me if it could be interesting for any of you guys, we will be glad to
 share it with the community and eventually to integrate in the trunk.
 Also I would like the gurus opinion about the design (I wrote a proposal).

 Awaiting for feedback


 http://www.linebee.com/wp-content/uploads/2010/01/File_monitoring_for_diferencial_index_1.rtf
  
 http://www.linebee.com/wp-content/uploads/2010/01/File_monitoring_for_diferencial_index_1.rtf%20

 linebee labs http://www.linebee.com/?page_id=71

 Zacarias
 zacar...@linebee.com
 www.linebee.com

Problem with German Wordendings

2010-01-26 Thread David Rühr


Hi List.

We have made a suggest search and send this query with a facet.prefix 
kinderzim:


facet=on
facet.prefix=kinderzim
facet.mincount=1
facet.field=content
facet.limit=10
fl=content
omitHeader=true
bf=log%28supplier_faktor%29
version=1.2
wt=json
json.nl=map
q=
start=0
rows=0


Now we get:
lst name=content
 int name=kinderzimm7/int
/lst

SolR doesn't return the endings of the output Words. It must be 
kinderzimmer same with kindermode, we get kindermod.
We add the words in our protwords.txt and include them with this line in 
schema.xml.
filter class=solr.SnowballPorterFilterFactory language=German 
protected=protwords.txt/


Can anybody help us?


Thanks and sorry about my english.
So Long , David

how to sort facets?

2010-01-26 Thread David Rühr


hi,

we make a Filter with Faceting feature. In our faceting list the order 
is by count by the matches:

facet.sort=count

but we need to sort by = facet.sort=manufacturer.
Url manipulation doesn't change anything, why?

select?fl=*%2Cscorefq=type%3Apagespellcheck=truefacet=truefacet.mincount=1facet.sort=manufacturerbf=log(supplier_faktor)facet.field=supplierfacet.field=manufacturerversion=1.2q=kindstart=0rows=10

so long,
David

[jira] Updated: (SOLR-1283) Mark Invalid error on indexing

2010-01-26 Thread Julien Coloos (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Coloos updated SOLR-1283:


Attachment: SOLR-1283.patch

The issue is also happening in current trunk (revision 903234), with the class 
{{HTMLStripCharFilter}} (replacing deprecated {{HTMLStripReader}} it seems).

Example of stacktrace:
{noformat}
26 janv. 2010 16:02:56 org.apache.solr.common.SolrException log
GRAVE: java.io.IOException: Mark invalid
at java.io.BufferedReader.reset(BufferedReader.java:485)
at org.apache.lucene.analysis.CharReader.reset(CharReader.java:63)
at 
org.apache.solr.analysis.HTMLStripCharFilter.restoreState(HTMLStripCharFilter.java:172)
at 
org.apache.solr.analysis.HTMLStripCharFilter.read(HTMLStripCharFilter.java:734)
at 
org.apache.solr.analysis.HTMLStripCharFilter.read(HTMLStripCharFilter.java:748)
at java.io.Reader.read(Reader.java:122)
at 
org.apache.lucene.analysis.CharTokenizer.incrementToken(CharTokenizer.java:77)
at 
org.apache.lucene.analysis.ISOLatin1AccentFilter.incrementToken(ISOLatin1AccentFilter.java:43)
at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:383)
at 
org.apache.lucene.analysis.ISOLatin1AccentFilter.next(ISOLatin1AccentFilter.java:64)
at 
org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:379)
at 
org.apache.lucene.analysis.TokenStream.incrementToken(TokenStream.java:318)
at 
org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:225)
at 
org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:38)
at 
org.apache.solr.analysis.SnowballPorterFilter.incrementToken(SnowballPorterFilterFactory.java:116)
at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:406)
at 
org.apache.solr.analysis.BufferedTokenStream.read(BufferedTokenStream.java:97)
at 
org.apache.solr.analysis.BufferedTokenStream.next(BufferedTokenStream.java:83)
at 
org.apache.lucene.analysis.TokenStream.incrementToken(TokenStream.java:321)
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
at 
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:781)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:764)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2630)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2602)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:723)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at

[jira] Commented: (SOLR-1722) Allowing changing the special default core name, and as a default default core name, switch to using collection1 rather than DEFAULT_CORE

2010-01-26 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805093#action_12805093
 ] 

Mark Miller commented on SOLR-1722:
---

Right - the issue with cloud was that if you ask for the core name, it's known 
as  rather than its other name.

So their are def advantages to doing this in the dispatch filter - you can use 
the non  name and you can get rid of all the normalization going on in core 
container - 

the problem is back compat I think - if we just normalize in the 
DispatchFilter, anyone counting on getting the default core with getCore() 
now will have their code broken.

 Allowing changing the special default core name, and as a default default 
 core name, switch to using collection1 rather than DEFAULT_CORE
 ---

 Key: SOLR-1722
 URL: https://issues.apache.org/jira/browse/SOLR-1722
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1722.patch


 see 
 http://search.lucidimagination.com/search/document/f5f2af7c5041a79e/default_core

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-1711) Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java

2010-01-26 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yonik Seeley resolved SOLR-1711.

Resolution: Fixed

Thanks Attila! I just committed this.

Race condition in
org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
--

Key: SOLR-1711
URL: https://issues.apache.org/jira/browse/SOLR-1711
Project: Solr
Issue Type: Bug
Components: clients - java
Affects Versions: 1.4, 1.5
Reporter: Attila Babo
Priority: Critical
Fix For: 1.5

Attachments: StreamingUpdateSolrServer.patch

Original Estimate: 1h
Remaining Estimate: 1h

While inserting a large pile of documents using StreamingUpdateSolrServer
there is a race condition as all Runner instances stop processing while the
blocking queue is full. With a high performance client this could happen
quite often, there is no way to recover from it at the client side.
In StreamingUpdateSolrServer there is a BlockingQueue called queue to store
UpdateRequests, there are up to threadCount number of workers threads from
StreamingUpdateSolrServer.Runner to read that queue and push requests to a
Solr instance. If at one point the BlockingQueue is empty all workers stop
processing it and pushing the collected content to Solr which could be a time
consuming process, sometimes all worker threads are waiting for Solr. If at
this time the client fills the BlockingQueue to full all worker threads will
quit without processing any further and the main thread will block forever.
There is a simple, well tested patch attached to handle this situation.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-01-26 Thread Hoss Man (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805167#action_12805167
]

Hoss Man commented on SOLR-1677:

bq. And here are the JIRA issues for stemming bugs, since you didnt take my
hint to go and actually read them.

sigh. I read both those issues when you filed them, and I agreed with your
assessment that they are bugs we should fix -- if i had thought you were wrong
i would have said so in the issue comments.

But that doesn't change the fact that sometimes people depend on buggy behavior
-- and sometimes those people depend on the buggy behavior without even
realizing it. Bug fixes in a stemmer might make it more correct according to
the stemmer algorithm specification, or the language semantics, but in some
peculuar use cases an application might find the correct implementation less
useful then the previous buggy version.

This is one reason why things like CHANGES.txt are important: to draw attention
to what has changed between two versions of a piece of software, so people can
make informed opinions about what they should test in their own applications
when they upgrade things under the covers. luceneMatchVersion should be no
different. We should try to find a simple way to inform people when you
switch from luceneMatchVersion=X to luceneMatchVersion=Y here are the bug fixes
you will get so they know what to test to determine if they are adversely
affected by that bug fix in some way (and find their own work around)

bq. Perhaps you should come up with a better example than stemming, as you
don't know what you are talking about.

1) It's true, I frequently don't know what i'm talking about ... this issue was
a prime example, and i thank you, Uwe, and Miller for helping me realize that i
was completely wrong in my understanding about the intended purpose of
o.a.l.Version, and that a global setting for it in Solr makes total sense --
But that doesn't make my concerns about documenting the affects of that global
setting any less valid.

2) Perhaps you should read the StopFilter example i already posted in my last
comment...

{quote}
bq. Robert mentioned in an earlier comment that StopFilter's position increment
behavior changes depending on the luceneMatchVersion -- what if an existing
Solr 1.3 user notices a bug in some Tokenizer, and adds
{{luceneMatchVersion3.0/luceneMatchVersion}} to his schema.xml to fix it.
Without clear documentation n _everything_ that is affected when doing that, he
may not realize that StopFilter changed at all -- and even though the position
incrememnt behavior may now be more correct, it might drasticly change the
results he gets when using dismax with a particular qs or ps value. Hence my
point that this becomes a serious documentation concern: finding a way to make
it clear to users what they need to consider when modifying luceneMatchVersion.
{quote}

Add support for o.a.lucene.util.Version for BaseTokenizerFactory and
BaseTokenFilterFactory
---

Key: SOLR-1677
URL: https://issues.apache.org/jira/browse/SOLR-1677
Project: Solr
Issue Type: Sub-task
Components: Schema and Analysis
Reporter: Uwe Schindler
Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch,
SOLR-1677.patch

Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards
compatibility with old indexes created using older versions of Lucene. The
most important example is StandardTokenizer, which changed its behaviour with
posIncr and incorrect host token types in 2.4 and also in 2.9.
In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with
much more Unicode support, almost every Tokenizer/TokenFilter needs this
Version parameter. In 2.9, the deprecated old ctors without Version take
LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
This patch adds basic support for the Lucene Version property to the base
factories. Subclasses then can use the luceneMatchVersion decoded enum (in
3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently
contains a helper map to decode the version strings, but in 3.0 is can be
replaced by Version.valueOf(String), as the Version is a subclass of Java5
enums. The default value is Version.LUCENE_24 (as this is the default for the
no-version ctors in Lucene).
This patch also removes unneeded conversions to CharArraySet from
StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed
to match Lucene 3.0.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: CHANGES.txt updates for SOLR-1516 and SOLR-1592

2010-01-26 Thread Chris Hostetter


: Not to be a best, but there's no CHANGES.txt updates for SOLR-1516 and
: SOLR-1592. Could someone update them? A trivial patch is attached...

Sorry about that.

Every change (with the possible exception of fixing formating or 
documentation typos) *should* have a CHANGES.txt entry.

Every change that affects the public API *MUST* have a CHANGES.txt entry.

Committed revision 903398.


-Hoss

[jira] Commented: (SOLR-1603) Perl Response Writer

2010-01-26 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805186#action_12805186
 ] 

Hoss Man commented on SOLR-1603:


I realize this is analogous to the python, php, and ruby writers, but while i 
can't speak much to how those (language) communities feel about evaling code 
from remote sources to generate data structures, i know that the majority of 
the Perl community considers that a bad practice ... it's the reason things 
like YAML was created: to allow simple serialization w/o needing to execute 
untrusted code.

So i'm a little leery about adding this (beyond my general leeryness of adding 
code w/o tests).

 Perl Response Writer
 

 Key: SOLR-1603
 URL: https://issues.apache.org/jira/browse/SOLR-1603
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Reporter: Claudio Valente
Priority: Minor
 Attachments: SOLR-1603.patch


 I've made a patch that implements a Perl response writer for Solr.
 It's nan/inf and unicode aware.
 I don't know whether some fields can be binary but if so I can probably 
 extend it to support that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-01-26 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805187#action_12805187
]

Robert Muir commented on SOLR-1677:
---

bq. 2) Perhaps you should read the StopFilter example i already posted in my
last comment...

https://issues.apache.org/jira/browse/LUCENE-2094?focusedCommentId=12783932page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12783932

as far as this one goes, i specifically commented before on this not being
'hidden' by Version (with Solr users in mind) but instead its own option that
every user should consider, regardless of defaults.

For the stopfilter posInc the user should think it through, its pretty strange,
like i mention in my comment, that a definite article like 'the' gets a posInc
bump in one language but not another, simply because it happens to be separated
by a space.

I guess I could care less what the default is, if you care about such things
you shouldn't be using the defaults and instead specifying this yourself in the
schema, and Version has no effect. I can't really defend the whole stopfilter
posInc thing, as again i think it doesn't make a whole lot of sense, maybe it
works good for english I guess, I won't argue about it.

Add support for o.a.lucene.util.Version for BaseTokenizerFactory and
BaseTokenFilterFactory
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1718) Carriage return should submit query admin form

2010-01-26 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805190#action_12805190
 ] 

Hoss Man commented on SOLR-1718:


I don't understand what you mean.  both forms use a {{textarea}}, why should 
the behavior of one textarea be different from the behavior of the other (and 
every other html textarea on the web) ?

 Carriage return should submit query admin form
 --

 Key: SOLR-1718
 URL: https://issues.apache.org/jira/browse/SOLR-1718
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 1.4
Reporter: David Smiley
Priority: Minor

 Hitting the carriage return on the keyboard should submit the search query on 
 the admin front screen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1718) Carriage return should submit query admin form

2010-01-26 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805219#action_12805219
 ] 

David Smiley commented on SOLR-1718:


Consider the JIRA interface we are using to comment on this issue.  At the 
top-right of the screen is a QUICK SEARCH: box.  It doesn't even have a 
submit button, it just works with hitting the return key.

 Carriage return should submit query admin form
 --

 Key: SOLR-1718
 URL: https://issues.apache.org/jira/browse/SOLR-1718
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 1.4
Reporter: David Smiley
Priority: Minor

 Hitting the carriage return on the keyboard should submit the search query on 
 the admin front screen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1734) Add pid file to snappuller to skip script overruns, and recover from failure

2010-01-26 Thread Bill Au (JIRA)

Add pid file to snappuller to skip script overruns, and recover from failure


 Key: SOLR-1734
 URL: https://issues.apache.org/jira/browse/SOLR-1734
 Project: Solr
  Issue Type: Improvement
  Components: replication (scripts)
Affects Versions: 1.4
Reporter: Bill Au
Assignee: Bill Au
Priority: Minor


The pid file will allow snappuller to be run as fast as possible without 
overruns. Also it will recover from a last failed run should an older 
snappuller process no longer be running.  The same has already been done to 
snapinstaller in SOLR-990.  Overlapping snappuller could cause replication 
traffic to saturate the network if a large Solr index is being replicated to a 
large number of clients.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2010-01-26 Thread Uri Boness (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805235#action_12805235
 ] 

Uri Boness commented on SOLR-1725:
--

Lance, I lost you a bit as well.

bq. Uri, I'd prefer if the manner of configuration was as similar as possible, 
i.e. if we could get rid of the lst name=params part, and instead pass all 
top-level params directly to the script (except the scripts param itself).

Hmm... personally I prefer configurations that clearly indicate their purpose. 
leaving out the _params_ list will make things a bit confusing - some 
parameters are available for the scripts, others are not... it's not really 
clear.

bq. manner of configuration was as similar as possible

The configuration are similar. All elements in solrconfig.xml have one standard 
way of configuration which can be anything from a _lst_, _bool_, _str_, etc 
Tomorrow a new processor will popup which will also require a _lst_ 
configuration... and that's fine. 

bq.Even better if the definition of a processor was in a separate xml section 
and then refer by name only in each chain, but that is a bigger change outside 
scope of this patch.

Well, indeed that's a bigger change. Like everything, this kind of 
configuration has it's proscons.

I guess it's best if people will just state their preferences regarding how 
they would like to see this processor configured and based on that I'll adjust 
the patch.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1734) Add pid file to snappuller to skip script overruns, and recover from failure

2010-01-26 Thread Bill Au (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Au updated SOLR-1734:
--

Attachment: SOLR-1734.patch

I am reusing the code from SOLR-990 which adds the same feature to 
snapinstaller.  I have added a -f command line argument to force the 
snappuller to run even if one is already running.  That will be useful if 
network capacity is not an issue.

 Add pid file to snappuller to skip script overruns, and recover from failure
 

 Key: SOLR-1734
 URL: https://issues.apache.org/jira/browse/SOLR-1734
 Project: Solr
  Issue Type: Improvement
  Components: replication (scripts)
Affects Versions: 1.4
Reporter: Bill Au
Assignee: Bill Au
Priority: Minor
 Attachments: SOLR-1734.patch


 The pid file will allow snappuller to be run as fast as possible without 
 overruns. Also it will recover from a last failed run should an older 
 snappuller process no longer be running.  The same has already been done to 
 snapinstaller in SOLR-990.  Overlapping snappuller could cause replication 
 traffic to saturate the network if a large Solr index is being replicated to 
 a large number of clients.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: how to sort facets?

2010-01-26 Thread Koji Sekiguchi


David Rühr wrote:

hi,

we make a Filter with Faceting feature. In our faceting list the order 
is by count by the matches:

facet.sort=count

but we need to sort by = facet.sort=manufacturer.
Url manipulation doesn't change anything, why?

select?fl=*%2Cscorefq=type%3Apagespellcheck=truefacet=truefacet.mincount=1facet.sort=manufacturerbf=log(supplier_faktor)facet.field=supplierfacet.field=manufacturerversion=1.2q=kindstart=0rows=10 



so long,
David


Try facet.sort=index. facet.sort accepts only count or index.

http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort

Koji

--
http://www.rondhuit.com/en/

[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2010-01-26 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805247#action_12805247
 ] 

Hoss Man commented on SOLR-1725:


Some random comments/questions from the peanut gallery...

1) what is the value add in making ScriptUpdateProcessorFactory support 
multiple scripts ? ... wouldn't it be simpler to require that users declare 
multiple instances of ScriptUpdateProcessorFactory (that hte processor chain 
already executes in sequence) then to add sequential processing to the 
ScriptUpdateProcessor?

2) The NamedList init args can be as deep of a data structure as you want, so 
something like this would be totally feasible (if desired) ...

{code}
processor class=solr.ScriptUpdateProcessorFactory
  lst name=scripts
lst name=updateProcessor1.js
  bool name=someParamNametrue/bool
  int name=someOtherParamName3/int
/lst
lst name=updateProcessor2.js
  bool name=fooParamtrue/bool
  str name=barParam3/str
/lst
  /lst
  lst name=otherProcessorOPtionsIfNeeded
...
  /lst
/processor
{code}

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Upgrading to the latest versions of wstx jars

2010-01-26 Thread Chris Hostetter


: It has been a long time since the Woodstox jars have been updated. Is
: this intentional, as in are there any issues if we use the latest jars
: with Solr?

I have no idea ... dependencies tend to get updated when people point out 
that newer releases have bug fixes or new features we want to take 
advantage of.

If you could test out hte new version, and then open a Jira (to track 
upgrading the jar) where you posted your comments there about letting us 
know if it works better (or just plain works) that would be very helpful.


-Hoss

Re: Planned release date for 1.5 with SOLR-236 fixed?

2010-01-26 Thread Chris Hostetter


: Would anybody happen to know the planned release date for 1.5?

Release dates don't tend to be explicitly planned in advance.  Instead 
releases tend to coalesce when the community feels that new features 
warrant a new release, and that the APIs introduced by those new features 
are ready to be considered stable and supported.

: And if so, whether or not the final fix for SOLR-236 will be included.

I'm not really sure how to answer that ... SOLR-236 is not a bug 
report, it proposes a new feature -- so there is no fix  Many, MANY 
people are interested in various aspects of the feature proposed, and i 
know lots of people are looking to try and get some version(s) of that 
functionality committed, but the answer to wether or not it will be 
included in 1.5 will depend on when 1.5 happens and wether some version of 
hte functionality is committed to the trunk prior to that release.



-Hoss

[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2010-01-26 Thread Uri Boness (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805254#action_12805254
 ] 

Uri Boness commented on SOLR-1725:
--

bq. 1) what is the value add in making ScriptUpdateProcessorFactory support 
multiple scripts ? ... wouldn't it be simpler to require that users declare 
multiple instances of ScriptUpdateProcessorFactory (that hte processor chain 
already executes in sequence) then to add sequential processing to the 
ScriptUpdateProcessor?

Well... to my taste it makes the configuration cleaner (no need to define 
several script processors). The thing is, you have the choice here - either 
specify several scripts (comma separated) or split them to several processors.

bq. 2) The NamedList init args can be as deep of a data structure as you want, 
so something like this would be totally feasible (if desired) ...

That's definitely another option.

The only thing is that you'd probably want some way to define shared parameters 
(shared between the scripts that is) and not be forced to specify them several 
times for each script. I guess you can do something like this:

{code}
processor class=solr.ScriptUpdateProcessorFactory
  lst name=sharedParams
bool name=paramNametrue/bool
  /lst
  lst name=scripts
lst name=updateProcessor1.js
  bool name=someParamNametrue/bool
  int name=someOtherParamName3/int
/lst
lst name=updateProcessor2.js
  bool name=fooParamtrue/bool
  str name=barParam3/str
/lst
  /lst
  lst name=otherProcessorOPtionsIfNeeded
...
  /lst
/processor
{code}

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1735) shut down TimeLimitedCollection timer thread on application unload

2010-01-26 Thread Chris Darroch (JIRA)

shut down TimeLimitedCollection timer thread on application unload
--

 Key: SOLR-1735
 URL: https://issues.apache.org/jira/browse/SOLR-1735
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4, 1.3
Reporter: Chris Darroch


As described in https://issues.apache.org/jira/browse/LUCENE-2237, shutting 
down the timer thread created by Lucene's TimeLimitedCollector allows Tomcat or 
another application server to cleanly unload solr.war (or any application using 
Lucene, for that matter).

I'm attaching two patches for Solr 1.3 which use the patch provided in 
LUCENE-2237 to shut down the timer thread when a new servlet context listener 
for the solr.war application is informed the application is about to be 
unloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1735) shut down TimeLimitedCollection timer thread on application unload

2010-01-26 Thread Chris Darroch (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Darroch updated SOLR-1735:


Attachment: SOLR-1735-1_3.patch

 shut down TimeLimitedCollection timer thread on application unload
 --

 Key: SOLR-1735
 URL: https://issues.apache.org/jira/browse/SOLR-1735
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3, 1.4
Reporter: Chris Darroch
 Attachments: SOLR-1735-1_3.patch


 As described in https://issues.apache.org/jira/browse/LUCENE-2237, shutting 
 down the timer thread created by Lucene's TimeLimitedCollector allows Tomcat 
 or another application server to cleanly unload solr.war (or any application 
 using Lucene, for that matter).
 I'm attaching two patches for Solr 1.3 which use the patch provided in 
 LUCENE-2237 to shut down the timer thread when a new servlet context listener 
 for the solr.war application is informed the application is about to be 
 unloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: Date Facet duplicate counts

2010-01-26 Thread Chris Hostetter


:  should we be inclusive of the lower or the upper? ... even if we make it 
:  an option, how should it apply to the first and last ranges computed? 
:  do the answers change if facet.date.other includes before and/or after 
:  should the between option be inclusive of both end points as well?

: I guess to be consistent, the 'inclusiveness' should be intrsinically handled 
in
: detection of '[', '{' and/or ']', '}' -- i.e. match the facet.date field with 
the corresponding field 
: in the query - e.g.: q= *:* AND timestamp:[then TO now}date.facet=timestamp .

...that only works if the datefield being faceted on is included in the -- 
which is frequently not the case, particularly on the first request of a 
session, where you want to face on date, buthte user has not yet made any 
attempt to restrict by any of those facets.

: If no such token exists in the query, perhaps the date.facet token parsing 
could process
: an option ala:  date.facet=[timestamp} to explicitly set the edge behaviour, 
or to override a match 
: in the query parser tokenization.
: 
: This way, there's no new explicit option; it would work with existing queries 
(no extra []{}'s = default behaviour); 
: and people could easily add it if they need custom edge behaviour.

I suppose ... but it still doesn't address some of hte outstanding 
questions i pointed out before (handling the first/last range in the block 
... ie: i want inclusive of the lower, exclusive of hte upper, except for 
the last range which should be inclusive of both).  Personally i think 
addign a new option is just as clear as adding markup to the date.facet 
param parsing ... the less we make assumptions about what special 
characters people have in their fieldnames the better.


: Another way to deal with it is to add MILLISECOND logic to the 
DateMathParser. Then the '1ms' adjustment
: at one and/or the other could be done by the caller at query time, leaving 
the stored data intact, and leaving
: the server-side date faceting as it is. In fact, this can be done today using 
SECOND, but can be a problem if:
:  - You're using HOURS or DAYS and don't want to convert to SECONDS each time, 
or
:  - You need granularity to the SECOND

I don't follow you at all ... yes this can be done today, but i don't 
understand what you mean about needing to convert to seconds, or requiring 
second granularity.  

If you don't index with millisecond precision, then no matter what 
precision you index with, this example would let you get ranges including 
the lower bound, but not the upper bound of each range using a 1ms 
fudge ...

facet.date=timestamp
facet.date.start=NOW/DAY-5DAYS-1MILLI
facet.date.end=NOW/DAY+1DAY-1MILLI
facet.date.gap=+1DAY

Brainstorming a bit...

I think the semantics that might make the most sense is to add a 
multivalued facet.date.include param that supports the following 
options:  all, lower, upper, edge, outer
 - all is shorthand for lower,upper,edge,outer and is the default 
   (for back compat)
 - if lower is specified, then all ranges include their lower bound
 - if upper is specified, then all ranges include their upper bound
 - if edge is specified, then the first and last ranges include 
   their edge bounds (ie: lower for the first one, upper for the last 
   one) even if the corrisponding upper/lower option is not 
   specified.
 - the between count is inclusive of each of the start and end 
   bounds iff the first and last range are inclusive of them
 - the before and after ranges are inclusive of their respective 
   bounds if:
- outer is specified ... OR ...
- the first and last ranges don't already include them


so assuming you started with something like...
  facet.date.start=1 facet.date.end=3 facet.date.gap=+1 facet.date.other=all
...your ranges would be...
  [1 TO 2], [2 TO 3] and [* TO 1], [1 TO 3], [3 TO *]

w/ facet.date.include=lower ...
  [1 TO 2}, [2 TO 3} and [* TO 1}, [1 TO 3}, [3 TO *]

w/ facet.date.include=upper ...
  {1 TO 2], {2 TO 3] and [* TO 1], {1 TO 3], {3 TO *]

w/ facet.date.include=lowerfacet.date.include=edge ...
  [1 TO 2}, [2 TO 3] and [* TO 1}, [1 TO 3], {3 TO *]

w/ facet.date.include=upperfacet.date.include=edge ...
  [1 TO 2], {2 TO 3] and [* TO 1}, [1 TO 3], {3 TO *]

w/ facet.date.include=upperfacet.date.include=outer ...
  {1 TO 2], {2 TO 3] and [* TO 1], {1 TO 3], [3 TO *]

...etc.


what do you think?



-Hoss

[jira] Commented: (SOLR-1728) ResponseWriters should support byte[], ByteBuffer

2010-01-26 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805291#action_12805291
 ] 

Hoss Man commented on SOLR-1728:


Noble: your issue description is a bit terse, so i'm a little confused.

Are you suggesting an API change such that binary write methods are added to 
QueryResponseWriter (making it equivalent to BinaryQueryResponseWriter) ?  

Or are you suggesting that the existing classes which implement 
QueryResponseWriter ( JSONResponseWriter, PHPResponseWriter, 
PythonResponseWriter, XMLResponseWriter,  etc...) should start implementing 
BinaryQueryResponseWriter?

In either case: what's the motivation?

 ResponseWriters should support byte[], ByteBuffer
 -

 Key: SOLR-1728
 URL: https://issues.apache.org/jira/browse/SOLR-1728
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5


 Only BinaryResponseWriter supports byte[] and ByteBuffer. Other writers also 
 should support these

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1553) extended dismax query parser

2010-01-26 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805303#action_12805303
 ] 

Peter Wolanin commented on SOLR-1553:
-

some commented out debug code left in the committed parser?

{code}
protected void addClause(List clauses, int conj, int mods, Query q) {
//System.out.println(addClause:clauses=+clauses+ conj=+conj+ mods=+mods+ 
q=+q);
  super.addClause(clauses, conj, mods, q);
}
{code}

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Fix For: 1.5

 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-01-26 Thread Hoss Man (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805313#action_12805313
]

Hoss Man commented on SOLR-1729:

bq. (e.g. they are in a different time-zone, not time-synced etc.).

time-zones should be irrelevant since all calculations are done in UTC ... lack
of time-sync is a legitimate concern, but the more serious problem is
distributed requests and network lag. Even if all of the boxes have
synchronized clocks, they might not all get queried at the exact same time, and
multiple requets might be made to a single server for different phrases of the
distributed request that expect to get the same answers.

It should be noted that while adding support to date faceting for this type of
when is now? is certainly _necessary_ to make distributed date faceting work
sanely, it is not _sufficient_ ... unless filter queries that use date math
also respect it the counts returned from date faceting will still potentially
be non-sensical.

Date Facet now override time parameter
--

Key: SOLR-1729
URL: https://issues.apache.org/jira/browse/SOLR-1729
Project: Solr
Issue Type: Improvement
Components: search
Affects Versions: 1.4
Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
Attachments: FacetParams.java, SimpleFacets.java

This PATCH introduces a new query parameter that tells a (typically, but not
necessarily) remote server what time to use as 'NOW' when calculating date
facets for a query (and, for the moment, date facets *only*) - overriding the
default behaviour of using the local server's current time.
This gets 'round a problem whereby an explicit time range is specified in a
query (e.g. timestamp:[then0 TO then1]), and date facets are required for the
given time range (in fact, any explicit time range).
Because DateMathParser performs all its calculations from 'NOW', remote
callers have to work out how long ago 'then0' and 'then1' are from 'now', and
use the relative-to-now values in the facet.date.xxx parameters. If a remote
server has a different opinion of NOW compared to the caller, the results
will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
This becomes particularly salient when performing distributed date faceting
(see SOLR-1709), where multiple shards may all be running with different
times, and the faceting needs to be aligned.
The new parameter is called 'facet.date.now', and takes as a parameter a
(stringified) long that is the number of milliseconds from the epoch (1 Jan
1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call.
This was chosen over a formatted date to delineate it from a 'searchable'
time and to avoid superfluous date parsing. This makes the value generally a
programatically-set value, but as that is where the use-case is for this type
of parameter, this should be ok.
NOTE: This parameter affects date facet timing only. If there are other areas
of a query that rely on 'NOW', these will not interpret this value. This is a
broader issue about setting a 'query-global' NOW that all parts of query
analysis can share.
Source files affected:
FacetParams.java (holds the new constant FACET_DATE_NOW)
SimpleFacets.java getFacetDateCounts() NOW parameter modified
This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as
it's a general change for date faceting, it was deemed deserving of its own
patch. I will be updating SOLR-1709 in due course to include the use of this
new parameter, after some rfc acceptance.
A possible enhancement to this is to detect facet.date fields, look for and
match these fields in queries (if they exist), and potentially determine
automatically the required time skew, if any. There are a whole host of
reasons why this could be problematic to implement, so an explicit
facet.date.now parameter is the safest route.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [PMX:FAKE_SENDER] Re: large OR-boolean query

2010-01-26 Thread Chris Hostetter


: for reasons forthcoming.  The QParser then  just returns a
: ConstantScoreQuery wrapped around a Filter subclass that I wrote which uses
: these Terms.  The Filter subclass does most of the work.

If there is a lot of overlap in the Terms that are used from query to 
query, you might find it more efficient to construct individual 
TermFilters for each term, and utilize the filterCache to reuse them from 
request to request -- then your plugin (it would probably have to be 
a SearchComponent instead of a QParser) would only need to find the union 
of the individual DocSets

: Correct me if I'm wrong, but it seemed important to have my input terms in
: natural order of a TreeSet in order to take advantage of the seek() approach
: to TermDocs (presuming it is sort of like a database cursor?).

(I believe) You are correct .. seek can only move ahead.

: In any event, we're getting rougly 2-3 second query times, with an
: additional 1-2 seconds parsing input from the request.  so our local client
: app sees about a 6-8 second roundtrip on it's queries, with faceting turned
: on. For such a large query: not bad!

Unless the individual terms tend to be extermely unique, or your are 
opening a new search extremeley frequently, i would suggest you try the 
filterCache and DocSet union based approach

   DocSet main = BitDocSet();
   for (Term t : myTerms) {
 main = searcher.getDocSet(new TermQuery(t)).union(main);
   }

-Hoss

[jira] Commented: (SOLR-1728) ResponseWriters should support byte[], ByteBuffer

2010-01-26 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805357#action_12805357
 ] 

Noble Paul commented on SOLR-1728:
--

Everything works now in non-distributed search because , the BinaryField takes 
care of writing out the data as strings. In distributed search  ,when the 
writers have to emit SolrDocument and if it contains byte[],  XML, JSON and 
other response writers would do a toString() on the byte[]. 



 ResponseWriters should support byte[], ByteBuffer
 -

 Key: SOLR-1728
 URL: https://issues.apache.org/jira/browse/SOLR-1728
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5


 Only BinaryResponseWriter supports byte[] and ByteBuffer. Other writers also 
 should support these

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1736) In the slave , If 'mov'ing file does not succeed , copy the file

2010-01-26 Thread Noble Paul (JIRA)

In the slave , If 'mov'ing file does not succeed , copy the file


 Key: SOLR-1736
 URL: https://issues.apache.org/jira/browse/SOLR-1736
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5


A user has reported  instances where the file#renameTo failing and replication 
fails. if renameTo does not succeed try doing a manual copy.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1737) Add a FieldStreamDataSource

2010-01-26 Thread Noble Paul (JIRA)

Add a FieldStreamDataSource
---

 Key: SOLR-1737
 URL: https://issues.apache.org/jira/browse/SOLR-1737
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5


TikaEntityProcessor needs a DataSource which returns a Stream instead of a 
Reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

37 matches

Mail list logo