Re: frange not working in query

2011-08-11 Thread Amit Sawhney
The default sort is on relevance. I want to give an option to users to sort the 
results by date (latest on top). 
This works fine for queries which have few results (upto 100). However, it 
brings inaccurate results as soon as the figure reaches 1000s.
I am trying to limit the sorting to top few results only. Hoping through frange 
I will be able to define the lower limit of relevance score and get better 
results on date sort.

Is there any other way to do this?

Hope its clear.
- Amit

On 10-Aug-2011, at 7:52 PM, simon wrote:

 I meant the frange query, of course
 
 On Wed, Aug 10, 2011 at 10:21 AM, simon mtnes...@gmail.com wrote:
 Could you tell us what you're trying to achieve with the range query ?
 It's not clear.
 
 -Simon
 
 On Wed, Aug 10, 2011 at 5:57 AM, Amit Sawhney sawhney.a...@gmail.com wrote:
 Hi All,
 
 I am trying to sort the results on a unix timestamp using this query.
 
 http://url.com:8983/solr/db/select/?indent=onversion=2.1q={!frange%20l=0.25}query($qq)qq=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1
 
 When I run this query, it says 'no field name specified in query and no 
 defaultSearchField defined in schema.xml'
 
 As soon as I remove the frange query and run this, it starts working fine.
 
 http://url.com:8983/solr/db/select/?indent=onversion=2.1q=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1
 
 Any pointers?
 
 
 Thanks,
 Amit
 



Re: Solr 3.3 crashes after ~18 hours?

2011-08-11 Thread Bernd Fehling

Hi, googling hotspot server 19.1-b02 shows that you are not alone
with hanging threads and crashes. And not only with solr.
Maybe try another JAVA?

Bernd



Am 10.08.2011 17:00, schrieb alexander sulz:

Okay, with this command it hangs.
Also: I managed to get a Thread Dump (attached).

regards

Am 05.08.2011 15:08, schrieb Yonik Seeley:

On Fri, Aug 5, 2011 at 7:33 AM, alexander sulza.s...@digiconcept.net wrote:

Usually you get a XML-Response when doing commits or optimize, in this case
I get nothing
in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T
load forever or anything.
It doesn't hang! I just get a blank page / empty response.

Sounds like you are doing it from a browser?
Can you try it from the command line? It should give back some sort
of response (or hang waiting for a response).

curl http://localhost:8983/solr/update?commit=true;

-Yonik
http://www.lucidimagination.com



I use the stuff in the example folder, the only changes i made was enable
logging and changing the port to 8985.
I'll try getting a thread dump if it happens again!
So far its looking good with having allocated more memory to it.

Am 04.08.2011 16:08, schrieb Yonik Seeley:

On Thu, Aug 4, 2011 at 8:09 AM, alexander sulza.s...@digiconcept.net
wrote:

Thank you for the many replies!

Like I said, I couldn't find anything in logs created by solr.
I just had a look at the /var/logs/messages and there wasn't anything
either.

What I mean by crash is that the process is still there and http GET
pings
would return 200
but when i try visiting /solr/admin, I'd get a blank page! The server
ignores any incoming updates or commits,

ignores means what? The request hangs? If so, could you get a thread
dump?

Do queries work (like /solr/select?q=*:*) ?


thous throwing no errors, no 503's.. It's like the server has a blackout
and
stares blankly into space.

Are you using a different servlet container than what is shipped with
solr?
If you did start with the solr example server, what jetty
configuration changes have you made?

-Yonik
http://www.lucidimagination.com






--
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: How to start troubleshooting a content extraction issue

2011-08-11 Thread Jayendra Patil
You can test the standalone content extraction with the tika-app.jar -

Command to output in text format -
java -jar tika-app-0.8.jar --text file_path

For more options java -jar tika-app-0.8.jar --help

Use the correct tika-app version jar matching the Solr build.

Regards,
Jayendra

On Wed, Aug 10, 2011 at 1:53 PM, Tim AtLee timat...@gmail.com wrote:
 Hello

 So, I'm a newbie to Solr and Tika and whatnot, so please use simple words
 for me :P

 I am running Solr on Tomcat 7 on Windows Server 2008 r2, running as the
 search engine for a Drupal web site.

 Up until recently, everything has been fine - searching works, faceting
 works, etc.

 Recently a user uploaded a 5mb xltm file, which seems to be causing Tomcat
 to spike in CPU usage, and eventually error out.  When the documents are
 submitted to be index, the tomcat process spikes up to use 100% of 1
 available CPU, with the eventual error in Drupal of Exception occured
 sending *sites/default/files/nodefiles/533/June 30, 2011.xltm* to Solr 0
 Status: Communication Error.

 I am looking for some help in figuring out where to troubleshoot this.  I
 assume it's this file, but I guess I'd like to be sure - so how can I submit
 this file for content extraction manually to see what happens?

 Thanks,

 Tim



Need help indexing/querying a particular type of hierarchy

2011-08-11 Thread Michael B. Klein
Hi all,

I have a particular data structure I'm trying to index into a solr document
so that I can query and facet it in a particular way, and I can't quite
figure out the best way to go about it.

One sample object is here: https://gist.github.com/1139065

The part that's tripping me up is the workflows. Each workflow has a name
(in this case, digitizationWF and accessionWF). Each workflow is made up of
a number of processes, each of which has its own current status. Every time
the status of a process within a workflow changes, the object is reindexed.

What I'd like to be able to do is present several hierarchies of facets: In
one, the workflow name is the top-level facet, with the second level showing
each process, under which is listed each status (completed, waiting, or
error) and the number of documents with that status for that process (some
values omitted for brevity):

accessionWF (583)
  publish (583)
completed (574)
waiting (6)
error (3)
  shelve (583)
completed (583)

etc.

I'd also like to be able to invert that presentation:

accessionWF (583)
  completed (583)
publish (574)
shelve (583)
  waiting (6)
publish (6)
  error (3)
publish (3)

or even

completed (583)
  accessionWF (583)
publish (574)
shelve (583)
  digitizationWF (583)
initiate (583)
error (3)
  accessionWF (3)
shelve (3)

etc.

I don't think Solr 4.0's pivot/hierarchical facets are what I'm looking for,
because the status values are ambiguous when not qualified by the process
name -- the object itself has no completed status, only a
publish:completed and a shelve:completed that I want to be able to group
together into a count/list of objects with completed processes. I also
don't think PathHierarchyTokenizerFactory is quite the answer either.

What kind of Solr magic, if any, am I looking for here?

Thanks in advance for any help or advice.
Michael

---
Michael B. Klein
Digitization Workflow Engineer
Stanford University Libraries


Re: strip html from data

2011-08-11 Thread Merlin Morgenstern
I am sorry, but I do not really understand the difference of indexed and
returned result set.

I look on the returned dataset via this command:
solr/select/?q=id:533563terms=true

which gives me html tags like this ones: /bbr /

I also tried to turn on TermsComponent, but it did not change anything:
solr/select/?q=id:533563terms=true

The shema browser does not show any html tags inside the text field, just
indexed words of the one dataset.

Is there a way to strip the html tags completly and not index them? If not,
how to I retrieve the results without html tags?

Thank you for your help.



2011/8/9 Erick Erickson erickerick...@gmail.com

 OK, what does not working mean? You never answered Markus' question:

 Are you looking at the returned result set or what you've actually
 indexed?
 Analyzers are not run on the stored data, only on indexed data.

 If not working means that your returned results contain the markup, then
 you're confusing indexing and storing. All the analysis chains operate
 on data sent into the indexing process. But the verbatim data is *stored*
 prior to (or separate from) indexing.

 So my assumption is that you see data returned in the document with
 markup, which is just as it should be, and there's no problem at all. And
 your
 actual indexed terms (try looking at the data with TermsComponent, or
 admin/schema browser) will NOT have any markup.

 Perhaps you can back up a bit and describe what's failing .vs. what you
 expect.

 Best
 Erick

 On Mon, Aug 8, 2011 at 6:50 AM, Merlin Morgenstern
 merlin.morgenst...@googlemail.com wrote:
  Unfortunatelly I still cant get it running. The code I am using is the
  following:
 analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 
  I also tried this one:
 
 types
  fieldType name=text class=solr.TextField
  positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer
 charFilter class=solr.HTMLStripCharFilterFactory/
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StandardFilterFactory/
 /analyzer
  /fieldType
 /types
   field name=text type=text indexed=true stored=true
  required=false/
 
  none of those worked. I restartred solr after the shema update and
 reindexed
  the data. No change, the html tags are still in there.
 
  Any other ideas? Maybe this is a bug in solr? I am using solr 3.3.0 on
 suse
  linux.
 
  Thank you for any help on this.
 
 
 
  2011/7/25 Mike Sokolov soko...@ifactory.com
 
  Hmm that looks like it's working fine.  I stand corrected.
 
 
 
  On 07/25/2011 12:24 PM, Markus Jelsma wrote:
 
  I've seen that issue too and read comments on the list yet i've never
 had
  trouble with the order, don't know what's going on. Check this
 analyzer,
  i've
  moved the charFilter to the bottom:
 
  analyzer type=index
  tokenizer class=solr.**WhitespaceTokenizerFactory/
  filter class=solr.**WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1 catenateNumbers=1
  catenateAll=0
  splitOnCaseChange=1/
  filter class=solr.**LowerCaseFilterFactory/
  filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=false expand=true/
  filter class=solr.StopFilterFactory ignoreCase=false
  words=stopwords.txt/
  filter class=solr.**ASCIIFoldingFilterFactory/
  filter class=solr.**SnowballPorterFilterFactory
  protected=protwords.txt
  language=Dutch/
  filter class=solr.**RemoveDuplicatesTokenFilterFac**tory/
  charFilter class=solr.**HTMLStripCharFilterFactory/
  /analyzer
 
  The analysis chain still does its job as i expect for the input:
  spanbla bla/span
 
  Index Analyzer
  org.apache.solr.analysis.**HTMLStripCharFilterFactory
  {luceneMatchVersion=LUCENE_34}
  textbla bla
  org.apache.solr.analysis.**WhitespaceTokenizerFactory
  

Re: LockObtainFailedException

2011-08-11 Thread Peter Sturge
Hi,

When you get this exception with no other error or explananation in
the logs, this is almost always because the JVM has run out of memory.
Have you checked/profiled your mem usage/GC during the stream operation?



On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta nkgiit...@gmail.com wrote:
 Hi,

 We are doing streaming update to solr for multiple user,

 We are getting


 Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log

 SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
 out: NativeFSLock@/var/lib/solr/data/index/write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
        at
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
        at
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
        at
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
        at
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
        at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
        at
 org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
        at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
        at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
        at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
        at org.apache.tomcat.util.net.JIoEndpoint

 Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
 out: NativeFSLock@/var/lib/solr/data/index/write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
        at
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
        at
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
        at
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
        at
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
        at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
        at
 org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
        at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at
 

Re: strip html from data

2011-08-11 Thread Ahmet Arslan
 Is there a way to strip the html tags completly and not
 index them? If not,
 how to I retrieve the results without html tags?

How do you push documents to solr? You need to strip html tags before the 
analysis chain. For example, if you are using Data Import Handler, you can use 
HTMLStripTransformer.

 http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer  


Re: how to change default response fromat as json in solr configuration?

2011-08-11 Thread Erik Hatcher
You can set default=true in solrconfig on the JSON response writer, like this:

  queryResponseWriter name=json 
  default=true
  class=solr.JSONResponseWriter /

Or you can add str name=wtjson/str to any request handler definitions.

Erik

On Aug 11, 2011, at 07:36 , nagarjuna wrote:

 Hi everybody,
 
 when ever i enter search term in solr i am able to getting response in XML
 format(default),i can change that response by adding wt=json to the url
 .but instead of that i need to change the default format from XML to
 JSON..how can i do that.please help me
 
 
 Thanks in advance...
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-change-default-response-fromat-as-json-in-solr-configuration-tp3245629p3245629.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: LockObtainFailedException

2011-08-11 Thread Naveen Gupta
Yes this was happening because of JVM heap size

But the real issue is that if our index size is growing (very high)

then indexing time is taking very long (using streaming)

earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it
was taking 3 mins 20 secs time,

after deleting the index data, it is taking 9 secs

What would be approach to have better indexing performance as well as index
size should also at the same time.

The index size was around 4.5 GB

Thanks
Naveen

On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge peter.stu...@gmail.comwrote:

 Hi,

 When you get this exception with no other error or explananation in
 the logs, this is almost always because the JVM has run out of memory.
 Have you checked/profiled your mem usage/GC during the stream operation?



 On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta nkgiit...@gmail.com wrote:
  Hi,
 
  We are doing streaming update to solr for multiple user,
 
  We are getting
 
 
  Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log
 
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: NativeFSLock@/var/lib/solr/data/index/write.lock
 at org.apache.lucene.store.Lock.obtain(Lock.java:84)
 at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
 at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
 at
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
 at
 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
 at
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
 at
 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
 at
  org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at org.apache.tomcat.util.net.JIoEndpoint
 
  Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: NativeFSLock@/var/lib/solr/data/index/write.lock
 at org.apache.lucene.store.Lock.obtain(Lock.java:84)
 at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
 at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
 at
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
 at
 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
 at
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
 at
 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
 at
  org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 
 

RE: Building a facet query in SolrJ

2011-08-11 Thread Simon, Richard T
Thanks! I actually found a page on line that explained this.

-Rich

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, August 10, 2011 4:01 PM
To: solr-user@lucene.apache.org
Cc: Simon, Richard T
Subject: RE: Building a facet query in SolrJ


: query.addFacetQuery(MyField + : + \ + uri + \);
...
: But when I examine queryResponse.getFacetFields, it's an empty list, if 

facet.query constraints+counts do not come back in the facet.field 
section of hte response.  they come back in the facet.query section of 
the response (look at the XML in your browser and you'll see what i 
mean)...

https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/response/QueryResponse.html#getFacetQuery%28%29


-Hoss


RE: Hudson build issues

2011-08-11 Thread Steven A Rowe
Hi arian487,

You apparently are not using the official Ant build?  (Maven is officially 
unsupported.)

The scripts used by the Lucene and Solr Jenkins builds at the ASF are available 
here:

http://svn.apache.org/repos/asf/lucene/dev/nightly/

The ASF Jenkins jobs checkout the above directory in addition to the 
Lucene/Solr branch/trunk to be tested, and then invoke the appropriate script 
from the above directory.

There are Maven build scripts there - the artifact you're looking for is 
installed in the local repository by calling the equivalent of:

mvn -N -Pbootstrap install

When the Maven jobs run under ASF Jenkins, the results are published nightly.  
More details here: 

http://wiki.apache.org/solr/NightlyBuilds

Steve

 -Original Message-
 From: arian487 [mailto:akarb...@tagged.com]
 Sent: Wednesday, August 10, 2011 9:54 PM
 To: solr-user@lucene.apache.org
 Subject: Hudson build issues
 
 Whenever I try to build this on our hudson server it says it can't find
 org.apache.lucene:lucene-xercesImpl:jar:4.0-SNAPSHOT.  Is the Apache repo
 lacking this artifact?
 
 --
 View this message in context: http://lucene.472066.n3.nabble.com/Hudson-
 build-issues-tp3244563p3244563.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: LockObtainFailedException

2011-08-11 Thread Peter Sturge
Optimizing indexing time is a very different question.
I'm guessing your 3mins+ time you refer to is the commit time.

There are a whole host of things to take into account regarding
indexing, like: number of segments, schema, how many fields, storing
fields, omitting norms, caching, autowarming, search activity etc. -
the list goes on...
The trouble is, you can look at 100 different Solr installations with
slow indexing, and find 200 different reasons why each is slow.

The best place to start is to get a full understanding of precisely
how your data is being stored in the index, starting with adding docs,
going through your schema, Lucene segments, solrconfig.xml etc,
looking at caches, commit triggers etc. - really getting to know how
each step is affecting performance.
Once you really have a handle on all the indexing steps, you'll be
able to spot the bottlenecks that relate to your particular
environment.

An index of 4.5GB isn't that big (but the number of documents tends to
have more of an effect than the physical size), so the bottleneck(s)
should be findable once you trace through the indexing operations.



On Thu, Aug 11, 2011 at 1:02 PM, Naveen Gupta nkgiit...@gmail.com wrote:
 Yes this was happening because of JVM heap size

 But the real issue is that if our index size is growing (very high)

 then indexing time is taking very long (using streaming)

 earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it
 was taking 3 mins 20 secs time,

 after deleting the index data, it is taking 9 secs

 What would be approach to have better indexing performance as well as index
 size should also at the same time.

 The index size was around 4.5 GB

 Thanks
 Naveen

 On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge peter.stu...@gmail.comwrote:

 Hi,

 When you get this exception with no other error or explananation in
 the logs, this is almost always because the JVM has run out of memory.
 Have you checked/profiled your mem usage/GC during the stream operation?



 On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta nkgiit...@gmail.com wrote:
  Hi,
 
  We are doing streaming update to solr for multiple user,
 
  We are getting
 
 
  Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log
 
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: NativeFSLock@/var/lib/solr/data/index/write.lock
         at org.apache.lucene.store.Lock.obtain(Lock.java:84)
         at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
         at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
         at
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
         at
 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
         at
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
         at
 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
         at
  org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
         at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
         at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
         at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
         at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
         at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
         at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
         at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
         at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
         at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
         at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
         at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
         at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
         at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
         at
 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
         at
 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
         at org.apache.tomcat.util.net.JIoEndpoint
 
  Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: NativeFSLock@/var/lib/solr/data/index/write.lock
         at org.apache.lucene.store.Lock.obtain(Lock.java:84)
         at
 

Re: frange not working in query

2011-08-11 Thread Yonik Seeley
On Wed, Aug 10, 2011 at 5:57 AM, Amit Sawhney sawhney.a...@gmail.com wrote:
 Hi All,

 I am trying to sort the results on a unix timestamp using this query.

 http://url.com:8983/solr/db/select/?indent=onversion=2.1q={!frange%20l=0.25}query($qq)qq=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1

 When I run this query, it says 'no field name specified in query and no 
 defaultSearchField defined in schema.xml'


The default query type for embedded queries is lucene.   so your
qq=nokia is equivalent to qq={!lucene}nokia

So one way is to explicitly make it dismax:
   qq={!dismax}nokia
Another way is to declare the sub-query to be of type dismax:
  q={!frange l=0.25}query({!dismax v=$qq})qq=nokia

-Yonik
http://www.lucidimagination.com


 As soon as I remove the frange query and run this, it starts working fine.

 http://url.com:8983/solr/db/select/?indent=onversion=2.1q=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1

 Any pointers?


 Thanks,
 Amit


Re: Need help indexing/querying a particular type of hierarchy

2011-08-11 Thread Dmitry Kan
Hi,

Can you keep your hierarchy flat in SOLR and then use filter queries
(fq=wf:accessionWF) inside you facet queries (facet.field=status)?

Or is the requirement to have one single facet query producing the
hierarchical facet counts?

On Thu, Aug 11, 2011 at 10:43 AM, Michael B. Klein mbkl...@gmail.comwrote:

 Hi all,

 I have a particular data structure I'm trying to index into a solr document
 so that I can query and facet it in a particular way, and I can't quite
 figure out the best way to go about it.

 One sample object is here: https://gist.github.com/1139065

 The part that's tripping me up is the workflows. Each workflow has a name
 (in this case, digitizationWF and accessionWF). Each workflow is made up of
 a number of processes, each of which has its own current status. Every time
 the status of a process within a workflow changes, the object is reindexed.

 What I'd like to be able to do is present several hierarchies of facets: In
 one, the workflow name is the top-level facet, with the second level
 showing
 each process, under which is listed each status (completed, waiting, or
 error) and the number of documents with that status for that process (some
 values omitted for brevity):

 accessionWF (583)
  publish (583)
completed (574)
waiting (6)
error (3)
  shelve (583)
completed (583)

 etc.

 I'd also like to be able to invert that presentation:

 accessionWF (583)
  completed (583)
publish (574)
shelve (583)
  waiting (6)
publish (6)
  error (3)
publish (3)

 or even

 completed (583)
  accessionWF (583)
publish (574)
shelve (583)
  digitizationWF (583)
initiate (583)
 error (3)
  accessionWF (3)
shelve (3)

 etc.

 I don't think Solr 4.0's pivot/hierarchical facets are what I'm looking
 for,
 because the status values are ambiguous when not qualified by the process
 name -- the object itself has no completed status, only a
 publish:completed and a shelve:completed that I want to be able to
 group
 together into a count/list of objects with completed processes. I also
 don't think PathHierarchyTokenizerFactory is quite the answer either.

 What kind of Solr magic, if any, am I looking for here?

 Thanks in advance for any help or advice.
 Michael

 ---
 Michael B. Klein
 Digitization Workflow Engineer
 Stanford University Libraries




-- 
Regards,

Dmitry Kan


Re: Solr 3.3 crashes after ~18 hours?

2011-08-11 Thread Stephen Duncan Jr
I know it seems like my problem may not be the same as the original
poster, but in investigating this, I did find this Jetty issue that
may be related: http://jira.codehaus.org/browse/JETTY-1377

Stephen Duncan Jr
www.stephenduncanjr.com



On Thu, Aug 4, 2011 at 1:54 PM, Stephen Duncan Jr
stephen.dun...@gmail.com wrote:
 On Thu, Aug 4, 2011 at 10:08 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:

 ignores means what?  The request hangs?  If so, could you get a thread 
 dump?

 Do queries work (like /solr/select?q=*:*) ?

 thous throwing no errors, no 503's.. It's like the server has a blackout and
 stares blankly into space.

 Are you using a different servlet container than what is shipped with solr?
 If you did start with the solr example server, what jetty
 configuration changes have you made?

 -Yonik
 http://www.lucidimagination.com


 We're seeing something similar here.  Not sure exactly what the
 circumstances are, but occasionally our Solr 3.3 test instance is
 hanging, nothing seems to be happening for several minutes.  It does
 seem to be happening while data is being added and continuous queries
 are being sent.  It also may be related to an optimize happening (we
 attempt to optimize after adding all the new data from our database).
 The last log message is:

 2011-08-04 13:46:56,418 [qtp30604342-451] INFO
 org.apache.solr.core.SolrCore - [report] webapp= path=/update
 params={optimize=truewaitSearcher=truemaxSegments=1waitFlush=truewt=javabinversion=2}
 status=0 QTime=109109

 Here is our thread dump:


 2011-08-04 13:47:16
 Full thread dump Java HotSpot(TM) Client VM (20.1-b02 mixed mode):

 RMI TCP Connection(13)-172.16.10.102 daemon prio=6 tid=0x47a4a400
 nid=0x1384 runnable [0x4861f000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        - locked 0x183a55a0 (a java.io.BufferedInputStream)
        at java.io.FilterInputStream.read(FilterInputStream.java:66)
        at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:517)
        at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
        at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
        - 0x183a7c68 (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

 qtp30604342-451 prio=6 tid=0x475c4800 nid=0x1a58 waiting on
 condition [0x4897f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  0x18214c08 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
        at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
        at 
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:320)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:512)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:38)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:558)
        at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
        - None

 qtp30604342-450 prio=6 tid=0x47ad1c00 nid=0x1ca4 waiting on
 condition [0x49d2f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  0x18214c08 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
        at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
        at 
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:320)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:512)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:38)
        at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:558)
        at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
        - None

 qtp30604342-449 prio=6 tid=0x47a57c00 nid=0xb2c waiting on condition
 [0x49c2f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - 

NRT in Master- Slave setup, crazy?

2011-08-11 Thread eks dev
Thinking aloud and grateful for sparing ..

I need to support high commit rate (low update latency) in a master
slave setup and I have a bad feelings about it, even with disabling
warmup and stripping everything down that slows down refresh.

I will try it anyway, but I started thinking about backup plan, like
NRT on slaves.

An idea is to have Master working on disk, doing commits in throughput
 friendly manner (e.g. every 5-10 minutes), but let slaves do the same
updates with softCommit

I am basically going to let slaves possibly run out of sync with
master, by issuing the same updates on all slaves with softCommit ...
every now and than syncing with Master.

Could this work? the trick is, index is big (can fit in Ca. 16-20G
Ram), but update rate is small and ugly distributed in time (every
couple of seconds a few documents), one hard commit on master + slave
update would probably cost much more than add(document) with
softCommit on every slave (2-5 of them)

So all in all, master remains real master and is there to ensure:
 a ) seeding if slave restarts
 b) authoritative index master, if slaves run out of sync (small diff
is ok if they get corrected once a day)

In general, do you find such idea wrong for some reason, should I be
doing something else/better to achieve low update latency in master
slave (for low update throughput)?

Anything I can do to make standard master slave latency better apart
from disabling warmup? Would loading os ramdisk (tmpfs forced in ram)
on slaves bring much.

I am talking about Ca. 1 second (plus/minus) update latency target
from update to search on slave... But not more than 0.5 - 2 updates
every second.  And what I so far understood how solr works, this is
going to be possible only with NRT on slaves (Analysis in my case is
fast, so not an issue)...


SolR : Spellchecking Autocomplete

2011-08-11 Thread vsham
Hello,

I posted on the Lucene Forums, and someone told me to e-mail it here.

Instead of writing again my question here, I take the liberty to link my post. 
Its about SolR, autocompletion, Spellchecking and case-sentivieness (?).

http://lucene.472066.n3.nabble.com/SolR-Spellchecking-amp-Autocomplete-td3243107.html

Thanks for all,

Valentin


Re: Solr 3.3: DIH configuration for Oracle

2011-08-11 Thread Shawn Heisey

On 8/10/2011 2:52 PM, Eugeny Balakhonov wrote:

java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
declared primary key pk='T1_ID_RECORD, T2_ID_RECORD'

I have analyzed the source code of DIH. I found that in the DocBuilder class
collectDelta() method works with value of entity attribute pk as with
simple string. But in my case this is array with two values: T1_ID_RECORD,
T2_ID_RECORD


Whatever you declare as the DIH primary key must exist as a field name 
in the result set, or Solr will complain.  I had a perfectly working 
config in 1.4.1, with identical text in query and deltaImportQuery.  It 
didn't work when I tried to upgrade to 3.1.  The problem was that I was 
using a deltaQuery that just returned MAX(did), to tell Solr that 
something needed to be done.  I had to add AS did to the deltaQuery so 
that it matched my primary key.  I am controlling the delta-import from 
outside Solr, so I do not need to use the result set from deltaQuery.


The point is to pick something that will exist in all of your result 
sets.  You might need to include an AS xxx (with something you choose 
for xxx) in your queries and use the xxx value as your pk.  Because you 
have only provided a simple example, I can't really tell you what you 
should use.


The pk value is only used to coordinate your queries.  It only has 
meaning in the DIH, not the Solr index.  Uniqueness in the Solr index is 
controlled by the uniqueKey value in schema.xml.  In my case, pk and 
uniqueKey are not the same field.


Side note: I'm not much of an expert, so I can't guarantee I can help 
further.  I will give it a try, though.


Thanks,
Shawn



copyfields in schema.xml

2011-08-11 Thread Rode González
Hi all.

 

if in schema.xml we put something like:

 

field name=title type=string indexed=false stored=false
multiValued=true/

field name=titulo type=string indexed=true stored=true
multiValued=true/

field name=text type=text_general indexed=true stored=false
multiValued=true/

 

copyField source=title dest=titulo/

copyField source=titulo dest=text/

 

Can I expect that in 'text' field I have the 'title' and the 'titulo'
contents ?

 

thanks ;)

 

Note: in our app, the titles refer to books that can be named in several
different ways .

 

---

Rode González

 

  _  

No se encontraron virus en este mensaje.
Comprobado por AVG - www.avg.com
Versión: 10.0.1392 / Base de datos de virus: 1520/3826 - Fecha de
publicación: 08/10/11



RE: copyfields in schema.xml

2011-08-11 Thread Michael Ryan
Nope. The 'text' field will just have the 'titulo' contents. To have both, you 
would have to do something like this:

copyField source=title dest=titulo/
copyField source=title dest=text/
copyField source=titulo dest=text/

-Michael


RE: Hudson build issues

2011-08-11 Thread arian487
I downloaded the official build (4.0) and I've been customizing it for my
needs.  I'm not really sure how to use these scripts.  Is there somewhere in
Hudson where I can apply these scripts or something?  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hudson-build-issues-tp3244563p3246645.html
Sent from the Solr - User mailing list archive at Nabble.com.


need some guidance about how to configure a specific solr solution.

2011-08-11 Thread Roman, Pablo
Hi There,

I am IT and  work on a project based on Liferary 605 with solr-3.2 like the 
indexer/search engine.

I presently have only one server that is indexing and searching but reading the 
Liferay Support suggestions they point to the need of having:
- 2 to n SOLR read-server for searching from any member of the liferay cluster
- 1 SOLR write-server where all liferay cluster members write.

However, going down to detail to implement that on the liferay side I think I 
know how to do that which is inserting into the plugin for Solr this entries

 solr-spring.xml in the WEB-INF/classes/META-INF folder. Open this file in a 
text editor and you will see that there are two entries which define where the 
Solr server can be found by Liferay:

bean id=indexSearcher 
class=com.liferay.portal.search.solr.SolrIndexSearcherImpl property 
name=serverURL value=http://localhost:8080/solr/select; / /bean bean 
id=indexWriter class=com.liferay.portal.search.solr.SolrIndexWriterImpl 
property name=serverURL value=http://localhost:8080/solr/update; / /bean

However, I don't know how to replicate the writer solr server content into the 
readers. Please can you provide advice about that?

Thanks,
Pablo

This e-mail may contain confidential and/or privileged information for the sole 
use of the intended recipient. 
Any review or distribution by anyone other than the person for whom it was 
originally intended is strictly prohibited. 
If you have received this e-mail in error, please contact the sender and delete 
all copies. 
Opinions, conclusions or other information contained in this e-mail may not be 
that of the organization.


Searching For Term 'OR'

2011-08-11 Thread John Brewer
Hello,

  I am looking for some advice on how to index and search a field that contains 
a two character state name without the query parser dying on the OR and also 
not treating it as an 'OR' Boolean operator.

  For example:

  The following query with a filter query key/value pair causes an exception:

  q=*:*fq=(state:OR)

Caused by: org.apache.lucene.queryParser.ParseException: Encountered  OR OR 
 at line 1, column 7.
Was expecting one of:
( ...
* ...
QUOTED ...
TERM ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...

  Note: we had the same issue with Indiana (IN), but removing that stop word 
fixed it. Removing the stopword 'or', has not helped.

  The field itself is indexed and stored as string field during indexing.
   field name=state type=string indexed=true stored=true/


Thanks in advance,
John Brewer



Re: Searching For Term 'OR'

2011-08-11 Thread Tomás Fernández Löbbe
I guess this is because Lucene QP is interpreting the 'OR' operator.
You can either:
 use lowercase
 use other query parser, like the term query parser. See
http://lucene.apache.org/solr/api/org/apache/solr/search/TermQParserPlugin.html

Also, if you just removed the or term from the stopwords, you'll probably
have to reindex if you want it in the index.

Regards,

Tomás

On Thu, Aug 11, 2011 at 2:38 PM, John Brewer
john.bre...@atozdatabases.comwrote:

 Hello,

  I am looking for some advice on how to index and search a field that
 contains a two character state name without the query parser dying on the OR
 and also not treating it as an 'OR' Boolean operator.

  For example:

  The following query with a filter query key/value pair causes an
 exception:

  q=*:*fq=(state:OR)

 Caused by: org.apache.lucene.queryParser.ParseException: Encountered  OR
 OR  at line 1, column 7.
 Was expecting one of:
( ...
* ...
QUOTED ...
TERM ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...

  Note: we had the same issue with Indiana (IN), but removing that stop word
 fixed it. Removing the stopword 'or', has not helped.

  The field itself is indexed and stored as string field during indexing.
   field name=state type=string indexed=true stored=true/


 Thanks in advance,
 John Brewer




Re: Searching For Term 'OR'

2011-08-11 Thread John Brewer
Thanks for the feedback. I'll give these a try.

Tomás Fernández Löbbe tomasflo...@gmail.com wrote:

I guess this is because Lucene QP is interpreting the 'OR' operator.
You can either:
 use lowercase
 use other query parser, like the term query parser. See
http://lucene.apache.org/solr/api/org/apache/solr/search/TermQParserPlugin.html

Also, if you just removed the or term from the stopwords, you'll probably
have to reindex if you want it in the index.

Regards,

Tomás

On Thu, Aug 11, 2011 at 2:38 PM, John Brewer
john.bre...@atozdatabases.comwrote:

 Hello,

  I am looking for some advice on how to index and search a field that
 contains a two character state name without the query parser dying on the OR
 and also not treating it as an 'OR' Boolean operator.

  For example:

  The following query with a filter query key/value pair causes an
 exception:

  q=*:*fq=(state:OR)

 Caused by: org.apache.lucene.queryParser.ParseException: Encountered  OR
 OR  at line 1, column 7.
 Was expecting one of:
( ...
* ...
QUOTED ...
TERM ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...

  Note: we had the same issue with Indiana (IN), but removing that stop word
 fixed it. Removing the stopword 'or', has not helped.

  The field itself is indexed and stored as string field during indexing.
   field name=state type=string indexed=true stored=true/


 Thanks in advance,
 John Brewer




RE: Searching For Term 'OR'

2011-08-11 Thread Rode González
hi, 

use the filter LowerCaseFilterFactory (don't work with string type, you must
create a new fieldtype of text type) 

or use scaped forms:

\OR
\AND

I tried it a moment ago and it works.

saludos
---
Rode González


 -Mensaje original-
 De: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Enviado el: jueves, 11 de agosto de 2011 19:58
 Para: solr-user@lucene.apache.org
 Asunto: Re: Searching For Term 'OR'
 
 I guess this is because Lucene QP is interpreting the 'OR' operator.
 You can either:
  use lowercase
  use other query parser, like the term query parser. See
 http://lucene.apache.org/solr/api/org/apache/solr/search/TermQParserPlu
 gin.html
 
 Also, if you just removed the or term from the stopwords, you'll
 probably
 have to reindex if you want it in the index.
 
 Regards,
 
 Tomás
 
 On Thu, Aug 11, 2011 at 2:38 PM, John Brewer
 john.bre...@atozdatabases.comwrote:
 
  Hello,
 
   I am looking for some advice on how to index and search a field that
  contains a two character state name without the query parser dying on
 the OR
  and also not treating it as an 'OR' Boolean operator.
 
   For example:
 
   The following query with a filter query key/value pair causes an
  exception:
 
   q=*:*fq=(state:OR)
 
  Caused by: org.apache.lucene.queryParser.ParseException: Encountered
  OR
  OR  at line 1, column 7.
  Was expecting one of:
 ( ...
 * ...
 QUOTED ...
 TERM ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
 
   Note: we had the same issue with Indiana (IN), but removing that
 stop word
  fixed it. Removing the stopword 'or', has not helped.
 
   The field itself is indexed and stored as string field during
 indexing.
field name=state type=string indexed=true stored=true/
 
 
  Thanks in advance,
  John Brewer
 
 
 
 -
 No se encontraron virus en este mensaje.
 Comprobado por AVG - www.avg.com
 Versión: 10.0.1392 / Base de datos de virus: 1520/3827 - Fecha de
 publicación: 08/11/11

-
No se encontraron virus en este mensaje.
Comprobado por AVG - www.avg.com
Versión: 10.0.1392 / Base de datos de virus: 1520/3827 - Fecha de
publicación: 08/11/11




Re: Searching For Term 'OR'

2011-08-11 Thread Chris Hostetter

:   I am looking for some advice on how to index and search a field that 
: contains a two character state name without the query parser dying on 
: the OR and also not treating it as an 'OR' Boolean operator.

fq={!term f=state}OR

...this kind of filter you don't want a query parser that has any 
metacharacters.


-Hoss


RE: Searching For Term 'OR'

2011-08-11 Thread John Brewer
Thanks for the advice everyone. I am rebuilding the index with a lowercase 
field instead of string.

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, August 11, 2011 1:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Searching For Term 'OR'


:   I am looking for some advice on how to index and search a field that 
: contains a two character state name without the query parser dying on
: the OR and also not treating it as an 'OR' Boolean operator.

fq={!term f=state}OR

...this kind of filter you don't want a query parser that has any 
metacharacters.


-Hoss


Re: strip html from data

2011-08-11 Thread Alexei Martchenko
You can use charFilter class=solr.HTMLStripCharFilterFactory/ like here
in this example. Check the docs about your specific SOLR version because
something has changed in the htmlstrip syntax in 1.4 and 3.x

fieldType name=text class=solr.TextField positionIncrementGap=100
tokenizer class=solr.WhitespaceTokenizerFactory/
charFilter class=solr.HTMLStripCharFilterFactory/
/fieldType

2011/8/11 Merlin Morgenstern merlin.morgenst...@googlemail.com

 I am sorry, but I do not really understand the difference of indexed and
 returned result set.

 I look on the returned dataset via this command:
 solr/select/?q=id:533563terms=true

 which gives me html tags like this ones: /bbr /

 I also tried to turn on TermsComponent, but it did not change anything:
 solr/select/?q=id:533563terms=true

 The shema browser does not show any html tags inside the text field, just
 indexed words of the one dataset.

 Is there a way to strip the html tags completly and not index them? If not,
 how to I retrieve the results without html tags?

 Thank you for your help.



 2011/8/9 Erick Erickson erickerick...@gmail.com

  OK, what does not working mean? You never answered Markus' question:
 
  Are you looking at the returned result set or what you've actually
  indexed?
  Analyzers are not run on the stored data, only on indexed data.
 
  If not working means that your returned results contain the markup,
 then
  you're confusing indexing and storing. All the analysis chains operate
  on data sent into the indexing process. But the verbatim data is *stored*
  prior to (or separate from) indexing.
 
  So my assumption is that you see data returned in the document with
  markup, which is just as it should be, and there's no problem at all. And
  your
  actual indexed terms (try looking at the data with TermsComponent, or
  admin/schema browser) will NOT have any markup.
 
  Perhaps you can back up a bit and describe what's failing .vs. what you
  expect.
 
  Best
  Erick
 
  On Mon, Aug 8, 2011 at 6:50 AM, Merlin Morgenstern
  merlin.morgenst...@googlemail.com wrote:
   Unfortunatelly I still cant get it running. The code I am using is the
   following:
  analyzer type=index
  charFilter
 class=solr.HTMLStripCharFilterFactory/
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
   generateWordParts=1 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.KeywordMarkerFilterFactory/
  filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
  charFilter
 class=solr.HTMLStripCharFilterFactory/
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
   generateWordParts=1 generateNumberParts=1 catenateWords=0
   catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.KeywordMarkerFilterFactory/
  filter class=solr.PorterStemFilterFactory/
  /analyzer
  
   I also tried this one:
  
  types
   fieldType name=text class=solr.TextField
   positionIncrementGap=100 autoGeneratePhraseQueries=true
 analyzer
  charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
  /analyzer
   /fieldType
  /types
field name=text type=text indexed=true stored=true
   required=false/
  
   none of those worked. I restartred solr after the shema update and
  reindexed
   the data. No change, the html tags are still in there.
  
   Any other ideas? Maybe this is a bug in solr? I am using solr 3.3.0 on
  suse
   linux.
  
   Thank you for any help on this.
  
  
  
   2011/7/25 Mike Sokolov soko...@ifactory.com
  
   Hmm that looks like it's working fine.  I stand corrected.
  
  
  
   On 07/25/2011 12:24 PM, Markus Jelsma wrote:
  
   I've seen that issue too and read comments on the list yet i've never
  had
   trouble with the order, don't know what's going on. Check this
  analyzer,
   i've
   moved the charFilter to the bottom:
  
   analyzer type=index
   tokenizer class=solr.**WhitespaceTokenizerFactory/
   filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1
   generateNumberParts=1 catenateWords=1 catenateNumbers=1
   catenateAll=0
   splitOnCaseChange=1/
   filter class=solr.**LowerCaseFilterFactory/
   filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
   ignoreCase=false expand=true/
   filter class=solr.StopFilterFactory ignoreCase=false
   words=stopwords.txt/
   

Re: Need help indexing/querying a particular type of hierarchy

2011-08-11 Thread Michael B. Klein
I've been experimenting with that, but that fq wouldn't limit my facet
counts adequately. Since the document has both an accessionWF and a
digitizationWF, the fq would match (and count) the document no matter what
the status for each process.

I suppose I could do something like this:

field name=status_wpsaccessionWF:start-accession:completed/field
field name=status_wpsaccessionWF:cleanup:waiting/field
field 
name=status_wpsaccessionWF:descriptive-metadata:completed/field
field name=status_wpsaccessionWF:content-metadata:completed/field
field name=status_wpsaccessionWF:rights-metadata:completed/field
field name=status_wpsaccessionWF:publish:completed/field
field name=status_wpsaccessionWF:shelve:error/field
field name=status_wspaccessionWF:completed:start-accession/field
field name=status_wspaccessionWF:waiting:cleanup/field
field 
name=status_wspaccessionWF:completed:descriptive-metadata/field
field name=status_wspaccessionWF:completed:content-metadata/field
field name=status_wspaccessionWF:completed:rights-metadata/field
field name=status_wspaccessionWF:completed:publish/field
field name=status_wspaccessionWF:error:shelve/field
field name=status_swpcompleted:accessionWF:start-accession/field
field name=status_swpwaiting:accessionWF:cleanup/field
field 
name=status_swpcompleted:accessionWF:descriptive-metadata/field
field name=status_swpcompleted:accessionWF:content-metadata/field
field name=status_swpcompleted:accessionWF:rights-metadata/field
field name=status_swpcompleted:accessionWF:publish/field
field name=status_swperror:accessionWF:shelve/field

and use a PathHierarchyTokenizerFactory with : as the delimiter. Then I
could use facet.field=status_wpsf.status_wps.facet.prefix=accessionWF: to
get the counts for all the accessionWF processes and statuses, then repeat
using status_wsp and status_swp for the various inversions. I was hoping for
something easier. :)

On Thu, Aug 11, 2011 at 6:40 AM, Dmitry Kan dmitry@gmail.com wrote:

 Hi,

 Can you keep your hierarchy flat in SOLR and then use filter queries
 (fq=wf:accessionWF) inside you facet queries (facet.field=status)?

 Or is the requirement to have one single facet query producing the
 hierarchical facet counts?

 On Thu, Aug 11, 2011 at 10:43 AM, Michael B. Klein mbkl...@gmail.com
 wrote:

  Hi all,
 
  I have a particular data structure I'm trying to index into a solr
 document
  so that I can query and facet it in a particular way, and I can't quite
  figure out the best way to go about it.
 
  One sample object is here: https://gist.github.com/1139065
 
  The part that's tripping me up is the workflows. Each workflow has a name
  (in this case, digitizationWF and accessionWF). Each workflow is made up
 of
  a number of processes, each of which has its own current status. Every
 time
  the status of a process within a workflow changes, the object is
 reindexed.
 
  What I'd like to be able to do is present several hierarchies of facets:
 In
  one, the workflow name is the top-level facet, with the second level
  showing
  each process, under which is listed each status (completed, waiting, or
  error) and the number of documents with that status for that process
 (some
  values omitted for brevity):
 
  accessionWF (583)
   publish (583)
 completed (574)
 waiting (6)
 error (3)
   shelve (583)
 completed (583)
 
  etc.
 
  I'd also like to be able to invert that presentation:
 
  accessionWF (583)
   completed (583)
 publish (574)
 shelve (583)
   waiting (6)
 publish (6)
   error (3)
 publish (3)
 
  or even
 
  completed (583)
   accessionWF (583)
 publish (574)
 shelve (583)
   digitizationWF (583)
 initiate (583)
  error (3)
   accessionWF (3)
 shelve (3)
 
  etc.
 
  I don't think Solr 4.0's pivot/hierarchical facets are what I'm looking
  for,
  because the status values are ambiguous when not qualified by the process
  name -- the object itself has no completed status, only a
  publish:completed and a shelve:completed that I want to be able to
  group
  together into a count/list of objects with completed processes. I also
  don't think PathHierarchyTokenizerFactory is quite the answer either.
 
  What kind of Solr magic, if any, am I looking for here?
 
  Thanks in advance for any help or advice.
  Michael
 
  ---
  Michael B. Klein
  Digitization Workflow Engineer
  Stanford University Libraries
 



 --
 Regards,

 Dmitry Kan



Re: how to ignore case in solr search field?

2011-08-11 Thread Alexei Martchenko
Here's an example. Since I only query this for spelling, i can lowecase both
on index and query time.

fieldType name=textSpell class=solr.TextField positionIncrementGap=10
stored=false multiValued=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

2011/8/10 nagarjuna nagarjuna.avul...@gmail.com

 Hi please help me ..
how to ignore case while searching in solr


 ex:i need same results for the keywords abc, ABC , aBc,AbC and all the
 cases.




 Thank u in advance

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-ignore-case-in-solr-search-field-tp3242967p3242967.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-11 Thread Alexei Martchenko
are you boosting your docs?

2011/8/8 Jason Toy jason...@gmail.com

 I am trying to test out and compare different sorts and scoring.

  When I use dismax to search for indie music
 with: qf=all_lists_textq=indie+musicdefType=dismaxrows=100
 I see some stuff that seems irrelevant, meaning in top results I see only
 1 or 2 mentions of indie music, but when I look further down the list I
 do
 see other docs that have more occurrences of indie music.
 So I a want to test by comparing the the different queries versus seeing a
 list of docs ranked specifically by the count of occurrences of the phrase
 indie music

 On Mon, Aug 8, 2011 at 2:19 PM, Markus Jelsma markus.jel...@openindex.io
 wrote:

 
   Dismax queries can. But
  
   sort=termfreq(all_lists_text,'indie+music')
  
   is not using dismax.  Apparenty termfreq function can not? I am not
   familiar with the termfreq function.
 
  It simply returns the TF of the given _term_  as it is indexed of the
  current
  document.
 
  Sorting on TF like this seems strange as by default queries are already
  sorted
  that way since TF plays a big role in the final score.
 
  
   To understand why you'd need to reindex, you might want to read up on
 how
   lucene actually works, to get a basic understanding of how different
   indexing choices effect what is possible at query time. Lucene In
 Action
   is a pretty good book.
  
   On 8/8/2011 5:02 PM, Jason Toy wrote:
Are not  Dismax queries able to search for phrases using the default
index(which is what I am using?) If I can already do phrase
  searches,
  I
don't understand why I would need to reindex t be able to access
  phrases
from a function.
   
On Mon, Aug 8, 2011 at 1:49 PM, Markus
  Jelsmamarkus.jel...@openindex.iowrote:
Aelexei, thank you , that does seem to work.
   
My sort results seem to be totally wrong though, I'm not sure if
 its
because of my sort function or something else.
   
My query consists of:
sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
And I get back 4571232 hits.
   
That's normal, you issue a catch all query. Sorting should work
 but..
   
All the results don't have the phrase indie music anywhere in
 their
   
data.
   
  Does termfreq not support phrases?
   
No, it is TERM frequency and indie music is not one term. I don't
 know
how this function parses your input but it might not understand your
 +
escape and
think it's one term constisting of exactly that.
   
If not, how can I sort specifically by termfreq of a phrase?
   
You cannot. What you can do is index multiple terms as one term
 using
the shingle filter. Take care, it can significantly increase your
  index
size and
number of unique terms.
   
On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko
   
ale...@superdownloads.com.br  wrote:
You can use the standard query parser and pass q=*:*
   
2011/8/8 Jason Toyjason...@gmail.com
   
I am trying to list some data based on a function I run ,
specifically  termfreq(post_text,'indie music')  and I am unable
 to
   
do
   
it without passing in data to the q paramater.  Is it possible to
  get
a
   
sorted
   
list without searching for any terms?
   
--
   
*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533
 



 --
 - sent from my mobile
 6176064373




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: unique terms and multi-valued fields

2011-08-11 Thread Kevin Osborn
Thant makes sense. There are actually stored fields. I was mostly just trying 
to figure out how much my index size might grow. These fields I am dealing with 
are large and repetitive (but mixed).



From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org; Kevin Osborn osbo...@yahoo.com
Sent: Wednesday, August 10, 2011 7:08 AM
Subject: Re: unique terms and multi-valued fields

Well, it depends (tm).

If you're talking about *indexed* terms, then the value is stored only
once in both the cases you mentioned below. There's really very little
difference between a non-multi-valued field and a multi-valued field
in terms of how it's stored in the searchable portion of the index,
except for some position information.

So, having an XML doc with a single-valued field

field name=categorycomputers laptops/field

is almost identical (except for position info as positionIncrementGap) as a

field name=categorycomputers/field
field name=categorylaptops/field

multiValued refers to the *input*, not whether more than one word is
allowed in that field.


Now, about *stored* fields. If you store the data, verbatim copies are
kept in the
storage-specific files in each segment, and the values will be on disk for
each document.

But you probably don't care much because this data is only referenced when you
assemble a document for return to the client, it's irrelevant for searching.

Best
Erick

On Tue, Aug 9, 2011 at 8:02 PM, Kevin Osborn osbo...@yahoo.com wrote:
 Please verify my understanding. I have a field called category and it has a 
 value computers. If I use this same field and value for all of my 
 documents, it is really only stored on disk once because category:computers 
 is a unique term. Is this correct?

 But, what about multi-valued fields. So, I have a field called category. 
 For 100 documents, it has the values computers and laptops. For 100 other 
 documents, it has the values computers and tablets. Is this stored as 
 category:computers, category:laptops, category:tablets, meaning 3 
 unique terms. Or is it stored as category:computers,laptops and 
 category:computers,tablets. I believe it is the first case (hopefully), but 
 I am not sure.

 Thanks.

Re: Unbuffered entity enclosing request can not be repeated Invalid chunk header

2011-08-11 Thread Markus Jelsma
Hi,

We  see these errors too once on a while but there is real answer on the 
mailing list here except one user suspecting Tomcat is responsible (connection 
time outs).

Another user proposed to limit the number of documents per batch but that, of 
course, increases the number of connections made. We do only 250 docs/batch to 
limit RAM usage on the client and start to see these errors very occasionally. 
There may be a coincidence.. or not. 

Anyway, it's really hard to reproduce if not impossible. It happens when 
connecting directly as well when connecting through a proxy.

What you can do is simply retry the batch and it usually works out fine. At 
least you don't loose a batch in the process. We retry all failures at least a 
couple of times before giving up an indexing job.

Cheers,

 Hello folks,
 
 i use solr 1.4.1 and every 2 to 6 hours i have indexing errors in my log
 files.
 
 on the client side:
 2011-08-04 12:01:18,966 ERROR [Worker-242] IndexServiceImpl - Indexing
 failed with SolrServerException.
 Details: org.apache.commons.httpclient.ProtocolException: Unbuffered entity
 enclosing request can not be repeated.:
 Stacktrace:
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
 pSolrServer.java:469) .
 .
 on the server side:
 INFO: [] webapp=/solr path=/update params={wt=javabinversion=1} status=0
 QTime=3
 04.08.2011 12:01:18 org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {} 0 0
 04.08.2011 12:01:18 org.apache.solr.common.SolrException log
 SCHWERWIEGEND: org.apache.solr.common.SolrException: java.io.IOException:
 Invalid chunk header
 .
 .
 .
 i`m indexing ONE document per call, 15-20 documents per second, 24/7.
 what may be the problem?
 
 best regards
 vadim


Why is boost not always listed in explain when debug is on?

2011-08-11 Thread Jonathan Acheson
using Solr Specification Version: 4.0.0.2011.08.09.11.02.13

While trying understand scoring I noticed that boost is intermittently
displayed in the explain. For example, using edismax and the query string is
q=Starbucksqf=name.search name^2 my first result has the boost explicitly
listed in the explain as 2.0=boost. When I change the boost to 20,
however, I no longer see the boost listed. Should boost be displayed in
both cases? Any help understanding this behavior would be greatly
appreciated. Thanks!

Boost of 2 

f278968e-b2c6-4bbd-8e69-85ab938fa554: 
8.609146 = (MATCH) max of:
  8.609146 = (MATCH) weight(name:starbucks^2.0 in 163) [DefaultSimilarity],
result of:
8.609146 = score(doc=163,freq=1.0 = termFreq=1
), product of:
  0.9994 = queryWeight, product of:
2.0 = boost
8.609147 = idf(docFreq=8644, maxDocs=17433139)
0.05807776 = queryNorm
  8.609147 = fieldWeight in 163, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1
8.609147 = idf(docFreq=8644, maxDocs=17433139)
1.0 = fieldNorm(doc=163)
  4.278918 = (MATCH) weight(name.search:starbuck in 163)
[DefaultSimilarity], result of:
4.278918 = score(doc=163,freq=1.0 = termFreq=1
), product of:
  0.49850774 = queryWeight, product of:
8.583453 = idf(docFreq=8869, maxDocs=17433139)
0.05807776 = queryNorm
  8.583453 = fieldWeight in 163, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1
8.583453 = idf(docFreq=8869, maxDocs=17433139)
1.0 = fieldNorm(doc=163)


Boost of 20 

f278968e-b2c6-4bbd-8e69-85ab938fa554: 
8.609147 = (MATCH) max of:
  8.609147 = (MATCH) weight(name:starbucks^20.0 in 163) [DefaultSimilarity],
result of:
8.609147 = fieldWeight in 163, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1
  8.609147 = idf(docFreq=8644, maxDocs=17433139)
  1.0 = fieldNorm(doc=163)
  0.42789182 = (MATCH) weight(name.search:starbuck in 163)
[DefaultSimilarity], result of:
0.42789182 = score(doc=163,freq=1.0 = termFreq=1
), product of:
  0.049850777 = queryWeight, product of:
8.583453 = idf(docFreq=8869, maxDocs=17433139)
0.0058077765 = queryNorm
  8.583453 = fieldWeight in 163, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1
8.583453 = idf(docFreq=8869, maxDocs=17433139)
1.0 = fieldNorm(doc=163)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-is-boost-not-always-listed-in-explain-when-debug-is-on-tp3247505p3247505.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.ICUTokenizerFactory'

2011-08-11 Thread Chris Hostetter

: I copied the file apache-solr-analysis-extras-3.3.0.jar into solr's lib
: folder. Now the error is different -
...
:   I also added the following files to my apache-solr-3.3.0\example\lib
:  folder:

Deja-Vu...

http://www.lucidimagination.com/search/document/5967b87c6fa56fd1/error_loading_a_custom_request_handler_in_solr_4_0

And another blast from the past (all the details still acurate)...

http://www.lucidimagination.com/search/document/ef9f4bd49f8b3576/fw_customanalyzer_class_not_loaded_error

-Hoss


Re: Dates off by 1 day?

2011-08-11 Thread Chris Hostetter

: In Solr the date is stored as Zulu time zone and Solrj is returning date in
: CDT timezone (jvm is picking system time zone.)

Strictly speaking, Solrj is not returning the date in CDT timezone ... 
Date objects in java are absolute moments in time, that know nothing about 
timezones.

Where the system time zone of your client comes into play is when you do 
an implicit conversion to a String because of the + operator...

:  System.out.println(--  + 
resultDoc.getFieldValue(FILE_DATE));

http://download.oracle.com/javase/6/docs/api/java/util/Date.html#toString%28%29


-Hoss


Timeout trying to index from nutch

2011-08-11 Thread Phil Scadden
I am new user and I have SOLR installed. I can use the admin page and 
query the example data.
However, I was using nutch to load index with intranet web pages and I 
got this message.

SolrIndexer: starting at 2011-08-12 16:52:44
org.apache.solr.client.solrj.SolrServerException: 
java.net.ConnectException: Connection timed out

Timeout happened after about 12 minutes. I cant seem to find this 
message in an archive search. Can anyone give me some clues?

Notice: This email and any attachments are confidential. If received in error 
please destroy and immediately notify us. Do not copy or disclose the contents.