solr admin result page error

2011-02-11 Thread Bernd Fehling
Dear list,

after loading some documents via DIH which also include urls
I get this yellow XML error page as search result from solr admin GUI
after a search.
It says XML processing error not well-formed.
The code it argues about is:

arr name=dcurls
strhttp://eprints.soton.ac.uk/43350//str
strhttp://dx.doi.org/doi:10.1112/S0024610706023143/str
strMartinez-Perez, Conchita and Nucinkis, Brita E.A. (2006) Cohomological 
dimension of Mackey functors for infinite groups. Journal of the
London Mathematical Society, 74, (2), 379-396. (doi:10.1112/S0024610706023143 
lt;http://dx.doi.org/10.1112/S002461070602314\ugt;)/str/arr

See the \u utf8-code in the last line.

1. the loaded data is valid, well-formed and checked with xmllint. No errors.
2. there is no \u utf8-code in the source data.
3. the data is loaded via DIH without any errors.
4. if opening the source-view of the result page with firefox there is also no 
\u utf8-code.

Only idea I have is solr itself or the result page generation.

How to proceed, what else to check?

Regards,
Bernd


Re: Turn off caching

2011-02-11 Thread Li Li
besides fieldCache, there is also a cache for termInfo.
I don't know how to turn it off in both lucene and solr.

codes in TermInfosReader
  /** Returns the TermInfo for a Term in the set, or null. */
  TermInfo get(Term term) throws IOException {
return get(term, true);
  }

  /** Returns the TermInfo for a Term in the set, or null. */
  private TermInfo get(Term term, boolean useCache) throws IOException

2011/2/11 Stijn Vanhoorelbeke stijn.vanhoorelb...@gmail.com:
 Hi,

 You can comment out all sections in solrconfig.xml pointing to a cache.
 However, there is a cache deep in Lucence - the fieldcache - that can't be
 commented out. This cache will always jump into the picture

 If I need to do such things, I restart the whole tomcat6 server to flush ALL
 caches.

 2011/2/11 Li Li fancye...@gmail.com

 do you mean queryResultCache? you can comment related paragraph in
 solrconfig.xml
 see http://wiki.apache.org/solr/SolrCaching

 2011/2/8 Isan Fulia isan.fu...@germinait.com:
  Hi,
  My solrConfig file looks like
 
  config
   updateHandler class=solr.DirectUpdateHandler2 /
 
   requestDispatcher handleSelect=true 
     requestParsers enableRemoteStreaming=false
  multipartUploadLimitInKB=2048 /
   /requestDispatcher
 
   requestHandler name=standard class=solr.StandardRequestHandler
  default=true /
   requestHandler name=/update class=solr.XmlUpdateRequestHandler /
   requestHandler name=/admin/
  class=org.apache.solr.handler.admin.AdminHandlers /
 
 
   queryResponseWriter  name=xslt
  class=org.apache.solr.request.XSLTResponseWriter
   /queryResponseWriter
   !--config for the admin interface --
   admin
     defaultQuery*:*/defaultQuery
   /admin
  /config
 
 
  EveryTime I fire the same query so as to compare the results for
 different
  configurations , the query result time is getting reduced because of
  caching.
  So I want to turn off the cahing or clear the ache before  i fire the
 same
  query .
  Does anyone know how to do it.
 
 
  --
  Thanks  Regards,
  Isan Fulia.
 




Question about QueryParser and StandardAnalyzer

2011-02-11 Thread Ahsan |qbal
Hi All

I am writing a custom QueryParserPlugin for solr to fulfill a
specific requirement. Now when I Build query object, I need to feed that
object with terms, for that I get analyzer from the request as

Analyzer analyzer =
req.getSchema().getField(TextField).getType().getAnalyzer()

now in schema TextField has type text that is configured as

fieldtype name=text class=solr.TextField

analyzer

tokenizer class=solr.StandardTokenizerFactory
luceneMatchVersion=LUCENE_29/

filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/

/analyzer

/fieldtype

now when i get TokenStream

TolenStream ts=analyzer.tokenStream(TextField,new StringReader(ri*));

it simply removes wild cards from the term, analyzer has the
same behavior even if I escape wild card as

TolenStream ts=analyzer.tokenStream(TextField,new StringReader(ri\\*));

Please suggest

Regards
Ahsan Iqbal


Re: solr admin result page error

2011-02-11 Thread Bernd Fehling

Results so far.
I could locate and isolate the document causing trouble.
I've checked the document with xmllint again. It is valid, well-formed utf8.
I've loaded the single document and get the XML error if displaying the search 
result.
This is through solr admin search and also JSON interface, probably other
interfaces also.
Next step is to use debugger and see what goes wrong.

One thing I can already say is that it is utf8-code F0 9D 94 90 (U+1D510)
which makes the problem (Mathematical Fraktur Capital M).

Any already known issues about that?

Regards,
Bernd


Am 11.02.2011 08:59, schrieb Bernd Fehling:
 Dear list,
 
 after loading some documents via DIH which also include urls
 I get this yellow XML error page as search result from solr admin GUI
 after a search.
 It says XML processing error not well-formed.
 The code it argues about is:
 
 arr name=dcurls
 strhttp://eprints.soton.ac.uk/43350//str
 strhttp://dx.doi.org/doi:10.1112/S0024610706023143/str
 strMartinez-Perez, Conchita and Nucinkis, Brita E.A. (2006) Cohomological 
 dimension of Mackey functors for infinite groups. Journal of the
 London Mathematical Society, 74, (2), 379-396. (doi:10.1112/S0024610706023143 
 lt;http://dx.doi.org/10.1112/S002461070602314\ugt;)/str/arr
 
 See the \u utf8-code in the last line.
 
 1. the loaded data is valid, well-formed and checked with xmllint. No errors.
 2. there is no \u utf8-code in the source data.
 3. the data is loaded via DIH without any errors.
 4. if opening the source-view of the result page with firefox there is also 
 no \u utf8-code.
 
 Only idea I have is solr itself or the result page generation.
 
 How to proceed, what else to check?
 
 Regards,
 Bernd


Re: solr admin result page error

2011-02-11 Thread Markus Jelsma
It looks like you hit the same issue as i did a while ago:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg46510.html


On Friday 11 February 2011 08:59:27 Bernd Fehling wrote:
 Dear list,
 
 after loading some documents via DIH which also include urls
 I get this yellow XML error page as search result from solr admin GUI
 after a search.
 It says XML processing error not well-formed.
 The code it argues about is:
 
 arr name=dcurls
 strhttp://eprints.soton.ac.uk/43350//str
 strhttp://dx.doi.org/doi:10.1112/S0024610706023143/str
 strMartinez-Perez, Conchita and Nucinkis, Brita E.A. (2006) Cohomological
 dimension of Mackey functors for infinite groups. Journal of the London
 Mathematical Society, 74, (2), 379-396. (doi:10.1112/S0024610706023143
 lt;http://dx.doi.org/10.1112/S002461070602314\ugt;)/str/arr
 
 See the \u utf8-code in the last line.
 
 1. the loaded data is valid, well-formed and checked with xmllint. No
 errors. 2. there is no \u utf8-code in the source data.
 3. the data is loaded via DIH without any errors.
 4. if opening the source-view of the result page with firefox there is also
 no \u utf8-code.
 
 Only idea I have is solr itself or the result page generation.
 
 How to proceed, what else to check?
 
 Regards,
 Bernd

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


RE: Turn off caching

2011-02-11 Thread Jagdish Vasani IN
I don't think there is option to disable cache in solrconfig.xml in Solr
1.4..You need to modify/change code at time of creating
SolrIndexSearcher instance in  class SorlCore.

Thanks,
Jagdish

-Original Message-
From: Isan Fulia [mailto:isan.fu...@germinait.com] 
Sent: Tuesday, February 08, 2011 5:02 PM
To: solr-user@lucene.apache.org
Subject: Turn off caching

Hi,
My solrConfig file looks like

config
  updateHandler class=solr.DirectUpdateHandler2 /

  requestDispatcher handleSelect=true 
requestParsers enableRemoteStreaming=false
multipartUploadLimitInKB=2048 /
  /requestDispatcher

  requestHandler name=standard class=solr.StandardRequestHandler
default=true /
  requestHandler name=/update class=solr.XmlUpdateRequestHandler /
  requestHandler name=/admin/
class=org.apache.solr.handler.admin.AdminHandlers /


  queryResponseWriter  name=xslt
class=org.apache.solr.request.XSLTResponseWriter
  /queryResponseWriter
 !--config for the admin interface --
  admin
defaultQuery*:*/defaultQuery
  /admin
/config


EveryTime I fire the same query so as to compare the results for
different
configurations , the query result time is getting reduced because of
caching.
So I want to turn off the cahing or clear the ache before  i fire the
same
query .
Does anyone know how to do it.


-- 
Thanks  Regards,
Isan Fulia.


Any contribs available for Range field type?

2011-02-11 Thread kenf_nc

I have a huge need for a new field type. It would be a Poly field, similar to
Point or Payload. It would take 2 data elements and a search would return a
hit if the search term fell within the range of the elements. For example
let's say I have a document representing an Employment record. I may want to
create a field for years_of_service where it would take values 1999,2004. 
Then in a query q=years_of_service:2001 would be a hit,
q=years_of_service:2010 would not. The field would need to take a data type
attribute as a parameter. I may need to do integer ranges, float/double
ranges, date ranges. I don't see the need now, but heck maybe even a string
range. This would be useful for things like Event dates. An event often
occurs between several days (or hours) but the query is something like what
events are happening today. If I did q=event_date:NOW (or similar) it
should hit all documents where event_date has a range that in inclusive of
today. Another example would be product category document. A specific
automobile may have a fixed price, but a category of auto (2010 BMW 3-series
for example) would have a price range. 

I hope you get the point. My question (finally) is, does anyone know of an
existing contribution to the public domain that already does this? I'm more
of a .Net/C# developer than a Java developer. I know my way around Java, but
don't really have the right tools to build/test/etc. So was hoping to borrow
rather than build if I could.

Thanks,
Ken
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2473601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr suggestions

2011-02-11 Thread Thumuluri, Sai
Good Morning, 
I have implemented Solr 1.4.1 in our UAT environment and I get weird
suggestions for any misspellings. For instance when I search for
cabinet award winders as opposed to cabinet award winners, I get a
suggestion of cabinet abarc pindeks
http://nextgen-uat.sdc.vzwcorp.com/search/apachesolr_search/cabinet%20a
barc%20pindeks . How can I get more meaningful suggestions? Any help
is greatly appreciated. 

Thanks,
Sai Thumuluri




Re: Alternative to Solrj

2011-02-11 Thread Erik Hatcher
Sounds like you just described the VelocityResponseWriter.  On trunk (or 3.x I 
believe), try out http://localhost:8983/solr/browse and look at what makes that 
tick.

Erik

On Feb 11, 2011, at 08:40 , McGibbney, Lewis John wrote:

 Hi list,
 
 I have been looking at an alternative UI config displaying retrieved results 
 from Solr after a query has been passed. At this point, I am not interested 
 in Solrj as all I wish to change is the default responseWriter (line 1007 of 
 Solrconfig). I've also noticed a snippet of default CSS code included in 
 /conf/xslt/example.xsl and understand that all response writers are located 
 in $SOLR_HOME/src/java/org/apache/solr/request and that the default is 
 XSLTResponseWriter.java.
 Basically I wish to keep code for the search UI as simple as possible 
 (ideally write a simple JSP and CSS ), however I now find that this 
 configuration is proving slightly more confusing in practice. My thinking is 
 as follows, write own responseWriter, include within it my CSS template then 
 specify the responseWriter in solrconfig along with the java class. Can 
 anyone advise me on this from their own experiences.
 
 Thank you
 
 Lewis
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education’s Widening Participation Initiative of the 
 Year 2009 and Herald Society’s Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education’s Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html



Re: solr admin result page error

2011-02-11 Thread Bernd Fehling
Hi Markus,

yes it looks like the same issue. There is also a \u utf8-code in your dump.
Till now I followed it into XMLResponseWriter.
Some steps before the result in a buffer looks good and the utf8-code is 
correct.
Really hard to debug this freaky problem.

Have you looked deeper into this and located the bug?

It is definately a bug and has nothing to do with firefox.

Regards,
Bernd


Am 11.02.2011 13:48, schrieb Markus Jelsma:
 It looks like you hit the same issue as i did a while ago:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg46510.html
 
 
 On Friday 11 February 2011 08:59:27 Bernd Fehling wrote:
 Dear list,

 after loading some documents via DIH which also include urls
 I get this yellow XML error page as search result from solr admin GUI
 after a search.
 It says XML processing error not well-formed.
 The code it argues about is:

 arr name=dcurls
 strhttp://eprints.soton.ac.uk/43350//str
 strhttp://dx.doi.org/doi:10.1112/S0024610706023143/str
 strMartinez-Perez, Conchita and Nucinkis, Brita E.A. (2006) Cohomological
 dimension of Mackey functors for infinite groups. Journal of the London
 Mathematical Society, 74, (2), 379-396. (doi:10.1112/S0024610706023143
 lt;http://dx.doi.org/10.1112/S002461070602314\ugt;)/str/arr

 See the \u utf8-code in the last line.

 1. the loaded data is valid, well-formed and checked with xmllint. No
 errors. 2. there is no \u utf8-code in the source data.
 3. the data is loaded via DIH without any errors.
 4. if opening the source-view of the result page with firefox there is also
 no \u utf8-code.

 Only idea I have is solr itself or the result page generation.

 How to proceed, what else to check?

 Regards,
 Bernd
 

-- 
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Any contribs available for Range field type?

2011-02-11 Thread Erick Erickson
You can solve this as is by having a start and end field,
rather than a range field. Then add a clause like

+(+start:[* TO target] +end:[target TO *])

Best
Erick

On Fri, Feb 11, 2011 at 8:49 AM, kenf_nc ken.fos...@realestate.com wrote:

 I have a huge need for a new field type. It would be a Poly field, similar to
 Point or Payload. It would take 2 data elements and a search would return a
 hit if the search term fell within the range of the elements. For example
 let's say I have a document representing an Employment record. I may want to
 create a field for years_of_service where it would take values 1999,2004.
 Then in a query q=years_of_service:2001 would be a hit,
 q=years_of_service:2010 would not. The field would need to take a data type
 attribute as a parameter. I may need to do integer ranges, float/double
 ranges, date ranges. I don't see the need now, but heck maybe even a string
 range. This would be useful for things like Event dates. An event often
 occurs between several days (or hours) but the query is something like what
 events are happening today. If I did q=event_date:NOW (or similar) it
 should hit all documents where event_date has a range that in inclusive of
 today. Another example would be product category document. A specific
 automobile may have a fixed price, but a category of auto (2010 BMW 3-series
 for example) would have a price range.

 I hope you get the point. My question (finally) is, does anyone know of an
 existing contribution to the public domain that already does this? I'm more
 of a .Net/C# developer than a Java developer. I know my way around Java, but
 don't really have the right tools to build/test/etc. So was hoping to borrow
 rather than build if I could.

 Thanks,
 Ken
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2473601.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr suggestions

2011-02-11 Thread Erick Erickson
Well, you have to tell us how you're accessing the info and what's
in your index.

Please include the relevant schema file definitions and the calls you're
making to get spelling suggestions.

Best
Erick

On Fri, Feb 11, 2011 at 8:55 AM, Thumuluri, Sai
sai.thumul...@verizonwireless.com wrote:
 Good Morning,
 I have implemented Solr 1.4.1 in our UAT environment and I get weird
 suggestions for any misspellings. For instance when I search for
 cabinet award winders as opposed to cabinet award winners, I get a
 suggestion of cabinet abarc pindeks
 http://nextgen-uat.sdc.vzwcorp.com/search/apachesolr_search/cabinet%20a
 barc%20pindeks . How can I get more meaningful suggestions? Any help
 is greatly appreciated.

 Thanks,
 Sai Thumuluri





Re: Any contribs available for Range field type?

2011-02-11 Thread kenf_nc

True. And that's my temporary solution. But it's ugly code, even uglier
queries. I may have several such fields in a single query. A PolyField
solution would be so much more elegant and useful. I'm actually shocked more
people don't need/want something like it. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2474055.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr suggestions

2011-02-11 Thread Thumuluri, Sai
Please let me know if there is any other information that could help. 

My request handler config is
- requestHandler name=edismax class=solr.SearchHandler
- lst name=defaults
  str name=defTypeedismax/str 
  str name=echoParamsexplicit/str 
  /lst
  /requestHandler
- !-- 
 Note how you can register the same handler multiple times with
   different names (and different init parameters)


  -- 
- requestHandler name=partitioned class=solr.SearchHandler
default=true
- lst name=defaults
  str name=defTypeedismax/str 
  str name=echoParamsexplicit/str 
  float name=tie0.01/float 
  str name=qfbody^1.0 title^20.0 ts_vid_9_names^10.0
ts_vid_10_names^10.0 name^3.0 taxonomy_names^2.0 tags_h1^5.0
tags_h2_h3^3.0 tags_h4_h5_h6^2.0 tags_inline^1.0/str 
  str name=pfbody/str 
  int name=ps2/int 
  str name=mm3/str 
  str name=q.alt*:*/str 
- !--  example highlighter config, enable per-query with hl=true 
  -- 
  str name=hltrue/str 
  str name=hl.flbody/str 
  int name=hl.snippets3/int 
  str name=hl.mergeContiguoustrue/str 
- !-- 
 instructs Solr to return the field itself if no query terms are
found 
  -- 
  str name=f.body.hl.alternateFieldbody/str 
  str name=f.body.hl.maxAlternateFieldLength256/str 
- !-- 
 JS: I wasn't getting good results here... I'm turning off for now
 because I was getting periods (.) by themselves at the begining of
 snippets and don't feel like deubgging anymore.  Without the regex
is
 faster too 

  str name=spellcheckfalse/str 
  str name=spellcheck.onlyMorePopularfalse/str 
  str name=spellcheck.extendedResultsfalse/str 
  str name=spellcheck.count1/str 
  /lst
- arr name=last-components
  strspellcheck/str 
  strelevator/str 
  /arr
  /requestHandler

My field definitions are

- !--  The document id is derived from a site-spcific key (hash) and
the node ID like: $document-id = $hash . '/node/' . $node-nid; 
  -- 
  field name=id type=string indexed=true stored=true
required=true / 
  field name=site type=string indexed=false stored=true / 
  field name=hash type=string indexed=true stored=true / 
  field name=url type=string indexed=false stored=true / 
  field name=title type=text indexed=true stored=true
termVectors=true omitNorms=true / 
  field name=body type=text indexed=true stored=true
termVectors=true termPositions=true termOffsets=true / 
  field name=comments type=text indexed=false stored=true / 
  field name=type type=string indexed=true stored=true / 
  field name=type_name type=string indexed=true stored=true / 
  field name=path type=string indexed=false stored=true
multiValued=false / 
  field name=path_alias type=text indexed=true stored=true
termVectors=true / 
  field name=uid type=integer indexed=false stored=true / 
  field name=name type=text indexed=true stored=true
termVectors=true / 
  field name=sname type=string indexed=true stored=false / 
  field name=sort_name type=sortString indexed=true
stored=false / 
  field name=created type=date indexed=true stored=true / 
  field name=changed type=date indexed=true stored=true / 
  field name=comment_count type=integer indexed=true
stored=true / 
  field name=tid type=integer indexed=true stored=true
multiValued=true / 
  field name=vid type=integer indexed=true stored=true
multiValued=true / 
  field name=taxonomy_names type=text indexed=true stored=false
termVectors=true multiValued=true omitNorms=true / 
  field name=app type=string indexed=true stored=true
multiValued=true / 
  field name=cat type=string indexed=true stored=true
multiValued=true / 
  field name=area type=string indexed=true stored=true
multiValued=true / 
  field name=region type=string indexed=true stored=true
multiValued=true / 
  field name=permalink type=string indexed=true stored=true / 
  field name=categories type=string indexed=true stored=true
multiValued=true / 
  field name=categoriessrch type=text_lws indexed=true
stored=false multiValued=true / 
  field name=tags type=string indexed=true stored=true
multiValued=true / 
  field name=tagssrch type=text_lws indexed=true stored=false
multiValued=true / 
  field name=author type=string indexed=true stored=true / 
  field name=text type=text indexed=true stored=false
multiValued=true / 
  field name=numcomments type=integer indexed=true stored=true
/ 
  field name=tags_h1 type=text indexed=true stored=false
omitNorms=true / 
  field name=tags_h2_h3 type=text indexed=true stored=false
omitNorms=true / 
  field name=tags_h4_h5_h6 type=text indexed=true stored=false
omitNorms=true / 
  field name=tags_a type=text indexed=true stored=false
omitNorms=true / 
  field name=ts_vid_9_names type=text indexed=true stored=true
OmitNorms=true multiValued=true / 
  field name=ts_vid_10_names type=text indexed=true stored=true
OmitNorms=true multiValued=true / 
  field name=tags_inline type=text indexed=true stored=false
omitNorms=true / 
  field name=timestamp type=date indexed=true stored=true
default=NOW multiValued=false / 
  field name=prefix1 type=prefix_full 

Solr design decisions

2011-02-11 Thread Greg Georges
Hello all,

I have just finished to book Solr 1.4 Enterprise Search Server. I now 
understand most of the basics of Solr and also how we can scale the solution. 
Our goal is to have a centralized search service for a multitude of apps.

Our first application which we want to index, is a system in which we must 
index documents through Solr Cell. These documents are associated to certain 
clients (companies). Each client can have a multitude of users, and each user 
can be part of a group of users. We have permissions on each physical document 
in the system, and we want this to also be present in our enterprise search for 
the system.

I read that we can associate roles and ids to solr documents in order to show 
only a subset of search results for a particular user. The question I am asking 
is this. A best practice in Solr is to batch commit changes. The problem in my 
case is that if we change a documents permissions (role), and if we batch 
commit there can be a period where the document in the search results can be 
associated to the old role. What should I do in this case? Should I just commit 
the change right away? What if this action is done many times by many clients, 
will the performance still scale even if I do not batch commit my changes? 
Thanks

Greg


Re: Solr design decisions

2011-02-11 Thread Erick Erickson
Your users will have to accept some latency between changed permissions and
those permissions being reflected in the results. The length of that latency is
determined by two things:
1 the interval between when you send the change to Solr (i.e.
re-index the doc)
and issue a commit
AND
2 the time it takes the Solr instance to propgate that change.

Now, for 2 if you have a master/slave setup, the slave's polling interval must
pass before it pulls the changes down. Then there's the warmup time that
passes between the time the changes are made (master and/or slave) and the
time the new searcher uses the newly-warmed searcher.

Here's the problem; When a change is committed to an index (we're skipping the
master/slave issue for now), any autowarming takes, say, time T. If you commit
too frequently (some time less than T), then the *first* autowarm
process isn't yet
done when the *second* starts. And if you keep committing pathologically quickly
then you start a death spiral.

So the batching/not batching is less of a problem than the death
spiral. Batch changes
are more efficient, but that speedup is probably less noticeable than
the propagation delays.

All that said, it's not unreasonable to expect, say, a 5 minute delay
between the changes
and when they're reflected in new searches, so I'd start with some
reasonable number,
monitor the warmup times and reduce the commit interval as appropriate

NOTE: if you have a master/slave setup, and your master isn't used to
search, you can
control this by the polling interval on the slave and commit more
frequently on the
master since it doesn't need to warm searchers.

Finally, there is work being done for NRT (Near Real Time) searching that may
be of interest to you, search for NRT in JIRA if you're interested.

Best
Erick


On Fri, Feb 11, 2011 at 10:22 AM, Greg Georges greg.geor...@biztree.com wrote:
 Hello all,

 I have just finished to book Solr 1.4 Enterprise Search Server. I now 
 understand most of the basics of Solr and also how we can scale the solution. 
 Our goal is to have a centralized search service for a multitude of apps.

 Our first application which we want to index, is a system in which we must 
 index documents through Solr Cell. These documents are associated to certain 
 clients (companies). Each client can have a multitude of users, and each user 
 can be part of a group of users. We have permissions on each physical 
 document in the system, and we want this to also be present in our enterprise 
 search for the system.

 I read that we can associate roles and ids to solr documents in order to show 
 only a subset of search results for a particular user. The question I am 
 asking is this. A best practice in Solr is to batch commit changes. The 
 problem in my case is that if we change a documents permissions (role), and 
 if we batch commit there can be a period where the document in the search 
 results can be associated to the old role. What should I do in this case? 
 Should I just commit the change right away? What if this action is done many 
 times by many clients, will the performance still scale even if I do not 
 batch commit my changes? Thanks

 Greg



Title index to wiki

2011-02-11 Thread Dennis Gearon
pI think it would be an improvement to the wikis if the link to the title 
index were at the top of the index page of the wikis :-) I looked on that index 
page amp; did not see that link on that page./p
pWho's got write access to wikis pages?brbrbr/p
pSent from Yahoo! Mail on Android/p


Re: Title index to wiki

2011-02-11 Thread Markus Jelsma
What do you mean, there are two links to the Frontpage on each page.

On Friday 11 February 2011 16:56:41 Dennis Gearon wrote:
 pI think it would be an improvement to the wikis if the link to the title
 index were at the top of the index page of the wikis :-) I looked on that
 index page amp; did not see that link on that page./p pWho's got
 write access to wikis pages?brbrbr/p
 pSent from Yahoo! Mail on Android/p

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Solr design decisions

2011-02-11 Thread Bill Bell
You could commit on a time schedule. Like every 5 mins. If there is nothing to 
commit it doesn't do anything anyway.

Bill Bell
Sent from mobile


On Feb 11, 2011, at 8:22 AM, Greg Georges greg.geor...@biztree.com wrote:

 Hello all,
 
 I have just finished to book Solr 1.4 Enterprise Search Server. I now 
 understand most of the basics of Solr and also how we can scale the solution. 
 Our goal is to have a centralized search service for a multitude of apps.
 
 Our first application which we want to index, is a system in which we must 
 index documents through Solr Cell. These documents are associated to certain 
 clients (companies). Each client can have a multitude of users, and each user 
 can be part of a group of users. We have permissions on each physical 
 document in the system, and we want this to also be present in our enterprise 
 search for the system.
 
 I read that we can associate roles and ids to solr documents in order to show 
 only a subset of search results for a particular user. The question I am 
 asking is this. A best practice in Solr is to batch commit changes. The 
 problem in my case is that if we change a documents permissions (role), and 
 if we batch commit there can be a period where the document in the search 
 results can be associated to the old role. What should I do in this case? 
 Should I just commit the change right away? What if this action is done many 
 times by many clients, will the performance still scale even if I do not 
 batch commit my changes? Thanks
 
 Greg


Difference between Solr and Lucidworks distribution

2011-02-11 Thread Greg Georges
Hello all,

I just started watching the webinars from Lucidworks, and they mention their 
distribution which has an installer, etc.. Is there any other differences? Is 
it a good idea to use this free distribution?

Greg


Re: Difference between Solr and Lucidworks distribution

2011-02-11 Thread Markus Jelsma
It is not free for production environments.
http://www.lucidimagination.com/lwe/subscriptions-and-pricing

On Friday 11 February 2011 17:31:22 Greg Georges wrote:
 Hello all,
 
 I just started watching the webinars from Lucidworks, and they mention
 their distribution which has an installer, etc.. Is there any other
 differences? Is it a good idea to use this free distribution?
 
 Greg

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


RE: Alternative to Solrj

2011-02-11 Thread McGibbney, Lewis John
Hi Erik,

This sounds much more like it. I have had a look at the wiki and it sounds like 
a logical approach to UI customisation. Thank you for this

From: Erik Hatcher [erik.hatc...@gmail.com]
Sent: 11 February 2011 14:12
To: solr-user@lucene.apache.org
Subject: Re: Alternative to Solrj

Sounds like you just described the VelocityResponseWriter.  On trunk (or 3.x I 
believe), try out http://localhost:8983/solr/browse and look at what makes that 
tick.

Erik

On Feb 11, 2011, at 08:40 , McGibbney, Lewis John wrote:

 Hi list,

 I have been looking at an alternative UI config displaying retrieved results 
 from Solr after a query has been passed. At this point, I am not interested 
 in Solrj as all I wish to change is the default responseWriter (line 1007 of 
 Solrconfig). I've also noticed a snippet of default CSS code included in 
 /conf/xslt/example.xsl and understand that all response writers are located 
 in $SOLR_HOME/src/java/org/apache/solr/request and that the default is 
 XSLTResponseWriter.java.
 Basically I wish to keep code for the search UI as simple as possible 
 (ideally write a simple JSP and CSS ), however I now find that this 
 configuration is proving slightly more confusing in practice. My thinking is 
 as follows, write own responseWriter, include within it my CSS template then 
 specify the responseWriter in solrconfig along with the java class. Can 
 anyone advise me on this from their own experiences.

 Thank you

 Lewis

 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474

 Winner: Times Higher Education’s Widening Participation Initiative of the 
 Year 2009 and Herald Society’s Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

 Winner: Times Higher Education’s Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Email has been scanned for viruses by Altman Technologies' email management 
service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html


Re: Solr design decisions

2011-02-11 Thread Yonik Seeley
On Fri, Feb 11, 2011 at 10:47 AM, Bill Bell billnb...@gmail.com wrote:
 You could commit on a time schedule. Like every 5 mins. If there is nothing 
 to commit it doesn't do anything anyway.

It does do something!  A new searcher is opened and caches are invalidated, etc.
I'd recommend normally using commitWithin instead of explicitly
committing or using autocommit.

-Yonik
http://lucidimagination.com


Fatal error when posting to Solr

2011-02-11 Thread McGibbney, Lewis John
Hi list,

Was attempting to check out the VelocityResponseWriter before I progress with 
customising it for my own usage, I seem to have opened a can of worms when 
posting documents to Solr. Using simple post command I get the following output.

lewis@lewis-01:~/Downloads/apache-solr-1.4.1/example/exampledocs$ java -jar 
post.jar *.pdf
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, 
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file 
technical_handbook_2010_domestic_section_0_general.pdf
SimplePostTool: FATAL: Solr returned an error: 
Unexpected_character__code_37_in_prolog_expected___at_rowcol_unknownsource_11

In some projects (E.g. Nutch) I am aware that the distribution does not come 
with alll jar's and these are required to be downloaded separately, I know this 
is not the case with Solr though. I have also successfully committed a host of 
.pdf to Solr recently so I know that this is working fine. Checking my Solr 
logs nothing seems to be out of place!

Has anyone seen anything similar?

Thanks Lewis



Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html


Re: Faceting Query

2011-02-11 Thread Gora Mohanty
On Thu, Feb 10, 2011 at 12:00 PM, Isha Garg isha.g...@orkash.com wrote:
 Hi,
      What is the significance of copy field  when used in faceting . plz
 explain with example.

Not sure what you mean here. Could you provide details?

Regards,
Gora


Re: Faceting Query

2011-02-11 Thread Gora Mohanty
On Thu, Feb 10, 2011 at 12:21 PM, Isha Garg isha.g...@orkash.com wrote:
 What is facet.pivot field? PLz explain with example

Does http://wiki.apache.org/solr/SimpleFacetParameters#facet.pivot not help?

Regards,
Gora


Re: Fatal error when posting to Solr

2011-02-11 Thread Erik Hatcher
That's an incorrect way to POST PDF files (though maybe the latest work on 
post.jar makes it possible, but would require additional parameters).

In order to index PDF files, you'll need to script an iteration over all files 
and POST them in (or stream them however is most reasonable for your 
environment) using the techniques described here: 
http://wiki.apache.org/solr/ExtractingRequestHandler

Erik

On Feb 11, 2011, at 12:26 , McGibbney, Lewis John wrote:

 Hi list,
 
 Was attempting to check out the VelocityResponseWriter before I progress with 
 customising it for my own usage, I seem to have opened a can of worms when 
 posting documents to Solr. Using simple post command I get the following 
 output.
 
 lewis@lewis-01:~/Downloads/apache-solr-1.4.1/example/exampledocs$ java -jar 
 post.jar *.pdf
 SimplePostTool: version 1.2
 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, 
 other encodings are not currently supported
 SimplePostTool: POSTing files to http://localhost:8983/solr/update..
 SimplePostTool: POSTing file 
 technical_handbook_2010_domestic_section_0_general.pdf
 SimplePostTool: FATAL: Solr returned an error: 
 Unexpected_character__code_37_in_prolog_expected___at_rowcol_unknownsource_11
 
 In some projects (E.g. Nutch) I am aware that the distribution does not come 
 with alll jar's and these are required to be downloaded separately, I know 
 this is not the case with Solr though. I have also successfully committed a 
 host of .pdf to Solr recently so I know that this is working fine. Checking 
 my Solr logs nothing seems to be out of place!
 
 Has anyone seen anything similar?
 
 Thanks Lewis
 
 
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education’s Widening Participation Initiative of the 
 Year 2009 and Herald Society’s Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education’s Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html



Re: Monitor the QTime.

2011-02-11 Thread Gora Mohanty
On Fri, Feb 11, 2011 at 3:40 AM, Stijn Vanhoorelbeke
stijn.vanhoorelb...@gmail.com wrote:
 Hi,

 Is it possible to monitor the QTime of the queries.
 I know I could enable logging - but then all of my requests are logged,
 making bignasty logs.

 I just want to log the QTime periodically, lets say once every minute.
 Is this possible using Solr or can this be set up in tomcat anyway?

QTime is, of course, specific to the query, but it is returned in the
response XML, so one could run occasional queries to figure it out.
Please see http://wiki.apache.org/solr/SearchHandler

Regards,
Gora


solr1.4 replication question

2011-02-11 Thread Mike Franon
Hi,

I am fairly new to solr, and have setup two servers, one with master,
other as a slave.

I have a load balancer in front with 2 different VIP, one to do
gets/reads distributed evenly on the master and slave, and another VIP
to do posts/updates just to the master.  If the master fails I have
the second VIP to automatically update the slave.  But if that happens
is there a way to automatically switch whcih is master and which is
slave instead of going into solrconfig.xml and then restarting the
instances?

Any recommendations for the best way to set it up?

Thanks


help with dismax query

2011-02-11 Thread Tanner Postert
I'm having a problem using the dismax query for the term obsessed with
winning

http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0defType=dismaxqf=title
^10+description^4+text^1debugQuery=true

that query yields zero results, but removing the dismax stuff it works
fine:

http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0debugQuery=true

or even adding the mm=2 yields results, but mm=3 does not.

There is at least 1 record which contains the exact phrase 'obsessed with
winning' in the title as well as the description multiple times, yet when
the defType=dismax option is added, the query yields no results.  Am I
missing something?

Thanks in advance.


Re: help with dismax query

2011-02-11 Thread Erik Hatcher
Might with be a stop word removed by one of those qf fields?  That'd explain 
why mm=3 doesn't work, I think.

Erik

On Feb 11, 2011, at 15:43 , Tanner Postert wrote:

 I'm having a problem using the dismax query for the term obsessed with
 winning
 
 http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0defType=dismaxqf=title
 ^10+description^4+text^1debugQuery=true
 
 that query yields zero results, but removing the dismax stuff it works
 fine:
 
 http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0debugQuery=true
 
 or even adding the mm=2 yields results, but mm=3 does not.
 
 There is at least 1 record which contains the exact phrase 'obsessed with
 winning' in the title as well as the description multiple times, yet when
 the defType=dismax option is added, the query yields no results.  Am I
 missing something?
 
 Thanks in advance.



help with dismax query

2011-02-11 Thread Tanner Postert
I'm having a problem using the dismax query. For example: for the term
obsessed with winning I use:

http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0defType=dismaxqf=title
^10+description^4+text^1debugQuery=true

that query yields zero results, but removing the dismax stuff it works
fine:

http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0debugQuery=true

or even adding the mm=2 yields results, but mm=3 does not.

Looking at the discussion here:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg29682.html I see
that possibly sending the qf fields as separate values, rather than one may
yield better results but searching for:

http://localhost:8983/solr/core1/select?q=obsessed+with+winningfq=code:xyzshards=localhost:8983/solr/core1,localhost:8983/solr/core2,rows=10start=0defType=dismaxqf=title
^10qf=description^4qf=text^1debugQuery=true

yields no results either

There is at least 1 record which contains the exact phrase 'obsessed with
winning' in the title as well as the description and text (text is just a
copied field of title and description and couple of other fields). multiple
times, yet when the defType=dismax option is added, the query yields no
results.  Am I missing something?

Thanks in advance.


boosting results by a query?

2011-02-11 Thread Ryan McKinley
I have an odd need, and want to make sure I am not reinventing a wheel...

Similar to the QueryElevationComponent, I need to be able to move
documents to the top of a list that match a given query.

If there were no sort, then this could be implemented easily with
BooleanQuery (i think) but with sort it gets more complicated.  Seems
like I need:

  sortSpec.setSort( new Sort( new SortField[] {
new SortField( something that only sorts results in the boost query ),
new SortField( the regular sort )
  }));

Is there an existing FieldComparator I should look at?  Any other
pointers/ideas?

Thanks
ryan


Re: Solr design decisions

2011-02-11 Thread Bill Bell
Thanks. If you do 2 commits should it do anything? Are people using it to clear 
caches?



Bill Bell
Sent from mobile


On Feb 11, 2011, at 9:55 AM, Yonik Seeley yo...@lucidimagination.com wrote:

 On Fri, Feb 11, 2011 at 10:47 AM, Bill Bell billnb...@gmail.com wrote:
 You could commit on a time schedule. Like every 5 mins. If there is nothing 
 to commit it doesn't do anything anyway.
 
 It does do something!  A new searcher is opened and caches are invalidated, 
 etc.
 I'd recommend normally using commitWithin instead of explicitly
 committing or using autocommit.
 
 -Yonik
 http://lucidimagination.com


Re: boosting results by a query?

2011-02-11 Thread Sujit Pal
We are currently a Lucene shop, the way we do it (currently) is to have
these results come from a database table (where it is available in rank
order). We want to move to Solr, so what I plan on doing to replicate
this functionality is to write a custom request handler that will do the
database query and put the results on the top of the search results
before the SolrIndexSearcher is invoked.

-sujit

On Fri, 2011-02-11 at 16:31 -0500, Ryan McKinley wrote:
 I have an odd need, and want to make sure I am not reinventing a wheel...
 
 Similar to the QueryElevationComponent, I need to be able to move
 documents to the top of a list that match a given query.
 
 If there were no sort, then this could be implemented easily with
 BooleanQuery (i think) but with sort it gets more complicated.  Seems
 like I need:
 
   sortSpec.setSort( new Sort( new SortField[] {
 new SortField( something that only sorts results in the boost query ),
 new SortField( the regular sort )
   }));
 
 Is there an existing FieldComparator I should look at?  Any other
 pointers/ideas?
 
 Thanks
 ryan



Re: help with dismax query

2011-02-11 Thread Tanner Postert
looks like that might be the case, if I just do a search for with
including the dismax parameters, it returns no results, as opposed to a
search for 'obsessed' does return results. Is there any way I can get around
this behavior? or do I have something configured wrong?


 Might with be a stop word removed by one of those qf fields?  That'd explain
 why mm=3 doesn't work, I think.

 Erik




Re: Monitor the QTime.

2011-02-11 Thread Stijn Vanhoorelbeke
 QTime is, of course, specific to the query, but it is returned in the
 response XML, so one could run occasional queries to figure it out.
 Please see http://wiki.apache.org/solr/SearchHandler

 Regards,
 Gora


Yes, this could be a possibility. But then the Solr cache jumps back into
the picture.
I cannot simply query the system each minute with the same query - that way
the result would be completely satisfied by the internal caches. I could
build a list of heavy queries to do so - but I'd loved to use a more
straight forward method.


Re: help with dismax query

2011-02-11 Thread Tanner Postert
I think I found the answer here:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg04433.html

http://www.mail-archive.com/solr-user@lucene.apache.org/msg04433.htmlI
think the title and description fields did not have the stopword filter
applied to it, so it was causing an error. When I took off the qf=title 
qf=description fields the results works. I am rebuilding my indexes now.

On Fri, Feb 11, 2011 at 3:20 PM, Tanner Postert tanner.post...@gmail.comwrote:

 looks like that might be the case, if I just do a search for with
 including the dismax parameters, it returns no results, as opposed to a
 search for 'obsessed' does return results. Is there any way I can get around
 this behavior? or do I have something configured wrong?

 Might with be a stop word removed by one of those qf fields?  That'd 
 explain
 why mm=3 doesn't work, I think.

 Erik




Re: Monitor the QTime.

2011-02-11 Thread Stijn Vanhoorelbeke
2011/2/11 Ryan McKinley ryan...@gmail.com

 You may want to check the stats via JMX.  For example,


 http://localhost:8983/solr/core/admin/mbeans?stats=truekey=org.apache.solr.handler.StandardRequestHandler

 shows some basic stats info for the handler.
 ryan


Can you access this URL from a web browser (tried but doesn't work ) ? Or
must this used in jConsole / custom made java program.

Could you please point me to a good guide to implement this JMX stuff, cause
I'm a newbie for JMX.


more like this

2011-02-11 Thread lee carroll
Hi a MLT query with a q parameter which returns multiple matches such as

q=id:45 id:34
id:54mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name

seems to return the results of three seperate mlt queries ie

q=id:45 mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name
+
q=id:34 mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name
+
q=id:54mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name

rather than a combined similarity of all three

Is this becuase field1 is not storing term verctors ?

How best to achive a combined similarity mlt ?


Detailed Steps for Scaling Solr

2011-02-11 Thread Bing Li
Dear all,

I need to construct a site which supports searching for a large index. I
think scaling Solr is required. However, I didn't get a tutorial which helps
me do that step by step. I only have two resources as references. But both
of them do not tell me the exact operations.

1)
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

2) David Smiley, Eric Pugh; Solr 1.4 Enterprise Search Server

If you have experiences to scale Solr, could you give me such tutorials?

Thanks so much!
LB