Re: Query foreign language synonyms / words of equivalent meaning?

2012-10-10 Thread Bernd Fehling

As far as I know, there is no built-in functionality for language translation.
I would propose to write one, but there are many many pitfalls.
If you want to translate from one language to another you might have to
know the starting language. Otherwise you get problems with translation.

Not (german) - distress (english), affliction (english)

- you might have words in one language which are stopwords in other language 
not
- you don't have a one to one mapping, it's more like 1 to n+x
  toilette (french) - bathroom, rest room / restroom, powder room

This are just two points which jump into my mind but there are tons of pitfalls.

We use the solution of a multilingual thesaurus as synonym dictionary.
http://en.wikipedia.org/wiki/Eurovoc
It holds translations of 22 official languages of the European Union.

So a search for europäischer währungsfonds gives also results with
european monetary fund, fonds monétaire européen, ...

Regards
Bernd



Am 10.10.2012 04:54, schrieb onlinespend...@gmail.com:
 Hi,
 
 English is going to be the predominant language used in my documents, but
 there may be a spattering of words in other languages (such as Spanish or
 French). What I'd like is to initiate a query for something like bathroom
 for example and for Solr to return documents that not only contain
 bathroom but also baño (Spanish). And the same goes when searching for 
 baño. I'd like Solr to return documents that contain either bathroom or 
 baño.
 
 One possibility is to pre-translate all indexed documents to a common
 language, in this case English. And if someone were to search using a
 foreign word, I'd need to translate that to English before issuing a query
 to Solr. This appears to be problematic, since I'd have to know whether the
 indexed words and the query are even in a foreign language, which is not
 trivial.
 
 Another possibility is to pre-build a list of foreign word synonyms. So baño
 would be listed as a synonym for bathroom. But I'd need to include other
 languages (such as toilette in French) and other words. This requires that
 I know in advance all possible words I'd need to include foreign language
 versions of (not to mention needing to know which languages to include).
 This isn't trivial either.
 
 I'm assuming there's no built-in functionality that supports the foreign
 language translation on the fly, so what do people propose?
 
 Thanks!
 

-- 
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)LibTec - Bibliothekstechnologie
Universitätsstr. 25 und Wissensmanagement
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Solrcloud dataimport failed at first time after restart

2012-10-10 Thread jun Wang
I have found the reason.  The reason is that I am using jboss JNDI
datasource, and oracle driver is placed in WEB-INFO/lib, this is a very
common error, driver should be placed in %JBOSS_HOME%\server\default\lib.

2012/10/10 jun Wang wangjun...@gmail.com

 Hi, all
 I found that dataimport will failed at first time after restart. and the
 log is here . It's seem like a bug.

 2012-10-09 20:00:08,848 ERROR dataimport.DataImporter - Full Import
 failed:java.lang.RuntimeException: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query: select a.id, a.subject, a.keywords, a.category_id,
 to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60)
 as gmt_modified,a.member_seq,b.standard_attr_desc,
 b.custom_attr_desc, decode(a.product_min_price, null, 0,
 a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) +
 1 as is_offlinefrom ws_product_draft a,
 ws_product_attribute_draft bwhere a.id =
 b.product_id(+) Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
 at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
 at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
 at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
 Caused by: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query: select a.id, a.subject, a.keywords, a.category_id,
 to_number((a.gmt_modified - to_date('1970-01-01','-mm-dd'))*24*60*60)
 as gmt_modified,a.member_seq,b.standard_attr_desc,
 b.custom_attr_desc, decode(a.product_min_price, null, 0,
 a.product_min_price)/100 as min_price, sign(a.ws_offline_date - sysdate) +
 1 as is_offlinefrom ws_product_draft a,
 ws_product_attribute_draft bwhere a.id =
 b.product_id(+) Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
 at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
 ... 3 more
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
 Unable to execute query: select a.id, a.subject, a.keywords,
 a.category_id, to_number((a.gmt_modified -
 to_date('1970-01-01','-mm-dd'))*24*60*60) as gmt_modified,a.member_seq,
b.standard_attr_desc, b.custom_attr_desc,
 decode(a.product_min_price, null, 0, a.product_min_price)/100 as min_price,
 sign(a.ws_offline_date - sysdate) + 1 as is_offline
  from ws_product_draft a, ws_product_attribute_draft b
where a.id = b.product_id(+) Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:252)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
 at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
 ... 5 more
 Caused by: java.lang.ClassNotFoundException: Unable to load null or
 org.apache.solr.handler.dataimport.null
 at
 org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:899)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:159)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:362)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:239)
 ... 12 more
 Caused by: java.lang.NullPointerException
 at
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
 at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:387)
 at
 org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:889)
 ... 17 more



 --
 from Jun 

Re: search by multiple 'LIKE' operator connected with 'AND' operator

2012-10-10 Thread gremlin
I'm also unable to config that type of search through schema.xml. As I use
SOLR in drupal, I've implement that in hook_search_api_solr_query_alter by
exploding my search string on two (or more) chunks and now search works
well.

Strangely that couldn'y do it through SOLR configuration.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-by-multiple-LIKE-operator-connected-with-AND-operator-tp4012536p4012861.html
Sent from the Solr - User mailing list archive at Nabble.com.


Form too large error in SOLR4.0

2012-10-10 Thread ravicv
Hi,

Recently we have upgraded solr 1.4 version to 4.0 version. After upgrading
we are experiencing unusual behavior in SOLR4.0. 
The same query is working properly in solr 1.4 and it is throwing SEVERE:
null:java.lang.IllegalStateException: Form too large161138720 error in
solr4.0.

I have increased maxFormContentSize value in jetty.xml
Call name=setAttribute
  Argorg.eclipse.jetty.server.Request.maxFormContentSize/Arg
  Arg50/Arg
/Call

But still i am facing same issue.

Can some one please help me to resolve this issue.

Full Stack trace:

Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.IllegalStateException: Form too large161138720
at
org.eclipse.jetty.server.Request.extractParameters(Request.java:279)
at
org.eclipse.jetty.server.Request.getParameterMap(Request.java:705)
at
org.apache.solr.request.ServletSolrParams.init(ServletSolrParams.java:29)
at
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:394)
at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
at java.lang.Thread.run(Thread.java:662)

Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Server at
http://localhost:8983/solr/core0 returned non ok status:500, message:Server
Error
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
at org.apache.solr.handler.component.HttpShardHandler$1.call(Htt

Thanks,
Ravi




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Form-too-large-error-in-SOLR4-0-tp4012868.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread O. Klein
Is there some way to supplement the DirectSolrSpellChecker with a dictionary?

(In some cases terms are not used because of threshold, but should be
offered as spellcheck suggestion)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-additional-dictionary-with-DirectSolrSpellChecker-tp4012873.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr - Make Exact Search on Field with Fuzzy Query

2012-10-10 Thread meghana
 0 down vote favorite


We are using solr 3.6.

We have field named Description. We want searching feature with stemming and
also without stemming (exact word/phrase search), with highlighting in both
.

For that , we had made lot of research and come to conclusion, to use the
copy field with data type which doesn't have stemming factory. it is working
fine at now.

(main field has stemming and copy field has not.)

The data for that field is very large and we are having millions of
documents; and as we want, both searching and highlighting on them; we need
to keep this copy field stored and indexed both. which will increase index
size a lot.

we need to eliminate this duplication if possible any how.

From the recent research, we read that combining fuzzy search with dismax
will fulfill our requirement. (we have tried a bit but not getting success.)

Please let me know , if this is possible, or any other solutions to make
this happen.

Thanks in Advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Installing Solr on a shared hosting server?

2012-10-10 Thread simon
some time back I used dreamhost for a Solr based project. Looks as though
all their offerings, including shared  hosting have Java support - see
http://wiki.dreamhost.com/What_We_Support. I was very happy with their
service and support.

-Simon

On Tue, Oct 9, 2012 at 10:44 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Bluehost doesn't seem to support Java processes, so unfortunately the
 answer seems to be no.

 You might want to look into getting a Linode or some other similar VPS
 hosting. Solr needs RAM to function well, though, so you're not going
 to be able to go with the cheapest option.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Tue, Oct 9, 2012 at 9:27 AM, caiod ca...@me.com wrote:
  I was wondering if I can install Solr on bluehost's shared hosting to
 use as
  a website search, and also how do I do so? Thank you...
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Installing-Solr-on-a-shared-hosting-server-tp4012708.html
  Sent from the Solr - User mailing list archive at Nabble.com.



RE: Wild card searching - well sort of

2012-10-10 Thread Markus Jelsma
Hi - The WordDelimiterFilter can help you get *-BAAN-* for A100-BAAN-C20 but 
only because BAAN is surrounded with characters the filter splits and combines 
upon.
 
-Original message-
 From:Kissue Kissue kissue...@gmail.com
 Sent: Wed 10-Oct-2012 14:20
 To: solr-user@lucene.apache.org
 Subject: Wild card searching - well sort of
 
 Hi,
 
 I am wondering if there is a way i can get Solr to do this:
 
 I have added the string: *-BAAN-* to the index to a field called pattern
 which is a string type. Now i want to be able to search for A100-BAAN-C20
 or ZA20-BAAN-300 and have Solr return *-BAAN-*.
 
 Any ideas how i can accomplish something like this? I am currently using
 Solr 3.5 with solrJ.
 
 Thanks.
 


Re: Form too large error in SOLR4.0

2012-10-10 Thread Otis Gospodnetic
Hi,

Check jetty configs,  this looks like an error from the container.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 10, 2012 4:50 AM, ravicv ravichandra...@gmail.com wrote:

 Hi,

 Recently we have upgraded solr 1.4 version to 4.0 version. After upgrading
 we are experiencing unusual behavior in SOLR4.0.
 The same query is working properly in solr 1.4 and it is throwing SEVERE:
 null:java.lang.IllegalStateException: Form too large161138720 error in
 solr4.0.

 I have increased maxFormContentSize value in jetty.xml
 Call name=setAttribute
   Argorg.eclipse.jetty.server.Request.maxFormContentSize/Arg
   Arg50/Arg
 /Call

 But still i am facing same issue.

 Can some one please help me to resolve this issue.

 Full Stack trace:

 Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.IllegalStateException: Form too large161138720
 at
 org.eclipse.jetty.server.Request.extractParameters(Request.java:279)
 at
 org.eclipse.jetty.server.Request.getParameterMap(Request.java:705)
 at
 org.apache.solr.request.ServletSolrParams.init(ServletSolrParams.java:29)
 at

 org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:394)
 at

 org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
 at

 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
 at

 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
 at

 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
 at

 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
 at

 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
 at

 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
 at org.eclipse.jetty.server.Server.handle(Server.java:351)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
 at

 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
 at

 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
 at java.lang.Thread.run(Thread.java:662)

 Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Server at
 http://localhost:8983/solr/core0 returned non ok status:500,
 message:Server
 Error
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
 at org.apache.solr.handler.component.HttpShardHandler$1.call(Htt

 Thanks,
 Ravi




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Form-too-large-error-in-SOLR4-0-tp4012868.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Form too large error in SOLR4.0

2012-10-10 Thread Jack Krupansky
1611387 is 1,611,387 which is clearly greater than your revised limit of 
50 = 500,000.


Try setting the limit to 2,000,000 = 200. Or maybe even 5,000,000 = 
500.


-- Jack Krupansky

-Original Message- 
From: ravicv

Sent: Wednesday, October 10, 2012 4:49 AM
To: solr-user@lucene.apache.org
Subject: Form too large error in SOLR4.0

Hi,

Recently we have upgraded solr 1.4 version to 4.0 version. After upgrading
we are experiencing unusual behavior in SOLR4.0.
The same query is working properly in solr 1.4 and it is throwing SEVERE:
null:java.lang.IllegalStateException: Form too large161138720 error in
solr4.0.

I have increased maxFormContentSize value in jetty.xml
   Call name=setAttribute
 Argorg.eclipse.jetty.server.Request.maxFormContentSize/Arg
 Arg50/Arg
   /Call

But still i am facing same issue.

Can some one please help me to resolve this issue.

Full Stack trace:

Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.IllegalStateException: Form too large161138720
   at
org.eclipse.jetty.server.Request.extractParameters(Request.java:279)
   at
org.eclipse.jetty.server.Request.getParameterMap(Request.java:705)
   at
org.apache.solr.request.ServletSolrParams.init(ServletSolrParams.java:29)
   at
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:394)
   at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
   at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
   at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
   at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
   at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
   at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
   at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
   at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
   at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
   at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
   at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
   at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
   at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
   at org.eclipse.jetty.server.Server.handle(Server.java:351)
   at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
   at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
   at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
   at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
   at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
   at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
   at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
   at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
   at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
   at java.lang.Thread.run(Thread.java:662)

Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Server at
http://localhost:8983/solr/core0 returned non ok status:500, message:Server
Error
   at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
   at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
   at org.apache.solr.handler.component.HttpShardHandler$1.call(Htt

Thanks,
Ravi




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Form-too-large-error-in-SOLR4-0-tp4012868.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Questions about query times

2012-10-10 Thread Yuval Dotan
OK so I solved the question about the query that returns no results and
still takes time - I needed to add the facet.mincount=1 parameter and this
reduced the time to 200-300 ms instead of seconds.

I still could't figure out why a query that returns very few results (like
query number 2) still takes seconds to return even with
the facet.mincount=1 parameter.
I couldn't understand why the facet pivot takes so much time on 299 docs.

Does anyone have any idea?

Example Query:

(2)
q=*:*fq=(trimTime:[2012-09-04T15:23:48Z TO *])fq=(Severity:(High
Critical))fq=(trimTime:[2012-09-04T15:23:48Z TO
*])fq=(Confidence_Level:(N/A)) OR (Confidence_Level:(Medium-High)) OR
(Confidence_Level:(High))f.product.facet.sort=indexf.product.facet.limit=-1f.Severity.facet.sort=indexf.Severity.facet.limit=-1f.trimTime.facet.sort=indexf.trimTime.facet.limit=-1facet=truef.product.facet.method=enumfacet.pivot=product,Severity,trimTime

NumFound: 299

Times(ms):
Qtime: 2,756 Query: 307 Facet: 2,449

On Thu, Sep 20, 2012 at 5:24 PM, Yuval Dotan yuvaldo...@gmail.com wrote:

 Hi,

 We have a system that inserts logs continuously (real-time).
 We have been using the Solr facet pivot feature for querying and have been
 experiencing slow query times and we were hoping to gain some insights with
 your help.
 schema and solrconfig are attached

 Here are our questions (data below):

1. Why is facet time so long in (3) and (5) - in cases where there are
0 or very few results?
2. We ran two queries that are only differ in the time limit (for the
second query - time range is very small) - we got the same time for both
queries although the second one returned very few results - again why is
that?
3. Is there a way to improve pivot facet time?

 System Data:

 Index size: 63 GB
 RAM:4Gb
 CPU: 2 x Xeon E5410 2.33GHz
 Num of Documents: 109,278,476


 query examples:

 -
 (1)
 Query:
 q=*:*fq=(trimTime:[2012-09-04T14:29:24Z TO
 *])fq=(trimTime:[2012-09-04T14:29:24Z TO
 *])f.product.facet.sort=indexf.product.facet.limit=-1f.Severity.facet.sort=indexf.Severity.facet.limit=-1f.trimTime.facet.sort=indexf.trimTime.facet.limit=-1facet=truef.product.facet.method=enumfacet.pivot=product,Severity,trimTime

 NumFound:
 11,407,889

 Times (ms):
 Qtime: 3,239 Query: 353 Facet: 2,885
 -

 (2)
 Query:
 q=*:*fq=(trimTime:[2012-09-04T15:23:48Z TO *])fq=(Severity:(High
 Critical))fq=(trimTime:[2012-09-04T15:23:48Z TO
 *])fq=(Confidence_Level:(N/A)) OR (Confidence_Level:(Medium-High)) OR
 (Confidence_Level:(High))f.product.facet.sort=indexf.product.facet.limit=-1f.Severity.facet.sort=indexf.Severity.facet.limit=-1f.trimTime.facet.sort=indexf.trimTime.facet.limit=-1facet=truef.product.facet.method=enumfacet.pivot=product,Severity,trimTime

 NumFound: 299

 Times(ms):
 Qtime: 2,756 Query: 307 Facet: 2,449

 -
 (3)
 Query:
 q=*:*fq=(trimTime:[2012-09-11T12:55:00Z TO *])fq=(Severity:(High
 Critical))fq=(trimTime:[2012-09-04T15:23:48Z TO
 *])fq=(Confidence_Level:(N/A)) OR (Confidence_Level:(Medium-High)) OR
 (Confidence_Level:(High))f.product.facet.sort=indexf.product.facet.limit=-1f.Severity.facet.sort=indexf.Severity.facet.limit=-1f.trimTime.facet.sort=indexf.trimTime.facet.limit=-1facet=truef.product.facet.method=enumfacet.pivot=product,Severity,trimTime

 NumFound: 7

 Times(ms):
 Qtime: 2,798 Query: 312 Facet: 2,485

 -
 (4)
 Query:
 q=*:*fq=(trimTime:[2012-09-04T15:43:16Z TO
 *])fq=(trimTime:[2012-09-04T15:43:16Z TO *])fq=(product:(Application
 Control)) OR (product:(URL
 Filtering))f.appi_name.facet.sort=indexf.appi_name.facet.limit=-1f.app_risk.facet.sort=indexf.app_risk.facet.limit=-1f.matched_category.facet.sort=indexf.matched_category.facet.limit=-1f.trimTime.facet.sort=indexf.trimTime.facet.limit=-1facet=truef.appi_name.facet.method=enumfacet.pivot=appi_name,app_risk,matched_category,trimTimeexf.trimTime.facet.limit=-1facet=truef.product.facet.method=enumfacet.pivot=product,Severity,trimTime

 NumFound: more than 30M

 Times(ms): Qtime: 23,288
 -

 (5)
 Query:
 q=*:*fq=(trimTime:[2012-09-05T06:03:55Z TO *])fq=(Severity:(High
 Critical))fq=(trimTime:[2012-09-05T06:03:55Z TO *])fq=(product:(IPS))
 OR (product:(SmartDefense))fq=(action:(Detect)) OR
 

Re: Wild card searching - well sort of

2012-10-10 Thread Jack Krupansky
1. What is your specific motivation for wanting to do this? (Sounds like yet 
another XY problem!)
2. What specific rules are you expecting to use for synthesis of patterns 
from the raw data?


For the latter, do you expect to index hand-coded specific patterns to be 
returned or do you have some sort of machine learning method in mind that 
will generate the patterns by examining all of the values?


-- Jack Krupansky

-Original Message- 
From: Kissue Kissue

Sent: Wednesday, October 10, 2012 8:15 AM
To: solr-user@lucene.apache.org
Subject: Wild card searching - well sort of

Hi,

I am wondering if there is a way i can get Solr to do this:

I have added the string: *-BAAN-* to the index to a field called pattern
which is a string type. Now i want to be able to search for A100-BAAN-C20
or ZA20-BAAN-300 and have Solr return *-BAAN-*.

Any ideas how i can accomplish something like this? I am currently using
Solr 3.5 with solrJ.

Thanks. 



Re: Solr - Make Exact Search on Field with Fuzzy Query

2012-10-10 Thread Erick Erickson
There's nothing really built in to Solr to allow this. Are you
absolutely sure you can't just use the copyfield? Have you
actually tried it?

But I don't think you need to store the contents twice. Just
store it once and always highlight on that field whether you
search it or not. Since it's the raw text, you should be fine.
You'll have two versions of the field tokenized of course, but
that should take less space than you might think. You
probably want to store the version with the stemming turned on...

That said, storing twice only uses up some disk space, it
doesn't require additional memory for searching. So unless
you're running out of disk space you can just keep two stored
versions around.

But

If none of that works you might write a custom filter that
emits two tokens for each input token at indexing
time, similar to what synonyms do. The original should
have some special character appended, say $ and the
second should be the results of stemming (note, there
will be two tokens even if there is no stemming done).
So, indexing running would index running$ and run.
Now, when you need to search for an exact match on
running, you search for running$.

This works for the reverse too. Since the rule is append
$ to all original tokens run gets indexed as run$ and run.
Now, searching for run matches as does run$. But
run$ does not match the doc that had running since the two
tokens emitted in that case are run and running$.

But look at what's happened here. You're indexing two tokens
for every one token in the input. Furthermore, you're adding
a bunch of unique tokens to the index. It's hard to see how this
results in any savings over just using copyField. You have
to index the two tokens since you have to distinguish between
the stemmed and un-stemmed version.

You might be able to do something really exotic with payloads.
This is _really_ out of left field, but it just occurred to me. You'd
have to define a transformation from the original word into the
stemmed word that created a unique value. Something like
no stemming - 0
removing ing - 1
removing s- 2

etc. Actually, this would have to be some kind of function on the
letters removed so that removing ing mapped to, say,
the ordinal position of the letter in the alphabet * position * 100. So
ing would map to 'i' - 'a' + ('n' - 'a') * 100 + ('g' - 'a') * 1 etc...
(you'd have to take considerable care to get this right for any
code sets that had more than 100 possible code points)...
Now, you've included the information about what the original
word was and could use the payload to fail to match in the
exact-match case. Of course the other issue would be to figure
out the syntax to get the fact that you wanted an exact match
down into your custom scorer.

But as you can see, any scheme is harder than just flipping a switch,
so I'd _really_ verify that you can't just use copyField

Best
Erick

On Wed, Oct 10, 2012 at 7:38 AM, meghana meghana.rav...@amultek.com wrote:
  0 down vote favorite


 We are using solr 3.6.

 We have field named Description. We want searching feature with stemming and
 also without stemming (exact word/phrase search), with highlighting in both
 .

 For that , we had made lot of research and come to conclusion, to use the
 copy field with data type which doesn't have stemming factory. it is working
 fine at now.

 (main field has stemming and copy field has not.)

 The data for that field is very large and we are having millions of
 documents; and as we want, both searching and highlighting on them; we need
 to keep this copy field stored and indexed both. which will increase index
 size a lot.

 we need to eliminate this duplication if possible any how.

 From the recent research, we read that combining fuzzy search with dismax
 will fulfill our requirement. (we have tried a bit but not getting success.)

 Please let me know , if this is possible, or any other solutions to make
 this happen.

 Thanks in Advance




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: segment number during optimize of index

2012-10-10 Thread jame vaalet
Guys,
thanks for all the inputs, I was continuing my research to know more about
segments in Lucene. Below are my conclusion, please correct me if am wrong.

   1. Segments are independent sub-indexes in seperate file, while indexing
   its better to create new segment as it doesnt have to modify an existing
   file. where as while searching, smaller the segment the better it is since
   you open x (not exactly x but xn a value proportional to x) physical files
   to search if you have got x segments in the index.
   2. since lucene has memory map concept, for each file/segment in index a
   new m-map file is created and mapped to the physcial file in disk. Can
   someone explain or correct this in detail, i am sure there are lot many
   people wondering how m-map works while you merge or optimze index segments.



On 6 October 2012 07:41, Otis Gospodnetic otis.gospodne...@gmail.comwrote:

 If I were you and not knowing all your details...

 I would optimize indices that are static (not being modified) and
 would optimize down to 1 segment.
 I would do it when search traffic is low.

 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html


 On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet jamevaa...@gmail.com wrote:
  Hi Eric,
  I  am in a major dilemma with my index now. I have got 8 cores each
 around
  300 GB in size and half of them are deleted documents in it and above
 that
  each has got around 100 segments as well. Do i issue a expungeDelete and
  allow the merge policy to take care of the segments or optimize them into
  single segment. Search performance is not at par compared to usual solr
  speed.
  If i have to optimize what segment number should i choose? my RAM size
  around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas
  advice !
 
  thanks.
 
 
  On 6 October 2012 00:00, Erick Erickson erickerick...@gmail.com wrote:
 
  because eventually you'd run out of file handles. Imagine a
  long-running server with 100,000 segments. Totally
  unmanageable.
 
  I think shawn was emphasizing that RAM requirements don't
  depend on the number of segments. There are other
  resources that file consume however.
 
  Best
  Erick
 
  On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet jamevaa...@gmail.com
 wrote:
   hi Shawn,
   thanks for the detailed explanation.
   I have got one doubt, you said it doesn matter how many segments index
  have
   but then why does solr has this merge policy which merges segments
   frequently?  why can it leave the segments as it is rather than
 merging
   smaller one's into bigger one?
  
   thanks
   .
  
   On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org wrote:
  
   On 10/4/2012 3:22 PM, jame vaalet wrote:
  
   so imagine i have merged the 150 Gb index into single segment, this
  would
   make a single segment of 150 GB in memory. When new docs are
 indexed it
   wouldn't alter this 150 Gb index unless i update or delete the older
  docs,
   right? will 150 Gb single segment have problem with memory swapping
 at
  OS
   level?
  
  
   Supplement to my previous reply:  the real memory mentioned in the
 last
   paragraph does not include the memory that the OS uses to cache disk
   access.  If more memory is needed and all the free memory is being
 used
  by
   the disk cache, the OS will throw away part of the disk cache (a
   near-instantaneous operation that should never involve disk I/O) and
  give
   that memory to the application that requests it.
  
   Here's a very good breakdown of how memory gets used with
 MMapDirectory
  in
   Solr.  It's applicable to any program that uses memory mapping, not
 just
   Solr:
  
  
 http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory
  http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory
  
   Thanks,
   Shawn
  
  
  
  
   --
  
   -JAME
 
 
 
 
  --
 
  -JAME




-- 

-JAME


Re: Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread O. Klein
I don't want to tweak the threshold. For majority of cases it works fine.

It's for cases where term has low frequency but is spelled correctly.

If you lower the threshold you would also get incorrect spelled terms as
suggestions.


Robert Muir wrote
 These thresholds are adjustable: read the javadocs and tweak them.
 
 On Wed, Oct 10, 2012 at 5:59 AM, O. Klein lt;

 klein@

 gt; wrote:
 Is there some way to supplement the DirectSolrSpellChecker with a
 dictionary?

 (In some cases terms are not used because of threshold, but should be
 offered as spellcheck suggestion)



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-additional-dictionary-with-DirectSolrSpellChecker-tp4012873.html
 Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-additional-dictionary-with-DirectSolrSpellChecker-tp4012873p4012908.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wild card searching - well sort of

2012-10-10 Thread Toke Eskildsen
On Wed, 2012-10-10 at 14:15 +0200, Kissue Kissue wrote:
 I have added the string: *-BAAN-* to the index to a field called pattern
 which is a string type. Now i want to be able to search for A100-BAAN-C20
 or ZA20-BAAN-300 and have Solr return *-BAAN-*.

That sounds a lot like the problem presented in the thread 
Indexing wildcard patterns:
http://web.archiveorange.com/archive/v/AAfXfcuIJY9BQJL3mjty

The short answer is no, Solr does not support this in the general form.
But maybe you can make it work anyway. In your example, the two queries
A100-BAAN-C20 and ZA20-BAAN-300 share the form 
[4 random characters]-[4 significant characters]-[3 random characters]
so a little bit of pre-processing would rewrite that to 
*-[4 significant characters]-*
which would match *-BAAN-*

If you describe the patterns and common elements to your indexed terms
and to your queries, we might come up with something.



Re: Synonym Filter: Removing all original tokens, retain matched synonyms

2012-10-10 Thread Jack Krupansky
The synonym filter does set the type attribute to TYPE_SYNONYM for 
synonyms, so you could write your own token filter that keeps only tokens 
with that type.


Try the Solr Admin analysis page to see how various terms are analyzed by 
the synonym filter. It will show TYPE_SYNONYM.


-- Jack Krupansky

-Original Message- 
From: Daniel Rosher

Sent: Wednesday, October 10, 2012 8:34 AM
To: solr-user@lucene.apache.org
Subject: Synonym Filter: Removing all original tokens, retain matched 
synonyms


Hi,

Is there a way to do this?

Token_Input:
the fox jumped over the lazy dog

Synonym_Map:
fox = vulpes
dog = canine

Token_Output:
vulpes canine

So remove all tokens, but retain those matched against the synonym map

Cheers,
Dan 



Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Sami Siren
On Wed, Oct 10, 2012 at 12:02 AM, Briggs Thompson
w.briggs.thomp...@gmail.com wrote:
 *Sami*
 The client IS
 instantiated only once and not for every request. I was curious if this was
 part of the problem. Do I need to re-instantiate the object for each
 request made?

No, it is expensive if you instantiate the client every time.

When the client seems to be hanging, can you still access the Solr
instance normally and execute updates/searches from other clients?

--
 Sami Siren


Re: Synonym Filter: Removing all original tokens, retain matched synonyms

2012-10-10 Thread Ahmet Arslan

 Token_Input:
 the fox jumped over the lazy dog
 
 Synonym_Map:
 fox = vulpes
 dog = canine
 
 Token_Output:
 vulpes canine
 
 So remove all tokens, but retain those matched against the
 synonym map

May be you can make use of  
http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/KeepWordFilterFactory.html.

You need to copy entries (vulpes, canine) from synonym.txt into keepwords.txt 
file.


Re: Synonym Filter: Removing all original tokens, retain matched synonyms

2012-10-10 Thread Daniel Rosher
Ah ha .. good thinking ... thanks!

Dan

On Wed, Oct 10, 2012 at 2:39 PM, Ahmet Arslan iori...@yahoo.com wrote:


  Token_Input:
  the fox jumped over the lazy dog
 
  Synonym_Map:
  fox = vulpes
  dog = canine
 
  Token_Output:
  vulpes canine
 
  So remove all tokens, but retain those matched against the
  synonym map

 May be you can make use of
 http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/KeepWordFilterFactory.html
 .

 You need to copy entries (vulpes, canine) from synonym.txt into
 keepwords.txt file.



Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Briggs Thompson
There are other updates that happen on the server that do not fail, so the
answer to your question is yes.

On Wed, Oct 10, 2012 at 8:12 AM, Sami Siren ssi...@gmail.com wrote:

 On Wed, Oct 10, 2012 at 12:02 AM, Briggs Thompson
 w.briggs.thomp...@gmail.com wrote:
  *Sami*
  The client IS
  instantiated only once and not for every request. I was curious if this
 was
  part of the problem. Do I need to re-instantiate the object for each
  request made?

 No, it is expensive if you instantiate the client every time.

 When the client seems to be hanging, can you still access the Solr
 instance normally and execute updates/searches from other clients?

 --
  Sami Siren



Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Sami Siren
On Wed, Oct 10, 2012 at 5:36 PM, Briggs Thompson
w.briggs.thomp...@gmail.com wrote:
 There are other updates that happen on the server that do not fail, so the
 answer to your question is yes.

The other updates are using solrj or something else?

It would be helpful if you could prepare a simple java program that
uses solrj to demonstrate the problem. Based on the available
information it is really difficult try to guess what's happening.

--
 Sami Siren


Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Briggs Thompson
They are both SolrJ.

What is happening is I have a batch indexer application that does a full
re-index once per day. I also have an incremental indexer that takes
items off a queue when they are updated.

The problem only happens when both are running at the same time - they also
run from the same machine. I am going to dig into this today and see what I
find - I didn't get around to it yesterday.

Question: I don't seem to see a StreamingUpdateSolrServer object on the 4.0
beta. I did see the ConcurrentUpdateSolrServer - this seems like a similar
choice. Is this correct?

On Wed, Oct 10, 2012 at 9:43 AM, Sami Siren ssi...@gmail.com wrote:

 On Wed, Oct 10, 2012 at 5:36 PM, Briggs Thompson
 w.briggs.thomp...@gmail.com wrote:
  There are other updates that happen on the server that do not fail, so
 the
  answer to your question is yes.

 The other updates are using solrj or something else?

 It would be helpful if you could prepare a simple java program that
 uses solrj to demonstrate the problem. Based on the available
 information it is really difficult try to guess what's happening.

 --
  Sami Siren



Unique terms without faceting

2012-10-10 Thread Phil Hoy
Hi,

I know that you can use a facet query to get the unique terms for a field 
taking account of any q or fq parameters but for our use case the counts are 
not needed. So is there a more efficient way of finding  just unique terms for 
a field?

Phil



Re: Unique terms without faceting

2012-10-10 Thread Jack Krupansky

The Solr TermsComponent:

http://wiki.apache.org/solr/TermsComponent

-- Jack Krupansky

-Original Message- 
From: Phil Hoy

Sent: Wednesday, October 10, 2012 11:45 AM
To: solr-user@lucene.apache.org
Subject: Unique terms without faceting

Hi,

I know that you can use a facet query to get the unique terms for a field 
taking account of any q or fq parameters but for our use case the counts are 
not needed. So is there a more efficient way of finding  just unique terms 
for a field?


Phil



Re: Faceted search question (Tokenizing)

2012-10-10 Thread Grapes
Here is another simpler example of what I am trying to achieve:

Multi-Valued Field 1:
Data 1
Data 2
Data 3
Data 4

Multi-Valued Field 2:
Data 11
Data 12
Data 13
Data 14

Multi-Valued Field 3:
Data 21
Data 22
Data 23
Data 24


How can I specify that Data 1,Data 11 and data 21 are all related? And if I
facet Data 1 + Data 11, I only want to see Data 21.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceted-search-question-Tokenizing-tp4012948p4012956.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with delete by query in Solr 4.0 beta

2012-10-10 Thread Andrew Groh
I cannot seem to get delete by query working in my simple setup in Solr 4.0 
beta.

I have a single collection and I want to delete old documents from it.  There 
is a single solr node in the config (no replication, not distributed). This is 
something that I previously did in Solr 3.x

My collection is called dine, so I do:

curl  http://localhost:8080/solr/dine/update; -s -H 'Content-type:text/xml; 
charset=utf-8' -d deletequerytimestamp_dt:[2012-09-01T00:00:00Z TO 
2012-09-27T00:00:00Z]/query/delete

and then a commit.

The problem is that the documents are not delete.  When I run the same query in 
the admin page, it still returns documents.

I walked through the code and find the code in 
DistributedUpdateProcessor::doDeleteByQuery to be suspicious.

Specifically, vinfo is not null, but I have no version field, so versionsStored 
is false.

So it gets to line 786, which looks like:
if (versionsStored) {

That then skips to line 813 (the finally clause) skipping all calls to 
doLocalDelete

Now, I do confess I don't understand exactly how this code should work.  
However, in the add code, the check for versionsStored does not skip the call 
to doLocalAdd.

Any suggestions would be welcome.

Andrew





Faceted search question (Tokenizing)

2012-10-10 Thread Grapes
Hey There, 

We have the following data structure: 


- Person 
-- Interest 1 
--- Subinterest 1 
--- Subinterest 1 Description 
--- Subinterest 1 ID 
-- Interest 2 
--- Subinterest 2 
--- Subinterest 2 Description 
--- Subinterest 2 ID 
. 
-- Interest 99 
--- Subinterest 99 
--- Subinterest 99 Description 
--- Subinterest 99 ID 

Interest, Subinterest, Subinterest Description and Subinterest IDs are all
multiavlued fields. A person can have any number of
subinterests,descriptions and IDS. 

How could we faced/search this based on this data structure? Right now we
tokenized everything in a seperate multivalued column in the following
fasion; 


|Interest='Interest 1',Subinterest='Subinterest 1',Subinterest='Another
Subinterest 1',Description='Interest 1 Description',ID='Interest 1 ID'| 
|Interest='Interest 2',Description='Interest 2 Description',ID='Interest 2
ID'| 

I have a feeling like this is a wrong approach to this problem.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceted-search-question-Tokenizing-tp4012948.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Unique terms without faceting

2012-10-10 Thread Phil Hoy
Hi,

I don't think you can use that component whilst taking into account any fq or q 
parameters.

Phil

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 10 October 2012 16:51
To: solr-user@lucene.apache.org
Subject: Re: Unique terms without faceting

The Solr TermsComponent:

http://wiki.apache.org/solr/TermsComponent

-- Jack Krupansky

-Original Message-
From: Phil Hoy
Sent: Wednesday, October 10, 2012 11:45 AM
To: solr-user@lucene.apache.org
Subject: Unique terms without faceting

Hi,

I know that you can use a facet query to get the unique terms for a field 
taking account of any q or fq parameters but for our use case the counts are 
not needed. So is there a more efficient way of finding  just unique terms for 
a field?

Phil


__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__


PointType doc reindex issue

2012-10-10 Thread Ravi Solr
Hello,
I have a weird problem, Whenever I read the doc from solr and
then index the same doc that already exists in the index (aka
reindexing) I get the following error. Can somebody tell me what I am
doing wrong. I use solr 3.6 and the definition of the field is given
below

fieldType name=latlong class=solr.LatLonType subFieldSuffix=_coordinate/
dynamicField name=*_coordinate type=tdouble indexed=true stored=true/

Exception in thread main
org.apache.solr.client.solrj.SolrServerException: Server at
http://testsolr:8080/solr/mycore returned non ok status:400,
message:ERROR: [doc=1182684] multiple values encountered for non
multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
at com.wpost.search.indexing.MyTest.main(MyTest.java:31)


The data in the index looks as follows

str name=geolocation39.017608,-77.375239/str
arr name=geolocation_0_coordinate
 double39.017608/double
 double39.017608/double
/arr
arr name=geolocation_1_coordinate
double-77.375239/double
double-77.375239/double
/arr

Thanks

Ravi Kiran Bhaskar


Re: PointType doc reindex issue

2012-10-10 Thread Gopal Patwa
You need remove field after read solr doc,  when u add new field it will
add to list,  so when u try to commit the update field,  it will be multi
value and in your schema it is single value
On Oct 10, 2012 9:26 AM, Ravi Solr ravis...@gmail.com wrote:

 Hello,
 I have a weird problem, Whenever I read the doc from solr and
 then index the same doc that already exists in the index (aka
 reindexing) I get the following error. Can somebody tell me what I am
 doing wrong. I use solr 3.6 and the definition of the field is given
 below

 fieldType name=latlong class=solr.LatLonType
 subFieldSuffix=_coordinate/
 dynamicField name=*_coordinate type=tdouble indexed=true
 stored=true/

 Exception in thread main
 org.apache.solr.client.solrj.SolrServerException: Server at
 http://testsolr:8080/solr/mycore returned non ok status:400,
 message:ERROR: [doc=1182684] multiple values encountered for non
 multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
 at com.wpost.search.indexing.MyTest.main(MyTest.java:31)


 The data in the index looks as follows

 str name=geolocation39.017608,-77.375239/str
 arr name=geolocation_0_coordinate
  double39.017608/double
  double39.017608/double
 /arr
 arr name=geolocation_1_coordinate
 double-77.375239/double
 double-77.375239/double
 /arr

 Thanks

 Ravi Kiran Bhaskar



Memory Cost of group.cache.percent parameter

2012-10-10 Thread Mike Schultz
Does anyone have a clear understanding of how group.caching achieves it's
performance improvements memory wise?  Percent means percent of maxDoc so
it's a function of that, but is it a function of that *per* item in the
cache (like filterCache) or altogether?  The speed improvement looks pretty
dramatic for our macDoc=25M index but it would be helpful to understand what
the costs are.

Mike



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Memory-Cost-of-group-cache-percent-parameter-tp4012967.html
Sent from the Solr - User mailing list archive at Nabble.com.


Filter results based on custom scoring and _val_

2012-10-10 Thread jimtronic
I'm using solr function queries to generate my own custom score. I achieve
this using something along these lines:

q=_val_:my_custom_function()
This populates the score field as expected, but it also includes documents
that score 0. I need a way to filter the results so that scores below zero
are not included.

I realize that I'm using score in a non-standard way and that normally the
score that lucene/solr produce is not absolute. However, producing my own
score works really well for my needs.

I've tried using {!frange l=0} but this causes the score for all documents
to be 1.0.

I've found that I can do the following:

q=*:*fl=foo:my_custom_function()fq={!frange l=1}my_custom_function() 

This puts my custom score into foo, but it requires me to list all the logic
twice. Sometimes my logic is very long.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-results-based-on-custom-scoring-and-val-tp4012968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PointType doc reindex issue

2012-10-10 Thread Ravi Solr
Gopal I did in fact test the same and it worked when I delete ted the
geolocation_0_coordinate and geolocation_1_coordinate. But that seems
weird, so I was thinking if there is something else I need to do to
avoid doing this awkward workaround.

Ravi Kiran Bhaskar

On Wed, Oct 10, 2012 at 12:36 PM, Gopal Patwa gopalpa...@gmail.com wrote:
 You need remove field after read solr doc,  when u add new field it will
 add to list,  so when u try to commit the update field,  it will be multi
 value and in your schema it is single value
 On Oct 10, 2012 9:26 AM, Ravi Solr ravis...@gmail.com wrote:

 Hello,
 I have a weird problem, Whenever I read the doc from solr and
 then index the same doc that already exists in the index (aka
 reindexing) I get the following error. Can somebody tell me what I am
 doing wrong. I use solr 3.6 and the definition of the field is given
 below

 fieldType name=latlong class=solr.LatLonType
 subFieldSuffix=_coordinate/
 dynamicField name=*_coordinate type=tdouble indexed=true
 stored=true/

 Exception in thread main
 org.apache.solr.client.solrj.SolrServerException: Server at
 http://testsolr:8080/solr/mycore returned non ok status:400,
 message:ERROR: [doc=1182684] multiple values encountered for non
 multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
 at com.wpost.search.indexing.MyTest.main(MyTest.java:31)


 The data in the index looks as follows

 str name=geolocation39.017608,-77.375239/str
 arr name=geolocation_0_coordinate
  double39.017608/double
  double39.017608/double
 /arr
 arr name=geolocation_1_coordinate
 double-77.375239/double
 double-77.375239/double
 /arr

 Thanks

 Ravi Kiran Bhaskar



Re: PointType doc reindex issue

2012-10-10 Thread Gopal Patwa
Instead addfield method use setfield
On Oct 10, 2012 9:54 AM, Ravi Solr ravis...@gmail.com wrote:

 Gopal I did in fact test the same and it worked when I delete ted the
 geolocation_0_coordinate and geolocation_1_coordinate. But that seems
 weird, so I was thinking if there is something else I need to do to
 avoid doing this awkward workaround.

 Ravi Kiran Bhaskar

 On Wed, Oct 10, 2012 at 12:36 PM, Gopal Patwa gopalpa...@gmail.com
 wrote:
  You need remove field after read solr doc,  when u add new field it will
  add to list,  so when u try to commit the update field,  it will be multi
  value and in your schema it is single value
  On Oct 10, 2012 9:26 AM, Ravi Solr ravis...@gmail.com wrote:
 
  Hello,
  I have a weird problem, Whenever I read the doc from solr and
  then index the same doc that already exists in the index (aka
  reindexing) I get the following error. Can somebody tell me what I am
  doing wrong. I use solr 3.6 and the definition of the field is given
  below
 
  fieldType name=latlong class=solr.LatLonType
  subFieldSuffix=_coordinate/
  dynamicField name=*_coordinate type=tdouble indexed=true
  stored=true/
 
  Exception in thread main
  org.apache.solr.client.solrj.SolrServerException: Server at
  http://testsolr:8080/solr/mycore returned non ok status:400,
  message:ERROR: [doc=1182684] multiple values encountered for non
  multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
  at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
  at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
  at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
 
 
  The data in the index looks as follows
 
  str name=geolocation39.017608,-77.375239/str
  arr name=geolocation_0_coordinate
   double39.017608/double
   double39.017608/double
  /arr
  arr name=geolocation_1_coordinate
  double-77.375239/double
  double-77.375239/double
  /arr
 
  Thanks
 
  Ravi Kiran Bhaskar
 



Re: PointType doc reindex issue

2012-10-10 Thread Ravi Solr
I am using DirectXmlRequest to index XML. This is just a test case as
my client would be sending me a SOLR compliant XML. so I was trying to
simulate it by reading a doc from an exiting core and reindexing it.

HttpSolrServer server = new
HttpSolrServer(http://testsolr:8080/solr/mycore;);
QueryResponse resp = server.query(new 
SolrQuery(contentid:(1184911
OR 1182684)));
SolrDocumentList list = resp.getResults();
if(list != null  !list.isEmpty()) {
for(SolrDocument doc : list) {
SolrInputDocument iDoc = 
ClientUtils.toSolrInputDocument(doc);  
String contentid = (String) 
iDoc.getFieldValue(egcontentid);
String name = (String) 
iDoc.getFieldValue(name);
iDoc.setField(name, DigestUtils.md5Hex(name));

String xml = ClientUtils.toXML(iDoc);   

DirectXmlRequest up = new 
DirectXmlRequest(/update,
add+xml+/add);
server.request(up);
server.commit();

System.out.println(Updated name in contentid - 
 + contentid);

}
}

Ravi Kiran

On Wed, Oct 10, 2012 at 1:02 PM, Gopal Patwa gopalpa...@gmail.com wrote:
 Instead addfield method use setfield
 On Oct 10, 2012 9:54 AM, Ravi Solr ravis...@gmail.com wrote:

 Gopal I did in fact test the same and it worked when I delete ted the
 geolocation_0_coordinate and geolocation_1_coordinate. But that seems
 weird, so I was thinking if there is something else I need to do to
 avoid doing this awkward workaround.

 Ravi Kiran Bhaskar

 On Wed, Oct 10, 2012 at 12:36 PM, Gopal Patwa gopalpa...@gmail.com
 wrote:
  You need remove field after read solr doc,  when u add new field it will
  add to list,  so when u try to commit the update field,  it will be multi
  value and in your schema it is single value
  On Oct 10, 2012 9:26 AM, Ravi Solr ravis...@gmail.com wrote:
 
  Hello,
  I have a weird problem, Whenever I read the doc from solr and
  then index the same doc that already exists in the index (aka
  reindexing) I get the following error. Can somebody tell me what I am
  doing wrong. I use solr 3.6 and the definition of the field is given
  below
 
  fieldType name=latlong class=solr.LatLonType
  subFieldSuffix=_coordinate/
  dynamicField name=*_coordinate type=tdouble indexed=true
  stored=true/
 
  Exception in thread main
  org.apache.solr.client.solrj.SolrServerException: Server at
  http://testsolr:8080/solr/mycore returned non ok status:400,
  message:ERROR: [doc=1182684] multiple values encountered for non
  multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
  at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
  at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
  at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
 
 
  The data in the index looks as follows
 
  str name=geolocation39.017608,-77.375239/str
  arr name=geolocation_0_coordinate
   double39.017608/double
   double39.017608/double
  /arr
  arr name=geolocation_1_coordinate
  double-77.375239/double
  double-77.375239/double
  /arr
 
  Thanks
 
  Ravi Kiran Bhaskar
 



Re: Problem with delete by query in Solr 4.0 beta

2012-10-10 Thread Ahmet Arslan

 Do you have a _version_ field in
 your schema. I believe SOLR 4.0
 Beta requires that field.

Probably he is hitting this https://issues.apache.org/jira/browse/SOLR-3432


Creating a new Collection through API

2012-10-10 Thread Markus Mirsberger

Hi,

what is the best way to create a new Collection through the API so I get 
an own config folder with schema.xml and solrconfig.xml inside the 
created Core?


When I just create a Collection, only the data folder will be created 
but the config folder with schema.xml and solrconfig.xml will be used 
from another Collection. Even when I add the config folder later, I have 
to reload the core on every server to get the changes :(


Do I have to create a default Core somewhere, copy it inside my solr 
folder, rename it and then add this as a Collection or is there a better 
way to do this?



Thanks,
Markus




Re: anyone have any clues about this exception

2012-10-10 Thread Alexandre Rafalovitch
Something timed out, the other end closed the connection. This end
tried to write to closed pipe and died, something tried to catch that
exception and write its own and died even worse? Just making it up
really, but sounds good (plus a 3-year Java tech-support hunch).

If it happens often enough, see if you can run WireShark on that
machine's network interface and catch the whole network conversation
in action. Often, there is enough clues there by looking at tcp
packets and/or stuff transmitted. WireShark is a power-tool, so takes
a little while the first time, but the learning will pay for itself
over and over again.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert rober...@buy.com wrote:
 Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (master) 
 instance contains lots of these exceptions but solr itself seems to be doing 
 fine... any ideas?  I'm not seeing these exceptions being logged on my slave 
 servers btw, just the master where we do our indexing only.



 Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve invoke
 SEVERE: Servlet.service() for servlet default threw exception
 java.lang.IllegalStateException
 at 
 org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at 
 com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
 at java.lang.Thread.run(Unknown Source)


Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Shawn Heisey

On 10/9/2012 3:02 PM, Briggs Thompson wrote:

*Otis* - jstack is a great suggestion, thanks! The problem didn't happen
this morning but next time it does I will certainly get the dump to see
exactly where the app is swimming around. I haven't used
StreamingUpdateSolrServer
but I will see if that makes a difference. Are there any major drawbacks of
going this route?


One caveat -- when using the Streaming/Concurrent object, your 
application will not be notified when there is a problem indexing. I've 
been told there is a way to override a method in the object to allow 
trapping errors, but I have not seen sample code and haven't figured out 
how to do it.  I've filed an issue and a patch to fix this.  It's 
received some comments, but so far nobody has decided to commit it.


https://issues.apache.org/jira/browse/SOLR-3284

Thanks,
Shawn



Re: PriorityQueue:initialize consistently showing up as hot spot while profiling

2012-10-10 Thread Aaron Daubman
Hi Mikhail,

On Fri, Oct 5, 2012 at 7:15 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 okay. huge rows value is no.1 way to kill Lucene. It's not possible,
 absolutely. You need to rethink logic of your component. Check Solr's
 FieldCollapsing code, IIRC it makes second search to achieve similar goal.
 Also check PostFilter and DelegatingCollector classes, their approach can
 also be handy for your task.

This sounds like it could be a much saner way to handle the task,
however, I'm not sure what I should be looking at for the
'FieldCollapsing code' you mention - can you point me to a class?

Also, is there anything more you can say about PostFilter and
DelegatingCollector classes - I reviewed them but it was not obvious
to me what they were doing that would allow me to reduce the large
rows param we use to ensure all relevant docs are included in the
grouping and limiting occurs at the group level, rather than
pre-grouping...

Thanks again,
  Aaron


Re: PointType doc reindex issue

2012-10-10 Thread Chris Hostetter
: I have a weird problem, Whenever I read the doc from solr and
: then index the same doc that already exists in the index (aka
: reindexing) I get the following error. Can somebody tell me what I am
: doing wrong. I use solr 3.6 and the definition of the field is given
: below

When you use the LatLonType field type you get synthetic *_coordinate 
fields automicaly constructed under the covers from each of your fields 
that use a latlon fieldType.  because you have configured the 
*_coordinate fields to be stored they are included in the response 
when you request the doc.

this means that unless you explicitly remove those synthetically 
constructed values before reindexing, they will still be there in 
addition to the new (posisbly redundent) synthetic values created while 
indexing.

This is why the *_coordinate dynamicField in the solr example schema.xml 
is marked 'stored=false' so that this field doesn't come back in the 
response -- it's not ment for end users.


: fieldType name=latlong class=solr.LatLonType 
subFieldSuffix=_coordinate/
: dynamicField name=*_coordinate type=tdouble indexed=true 
stored=true/
: 
: Exception in thread main
: org.apache.solr.client.solrj.SolrServerException: Server at
: http://testsolr:8080/solr/mycore returned non ok status:400,
: message:ERROR: [doc=1182684] multiple values encountered for non
: multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
:   at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
:   at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
:   at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
: 
: 
: The data in the index looks as follows
: 
: str name=geolocation39.017608,-77.375239/str
: arr name=geolocation_0_coordinate
:  double39.017608/double
:  double39.017608/double
: /arr
: arr name=geolocation_1_coordinate
: double-77.375239/double
: double-77.375239/double
: /arr
: 
: Thanks
: 
: Ravi Kiran Bhaskar
: 

-Hoss


RE: anyone have any clues about this exception

2012-10-10 Thread Petersen, Robert
You could be right.  Going back in the logs, I noticed it used to happen less 
frequently and always towards the end of an optimize operation.  It is probably 
my indexer timing out waiting for updates to occur during optimizes.  The 
errors grew recently due to my upping the indexer threadcount to 22 threads, so 
there's a lot more timeouts occurring now.  Also our index has grown to double 
the old size so the optimize operation has started taking a lot longer, also 
contributing to what I'm seeing.   I have just changed my optimize frequency 
from three times a day to one time a day after reading the following:

Here they are talking about completely deprecating the optimize command in the 
next version of solr…
https://issues.apache.org/jira/browse/SOLR-3141c


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Wednesday, October 10, 2012 11:10 AM
To: solr-user@lucene.apache.org
Subject: Re: anyone have any clues about this exception

Something timed out, the other end closed the connection. This end tried to 
write to closed pipe and died, something tried to catch that exception and 
write its own and died even worse? Just making it up really, but sounds good 
(plus a 3-year Java tech-support hunch).

If it happens often enough, see if you can run WireShark on that machine's 
network interface and catch the whole network conversation in action. Often, 
there is enough clues there by looking at tcp packets and/or stuff transmitted. 
WireShark is a power-tool, so takes a little while the first time, but the 
learning will pay for itself over and over again.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert rober...@buy.com wrote:
 Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (master) 
 instance contains lots of these exceptions but solr itself seems to be doing 
 fine... any ideas?  I'm not seeing these exceptions being logged on my slave 
 servers btw, just the master where we do our indexing only.



 Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve 
 invoke
 SEVERE: Servlet.service() for servlet default threw exception 
 java.lang.IllegalStateException
 at 
 org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at 
 com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
 at java.lang.Thread.run(Unknown Source)



Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Briggs Thompson
Thanks for the heads up. I just tested this and you are right. I am making
a call to addBeans and it succeeds without any issue even when the server
is down. That sucks.

A big part of this process is reliant on knowing exactly what has made it
into the index and what has not, so this a difficult problem to solve when
you can't catch exceptions. I was thinking I could execute a ping request
first to determine if the Solr server is still operational, but that
doesn't help if the updateRequestHandler fails.

On Wed, Oct 10, 2012 at 1:48 PM, Shawn Heisey s...@elyograg.org wrote:

 On 10/9/2012 3:02 PM, Briggs Thompson wrote:

 *Otis* - jstack is a great suggestion, thanks! The problem didn't happen

 this morning but next time it does I will certainly get the dump to see
 exactly where the app is swimming around. I haven't used
 StreamingUpdateSolrServer
 but I will see if that makes a difference. Are there any major drawbacks
 of
 going this route?


 One caveat -- when using the Streaming/Concurrent object, your application
 will not be notified when there is a problem indexing. I've been told there
 is a way to override a method in the object to allow trapping errors, but I
 have not seen sample code and haven't figured out how to do it.  I've filed
 an issue and a patch to fix this.  It's received some comments, but so far
 nobody has decided to commit it.

 https://issues.apache.org/**jira/browse/SOLR-3284https://issues.apache.org/jira/browse/SOLR-3284

 Thanks,
 Shawn




Re: Why is SolrDispatchFilter using 90% of the Time?

2012-10-10 Thread Yonik Seeley
 When I look at the distribution of the Response-time I notice
 'SolrDispatchFilter.doFilter()' is taking up 90% of the time.

That's pretty much the top-level entry point to Solr (from the servlet
container), so it's normal.

-Yonik
http://lucidworks.com


RE: Faceted search question (Tokenizing)

2012-10-10 Thread Petersen, Robert
What do you want the results to be, persons?  And the facets should be 
interests or subinterests?  Why are there two layers of interests anyway?  Can 
there my many subinterests under one interest?  Is one of those two a name of 
the interest which would look nice as a facet?

Anyway, have you read these pages yet?  These should get you started in the 
right direction.
http://wiki.apache.org/solr/SolrFacetingOverview
http://wiki.apache.org/solr/HierarchicalFaceting

Hope that helps,
Robi

-Original Message-
From: Grapes [mailto:mkloub...@gmail.com] 
Sent: Wednesday, October 10, 2012 8:52 AM
To: solr-user@lucene.apache.org
Subject: Faceted search question (Tokenizing)

Hey There, 

We have the following data structure: 


- Person
-- Interest 1
--- Subinterest 1
--- Subinterest 1 Description
--- Subinterest 1 ID
-- Interest 2
--- Subinterest 2
--- Subinterest 2 Description
--- Subinterest 2 ID
. 
-- Interest 99
--- Subinterest 99
--- Subinterest 99 Description
--- Subinterest 99 ID 

Interest, Subinterest, Subinterest Description and Subinterest IDs are all 
multiavlued fields. A person can have any number of subinterests,descriptions 
and IDS. 

How could we faced/search this based on this data structure? Right now we 
tokenized everything in a seperate multivalued column in the following fasion; 


|Interest='Interest 1',Subinterest='Subinterest 1',Subinterest='Another
Subinterest 1',Description='Interest 1 Description',ID='Interest 1 ID'| 
|Interest='Interest 2',Description='Interest 2 Description',ID='Interest 
|2
ID'| 

I have a feeling like this is a wrong approach to this problem.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceted-search-question-Tokenizing-tp4012948.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Query foreign language synonyms / words of equivalent meaning?

2012-10-10 Thread SUJIT PAL
Hi,

We are using google translate to do something like what you (onlinespending) 
want to do, so maybe it will help.

During indexing, we store the searchable fields from documents into a fields 
named _en, _fr, _es, etc. So assuming we capture title and body from each 
document, the fields are (title_en, body_en), (title_fr, body_fr), etc, with 
their own analyzer chains. These documents come from a controlled source (ie 
not the web), so we know the language they are authored in.

During searching, a custom component intercepts the client language and the 
query. The query is sent to google translate for language detection. The 
largest amount of docs in the corpus is english, so if the detected language is 
either english or the client language, then we call google translate again to 
find the translated query in the other (english or client) language. Another 
custom component constructs an OR query between the two languages one component 
of which is aimed at the _en field set and the other aimed at the _xx (client 
language) field set.

-sujit

On Oct 9, 2012, at 11:24 PM, Bernd Fehling wrote:

 
 As far as I know, there is no built-in functionality for language translation.
 I would propose to write one, but there are many many pitfalls.
 If you want to translate from one language to another you might have to
 know the starting language. Otherwise you get problems with translation.
 
 Not (german) - distress (english), affliction (english)
 
 - you might have words in one language which are stopwords in other language 
 not
 - you don't have a one to one mapping, it's more like 1 to n+x
  toilette (french) - bathroom, rest room / restroom, powder room
 
 This are just two points which jump into my mind but there are tons of 
 pitfalls.
 
 We use the solution of a multilingual thesaurus as synonym dictionary.
 http://en.wikipedia.org/wiki/Eurovoc
 It holds translations of 22 official languages of the European Union.
 
 So a search for europäischer währungsfonds gives also results with
 european monetary fund, fonds monétaire européen, ...
 
 Regards
 Bernd
 
 
 
 Am 10.10.2012 04:54, schrieb onlinespend...@gmail.com:
 Hi,
 
 English is going to be the predominant language used in my documents, but
 there may be a spattering of words in other languages (such as Spanish or
 French). What I'd like is to initiate a query for something like bathroom
 for example and for Solr to return documents that not only contain
 bathroom but also baño (Spanish). And the same goes when searching for 
 baño. I'd like Solr to return documents that contain either bathroom or 
 baño.
 
 One possibility is to pre-translate all indexed documents to a common
 language, in this case English. And if someone were to search using a
 foreign word, I'd need to translate that to English before issuing a query
 to Solr. This appears to be problematic, since I'd have to know whether the
 indexed words and the query are even in a foreign language, which is not
 trivial.
 
 Another possibility is to pre-build a list of foreign word synonyms. So baño
 would be listed as a synonym for bathroom. But I'd need to include other
 languages (such as toilette in French) and other words. This requires that
 I know in advance all possible words I'd need to include foreign language
 versions of (not to mention needing to know which languages to include).
 This isn't trivial either.
 
 I'm assuming there's no built-in functionality that supports the foreign
 language translation on the fly, so what do people propose?
 
 Thanks!
 
 
 -- 
 *
 Bernd FehlingUniversitätsbibliothek Bielefeld
 Dipl.-Inform. (FH)LibTec - Bibliothekstechnologie
 Universitätsstr. 25 und Wissensmanagement
 33615 Bielefeld
 Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de
 
 BASE - Bielefeld Academic Search Engine - www.base-search.net
 *



unsuscribe

2012-10-10 Thread zMk Bnc

unsuscribe

Re: add shard to index

2012-10-10 Thread Upayavira
That is what is being discussed already. The thing is, at present, Solr
requires an even distribution of documents across shards, so you can't
just add another shard, assign it to a hash range, and be done with it.

The reason is down to the scoring mechanism used - TF/IDF (term
frequency/inverse document frequency). The IDF portion says how many
times does this term appear in the whole index? If there are only two
documents in the index, then the IDF will be very different from when
there are 2 million docs, resulting in different scores for equivalent
documents based upon which shard they are in.

Currently, the only solution to this is to distribute your documents
evenly, which would mean, if you have four shards and you create a
fifth, you'd need to send 1/4 of your documents from each shard to the
new shard, which is not really a trivial task.

I believe the JIRA ticket covering this was mentioned earlier in this
thread.

Upayavira

On Mon, Oct 8, 2012, at 04:33 PM, Radim Kolar wrote:
 Do it as it is done in cassandra database. Adding new node and 
 redistributing data can be done in live system without problem it looks 
 like this:
 
 every cassandra node has key range assigned. instead of assigning keys 
 to nodes like hash(key) mod nodes, then every node has its portion of 
 hash keyspace. They do not need to be same, some node can have larger 
 portion of keyspace then another.
 
 hash function max possible value is 12.
 
 shard1 - 1-4
 shard2 - 5-8
 shard3 - 9-12
 
 now lets add new shard. In cassandra adding new shard by default cuts 
 existing one by half, so you will have
 shard1 - 1-2
 shard23-4
 shard35-8
 shard4   9-12
 
 see? You needed to move only documents from old shard1. Usually you are 
 adding more then 1 shard during reorganization, you do not need to 
 rebalance cluster by moving every node into different position in hash 
 keyspace that much.


Re: Wild card searching - well sort of

2012-10-10 Thread Erick Erickson
Have you looked at WordDelimiterFilterFactory that was mentioned
earlier? Try a fieldType in the admin/analysis page that has
WDFF as part of the analysis chain. It would do exactly what you've
described so far.

WDFF splits the input up as tokens on non-alphanum characters,
alpha/num transitions and case transitions (you can configure these).
Then searching will match these split-out tokens.

Best
Erick

On Wed, Oct 10, 2012 at 10:28 AM, Kissue Kissue kissue...@gmail.com wrote:
 It is really not fixed. It could also be *-*-BAAN or BAAN-CAN20-*. In each
 i just want only the fixed character(s) to match then the * can match any
 character.


 On Wed, Oct 10, 2012 at 2:05 PM, Toke Eskildsen 
 t...@statsbiblioteket.dkwrote:

 On Wed, 2012-10-10 at 14:15 +0200, Kissue Kissue wrote:
  I have added the string: *-BAAN-* to the index to a field called pattern
  which is a string type. Now i want to be able to search for A100-BAAN-C20
  or ZA20-BAAN-300 and have Solr return *-BAAN-*.

 That sounds a lot like the problem presented in the thread
 Indexing wildcard patterns:
 http://web.archiveorange.com/archive/v/AAfXfcuIJY9BQJL3mjty

 The short answer is no, Solr does not support this in the general form.
 But maybe you can make it work anyway. In your example, the two queries
 A100-BAAN-C20 and ZA20-BAAN-300 share the form
 [4 random characters]-[4 significant characters]-[3 random characters]
 so a little bit of pre-processing would rewrite that to
 *-[4 significant characters]-*
 which would match *-BAAN-*

 If you describe the patterns and common elements to your indexed terms
 and to your queries, we might come up with something.




Re: Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread Robert Muir
On Wed, Oct 10, 2012 at 9:02 AM, O. Klein kl...@octoweb.nl wrote:
 I don't want to tweak the threshold. For majority of cases it works fine.

 It's for cases where term has low frequency but is spelled correctly.

 If you lower the threshold you would also get incorrect spelled terms as
 suggestions.


Yeah there is no real magic here when the corpus contains typos. this
existing docFreq heuristic was just borrowed from the old index-based
spellchecker.

I do wonder if using # of occurrences (totalTermFreq) instead of # of
documents with the term (docFreq) would improve the heuristic.

In all cases I think if you want to also integrate a dictionary or
something, it seems like this could somehow be done with the
File-based spellchecker?


Re: segment number during optimize of index

2012-10-10 Thread jun Wang
I have an other question, does the number of segment affect speed for
update index?

2012/10/10 jame vaalet jamevaa...@gmail.com

 Guys,
 thanks for all the inputs, I was continuing my research to know more about
 segments in Lucene. Below are my conclusion, please correct me if am wrong.

1. Segments are independent sub-indexes in seperate file, while indexing
its better to create new segment as it doesnt have to modify an existing
file. where as while searching, smaller the segment the better it is
 since
you open x (not exactly x but xn a value proportional to x) physical
 files
to search if you have got x segments in the index.
2. since lucene has memory map concept, for each file/segment in index a
new m-map file is created and mapped to the physcial file in disk. Can
someone explain or correct this in detail, i am sure there are lot many
people wondering how m-map works while you merge or optimze index
 segments.



 On 6 October 2012 07:41, Otis Gospodnetic otis.gospodne...@gmail.com
 wrote:

  If I were you and not knowing all your details...
 
  I would optimize indices that are static (not being modified) and
  would optimize down to 1 segment.
  I would do it when search traffic is low.
 
  Otis
  --
  Search Analytics - http://sematext.com/search-analytics/index.html
  Performance Monitoring - http://sematext.com/spm/index.html
 
 
  On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet jamevaa...@gmail.com
 wrote:
   Hi Eric,
   I  am in a major dilemma with my index now. I have got 8 cores each
  around
   300 GB in size and half of them are deleted documents in it and above
  that
   each has got around 100 segments as well. Do i issue a expungeDelete
 and
   allow the merge policy to take care of the segments or optimize them
 into
   single segment. Search performance is not at par compared to usual solr
   speed.
   If i have to optimize what segment number should i choose? my RAM size
   around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas
   advice !
  
   thanks.
  
  
   On 6 October 2012 00:00, Erick Erickson erickerick...@gmail.com
 wrote:
  
   because eventually you'd run out of file handles. Imagine a
   long-running server with 100,000 segments. Totally
   unmanageable.
  
   I think shawn was emphasizing that RAM requirements don't
   depend on the number of segments. There are other
   resources that file consume however.
  
   Best
   Erick
  
   On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet jamevaa...@gmail.com
  wrote:
hi Shawn,
thanks for the detailed explanation.
I have got one doubt, you said it doesn matter how many segments
 index
   have
but then why does solr has this merge policy which merges segments
frequently?  why can it leave the segments as it is rather than
  merging
smaller one's into bigger one?
   
thanks
.
   
On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org wrote:
   
On 10/4/2012 3:22 PM, jame vaalet wrote:
   
so imagine i have merged the 150 Gb index into single segment,
 this
   would
make a single segment of 150 GB in memory. When new docs are
  indexed it
wouldn't alter this 150 Gb index unless i update or delete the
 older
   docs,
right? will 150 Gb single segment have problem with memory
 swapping
  at
   OS
level?
   
   
Supplement to my previous reply:  the real memory mentioned in the
  last
paragraph does not include the memory that the OS uses to cache
 disk
access.  If more memory is needed and all the free memory is being
  used
   by
the disk cache, the OS will throw away part of the disk cache (a
near-instantaneous operation that should never involve disk I/O)
 and
   give
that memory to the application that requests it.
   
Here's a very good breakdown of how memory gets used with
  MMapDirectory
   in
Solr.  It's applicable to any program that uses memory mapping, not
  just
Solr:
   
   
  http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory
   http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory
   
Thanks,
Shawn
   
   
   
   
--
   
-JAME
  
  
  
  
   --
  
   -JAME
 



 --

 -JAME




-- 
from Jun Wang


Re: Query foreign language synonyms / words of equivalent meaning?

2012-10-10 Thread Lance Norskog
I want an update processor that runs Translation Party.

http://translationparty.com/

http://downloadsquad.switched.com/2009/08/14/translation-party-achieves-hilarious-results-using-google-transl/

- Original Message -
| From: SUJIT PAL sujit@comcast.net
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 10, 2012 2:51:37 PM
| Subject: Re: Query foreign language synonyms / words of equivalent meaning?
| 
| Hi,
| 
| We are using google translate to do something like what you
| (onlinespending) want to do, so maybe it will help.
| 
| During indexing, we store the searchable fields from documents into a
| fields named _en, _fr, _es, etc. So assuming we capture title and
| body from each document, the fields are (title_en, body_en),
| (title_fr, body_fr), etc, with their own analyzer chains. These
| documents come from a controlled source (ie not the web), so we know
| the language they are authored in.
| 
| During searching, a custom component intercepts the client language
| and the query. The query is sent to google translate for language
| detection. The largest amount of docs in the corpus is english, so
| if the detected language is either english or the client language,
| then we call google translate again to find the translated query in
| the other (english or client) language. Another custom component
| constructs an OR query between the two languages one component of
| which is aimed at the _en field set and the other aimed at the _xx
| (client language) field set.
| 
| -sujit
| 
| On Oct 9, 2012, at 11:24 PM, Bernd Fehling wrote:
| 
|  
|  As far as I know, there is no built-in functionality for language
|  translation.
|  I would propose to write one, but there are many many pitfalls.
|  If you want to translate from one language to another you might
|  have to
|  know the starting language. Otherwise you get problems with
|  translation.
|  
|  Not (german) - distress (english), affliction (english)
|  
|  - you might have words in one language which are stopwords in other
|  language not
|  - you don't have a one to one mapping, it's more like 1 to n+x
|   toilette (french) - bathroom, rest room / restroom, powder room
|  
|  This are just two points which jump into my mind but there are tons
|  of pitfalls.
|  
|  We use the solution of a multilingual thesaurus as synonym
|  dictionary.
|  http://en.wikipedia.org/wiki/Eurovoc
|  It holds translations of 22 official languages of the European
|  Union.
|  
|  So a search for europäischer währungsfonds gives also results
|  with
|  european monetary fund, fonds monétaire européen, ...
|  
|  Regards
|  Bernd
|  
|  
|  
|  Am 10.10.2012 04:54, schrieb onlinespend...@gmail.com:
|  Hi,
|  
|  English is going to be the predominant language used in my
|  documents, but
|  there may be a spattering of words in other languages (such as
|  Spanish or
|  French). What I'd like is to initiate a query for something like
|  bathroom
|  for example and for Solr to return documents that not only contain
|  bathroom but also baño (Spanish). And the same goes when
|  searching for 
|  baño. I'd like Solr to return documents that contain either
|  bathroom or 
|  baño.
|  
|  One possibility is to pre-translate all indexed documents to a
|  common
|  language, in this case English. And if someone were to search
|  using a
|  foreign word, I'd need to translate that to English before issuing
|  a query
|  to Solr. This appears to be problematic, since I'd have to know
|  whether the
|  indexed words and the query are even in a foreign language, which
|  is not
|  trivial.
|  
|  Another possibility is to pre-build a list of foreign word
|  synonyms. So baño
|  would be listed as a synonym for bathroom. But I'd need to include
|  other
|  languages (such as toilette in French) and other words. This
|  requires that
|  I know in advance all possible words I'd need to include foreign
|  language
|  versions of (not to mention needing to know which languages to
|  include).
|  This isn't trivial either.
|  
|  I'm assuming there's no built-in functionality that supports the
|  foreign
|  language translation on the fly, so what do people propose?
|  
|  Thanks!
|  
|  
|  --
|  *
|  Bernd FehlingUniversitätsbibliothek Bielefeld
|  Dipl.-Inform. (FH)LibTec - Bibliothekstechnologie
|  Universitätsstr. 25 und Wissensmanagement
|  33615 Bielefeld
|  Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de
|  
|  BASE - Bielefeld Academic Search Engine - www.base-search.net
|  *
| 
| 


Re: Auto Correction?

2012-10-10 Thread deniz
so other than commercial solutions, it seems like i need to have plugin
right? i couldnt find any open source solutions yet...



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Correction-tp4012666p4013044.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread Lance Norskog
Hapax legomena (terms with DF of 1) are very often typos. You can automatically 
build a stopword file from these. If you want to be picky, you can use only 
words with a very small distance from words with much larger DF.

- Original Message -
| From: Robert Muir rcm...@gmail.com
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 10, 2012 5:40:23 PM
| Subject: Re: Using additional dictionary with DirectSolrSpellChecker
| 
| On Wed, Oct 10, 2012 at 9:02 AM, O. Klein kl...@octoweb.nl wrote:
|  I don't want to tweak the threshold. For majority of cases it works
|  fine.
| 
|  It's for cases where term has low frequency but is spelled
|  correctly.
| 
|  If you lower the threshold you would also get incorrect spelled
|  terms as
|  suggestions.
| 
| 
| Yeah there is no real magic here when the corpus contains typos. this
| existing docFreq heuristic was just borrowed from the old index-based
| spellchecker.
| 
| I do wonder if using # of occurrences (totalTermFreq) instead of # of
| documents with the term (docFreq) would improve the heuristic.
| 
| In all cases I think if you want to also integrate a dictionary or
| something, it seems like this could somehow be done with the
| File-based spellchecker?
| 


Re: segment number during optimize of index

2012-10-10 Thread Lance Norskog
Study index merging. This is awesome.
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Jame- opening lots of segments is not a problem. A major performance problem 
you will find is 'Large Pages'. This is an operating-system strategy for 
managing servers with 10s of gigabytes of memory. Without it, all large 
programs run much more slowly than they could. It is not a Solr or JVM problem.


- Original Message -
| From: jun Wang wangjun...@gmail.com
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 10, 2012 6:36:09 PM
| Subject: Re: segment number during optimize of index
| 
| I have an other question, does the number of segment affect speed for
| update index?
| 
| 2012/10/10 jame vaalet jamevaa...@gmail.com
| 
|  Guys,
|  thanks for all the inputs, I was continuing my research to know
|  more about
|  segments in Lucene. Below are my conclusion, please correct me if
|  am wrong.
| 
| 1. Segments are independent sub-indexes in seperate file, while
| indexing
| its better to create new segment as it doesnt have to modify an
| existing
| file. where as while searching, smaller the segment the better
| it is
|  since
| you open x (not exactly x but xn a value proportional to x)
| physical
|  files
| to search if you have got x segments in the index.
| 2. since lucene has memory map concept, for each file/segment in
| index a
| new m-map file is created and mapped to the physcial file in
| disk. Can
| someone explain or correct this in detail, i am sure there are
| lot many
| people wondering how m-map works while you merge or optimze
| index
|  segments.
| 
| 
| 
|  On 6 October 2012 07:41, Otis Gospodnetic
|  otis.gospodne...@gmail.com
|  wrote:
| 
|   If I were you and not knowing all your details...
|  
|   I would optimize indices that are static (not being modified) and
|   would optimize down to 1 segment.
|   I would do it when search traffic is low.
|  
|   Otis
|   --
|   Search Analytics -
|   http://sematext.com/search-analytics/index.html
|   Performance Monitoring - http://sematext.com/spm/index.html
|  
|  
|   On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet
|   jamevaa...@gmail.com
|  wrote:
|Hi Eric,
|I  am in a major dilemma with my index now. I have got 8 cores
|each
|   around
|300 GB in size and half of them are deleted documents in it and
|above
|   that
|each has got around 100 segments as well. Do i issue a
|expungeDelete
|  and
|allow the merge policy to take care of the segments or optimize
|them
|  into
|single segment. Search performance is not at par compared to
|usual solr
|speed.
|If i have to optimize what segment number should i choose? my
|RAM size
|around 120 GB and JVM heap is around 45 GB (oldGen being 30
|GB). Pleas
|advice !
|   
|thanks.
|   
|   
|On 6 October 2012 00:00, Erick Erickson
|erickerick...@gmail.com
|  wrote:
|   
|because eventually you'd run out of file handles. Imagine a
|long-running server with 100,000 segments. Totally
|unmanageable.
|   
|I think shawn was emphasizing that RAM requirements don't
|depend on the number of segments. There are other
|resources that file consume however.
|   
|Best
|Erick
|   
|On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet
|jamevaa...@gmail.com
|   wrote:
| hi Shawn,
| thanks for the detailed explanation.
| I have got one doubt, you said it doesn matter how many
| segments
|  index
|have
| but then why does solr has this merge policy which merges
| segments
| frequently?  why can it leave the segments as it is rather
| than
|   merging
| smaller one's into bigger one?
|
| thanks
| .
|
| On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org
| wrote:
|
| On 10/4/2012 3:22 PM, jame vaalet wrote:
|
| so imagine i have merged the 150 Gb index into single
| segment,
|  this
|would
| make a single segment of 150 GB in memory. When new docs
| are
|   indexed it
| wouldn't alter this 150 Gb index unless i update or delete
| the
|  older
|docs,
| right? will 150 Gb single segment have problem with memory
|  swapping
|   at
|OS
| level?
|
|
| Supplement to my previous reply:  the real memory mentioned
| in the
|   last
| paragraph does not include the memory that the OS uses to
| cache
|  disk
| access.  If more memory is needed and all the free memory
| is being
|   used
|by
| the disk cache, the OS will throw away part of the disk
| cache (a
| near-instantaneous operation that should never involve disk
| I/O)
|  and
|give
| that memory to the application that requests it.
|
| Here's a very good breakdown of how memory gets used with
|   MMapDirectory
|in
| Solr.  It's applicable to any program that uses