date:20130826

Re: How to patch Solr4.2 for SolrEnityProcessor Sub-Enity issue

2013-08-26 Thread Shalin Shekhar Mangar

You are right. The fix committed to source was not complete. I've
reopened SOLR-3336 and I will put up a test and fix.

https://issues.apache.org/jira/browse/SOLR-3336

On Mon, Aug 26, 2013 at 9:41 AM, harshchawla ha...@livecareer.com wrote:
 In the second reply of this link, it is discussed and more over I am facing
 the same issue here:
 http://stackoverflow.com/questions/15734308/solrentityprocessor-is-called-only-once-for-sub-entities?lq=1.

 See attached my data-config.xml of new core (let say) test
 entity name=can dataSource=dsms query=select candidateid from
 Candidate c
 field column=CandidateID
 name=candidateid /
 entity name=dt 
 processor=SolrEntityProcessor
 url=http://localhost:8983/solr/csearch; query=candidateid
 :${can.CandidateID} fl=*
 /entity
 entity name=psu dataSource=dsms
 query=select Value from [CandidateData] where
 candidateid=${can.CandidateID}
 field column=Value name=psu/
 /entity
 /entity

 Here only the first record is getting parsed properly otherwise for all the
 remaining records only two field are coming in new core test even though
 core csearch contains all the field values for all the records.

 I hope it clarifies my situation more,



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-patch-Solr4-2-for-SolrEnityProcessor-Sub-Enity-issue-tp4086292p4086564.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.

Re: How to patch Solr4.2 for SolrEnityProcessor Sub-Enity issue

2013-08-26 Thread harshchawla

Thanks a lot in advance. I am eagerly waiting for your response.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-patch-Solr4-2-for-SolrEnityProcessor-Sub-Enity-issue-tp4086292p4086572.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: custom names for replicas in solrcloud

2013-08-26 Thread YouPeng Yang

Hi  smanad

   If I do not make a mistake, You can append the coreNodeName parameter to
your creation command:


http://10.7.23.125:8080/solr/admin/cores?action=CREATEname=dfscore8_3shard=shard3_3collection.configName=myconfschema=schema.xmlconfig=solrconfig3.xmlcollection=collection1dataDir=/soData/;
coreNodeName=heihei

   May it be helpful


Regards



2013/8/23 smanad sma...@gmail.com

 Hi,

 I am using Solr 4.3 with 3 solr hosta and with an external zookeeper
 ensemble of 3 servers. And just 1 shard currently.

 When I create collections using collections api it creates collections with
 names,
 collection1_shard1_replica1, collection1_shard1_replica2,
 collection1_shard1_replica3.
 Is there any way to pass a custom name? or can I have all the replicas with
 same name?

 Any pointers will be much appreciated.
 Thanks,
 -Manasi



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205.html
 Sent from the Solr - User mailing list archive at Nabble.com.

SimpleFacet feature combinations..

2013-08-26 Thread Bram Van Dam


Hi folks,

Some of the features of SimpleFacet can't be combined -- the most 
notable missing combination being range + pivot. Another combination 
which we'd find very useful is integration with StatsComponent 
(pivot/ranged stats).


Is anyone working on this? Or willing to work on this? This is a rather 
important feature for us, one which we currently implement by launching 
N+1 queries (or worse). Given the importance, I would be willing and 
able to donate some of my time to work on this. However, not being very 
familiar with the solr internals, it would probably be easier to team up 
with someone else on this?


If anyone is interested, feel free to get in touch.

 - Bram

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Furkan KAMACI

Hi Walter;

You are right about performance. However when I index documents on a
machine that has  a high percentage of Physical Memory usage I get EOF
errors?


2013/8/26 Walter Underwood wun...@wunderwood.org

 On Aug 25, 2013, at 1:41 PM, Furkan KAMACI wrote:

  Sometimes Physical Memory usage of Solr is over %99 and this may cause
  problems. Do you run such kind of a command periodically:
 
  sudo sh -c sync; echo 3  /proc/sys/vm/drop_caches
 
  to force dropping caches of machine that Solr runs at and avoid problems?


 This is a terrible idea. The OS automatically manages the file buffers.
 When they are all used, that is a  good thing, because it reduced disk IO.

 After this, no files will be cached in RAM. Every single read from a file
 will have to go to disk. This will cause very slow performance until the
 files are recached.

 Recently, I did exactly the opposite to improve performance in our Solr
 installation. Before starting the Solr process, a script reads every file
 in the index so that it will already be in file buffers. This avoids
 several minutes of high disk IO and slow performance after startup.

 wunder
 Search Guy, Chegg.com

Re: Tokenization at query time

2013-08-26 Thread Andrea Gazzarini


Hi Erick,
escaping spaces doesn't work...

Briefly,

- In a document I have an ISBN field that (stored value) is 
*978-90-04-23560-1*

- In the index I have this value: *9789004235601*

Now, I want be able to search the document by using:

1) q=*978-90-04-23560-1*
2) q=*978 90 04 23560 1*
3) q=*9789004235601*

1 and 3 works perfectly, 2 doesn't work.

My code is:

/SolrQuery query = new 
SolrQuery(ClientUtils.escapeQueryChars(req.getParameter(q)));/


isbn is declared in this way

fieldtype name=isbn_issn class=solr.TextField 
positionIncrementGap=100

analyzer
tokenizer class=*solr.KeywordTokenizerFactory*/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 catenateWords=0 
catenateNumbers=0 catenateAll=1 splitOnCaseChange=0/

/analyzer
/fieldtype
field name=isbn_issn_search type=issn_isbn indexed=true/
search handler is:

requestHandler name=any_bc class=solr.SearchHandler 
default=true

lst name=defaults
str name=defType*dismax*/str
str name=mm100%/str
str name=qf
*isbn_issn_search*^1
/str
str name=pf
*isbn_issn_search*^10
/str
int name=ps0/int
float name=tie0.1/float
...
/requestHandler

This is what I get:

*1) 978-90-04-23560-1**
*path=/select 
params={start=0q=*978\-90\-04\-23560\-1*sfield=qt=any_bcwt=javabinrows=10version=2} 
*hits=1* status=0 QTime=5*


2) ***9789004235601*
*webapp=/solr path=/select 
params={start=0q=*9789004235601*sfield=qt=any_bcwt=javabinrows=10version=2} 
*hits=1* status=0 QTime=5*


3) **978 90 04 23560 1**
*path=/select 
params={start=0*q=978\+90\+04\+23560\+1*sfield=qt=any_bcwt=javabinrows=10version=2} 
*hits=0 *status=0 QTime=2*


*Extract from queryDebug=true:

str name=q978\ 90\ 04\ 23560\ 1/str
...
str name=rawquerystring978\ 90\ 04\ 23560\ 1/str
str name=querystring978\ 90\ 04\ 23560\ 1/str
...
str name=parsedquery
+((DisjunctionMaxQuery((isbn_issn_search:*978*^1.0)~0.1)
DisjunctionMaxQuery((isbn_issn_search:*90*^1.0)~0.1)
DisjunctionMaxQuery((isbn_issn_search:*04*^1.0)~0.1)
DisjunctionMaxQuery((isbn_issn_search:*23560*^1.0)~0.1)
DisjunctionMaxQuery((isbn_issn_search:*1*^1.0)~0.1))~5)
DisjunctionMaxQuery((isbn_issn_search:*9789004235601*^10.0)~0.1)
/str


Probably this is a very stupid question but I'm going crazy. In this page

http://wiki.apache.org/solr/DisMaxQParserPlugin

*Query Structure*

/For each word in the query string, dismax builds a 
DisjunctionMaxQuery object for that word across all of the fields in the 
//qf//param...


/And seems exactly what it is doing...but what is a word? How can I 
force//(without using double quotes) spaces in a way that they are 
considered part of the word/?


/Many many many thanks
Andrea

On 08/13/2013 04:18 PM, Erick Erickson wrote:

I think you can get what you want by escaping the space with a backslash

YMMV of course.
Erick


On Tue, Aug 13, 2013 at 9:11 AM, Andrea Gazzarini 
andrea.gazzar...@gmail.com wrote:


Hi Erick,
sorry if that wasn't clear: this is what I'm actually observing in my
application.

I wrote the first post after looking at the explain (debugQuery=true): the
query

q=mag 778 G 69

is translated as follow:


/  +((DisjunctionMaxQuery((//**myfield://*mag*//^3000.0)~0.1)
   DisjunctionMaxQuery((//**myfield://*778*//^3000.0)~0.1)
   DisjunctionMaxQuery((//**myfield://*g*//^3000.0)~0.1)
   DisjunctionMaxQuery((//**myfield://*69*//^3000.0)~0.1))**~4)
   DisjunctionMaxQuery((//**myfield://*mag778g69*//^3.**0)~0.1)/

It seems that althouhg I declare myfield with this type

/fieldtype name=type1 class=solr.TextField 

 analyzer
 tokenizer class=solr.**KeywordTokenizerFactory* /

 filter class=solr.**LowerCaseFilterFactory /
 filter class=solr.**WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=0
 catenateWords=0 catenateNumbers=0 
catenateAll=1**splitOnCaseChange=0
/
 /analyzer
/fieldtype

/SOLR is tokenizing it therefore by producing several tokens
(mag,778,g,69)/
/

And I can't put double quotes on the query (q=mag 778 G 69) because the
request handler searches also in other fields (with different configuration
chains)

As I understood the query parser, (i.e. query time), does a whitespace
tokenization on its own before invoking my (query-time) chain. The same
doesn't happen at index time...this is my problem...because at index time
the field is analyzed exactly as I want...but unfortunately cannot say the
same at query time.

Sorry for my wonderful english, did you get the point?


On 08/13/2013 02:18 PM, Erick Erickson wrote:


On a quick scan I don't see a problem here. Attach
debug=query to your url and that'll show you the
parsed query, which will in turn show you what's been
pushed

Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-26 Thread skorrapa

Hello All,

I am still facing the same issue. Case insensitive search isnot working on
Solr 4.3
I am using the below configurations in schema.xml
fieldType name=string_lower_case class=solr.TextField
sortMissingLast=true omitNorms=true  
  analyzer type = index  
tokenizer class=solr.WhitespaceTokenizerFactory/  
filter class=solr.LowerCaseFilterFactory/  
  /analyzer
analyzer type = query  
tokenizer class=solr.WhitespaceTokenizerFactory/  
filter class=solr.LowerCaseFilterFactory/  
  /analyzer 
analyzer type = select  
tokenizer class=solr.WhitespaceTokenizerFactory/  
filter class=solr.LowerCaseFilterFactory/  
  /analyzer
/fieldType
Basically I want my string which could have spaces or characters like '-' or
\ to be searched upon case insensitively. 
Please help.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Caused by: java.net.SocketException: Connection reset by peer: socket write error solr querying

2013-08-26 Thread aniljayanti

Hi Greg,

thanks for reply,

I tried to set the maxIdleTime to 30 milliSeconds. But still getting
same error.

WARN  - 2013-08-26 09:44:29.058; org.eclipse.jetty.server.Response;
Committed before 500 {msg=Connection reset by peer: socket write
error,trace=org.eclipse.jetty.io.EofException
at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at
org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:507)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:170)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketException: Connection reset by peer: socket write
error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at 
org.eclipse.jetty.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:375)
at 
org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:164)
at 
org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:182)
at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:841)
... 37 more
,code=500}
WARN  - 2013-08-26 09:44:29.060; org.eclipse.jetty.servlet.ServletHandler;
/solr/324/select
java.lang.IllegalStateException: Committed
at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1144)
at org.eclipse.jetty.server.Response.sendError(Response.java:314)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at

ERROR org.apache.solr.update.CommitTracker – auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher

2013-08-26 Thread zhaoxin

470665 [commitScheduler-14-thread-1] ERROR
org.apache.solr.update.CommitTracker  – auto commit
error...:org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1522)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1634)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:574)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.ClassCastException



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ERROR-org-apache-solr-update-CommitTracker-auto-commit-error-org-apache-solr-common-SolrException-Err-tp4086576.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-26 Thread skorrapa

I have also re indexed the data and tried. And also tried with the belowl
  fieldType name=string_lower_case class=solr.TextField
sortMissingLast=true omitNorms=true
  analyzer type = index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
analyzer type = query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
analyzer type = select
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType
This didnt work as well...



On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] 
ml-node+s472066n4086601...@n3.nabble.com wrote:

 Hello All,

 I am still facing the same issue. Case insensitive search isnot working on
 Solr 4.3
 I am using the below configurations in schema.xml
 fieldType name=string_lower_case class=solr.TextField
 sortMissingLast=true omitNorms=true
   analyzer type = index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 analyzer type = query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 analyzer type = select
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 Basically I want my string which could have spaces or characters like '-'
 or \ to be searched upon case insensitively.
 Please help.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
  To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4081896code=a29ycmFwYXRpLnN1c2htYUBnbWFpbC5jb218NDA4MTg5Nnw0MjEwNTY0Mzc=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tokenization at query time

2013-08-26 Thread Erick Erickson

Andrea:

Works for me, admittedly through the browser

I suspect the problem is here: ClientUtils.**escapeQueryChars

That doesn't do anything about escaping the spaces, it just handles
characters that have special meaning to the query syntax, things like +-
etc.

Using your field definition, this:
http://localhost:8983/solr/select?wt=jsonq=ab\ cd\
efdebug=querydefType=edismaxqf=name eoe
produced this output..

   - parsedquery_toString: +(eoe:abcdef | (name:ab name:cd name:ef)),


where the field eoe is your isbn_issn type.

Best,
Erick


On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini 
andrea.gazzar...@gmail.com wrote:

 Hi Erick,
 escaping spaces doesn't work...

 Briefly,

 - In a document I have an ISBN field that (stored value) is
 *978-90-04-23560-1*
 - In the index I have this value: *9789004235601*

 Now, I want be able to search the document by using:

 1) q=*978-90-04-23560-1*
 2) q=*978 90 04 23560 1*
 3) q=*9789004235601*

 1 and 3 works perfectly, 2 doesn't work.

 My code is:

 /SolrQuery query = new SolrQuery(ClientUtils.**escapeQueryChars(req.**
 getParameter(q)));/

 isbn is declared in this way

 fieldtype name=isbn_issn class=solr.TextField
 positionIncrementGap=100
 analyzer
 tokenizer class=*solr.**KeywordTokenizerFactory*/

 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=0
 catenateNumbers=0 catenateAll=1 splitOnCaseChange=0/
 /analyzer
 /fieldtype
 field name=isbn_issn_search type=issn_isbn indexed=true/
 search handler is:

 requestHandler name=any_bc class=solr.SearchHandler
 default=true
 lst name=defaults
 str name=defType*dismax*/str

 str name=mm100%/str
 str name=qf
 *isbn_issn_search*^1
 /str
 str name=pf
 *isbn_issn_search*^10
 /str
 int name=ps0/int
 float name=tie0.1/float
 ...
 /requestHandler

 This is what I get:

 *1) 978-90-04-23560-1**
 *path=/select params={start=0q=*978\-90\-**04\-23560\-1*sfield=qt=any_*
 *bcwt=javabinrows=10version=**2} *hits=1* status=0 QTime=5*

 2) ***9789004235601*
 *webapp=/solr path=/select params={start=0q=***
 9789004235601*sfield=qt=any_**bcwt=javabinrows=10version=**2}
 *hits=1* status=0 QTime=5*

 3) **978 90 04 23560 1**
 *path=/select params={start=0*q=978\+90\+**04\+23560\+1*sfield=qt=any_*
 *bcwt=javabinrows=10version=**2} *hits=0 *status=0 QTime=2*

 *Extract from queryDebug=true:

 str name=q978\ 90\ 04\ 23560\ 1/str
 ...
 str name=rawquerystring978\ 90\ 04\ 23560\ 1/str
 str name=querystring978\ 90\ 04\ 23560\ 1/str
 ...
 str name=parsedquery
 +((DisjunctionMaxQuery((isbn_**issn_search:*978*^1.0)~0.**1)
 DisjunctionMaxQuery((isbn_**issn_search:*90*^1.0)~0.1)
 DisjunctionMaxQuery((isbn_**issn_search:*04*^1.0)~0.1)
 DisjunctionMaxQuery((isbn_**issn_search:*23560*^1.0)~**0.1)
 DisjunctionMaxQuery((isbn_**issn_search:*1*^1.0)~0.1))**~5)
 DisjunctionMaxQuery((isbn_**issn_search:*9789004235601*^**
 10.0)~0.1)
 /str

 --**--
 Probably this is a very stupid question but I'm going crazy. In this page

 http://wiki.apache.org/solr/**DisMaxQParserPluginhttp://wiki.apache.org/solr/DisMaxQParserPlugin

 *Query Structure*

 /For each word in the query string, dismax builds a DisjunctionMaxQuery
 object for that word across all of the fields in the //qf//param...

 /And seems exactly what it is doing...but what is a word? How can I
 force//(without using double quotes) spaces in a way that they are
 considered part of the word/?

 /Many many many thanks
 Andrea


 On 08/13/2013 04:18 PM, Erick Erickson wrote:

 I think you can get what you want by escaping the space with a
 backslash

 YMMV of course.
 Erick


 On Tue, Aug 13, 2013 at 9:11 AM, Andrea Gazzarini 
 andrea.gazzar...@gmail.com wrote:

  Hi Erick,
 sorry if that wasn't clear: this is what I'm actually observing in my
 application.

 I wrote the first post after looking at the explain (debugQuery=true):
 the
 query

 q=mag 778 G 69

 is translated as follow:


 /  +((DisjunctionMaxQuery((//myfield://*mag*//^3000.0)~0.1)
DisjunctionMaxQuery((//myfield://*778*//^3000.0)~0.1)
DisjunctionMaxQuery((//myfield://*g*//^3000.0)~0.1)
DisjunctionMaxQuery((//myfield://*69*//^3000.0)~0.1))~4)
DisjunctionMaxQuery((//myfield://*mag778g69*//^3.**
 **0)~0.1)/

 It seems that althouhg I declare myfield with this type

 /fieldtype name=type1 class=solr.TextField 

  analyzer
  tokenizer class=solr.KeywordTokenizerFactory* /

  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0
  catenateWords=0 catenateNumbers=0 catenateAll=1

Re: Tokenization at query time

2013-08-26 Thread Andrea Gazzarini


Hi Erick,
sorry I forgot the SOLR version...is the 3.6.0

ClientUtils in that version does whitespace escaping:

  public static String escapeQueryChars(String s) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i  s.length(); i++) {
  char c = s.charAt(i);
  // These characters are part of the query syntax and must be escaped
  if (c == '\\' || c == '+' || c == '-' || c == '!'  || c == '(' || 
c == ')' || c == ':'
|| c == '^' || c == '[' || c == ']' || c == '\' || c == '{' || 
c == '}' || c == '~'

|| c == '*' || c == '?' || c == '|' || c == ''  || c == ';'
|| Character.isWhitespace(c)) {
sb.append('\\');
  }
  sb.append(c);
}
return sb.toString();
  }

Now, I solved the issue but not really sure about that.

Debugging the code I saw that the query string (on the SearchHandler)

978\ 90\ 04\ 23560\ 1

once passed through DismaxQueryParser (specifically through 
SolrPluginUtils.partialEscape(CharSequence)

becames

978\\ 90\\ 04\\ 23560\\ 1

because that method escapes the backslashes

So, using the eclipse debugger I removed at runtime the additional backslash 
and it works perfectly but of course...I can't do that in production for every 
search :D

So, just to try I changed dismax in edismax which, I saw, doesn't call 
SolrPluginUtilsand it works perfectly!

I saw in your query string that you used edismax too...maybe is that the point?

Many thanks
Andrea

On 08/26/2013 02:47 PM, Erick Erickson wrote:

Andrea:

Works for me, admittedly through the browser

I suspect the problem is here: ClientUtils.**escapeQueryChars

That doesn't do anything about escaping the spaces, it just handles
characters that have special meaning to the query syntax, things like +-
etc.

Using your field definition, this:
http://localhost:8983/solr/select?wt=jsonq=ab\ cd\
efdebug=querydefType=edismaxqf=name eoe
produced this output..

- parsedquery_toString: +(eoe:abcdef | (name:ab name:cd name:ef)),


where the field eoe is your isbn_issn type.

Best,
Erick


On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini 
andrea.gazzar...@gmail.com wrote:


Hi Erick,
escaping spaces doesn't work...

Briefly,

- In a document I have an ISBN field that (stored value) is
*978-90-04-23560-1*
- In the index I have this value: *9789004235601*

Now, I want be able to search the document by using:

1) q=*978-90-04-23560-1*
2) q=*978 90 04 23560 1*
3) q=*9789004235601*

1 and 3 works perfectly, 2 doesn't work.

My code is:

/SolrQuery query = new SolrQuery(ClientUtils.**escapeQueryChars(req.**
getParameter(q)));/

isbn is declared in this way

fieldtype name=isbn_issn class=solr.TextField
positionIncrementGap=100
 analyzer
 tokenizer class=*solr.**KeywordTokenizerFactory*/

 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=0 catenateWords=0
catenateNumbers=0 catenateAll=1 splitOnCaseChange=0/
 /analyzer
/fieldtype
field name=isbn_issn_search type=issn_isbn indexed=true/
search handler is:

 requestHandler name=any_bc class=solr.SearchHandler
default=true
 lst name=defaults
 str name=defType*dismax*/str

 str name=mm100%/str
 str name=qf
*isbn_issn_search*^1
 /str
 str name=pf
*isbn_issn_search*^10
 /str
 int name=ps0/int
 float name=tie0.1/float
 ...
 /requestHandler

This is what I get:

*1) 978-90-04-23560-1**
*path=/select params={start=0q=*978\-90\-**04\-23560\-1*sfield=qt=any_*
*bcwt=javabinrows=10version=**2} *hits=1* status=0 QTime=5*

2) ***9789004235601*
*webapp=/solr path=/select params={start=0q=***
9789004235601*sfield=qt=any_**bcwt=javabinrows=10version=**2}
*hits=1* status=0 QTime=5*

3) **978 90 04 23560 1**
*path=/select params={start=0*q=978\+90\+**04\+23560\+1*sfield=qt=any_*
*bcwt=javabinrows=10version=**2} *hits=0 *status=0 QTime=2*

*Extract from queryDebug=true:

str name=q978\ 90\ 04\ 23560\ 1/str
...
str name=rawquerystring978\ 90\ 04\ 23560\ 1/str
str name=querystring978\ 90\ 04\ 23560\ 1/str
...
str name=parsedquery
 +((DisjunctionMaxQuery((isbn_**issn_search:*978*^1.0)~0.**1)
 DisjunctionMaxQuery((isbn_**issn_search:*90*^1.0)~0.1)
 DisjunctionMaxQuery((isbn_**issn_search:*04*^1.0)~0.1)
 DisjunctionMaxQuery((isbn_**issn_search:*23560*^1.0)~**0.1)
 DisjunctionMaxQuery((isbn_**issn_search:*1*^1.0)~0.1))**~5)
 DisjunctionMaxQuery((isbn_**issn_search:*9789004235601*^**
10.0)~0.1)
/str

--**--
Probably this is a very stupid question but I'm going crazy. In this page

http://wiki.apache.org/solr/**DisMaxQParserPluginhttp://wiki.apache.org/solr/DisMaxQParserPlugin

*Query Structure*

/For each word in the query string, dismax builds a DisjunctionMaxQuery
object for that word across all of the fields in the

adding support for deleteInstanceDir from solrj

2013-08-26 Thread Lyuba Romanchuk

Hi all,

Did anyone have a chance to look at the code?

It's attached here: https://issues.apache.org/jira/browse/SOLR-5023.



Thank you very much.

Lyuba

Re: Adding one core to an existing core?

2013-08-26 Thread Bruno Mannina


Dear Solr User,

now I have 2 cores collection1 collection2

Default collection is the Collection1

I have two questions:

- Is exist a parameter to add in my html link to indicate the selected 
core?

http://xxx.xxx.xxx.xxx/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on

I mean by default is the collection1, if I want collection2 I use the 
link:

http://xxx.xxx.xxx.xxx/solr/collection2/select/?q=*%3A*version=2.2start=0rows=10indent=on

Is exist a param core=collection2 instead of using a different link?


- My second question concerns updating.
Actually with one core, I do:
java -jar post.jar foo.xml

I suppose now I must add the desire core ? no ?
i.e.: -Dcore=collection2

What is the param to add in my command line?

Thanks a lot !

Bruno





Le 22/08/2013 16:23, Andrea Gazzarini a écrit :
First, a core is a separate index so it is completely indipendent from 
the already existing core(s). So basically you don't need to reindex.


In order to have two cores (but the same applies for n cores): you 
must have in your solr.home the file (solr.xml) described here


http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

then, you must obviously have one or two directories (corresponding to 
the instanceDir attribute). I said one or two because if the indexes 
configuration is basically the same (or something changes but is 
dynamically configured - i.e. core name) you can create two instances 
starting from the same configuration. I mean


solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=*conf.dir* /
  core name=core1 instanceDir=*conf.dir* /
 /cores
/solr

Otherwise you must have two different conf directories that contain 
indexes configuration. You should already have a first one (the 
current core), you just need to have another conf dir with 
solrconfig.xml, schema.xml and other required files. In this case each 
core will have its own instanceDir.


solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=*conf.dir.core0* /
  core name=core1 instanceDir=*conf.dir.core1* /
 /cores
/solr

Best,
Andrea



On 08/22/2013 04:04 PM, Bruno Mannina wrote:

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add 
one another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno

Re: Different Responses for 4.4 and 3.5 solr index

2013-08-26 Thread Stefan Matheis

Did you check the scoring? (use fl=*,score to retrieve it) .. additionally 
debugQuery=true might provide more information about how the score was 
calculated.

- Stefan 


On Monday, August 26, 2013 at 12:46 AM, Kuchekar wrote:

 Hi,
 The response from 4.4 and 3.5 in the current scenario differs in the
 sequence in which results are given us back.
 
 For example :
 
 Response from 3.5 solr is : id:A, id:B, id:C, id:D ...
 Response from 4.4 solr is : id C, id:A, id:D, id:B...
 
 Looking forward your reply.
 
 Thanks.
 Kuchekar, Nilesh
 
 
 On Sun, Aug 25, 2013 at 11:32 AM, Stefan Matheis
 matheis.ste...@gmail.com (mailto:matheis.ste...@gmail.com)wrote:
 
  Kuchekar (hope that's your first name?)
  
  you didn't tell us .. how they differ? do you get an actual error? or does
  the result contain documents you didn't expect? or the other way round,
  that some are missing you'd expect to be there?
  
  - Stefan
  
  
  On Sunday, August 25, 2013 at 4:43 PM, Kuchekar wrote:
  
   Hi,
   
   We get different response when we query 4.4 and 3.5 solr using same
   query params.
   
   My query param are as following :
   
   facet=true
   facet.mincount=1
   facet.limit=25
   
  
  qf=content^0.0+p_last_name^500.0+p_first_name^50.0+strong_topic^0.0+first_author_topic^0.0+last_author_topic^0.0+title_topic^0.0
   wt=javabin
   version=2
   rows=10
   f.affiliation_org.facet.limit=150
   fl=p_id,p_first_name,p_last_name
   start=0
   q=Apple
   facet.field=affiliation_org
   fq=table:profile
   fq=num_content:[*+TO+1500]
   fq=name:Apple
   
   The content in both (solr 4.4 and solr 3.5) are same.
   
   The solrconfig.xml from 3.5 an 4.4 are similarly constructed.
   
   Is there something I am missing that might have been changed in 4.4,
  which
   might be causing this issue. ?. The qf params looks same.
   
   Looking forward for your reply.
   
   Thanks.
   Kuchekar, Nilesh

Default query operator OR wont work in some cases

2013-08-26 Thread smanad

Hi, 

I have some documents with keywords egg and some with salad and some
with egg salad.
When I search for egg salad, I expect to see egg results + salad results. I
dont see them. 
egg and salad queries individually work fine. 
I am using whitespacetokenizer.

Not sure if I am missing something.
Thanks, 
-Manasi 
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: custom names for replicas in solrcloud

2013-08-26 Thread smanad

Is coreNodeName exposed via collections api?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205p4086628.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tokenization at query time

2013-08-26 Thread Erick Erickson

right, edismax is much preferred, dismax hasn't been formally deprecated,
but almost nobody uses it...

I'd be really careful about adding whitespace to the list of escape chars
because it changes the semantics of the search. While it'll work for this
specific case, if you use it in other cases it will change the sense of the
query. This may be OK, but be careful, it might be better to do this
specifically on an as-needed basis...

But you know your problem space best

Best,
Erick


On Mon, Aug 26, 2013 at 9:04 AM, Andrea Gazzarini 
andrea.gazzar...@gmail.com wrote:

 Hi Erick,
 sorry I forgot the SOLR version...is the 3.6.0

 ClientUtils in that version does whitespace escaping:

   public static String escapeQueryChars(String s) {
 StringBuilder sb = new StringBuilder();
 for (int i = 0; i  s.length(); i++) {
   char c = s.charAt(i);
   // These characters are part of the query syntax and must be escaped
   if (c == '\\' || c == '+' || c == '-' || c == '!'  || c == '(' || c
 == ')' || c == ':'
 || c == '^' || c == '[' || c == ']' || c == '\' || c == '{' || c
 == '}' || c == '~'
 || c == '*' || c == '?' || c == '|' || c == ''  || c == ';'
 || Character.isWhitespace(c)) {
 sb.append('\\');
   }
   sb.append(c);
 }
 return sb.toString();
   }

 Now, I solved the issue but not really sure about that.

 Debugging the code I saw that the query string (on the SearchHandler)


 978\ 90\ 04\ 23560\ 1

 once passed through DismaxQueryParser (specifically through
 SolrPluginUtils.partialEscape(**CharSequence)

 becames


 978\\ 90\\ 04\\ 23560\\ 1

 because that method escapes the backslashes

 So, using the eclipse debugger I removed at runtime the additional
 backslash and it works perfectly but of course...I can't do that in
 production for every search :D

 So, just to try I changed dismax in edismax which, I saw, doesn't call
 SolrPluginUtilsand it works perfectly!

 I saw in your query string that you used edismax too...maybe is that the
 point?

 Many thanks
 Andrea


 On 08/26/2013 02:47 PM, Erick Erickson wrote:

 Andrea:

 Works for me, admittedly through the browser

 I suspect the problem is here: ClientUtils.**escapeQueryChars


 That doesn't do anything about escaping the spaces, it just handles
 characters that have special meaning to the query syntax, things like +-
 etc.

 Using your field definition, this:
 http://localhost:8983/solr/**select?wt=jsonq=ab\http://localhost:8983/solr/select?wt=jsonq=ab%5Ccd\
 efdebug=querydefType=**edismaxqf=name eoe
 produced this output..

 - parsedquery_toString: +(eoe:abcdef | (name:ab name:cd name:ef)),



 where the field eoe is your isbn_issn type.

 Best,
 Erick


 On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini 
 andrea.gazzar...@gmail.com wrote:

  Hi Erick,
 escaping spaces doesn't work...

 Briefly,

 - In a document I have an ISBN field that (stored value) is
 *978-90-04-23560-1*
 - In the index I have this value: *9789004235601*

 Now, I want be able to search the document by using:

 1) q=*978-90-04-23560-1*
 2) q=*978 90 04 23560 1*
 3) q=*9789004235601*

 1 and 3 works perfectly, 2 doesn't work.

 My code is:

 /SolrQuery query = new SolrQuery(ClientUtils.escapeQueryChars(req.**

 getParameter(q)));/

 isbn is declared in this way

 fieldtype name=isbn_issn class=solr.TextField
 positionIncrementGap=100
  analyzer
  tokenizer class=*solr.KeywordTokenizerFactory*/


  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=0
 catenateNumbers=0 catenateAll=1 splitOnCaseChange=0/
  /analyzer
 /fieldtype
 field name=isbn_issn_search type=issn_isbn indexed=true/
 search handler is:

  requestHandler name=any_bc class=solr.SearchHandler
 default=true
  lst name=defaults
  str name=defType*dismax*/str

  str name=mm100%/str
  str name=qf
 *isbn_issn_search*^1
  /str
  str name=pf
 *isbn_issn_search*^10
  /str
  int name=ps0/int
  float name=tie0.1/float
  ...
  /requestHandler

 This is what I get:

 *1) 978-90-04-23560-1**
 *path=/select params={start=0q=*978\-90\-
 04\-23560\-1*sfield=qt=any_*
 *bcwt=javabinrows=10**version=**2} *hits=1* status=0 QTime=5*

 2) ***9789004235601*
 *webapp=/solr path=/select params={start=0q=***
 9789004235601*sfield=qt=any_bcwt=javabinrows=10**version=**2}

 *hits=1* status=0 QTime=5*

 3) **978 90 04 23560 1**
 *path=/select params={start=0*q=978\+90\+
 04\+23560\+1*sfield=qt=any_*
 *bcwt=javabinrows=10**version=**2} *hits=0 *status=0 QTime=2*


 *Extract from queryDebug=true:

 str name=q978\ 90\ 04\ 23560\ 1/str
 ...
 str name=rawquerystring978\ 90\ 04\ 23560\ 1/str
 str name=querystring978\ 90\ 04\ 23560\ 1/str
 ...
 str name=parsedquery

Re: Tokenization at query time

2013-08-26 Thread Andrea Gazzarini


On 08/26/2013 04:09 PM, Erick Erickson wrote:

right, edismax is much preferred, dismax hasn't been formally deprecated,
but almost nobody uses it...

Good to know...I basically use dismax in ALL my SOLR instances :D

I'd be really careful about adding whitespace to the list of escape chars
because it changes the semantics of the search. While it'll work for this
specific case, if you use it in other cases it will change the sense of the
query. This may be OK, but be careful, it might be better to do this
specifically on an as-needed basis...
Yes, that's the reason why I'm not really sure about what I did...I'm 
running my regression tests...all seems green...let's see

But you know your problem space best

Best,
Erick

Thank you very much

Best,
Gazza



On Mon, Aug 26, 2013 at 9:04 AM, Andrea Gazzarini 
andrea.gazzar...@gmail.com wrote:


Hi Erick,
sorry I forgot the SOLR version...is the 3.6.0

ClientUtils in that version does whitespace escaping:

   public static String escapeQueryChars(String s) {
 StringBuilder sb = new StringBuilder();
 for (int i = 0; i  s.length(); i++) {
   char c = s.charAt(i);
   // These characters are part of the query syntax and must be escaped
   if (c == '\\' || c == '+' || c == '-' || c == '!'  || c == '(' || c
== ')' || c == ':'
 || c == '^' || c == '[' || c == ']' || c == '\' || c == '{' || c
== '}' || c == '~'
 || c == '*' || c == '?' || c == '|' || c == ''  || c == ';'
 || Character.isWhitespace(c)) {
 sb.append('\\');
   }
   sb.append(c);
 }
 return sb.toString();
   }

Now, I solved the issue but not really sure about that.

Debugging the code I saw that the query string (on the SearchHandler)


978\ 90\ 04\ 23560\ 1

once passed through DismaxQueryParser (specifically through
SolrPluginUtils.partialEscape(**CharSequence)

becames


978\\ 90\\ 04\\ 23560\\ 1

because that method escapes the backslashes

So, using the eclipse debugger I removed at runtime the additional
backslash and it works perfectly but of course...I can't do that in
production for every search :D

So, just to try I changed dismax in edismax which, I saw, doesn't call
SolrPluginUtilsand it works perfectly!

I saw in your query string that you used edismax too...maybe is that the
point?

Many thanks
Andrea


On 08/26/2013 02:47 PM, Erick Erickson wrote:


Andrea:

Works for me, admittedly through the browser

I suspect the problem is here: ClientUtils.**escapeQueryChars


That doesn't do anything about escaping the spaces, it just handles
characters that have special meaning to the query syntax, things like +-
etc.

Using your field definition, this:
http://localhost:8983/solr/**select?wt=jsonq=ab\http://localhost:8983/solr/select?wt=jsonq=ab%5Ccd\
efdebug=querydefType=**edismaxqf=name eoe
produced this output..

 - parsedquery_toString: +(eoe:abcdef | (name:ab name:cd name:ef)),



where the field eoe is your isbn_issn type.

Best,
Erick


On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini 
andrea.gazzar...@gmail.com wrote:

  Hi Erick,

escaping spaces doesn't work...

Briefly,

- In a document I have an ISBN field that (stored value) is
*978-90-04-23560-1*
- In the index I have this value: *9789004235601*

Now, I want be able to search the document by using:

1) q=*978-90-04-23560-1*
2) q=*978 90 04 23560 1*
3) q=*9789004235601*

1 and 3 works perfectly, 2 doesn't work.

My code is:

/SolrQuery query = new SolrQuery(ClientUtils.escapeQueryChars(req.**

getParameter(q)));/

isbn is declared in this way

fieldtype name=isbn_issn class=solr.TextField
positionIncrementGap=100
  analyzer
  tokenizer class=*solr.KeywordTokenizerFactory*/


  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=0 catenateWords=0
catenateNumbers=0 catenateAll=1 splitOnCaseChange=0/
  /analyzer
/fieldtype
field name=isbn_issn_search type=issn_isbn indexed=true/
search handler is:

  requestHandler name=any_bc class=solr.SearchHandler
default=true
  lst name=defaults
  str name=defType*dismax*/str

  str name=mm100%/str
  str name=qf
*isbn_issn_search*^1
  /str
  str name=pf
*isbn_issn_search*^10
  /str
  int name=ps0/int
  float name=tie0.1/float
  ...
  /requestHandler

This is what I get:

*1) 978-90-04-23560-1**
*path=/select params={start=0q=*978\-90\-
04\-23560\-1*sfield=qt=any_*
*bcwt=javabinrows=10**version=**2} *hits=1* status=0 QTime=5*

2) ***9789004235601*
*webapp=/solr path=/select params={start=0q=***
9789004235601*sfield=qt=any_bcwt=javabinrows=10**version=**2}

*hits=1* status=0 QTime=5*

3) **978 90 04 23560 1**
*path=/select params={start=0*q=978\+90\+
04\+23560\+1*sfield=qt=any_*
*bcwt=javabinrows=10**version=**2} *hits=0 *status=0 QTime=2*


*Extract from

autoCommit and autoSoftCommit

2013-08-26 Thread Bryan Bende

I'm running Solr 4.3 with:

autoCommit
  maxTime6/maxTIme
  openSearcherfalse/openSearcher
/autoCommit

autoSoftCommit
  maxTime5000/maxTime
/autoSoftCommit

When I start Solr and send in a couple of hundred documents, I am able to
retrieve documents after 5 seconds using SolrJ. However, from the Solr
admin console if I query for *:* it will show that there are docs in the
numFound attribute, but none of the results have the stored fields present.

As a test I also tried modifying the autoCommit to add maxDocs like this:
autoCommit
  maxDocs100/maxDocs
  maxTime6/maxTIme
  openSearcherfalse/openSearcher
/autoCommit

It seems like with this configuration something different happens... if I
send in 150 docs then the first 100 will show up correctly through Solr
admin, but the last 50 that didn't hit the maxDocs threshold still don't
show the stored fields.

Is it expected that maxDocs and maxTime do something different when
commiting ?

If using autoCommit with openSearcher=false and autoSoftCommit, does the
client ever have to send a hard commit with openSearcher=true ?

- Bryan

Can a data import handler grab all pages of an RSS feed?

2013-08-26 Thread eShard

Good morning,
I have an IBM Portal atom feed that spans multiple pages.
Is there a way to instruct the DIH to grab all available pages?
I can put a huge range in but that can be extremely slow with large amounts
of XML data.
I'm currently using Solr 4.0 final.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-data-import-handler-grab-all-pages-of-an-RSS-feed-tp4086635.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Walter Underwood

What is the precise error? What kind of machine?

File buffers are a robust part of the OS. Unix has had file buffer caching for 
decades.

wunder

On Aug 26, 2013, at 1:37 AM, Furkan KAMACI wrote:

 Hi Walter;
 
 You are right about performance. However when I index documents on a
 machine that has  a high percentage of Physical Memory usage I get EOF
 errors?
 
 
 2013/8/26 Walter Underwood wun...@wunderwood.org
 
 On Aug 25, 2013, at 1:41 PM, Furkan KAMACI wrote:
 
 Sometimes Physical Memory usage of Solr is over %99 and this may cause
 problems. Do you run such kind of a command periodically:
 
 sudo sh -c sync; echo 3  /proc/sys/vm/drop_caches
 
 to force dropping caches of machine that Solr runs at and avoid problems?
 
 
 This is a terrible idea. The OS automatically manages the file buffers.
 When they are all used, that is a  good thing, because it reduced disk IO.
 
 After this, no files will be cached in RAM. Every single read from a file
 will have to go to disk. This will cause very slow performance until the
 files are recached.
 
 Recently, I did exactly the opposite to improve performance in our Solr
 installation. Before starting the Solr process, a script reads every file
 in the index so that it will already be in file buffers. This avoids
 several minutes of high disk IO and slow performance after startup.
 
 wunder
 Search Guy, Chegg.com
 
 
 

--
Walter Underwood
wun...@wunderwood.org

Re: custom names for replicas in solrcloud

2013-08-26 Thread Jack Krupansky


No, it is part of the core admin API.


-- Jack Krupansky

-Original Message- 
From: smanad

Sent: Monday, August 26, 2013 10:02 AM
To: solr-user@lucene.apache.org
Subject: Re: custom names for replicas in solrcloud

Is coreNodeName exposed via collections api?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205p4086628.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Caused by: java.net.SocketException: Connection reset by peer: socket write error solr querying

2013-08-26 Thread Greg Walters

AnilJayanti,

Have you checked your entire stack from the client all the way to solr along 
with anything between them? Your timeout values should match everywhere and if 
there's something between the client and server that'll timeout before either 
the client or server does it'll cause that error as well.

A quick google search shows similar causes:
http://stackoverflow.com/questions/13719645/comitted-before-500-null-error-in-solr-3-6-1
http://lucene.472066.n3.nabble.com/jetty-error-broken-pipe-td3522120.html

How long after the client sends a request does it take for that error to show 
up in the logs and what happens client side when you see the error?


-Original Message-
From: aniljayanti [mailto:aniljaya...@yahoo.co.in] 
Sent: Sunday, August 25, 2013 11:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Caused by: java.net.SocketException: Connection reset by peer: 
socket write error solr querying

Hi Greg,

thanks for reply,

I tried to set the maxIdleTime to 30 milliSeconds. But still getting same 
error.

WARN  - 2013-08-26 09:44:29.058; org.eclipse.jetty.server.Response;
Committed before 500 {msg=Connection reset by peer: socket write 
error,trace=org.eclipse.jetty.io.EofException
at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at
org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:507)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:170)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketException: Connection reset by peer: socket write 
error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at 
org.eclipse.jetty.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:375)
at

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Furkan KAMACI

It has a 48 GB of RAM and index size is nearly 100 GB at each node. I have
CentOS 6.4. While indexing I got that error and I am suspicious about that
it is because of high percentage of Physical Memory usage.

ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
early EOF
at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.eclipse.jetty.io.EofException: early EOF
at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
at java.io.InputStream.read(InputStream.java:101)
at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
... 36 more



2013/8/26 Walter Underwood wun...@wunderwood.org

 What is the precise error? What kind of machine?

 File buffers are a robust part of the OS. Unix has had file buffer caching
 for decades.

 wunder

 On Aug 26, 2013, at 1:37 AM, Furkan KAMACI wrote:

  Hi Walter;
 
  You are right about performance. However when I index documents on a
  machine that has  a high percentage of Physical Memory usage I get EOF
  errors?
 
 
  2013/8/26 Walter Underwood wun...@wunderwood.org
 
  On Aug 25, 2013, at 1:41 PM, Furkan KAMACI wrote:

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Walter Underwood

It looks lik that error happens when reading XML from an HTTP request. The XML 
ends too soon. This should be unrelated to file buffers.

wunder

On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote:

 It has a 48 GB of RAM and index size is nearly 100 GB at each node. I have
 CentOS 6.4. While indexing I got that error and I am suspicious about that
 it is because of high percentage of Physical Memory usage.
 
 ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
 java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
 early EOF
 at
 com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
 at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
 at
 com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
 at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
 at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
 at
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:365)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.eclipse.jetty.io.EofException: early EOF
 at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
 at java.io.InputStream.read(InputStream.java:101)
 at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
 at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
 at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
 at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
 at
 com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
 at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
 at
 com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
 at
 com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
 at
 com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
 at
 com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
 ... 36 more
 
 
 
 2013/8/26 Walter Underwood wun...@wunderwood.org
 
 What is the precise error? What kind of machine?
 
 File buffers are a robust part of the OS. Unix has had file buffer caching
 for decades.
 
 wunder
 
 On Aug 26, 2013, at 1:37 AM,

Master / Slave Set Up Documentation

2013-08-26 Thread Jared Griffith

Hello,
I'm new to this Solr thing, and I was wondering if there is any good /
solid documentation on setting up and running replication.  I'm going
through the Wiki but I am not seeing anything that is obvious there.

-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814

http://www.picsauditing.com

17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!

https://twitter.com/PICSAuditingLLC

Re: ERROR org.apache.solr.update.CommitTracker – auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher

2013-08-26 Thread Shawn Heisey

On 8/26/2013 1:54 AM, zhaoxin wrote:
 Caused by: java.lang.ClassCastException

Generally when you get this kind of error with Solr, it means you have a
mix of old and new jars.  This might be from an upgrade, where either
the old war expansion doesn't get removed, or from unnecessarily
including jars on your classpath.  If you are using custom code or a
code patch, it probably needs changing for a new Solr version.

Thanks,
Shawn

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Furkan KAMACI

Hi Walter;

You said you are caching your documents. What is average Physical Memory
usage of your Solr Nodes?


2013/8/26 Walter Underwood wun...@wunderwood.org

 It looks lik that error happens when reading XML from an HTTP request. The
 XML ends too soon. This should be unrelated to file buffers.

 wunder

 On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote:

  It has a 48 GB of RAM and index size is nearly 100 GB at each node. I
 have
  CentOS 6.4. While indexing I got that error and I am suspicious about
 that
  it is because of high percentage of Physical Memory usage.
 
  ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
  java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
  early EOF
  at
 
 com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
  at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
  at
 
 com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
  at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
  at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
  at
 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
  at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
  at
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
  at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
  at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
  at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:365)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
  at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
  at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
  at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
  at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
  at
 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
  at java.lang.Thread.run(Thread.java:722)
  Caused by: org.eclipse.jetty.io.EofException: early EOF
  at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
  at java.io.InputStream.read(InputStream.java:101)
  at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
  at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
  at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
  at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
  at
 
 com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
  at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
  at
 
 com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
  at
 
 com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
  at
 
 com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
  at

Re: Adding one core to an existing core?

2013-08-26 Thread Jack Krupansky


Unfortunately, there is no -Dcore property, so you have to due -Durl -

java -Durl=http://localhost:8983/solr/collection2/update ... -jar post.jar 
...


You have the proper /select syntax.

-- Jack Krupansky

-Original Message- 
From: Bruno Mannina

Sent: Monday, August 26, 2013 9:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Adding one core to an existing core?

Dear Solr User,

now I have 2 cores collection1 collection2

Default collection is the Collection1

I have two questions:

- Is exist a parameter to add in my html link to indicate the selected
core?
http://xxx.xxx.xxx.xxx/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on

I mean by default is the collection1, if I want collection2 I use the
link:
http://xxx.xxx.xxx.xxx/solr/collection2/select/?q=*%3A*version=2.2start=0rows=10indent=on

Is exist a param core=collection2 instead of using a different link?


- My second question concerns updating.
Actually with one core, I do:
java -jar post.jar foo.xml

I suppose now I must add the desire core ? no ?
i.e.: -Dcore=collection2

What is the param to add in my command line?

Thanks a lot !

Bruno





Le 22/08/2013 16:23, Andrea Gazzarini a écrit :
First, a core is a separate index so it is completely indipendent from the 
already existing core(s). So basically you don't need to reindex.


In order to have two cores (but the same applies for n cores): you must 
have in your solr.home the file (solr.xml) described here


http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

then, you must obviously have one or two directories (corresponding to the 
instanceDir attribute). I said one or two because if the indexes 
configuration is basically the same (or something changes but is 
dynamically configured - i.e. core name) you can create two instances 
starting from the same configuration. I mean


solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=*conf.dir* /
  core name=core1 instanceDir=*conf.dir* /
 /cores
/solr

Otherwise you must have two different conf directories that contain 
indexes configuration. You should already have a first one (the current 
core), you just need to have another conf dir with solrconfig.xml, 
schema.xml and other required files. In this case each core will have its 
own instanceDir.


solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=*conf.dir.core0* /
  core name=core1 instanceDir=*conf.dir.core1* /
 /cores
/solr

Best,
Andrea



On 08/22/2013 04:04 PM, Bruno Mannina wrote:

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add one 
another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno

Re: Master / Slave Set Up Documentation

2013-08-26 Thread Andrea Gazzarini

You mean this

http://wiki.apache.org/solr/SolrReplication

?

What's wrong with this page? It seems clear.
I'm widely using replication and the first time I set up a 1 master + 2
slaves by simply following that page
On 26 Aug 2013 18:54, Jared Griffith jgriff...@picsauditing.com wrote:

 Hello,
 I'm new to this Solr thing, and I was wondering if there is any good /
 solid documentation on setting up and running replication.  I'm going
 through the Wiki but I am not seeing anything that is obvious there.

 --

 Jared Griffith
 Linux Administrator, PICS Auditing, LLC
 P: (949) 936-4574
 C: (909) 653-7814

 http://www.picsauditing.com

 17701 Cowan #140 | Irvine, CA | 92614

 Join PICS on LinkedIn and Twitter!

 https://twitter.com/PICSAuditingLLC

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Walter Underwood

We use Amazon EC2 machines with 34GB of memory (m2.2xlarge). The Solr heap is 
8GB. We have several cores, totaling about 14GB on disk. This configuration 
allows 100% of the indexes to be in file buffers.

wunder

On Aug 26, 2013, at 9:57 AM, Furkan KAMACI wrote:

 Hi Walter;
 
 You said you are caching your documents. What is average Physical Memory
 usage of your Solr Nodes?
 
 
 2013/8/26 Walter Underwood wun...@wunderwood.org
 
 It looks lik that error happens when reading XML from an HTTP request. The
 XML ends too soon. This should be unrelated to file buffers.
 
 wunder
 
 On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote:
 
 It has a 48 GB of RAM and index size is nearly 100 GB at each node. I
 have
 CentOS 6.4. While indexing I got that error and I am suspicious about
 that
 it is because of high percentage of Physical Memory usage.
 
 ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
 java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
 early EOF
 at
 
 com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
 at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
 at
 
 com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
 at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
 at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
 at
 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
 at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
 at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:365)
 at
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
 at
 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.eclipse.jetty.io.EofException: early EOF
 at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
 at java.io.InputStream.read(InputStream.java:101)
 at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
 at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
 at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
 at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
 at
 
 com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
 at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
 at

Re: Master / Slave Set Up Documentation

2013-08-26 Thread Jared Griffith

Ha, I guess I didn't see that page listed in the Table of contents 
it's definitely Monday.  Thanks.


On Mon, Aug 26, 2013 at 10:36 AM, Andrea Gazzarini 
andrea.gazzar...@gmail.com wrote:

 You mean this

 http://wiki.apache.org/solr/SolrReplication

 ?

 What's wrong with this page? It seems clear.
 I'm widely using replication and the first time I set up a 1 master + 2
 slaves by simply following that page
 On 26 Aug 2013 18:54, Jared Griffith jgriff...@picsauditing.com wrote:

  Hello,
  I'm new to this Solr thing, and I was wondering if there is any good /
  solid documentation on setting up and running replication.  I'm going
  through the Wiki but I am not seeing anything that is obvious there.
 
  --
 
  Jared Griffith
  Linux Administrator, PICS Auditing, LLC
  P: (949) 936-4574
  C: (909) 653-7814
 
  http://www.picsauditing.com
 
  17701 Cowan #140 | Irvine, CA | 92614
 
  Join PICS on LinkedIn and Twitter!
 
  https://twitter.com/PICSAuditingLLC
 




-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814

http://www.picsauditing.com

17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!

https://twitter.com/PICSAuditingLLC

Re: Grouping

2013-08-26 Thread tvellore

I'm getting the same error...Is there any workaround to this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-tp2820116p4086674.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-26 Thread Erick Erickson

What is a select analyzer type? Never seen one of those before
or I'm just blanking

Either of those types should work for case-insensitive search, did
you re-index?

And please don't hijack threads, start a new subject with new
questions.

Best
Erick



On Mon, Aug 26, 2013 at 7:42 AM, skorrapa korrapati.sus...@gmail.comwrote:

 I have also re indexed the data and tried. And also tried with the belowl
   fieldType name=string_lower_case class=solr.TextField
 sortMissingLast=true omitNorms=true
   analyzer type = index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 analyzer type = query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 analyzer type = select
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 This didnt work as well...



 On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] 
 ml-node+s472066n4086601...@n3.nabble.com wrote:

  Hello All,
 
  I am still facing the same issue. Case insensitive search isnot working
 on
  Solr 4.3
  I am using the below configurations in schema.xml
  fieldType name=string_lower_case class=solr.TextField
  sortMissingLast=true omitNorms=true
analyzer type = index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
/analyzer
  analyzer type = query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
/analyzer
  analyzer type = select
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
/analyzer
  /fieldType
  Basically I want my string which could have spaces or characters like '-'
  or \ to be searched upon case insensitively.
  Please help.
 
 
  --
   If you reply to this email, your message will be added to the discussion
  below:
 
 
 http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
   To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4081896code=a29ycmFwYXRpLnN1c2htYUBnbWFpbC5jb218NDA4MTg5Nnw0MjEwNTY0Mzc=
 
  .
  NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Default query operator OR wont work in some cases

2013-08-26 Thread Erick Erickson

Try adding debug=query to your URL, that'll show you
how the parsing actually happened and should give you
some pointers.

Best,
Erick


On Mon, Aug 26, 2013 at 9:55 AM, smanad sma...@gmail.com wrote:

 Hi,

 I have some documents with keywords egg and some with salad and some
 with egg salad.
 When I search for egg salad, I expect to see egg results + salad results. I
 dont see them.
 egg and salad queries individually work fine.
 I am using whitespacetokenizer.

 Not sure if I am missing something.
 Thanks,
 -Manasi




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding one core to an existing core?

2013-08-26 Thread Bruno Mannina


ok thanks !

Le 26/08/2013 17:52, Jack Krupansky a écrit :

Unfortunately, there is no -Dcore property, so you have to due -Durl -

java -Durl=http://localhost:8983/solr/collection2/update ... -jar 
post.jar ...


You have the proper /select syntax.

-- Jack Krupansky

-Original Message- From: Bruno Mannina
Sent: Monday, August 26, 2013 9:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Adding one core to an existing core?

Dear Solr User,

now I have 2 cores collection1 collection2

Default collection is the Collection1

I have two questions:

- Is exist a parameter to add in my html link to indicate the selected
core?
http://xxx.xxx.xxx.xxx/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on 



I mean by default is the collection1, if I want collection2 I use the
link:
http://xxx.xxx.xxx.xxx/solr/collection2/select/?q=*%3A*version=2.2start=0rows=10indent=on 



Is exist a param core=collection2 instead of using a different link?


- My second question concerns updating.
Actually with one core, I do:
java -jar post.jar foo.xml

I suppose now I must add the desire core ? no ?
i.e.: -Dcore=collection2

What is the param to add in my command line?

Thanks a lot !

Bruno





Le 22/08/2013 16:23, Andrea Gazzarini a écrit :
First, a core is a separate index so it is completely indipendent 
from the already existing core(s). So basically you don't need to 
reindex.


In order to have two cores (but the same applies for n cores): you 
must have in your solr.home the file (solr.xml) described here


http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

then, you must obviously have one or two directories (corresponding 
to the instanceDir attribute). I said one or two because if the 
indexes configuration is basically the same (or something changes but 
is dynamically configured - i.e. core name) you can create two 
instances starting from the same configuration. I mean


solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=*conf.dir* /
  core name=core1 instanceDir=*conf.dir* /
 /cores
/solr

Otherwise you must have two different conf directories that contain 
indexes configuration. You should already have a first one (the 
current core), you just need to have another conf dir with 
solrconfig.xml, schema.xml and other required files. In this case 
each core will have its own instanceDir.


solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=*conf.dir.core0* /
  core name=core1 instanceDir=*conf.dir.core1* /
 /cores
/solr

Best,
Andrea



On 08/22/2013 04:04 PM, Bruno Mannina wrote:

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add 
one another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Dan Davis

I have now come to the task of estimating man-days to add Blended Search
Results to Apache Solr.   The argument has been made that this is not
desirable (see Jonathan Rochkind's blog entries on Bento search with
blacklight).   But the estimate remains.No estimate is worth much
without a design.   So, I am come to the difficult of estimating this
without having an in-depth knowledge of the Apache core.   Here is my
design, likely imperfect, as it stands.

   - Configure a core specific to each search source (local or remote)
   - On cores that index remote content, implement a periodic delete query
   that deletes documents whose timestamp is too old
   - Implement a custom requestHandler for the remote cores that goes out
   and queries the remote source.   For each result in the top N
   (configurable), it computes an id that is stable (e.g. it is based on the
   remote resource URL, doi, or hash of data returned).   It uses that id to
   look-up the document in the lucene database.   If the data is not there, it
   updates the lucene core and sets a flag that commit is required.   Once it
   is done, it commits if needed.
   - Configure a core that uses a custom SearchComponent to call the
   requestHandler that goes and gets new documents and commits them.   Since
   the cores for remote content are different cores, they can restart their
   searcher at this point if any commit is needed.   The custom
   SearchComponent will wait for commit and reload to be completed.   Then,
   search continues uses the other cores as shards.
   - Auto-warming on this will assure that the most recently requested data
   is present.

It will, of course, be very slow a good part of the time.

Erik and others, I need to know whether this design has legs and what other
alternatives I might consider.



On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson erickerick...@gmail.comwrote:

 The lack of global TF/IDF has been answered in the past,
 in the sharded case, by usually you have similar enough
 stats that it doesn't matter. This pre-supposes a fairly
 evenly distributed set of documents.

 But if you're talking about federated search across different
 types of documents, then what would you rescore with?
 How would you even consider scoring docs that are somewhat/
 totally different? Think magazine articles an meta-data associated
 with pictures.

 What I've usually found is that one can use grouping to show
 the top N of a variety of results. Or show tabs with different
 types. Or have the app intelligently combine the different types
 of documents in a way that makes sense. But I don't know
 how you'd just get the right thing to happen with some kind
 of scoring magic.

 Best
 Erick


 On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote:

 I've thought about it, and I have no time to really do a meta-search
 during
 evaluation.  What I need to do is to create a single core that contains
 both of my data sets, and then describe the architecture that would be
 required to do blended results, with liberal estimates.

 From the perspective of evaluation, I need to understand whether any of
 the
 solutions to better ranking in the absence of global IDF have been
 explored?I suspect that one could retrieve a much larger than N set of
 results from a set of shards, re-score in some way that doesn't require
 IDF, e.g. storing both results in the same priority queue and *re-scoring*
 before *re-ranking*.

 The other way to do this would be to have a custom SearchHandler that
 works
 differently - it performs the query, retries all results deemed relevant
 by
 another engine, adds them to the Lucene index, and then performs the query
 again in the standard way.   This would be quite slow, but perhaps useful
 as a way to evaluate my method.

 I still welcome any suggestions on how such a SearchHandler could be
 implemented.

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Paul Libbrecht


Why not simply create a meta search engine that indexes everything of each of 
the nodes.?
(I think one calls this harvesting)

I believe that this the way to avoid all sorts of performance bottleneck.
As far as I could analyze, the performance of a federated search is the 
performance of the least speedy node; which can turn to be quite bad if you do 
not exercise guarantees of remote sources.

Or are the remote cores below actually things that you manage on your side? 
If yes guarantees are easy to manage..

Paul


Le 26 août 2013 à 22:38, Dan Davis a écrit :

 I have now come to the task of estimating man-days to add Blended Search
 Results to Apache Solr.   The argument has been made that this is not
 desirable (see Jonathan Rochkind's blog entries on Bento search with
 blacklight).   But the estimate remains.No estimate is worth much
 without a design.   So, I am come to the difficult of estimating this
 without having an in-depth knowledge of the Apache core.   Here is my
 design, likely imperfect, as it stands.
 
   - Configure a core specific to each search source (local or remote)
   - On cores that index remote content, implement a periodic delete query
   that deletes documents whose timestamp is too old
   - Implement a custom requestHandler for the remote cores that goes out
   and queries the remote source.   For each result in the top N
   (configurable), it computes an id that is stable (e.g. it is based on the
   remote resource URL, doi, or hash of data returned).   It uses that id to
   look-up the document in the lucene database.   If the data is not there, it
   updates the lucene core and sets a flag that commit is required.   Once it
   is done, it commits if needed.
   - Configure a core that uses a custom SearchComponent to call the
   requestHandler that goes and gets new documents and commits them.   Since
   the cores for remote content are different cores, they can restart their
   searcher at this point if any commit is needed.   The custom
   SearchComponent will wait for commit and reload to be completed.   Then,
   search continues uses the other cores as shards.
   - Auto-warming on this will assure that the most recently requested data
   is present.
 
 It will, of course, be very slow a good part of the time.
 
 Erik and others, I need to know whether this design has legs and what other
 alternatives I might consider.
 
 
 
 On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
 The lack of global TF/IDF has been answered in the past,
 in the sharded case, by usually you have similar enough
 stats that it doesn't matter. This pre-supposes a fairly
 evenly distributed set of documents.
 
 But if you're talking about federated search across different
 types of documents, then what would you rescore with?
 How would you even consider scoring docs that are somewhat/
 totally different? Think magazine articles an meta-data associated
 with pictures.
 
 What I've usually found is that one can use grouping to show
 the top N of a variety of results. Or show tabs with different
 types. Or have the app intelligently combine the different types
 of documents in a way that makes sense. But I don't know
 how you'd just get the right thing to happen with some kind
 of scoring magic.
 
 Best
 Erick
 
 
 On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote:
 
 I've thought about it, and I have no time to really do a meta-search
 during
 evaluation.  What I need to do is to create a single core that contains
 both of my data sets, and then describe the architecture that would be
 required to do blended results, with liberal estimates.
 
 From the perspective of evaluation, I need to understand whether any of
 the
 solutions to better ranking in the absence of global IDF have been
 explored?I suspect that one could retrieve a much larger than N set of
 results from a set of shards, re-score in some way that doesn't require
 IDF, e.g. storing both results in the same priority queue and *re-scoring*
 before *re-ranking*.
 
 The other way to do this would be to have a custom SearchHandler that
 works
 differently - it performs the query, retries all results deemed relevant
 by
 another engine, adds them to the Lucene index, and then performs the query
 again in the standard way.   This would be quite slow, but perhaps useful
 as a way to evaluate my method.
 
 I still welcome any suggestions on how such a SearchHandler could be
 implemented.

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Furkan KAMACI

EOF exception seems like a generic exception for me. I should find the
underlying problem within my infrastructure.

26 Ağustos 2013 Pazartesi tarihinde Walter Underwood wun...@wunderwood.org
adlı kullanıcı şöyle yazdı:
 We use Amazon EC2 machines with 34GB of memory (m2.2xlarge). The Solr
heap is 8GB. We have several cores, totaling about 14GB on disk. This
configuration allows 100% of the indexes to be in file buffers.

 wunder

 On Aug 26, 2013, at 9:57 AM, Furkan KAMACI wrote:

 Hi Walter;

 You said you are caching your documents. What is average Physical Memory
 usage of your Solr Nodes?


 2013/8/26 Walter Underwood wun...@wunderwood.org

 It looks lik that error happens when reading XML from an HTTP request.
The
 XML ends too soon. This should be unrelated to file buffers.

 wunder

 On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote:

 It has a 48 GB of RAM and index size is nearly 100 GB at each node. I
 have
 CentOS 6.4. While indexing I got that error and I am suspicious about
 that
 it is because of high percentage of Physical Memory usage.

 ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
 java.lang.RuntimeException: [was class
org.eclipse.jetty.io.EofException]
 early EOF
 at


com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
 at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
 at


com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
 at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
 at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
 at


org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at


org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at


org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at


org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
 at


org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
 at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
 at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 at


org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at


org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 at


org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at


org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
 at


org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at


org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at

org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
 at


org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at


org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at


org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at


org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at


org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

 org.eclipse.jetty.server.handler.--
 Walter Underwood
 wun...@wunderwood.org

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Dan Davis

First answer:

My employer is a library and do not have the license to harvest everything
indexed by a web-scale discovery service such as PRIMO or Summon.If
our design automatically relays searches entered by users, and then
periodically purges results, I think it is reasonable from a licensing
perspective.

Second answer:

What if you wanted your Apache Solr powered search to include all results
from Google scholar to any query?   Do you think you could easily or
cheaply configure a Zookeeper cluster large enough to harvest and index all
of Google Scholar?   Would that violate robot rules?Is it even possible
to do this from an API perspective?   Wouldn't google notice?

Third answer:

On Gartner's 2013 Enterprise Search Magic Quadrant, LucidWorks and the
other Enterprise Search firm based on Apache Solr were dinged on the lack
of Federated Search.  I do not have the hubris to think I can fix that, and
it is not really my role to try, but something that works without
Harvesting and local indexing is obviously desirable to Enterprise Search
users.



On Mon, Aug 26, 2013 at 4:46 PM, Paul Libbrecht p...@hoplahup.net wrote:


 Why not simply create a meta search engine that indexes everything of each
 of the nodes.?
 (I think one calls this harvesting)

 I believe that this the way to avoid all sorts of performance bottleneck.
 As far as I could analyze, the performance of a federated search is the
 performance of the least speedy node; which can turn to be quite bad if you
 do not exercise guarantees of remote sources.

 Or are the remote cores below actually things that you manage on your
 side? If yes guarantees are easy to manage..

 Paul


 Le 26 août 2013 à 22:38, Dan Davis a écrit :

  I have now come to the task of estimating man-days to add Blended Search
  Results to Apache Solr.   The argument has been made that this is not
  desirable (see Jonathan Rochkind's blog entries on Bento search with
  blacklight).   But the estimate remains.No estimate is worth much
  without a design.   So, I am come to the difficult of estimating this
  without having an in-depth knowledge of the Apache core.   Here is my
  design, likely imperfect, as it stands.
 
- Configure a core specific to each search source (local or remote)
- On cores that index remote content, implement a periodic delete query
that deletes documents whose timestamp is too old
- Implement a custom requestHandler for the remote cores that goes
 out
and queries the remote source.   For each result in the top N
(configurable), it computes an id that is stable (e.g. it is based on
 the
remote resource URL, doi, or hash of data returned).   It uses that id
 to
look-up the document in the lucene database.   If the data is not
 there, it
updates the lucene core and sets a flag that commit is required.
 Once it
is done, it commits if needed.
- Configure a core that uses a custom SearchComponent to call the
requestHandler that goes and gets new documents and commits them.
 Since
the cores for remote content are different cores, they can restart
 their
searcher at this point if any commit is needed.   The custom
SearchComponent will wait for commit and reload to be completed.
 Then,
search continues uses the other cores as shards.
- Auto-warming on this will assure that the most recently requested
 data
is present.
 
  It will, of course, be very slow a good part of the time.
 
  Erik and others, I need to know whether this design has legs and what
 other
  alternatives I might consider.
 
 
 
  On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  The lack of global TF/IDF has been answered in the past,
  in the sharded case, by usually you have similar enough
  stats that it doesn't matter. This pre-supposes a fairly
  evenly distributed set of documents.
 
  But if you're talking about federated search across different
  types of documents, then what would you rescore with?
  How would you even consider scoring docs that are somewhat/
  totally different? Think magazine articles an meta-data associated
  with pictures.
 
  What I've usually found is that one can use grouping to show
  the top N of a variety of results. Or show tabs with different
  types. Or have the app intelligently combine the different types
  of documents in a way that makes sense. But I don't know
  how you'd just get the right thing to happen with some kind
  of scoring magic.
 
  Best
  Erick
 
 
  On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote:
 
  I've thought about it, and I have no time to really do a meta-search
  during
  evaluation.  What I need to do is to create a single core that contains
  both of my data sets, and then describe the architecture that would be
  required to do blended results, with liberal estimates.
 
  From the perspective of evaluation, I need to understand whether any of
  the
  solutions to better ranking in

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Dan Davis

One more question here - is this topic more appropriate to a different list?


On Mon, Aug 26, 2013 at 4:38 PM, Dan Davis dansm...@gmail.com wrote:

 I have now come to the task of estimating man-days to add Blended Search
 Results to Apache Solr.   The argument has been made that this is not
 desirable (see Jonathan Rochkind's blog entries on Bento search with
 blacklight).   But the estimate remains.No estimate is worth much
 without a design.   So, I am come to the difficult of estimating this
 without having an in-depth knowledge of the Apache core.   Here is my
 design, likely imperfect, as it stands.

- Configure a core specific to each search source (local or remote)
- On cores that index remote content, implement a periodic delete
query that deletes documents whose timestamp is too old
- Implement a custom requestHandler for the remote cores that goes
out and queries the remote source.   For each result in the top N
(configurable), it computes an id that is stable (e.g. it is based on the
remote resource URL, doi, or hash of data returned).   It uses that id to
look-up the document in the lucene database.   If the data is not there, it
updates the lucene core and sets a flag that commit is required.   Once it
is done, it commits if needed.
- Configure a core that uses a custom SearchComponent to call the
requestHandler that goes and gets new documents and commits them.   Since
the cores for remote content are different cores, they can restart their
searcher at this point if any commit is needed.   The custom
SearchComponent will wait for commit and reload to be completed.   Then,
search continues uses the other cores as shards.
- Auto-warming on this will assure that the most recently requested
data is present.

 It will, of course, be very slow a good part of the time.

 Erik and others, I need to know whether this design has legs and what
 other alternatives I might consider.



 On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 The lack of global TF/IDF has been answered in the past,
 in the sharded case, by usually you have similar enough
 stats that it doesn't matter. This pre-supposes a fairly
 evenly distributed set of documents.

 But if you're talking about federated search across different
 types of documents, then what would you rescore with?
 How would you even consider scoring docs that are somewhat/
 totally different? Think magazine articles an meta-data associated
 with pictures.

 What I've usually found is that one can use grouping to show
 the top N of a variety of results. Or show tabs with different
 types. Or have the app intelligently combine the different types
 of documents in a way that makes sense. But I don't know
 how you'd just get the right thing to happen with some kind
 of scoring magic.

 Best
 Erick


 On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote:

 I've thought about it, and I have no time to really do a meta-search
 during
 evaluation.  What I need to do is to create a single core that contains
 both of my data sets, and then describe the architecture that would be
 required to do blended results, with liberal estimates.

 From the perspective of evaluation, I need to understand whether any of
 the
 solutions to better ranking in the absence of global IDF have been
 explored?I suspect that one could retrieve a much larger than N set
 of
 results from a set of shards, re-score in some way that doesn't require
 IDF, e.g. storing both results in the same priority queue and
 *re-scoring*
 before *re-ranking*.

 The other way to do this would be to have a custom SearchHandler that
 works
 differently - it performs the query, retries all results deemed relevant
 by
 another engine, adds them to the Lucene index, and then performs the
 query
 again in the standard way.   This would be quite slow, but perhaps useful
 as a way to evaluate my method.

 I still welcome any suggestions on how such a SearchHandler could be
 implemented.

No documents found for some queries with special chars like mm

2013-08-26 Thread Utkarsh Sengar

Some of the queries (not all) with special chars return no documents.

Example: queries returning no documents
q=mm (this can be explained, when I search for m m, no documents are
returned)
q=o'reilly (when I search for o reilly, I get documents back)


Queries returning documents:
q=helloworld (document matched is Hello World: A Life in Ham Radio)


My questions are:
1. What's wrong with o'reilly? What changes do I need in my field type?
2. How can I make the query mm work?
My indexe has a bunch of MM's docs like: M  M's Milk Chocolate Candy
Coated Peanuts  19.2 oz and M and Ms Chocolate Candies - Peanut - 1 Bag
(42 oz)


FIeld type:
fieldType name=text_general class=solr.TextField
positionIncrementGap=100
 analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EnglishMinimalStemFilterFactory/
  filter class=solr.ASCIIFoldingFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1

catenateWords=1

catenateNumbers=1

catenateAll=0

preserveOriginal=1/
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EnglishMinimalStemFilterFactory/
  filter class=solr.ASCIIFoldingFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType


-- 
Thanks,
-Utkarsh

Re: No documents found for some queries with special chars like mm

2013-08-26 Thread Erick Erickson

First thing to do is attach query=debug to your queries and look at the
parsed output.

Second thing to do is look at the admin/analysis page and see what happens
at index and query time to things like o'reilly. You have
WordDelimiterFilterFactory
configured in your query but not index analysis chain. My bet on that is
that
you're getting different tokens at query and index time...

Third thing is that you need to escape the  character. It's probably being
interpreted as a delimiter on the URL and Solr ignores params it doesn't
understand.

Best
Erick


On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 Some of the queries (not all) with special chars return no documents.

 Example: queries returning no documents
 q=mm (this can be explained, when I search for m m, no documents are
 returned)
 q=o'reilly (when I search for o reilly, I get documents back)


 Queries returning documents:
 q=helloworld (document matched is Hello World: A Life in Ham Radio)


 My questions are:
 1. What's wrong with o'reilly? What changes do I need in my field type?
 2. How can I make the query mm work?
 My indexe has a bunch of MM's docs like: M  M's Milk Chocolate Candy
 Coated Peanuts  19.2 oz and M and Ms Chocolate Candies - Peanut - 1 Bag
 (42 oz)


 FIeld type:
 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishMinimalStemFilterFactory/
   filter class=solr.ASCIIFoldingFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1

 catenateWords=1

 catenateNumbers=1

 catenateAll=0

 preserveOriginal=1/
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishMinimalStemFilterFactory/
   filter class=solr.ASCIIFoldingFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 /fieldType


 --
 Thanks,
 -Utkarsh

Re: Default query operator OR wont work in some cases

2013-08-26 Thread smanad

here is keywords field for 3 docs, 

Simply Asia products,Simply Asia,Sesame Chicken Egg Drop Soup,Soy Ginger
Shrimp and Noodle Salad,Sesame Teriyaki Noodle Bowl

Eggs,AllWhites,Better'n Eggs,Foods,AllWhites or Better'n Eggs

DOLE Salad Blend Salad Kit,Salad Kit,Salad,DOLE,produce

Here is my debug query:
str name=parsedquery(+((DisjunctionMaxQuery((keywords:egg^2.0)~0.1)
DisjunctionMaxQuery((keywords:salad^2.0)~0.1))~2)
DisjunctionMaxQuery((keywords:egg salad)~0.1) /no_coord/str

Here is my fieldtype definition for keywords,
fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
types=word-delim-types.txt /
filter class=solr.EnglishMinimalStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
types=word-delim-types.txt /
filter class=solr.EnglishMinimalStemFilterFactory/
  /analyzer
/fieldType




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086723.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Amit Jha

Hi,

I would suggest for the following. 

1. Create custom search connectors for each individual sources.
2. Connector will responsible to query the source of any type web, gateways 
etc. and get the results  write the top N results to a solr.
3. Query the same keyword to solr and display the result. 

Would you like to create something like
http://knimbus.com


Rgds
AJ

On 27-Aug-2013, at 2:28, Dan Davis dansm...@gmail.com wrote:

 One more question here - is this topic more appropriate to a different list?
 
 
 On Mon, Aug 26, 2013 at 4:38 PM, Dan Davis dansm...@gmail.com wrote:
 
 I have now come to the task of estimating man-days to add Blended Search
 Results to Apache Solr.   The argument has been made that this is not
 desirable (see Jonathan Rochkind's blog entries on Bento search with
 blacklight).   But the estimate remains.No estimate is worth much
 without a design.   So, I am come to the difficult of estimating this
 without having an in-depth knowledge of the Apache core.   Here is my
 design, likely imperfect, as it stands.
 
   - Configure a core specific to each search source (local or remote)
   - On cores that index remote content, implement a periodic delete
   query that deletes documents whose timestamp is too old
   - Implement a custom requestHandler for the remote cores that goes
   out and queries the remote source.   For each result in the top N
   (configurable), it computes an id that is stable (e.g. it is based on the
   remote resource URL, doi, or hash of data returned).   It uses that id to
   look-up the document in the lucene database.   If the data is not there, it
   updates the lucene core and sets a flag that commit is required.   Once it
   is done, it commits if needed.
   - Configure a core that uses a custom SearchComponent to call the
   requestHandler that goes and gets new documents and commits them.   Since
   the cores for remote content are different cores, they can restart their
   searcher at this point if any commit is needed.   The custom
   SearchComponent will wait for commit and reload to be completed.   Then,
   search continues uses the other cores as shards.
   - Auto-warming on this will assure that the most recently requested
   data is present.
 
 It will, of course, be very slow a good part of the time.
 
 Erik and others, I need to know whether this design has legs and what
 other alternatives I might consider.
 
 
 
 On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
 The lack of global TF/IDF has been answered in the past,
 in the sharded case, by usually you have similar enough
 stats that it doesn't matter. This pre-supposes a fairly
 evenly distributed set of documents.
 
 But if you're talking about federated search across different
 types of documents, then what would you rescore with?
 How would you even consider scoring docs that are somewhat/
 totally different? Think magazine articles an meta-data associated
 with pictures.
 
 What I've usually found is that one can use grouping to show
 the top N of a variety of results. Or show tabs with different
 types. Or have the app intelligently combine the different types
 of documents in a way that makes sense. But I don't know
 how you'd just get the right thing to happen with some kind
 of scoring magic.
 
 Best
 Erick
 
 
 On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote:
 
 I've thought about it, and I have no time to really do a meta-search
 during
 evaluation.  What I need to do is to create a single core that contains
 both of my data sets, and then describe the architecture that would be
 required to do blended results, with liberal estimates.
 
 From the perspective of evaluation, I need to understand whether any of
 the
 solutions to better ranking in the absence of global IDF have been
 explored?I suspect that one could retrieve a much larger than N set
 of
 results from a set of shards, re-score in some way that doesn't require
 IDF, e.g. storing both results in the same priority queue and
 *re-scoring*
 before *re-ranking*.
 
 The other way to do this would be to have a custom SearchHandler that
 works
 differently - it performs the query, retries all results deemed relevant
 by
 another engine, adds them to the Lucene index, and then performs the
 query
 again in the standard way.   This would be quite slow, but perhaps useful
 as a way to evaluate my method.
 
 I still welcome any suggestions on how such a SearchHandler could be
 implemented.

Re: Default query operator OR wont work in some cases

2013-08-26 Thread Jack Krupansky

The phrase egg salad does not occur in your input. And, quoted phrases are 
an implicit AND, not an OR. Either you wanted egg and salad but not 
as a phrase, or as a very loose sloppy phrase, such as egg salad~10.


Or, who knows what you really want - your requirements are expressed too 
imprecisely.


-- Jack Krupansky

-Original Message- 
From: smanad

Sent: Monday, August 26, 2013 8:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Default query operator OR wont work in some cases

here is keywords field for 3 docs,

Simply Asia products,Simply Asia,Sesame Chicken Egg Drop Soup,Soy Ginger
Shrimp and Noodle Salad,Sesame Teriyaki Noodle Bowl

Eggs,AllWhites,Better'n Eggs,Foods,AllWhites or Better'n Eggs

DOLE Salad Blend Salad Kit,Salad Kit,Salad,DOLE,produce

Here is my debug query:
str name=parsedquery(+((DisjunctionMaxQuery((keywords:egg^2.0)~0.1)
DisjunctionMaxQuery((keywords:salad^2.0)~0.1))~2)
DisjunctionMaxQuery((keywords:egg salad)~0.1) /no_coord/str

Here is my fieldtype definition for keywords,
   fieldType name=text_general class=solr.TextField
positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
types=word-delim-types.txt /
   filter class=solr.EnglishMinimalStemFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
types=word-delim-types.txt /
   filter class=solr.EnglishMinimalStemFilterFactory/
 /analyzer
   /fieldType




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086723.html
Sent from the Solr - User mailing list archive at Nabble.com.

ANNOUNCE: Lucene/Solr Revolution EU 2013: Registration Community Voting

2013-08-26 Thread Chris Hostetter



(NOTE: cross-posted to various lists, please reply only to general@lucene 
w/ any questions or follow ups)



2 Announcements folks should be aware of regarding the upcoming 
Lucene/Solr Revolution EU 2013 in Dublin...



# 1) Registration Now Open

Registration is now open for Lucene/Solr Revolution EU 2013, the biggest 
open source conference dedicated to Apache Lucene/Solr.  Two-day training 
workshops will precede the conference.  You can benefit from discounted 
conference rates if you register early.


http://lucenerevolution.org/registration

More info...
http://searchhub.org/2013/08/15/lucenesolr-revolution-eu-registration-is-open/


# 2) Community Voting on Agenda (Until September 9th)

The Lucene/Solr Revolution free voting system allows you to vote on your 
favorite topics. The sessions that receive the highest number of votes 
will be automatically added to the Lucene/Solr Revolution EU 2013 agenda. 
The remaining sessions will be selected by a committee of industry experts 
who will take into account the community’s votes as well as their own 
expertise in the area.


http://lucenerevolution.org/2013/call-for-papers-survey

More info...
http://searchhub.org/2013/08/23/help-us-set-the-agenda-for-lucenesolr-revolution-eu/

-Hoss

Re: Default query operator OR wont work in some cases

2013-08-26 Thread smanad

I am not searching for phrase query, I am not sure why it shows up in
parsedquery.
lst name=responseHeader
  int name=status0/int
  int name=QTime3/int
  lst name=params
str name=debugQuerytrue/str
str name=indenttrue/str
str name=qegg salad
/str
str name=_1377569284170/str
str name=wtxml/str
  /lst
/lst



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086732.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filter cache pollution during sharded edismax queries

2013-08-26 Thread Ken Krugler

Hi Otis,

Sorry I missed your reply, and thanks for trying to find a similar report.

Wondering if I should file a Jira issue? That might get more attention :)

-- Ken

On Jul 5, 2013, at 1:05pm, Otis Gospodnetic wrote:

 Hi Ken,
 
 Uh, I left this email until now hoping I could find you a reference to
 similar reports, but I can't find them now.  I am quite sure I saw
 somebody with a similar report within the last month.  Plus, several
 people have reported issues with performance dropping when they went
 from 3.x to 4.x and maybe this is why.
 
 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm
 
 
 
 On Tue, Jul 2, 2013 at 3:01 PM, Ken Krugler kkrugler_li...@transpac.com 
 wrote:
 Hi all,
 
 After upgrading from Solr 3.5 to 4.2.1, I noticed our filterCache hit ratio 
 had dropped significantly.
 
 Previously it was at 95+%, but now it's  50%.
 
 I enabled recording 100 entries for debugging, and in looking at them it 
 seems that edismax (and faceting) is creating entries for me.
 
 This is in a sharded setup, so it's a distributed search.
 
 If I do a search for the string bogus text using edismax on two fields, I 
 get an entry in each of the shard's filter caches that looks like:
 
 item_+(((field1:bogus | field2:bogu) (field1:text | field2:text))~2):
 
 Is this expected?
 
 I have a similar situation happening during faceted search, even though my 
 fields are single-value/untokenized strings, and I'm not using the enum 
 facet method.
 
 But I'll get many, many entries in the filterCache for facet values, and 
 they all look like item_facet field:facet value:
 
 The net result of the above is that even with a very big filterCache size of 
 2K, the hit ratio is still only 60%.
 
 Thanks for any insights,
 
 -- Ken

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions  training
Hadoop, Cascading, Cassandra  Solr

Re: Default query operator OR wont work in some cases

2013-08-26 Thread Jack Krupansky

Yeah, sorry, I read the parsed query too quickly - the phrase is the 
optional relevancy boost due to the pf2 parameter.


-- Jack Krupansky

-Original Message- 
From: smanad

Sent: Monday, August 26, 2013 10:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Default query operator OR wont work in some cases

I am not searching for phrase query, I am not sure why it shows up in
parsedquery.
lst name=responseHeader
 int name=status0/int
 int name=QTime3/int
 lst name=params
   str name=debugQuerytrue/str
   str name=indenttrue/str
   str name=qegg salad
/str
   str name=_1377569284170/str
   str name=wtxml/str
 /lst
/lst



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086732.html
Sent from the Solr - User mailing list archive at Nabble.com.

51 matches

Mail list logo