Selective field query

2015-10-09 Thread Colin Hunter
Hi

I am working on a complex search utility with an index created via data
import from an extensive MySQL database.
There are many ways in which the index is searched. One of the utility
input fields searches only on a Service Name. However, if I target the
query as q=ServiceName:"Searched service", this only returns an exact
string match. If q=Searched Service, the query still returns results from
all indexed data.

Is there a way to construct a query to only return results from one field
of a doc ?
I have tried setting index=false, stored=true on unwanted fields, but these
appear to have still been returned in results.

Any advice on this would be very welcome.
Thank You
Collin Hunter

-- 
www.gfc.uk.net


Re: Selective field query

2015-10-09 Thread Upayavira


On Fri, Oct 9, 2015, at 09:54 AM, Colin Hunter wrote:
> Hi
> 
> I am working on a complex search utility with an index created via data
> import from an extensive MySQL database.
> There are many ways in which the index is searched. One of the utility
> input fields searches only on a Service Name. However, if I target the
> query as q=ServiceName:"Searched service", this only returns an exact
> string match. If q=Searched Service, the query still returns results from
> all indexed data.
> 
> Is there a way to construct a query to only return results from one field
> of a doc ?
> I have tried setting index=false, stored=true on unwanted fields, but
> these
> appear to have still been returned in results.

q=ServiceName:(Searched Service)

That'll look in just one field.

Remember changing indexed to false doesn't impact the stuff already in
your index. And the reason you are likely getting all that stuff is
because you have a copyField that copies it over into the 'text' field.
If you'll never want to search on some fields, switch them to
index=false, make sure you aren't doing a copyField on them, and then
reindex.

Upayavira


Re: Selective field query

2015-10-09 Thread Colin Hunter
Ah ha...   the copy field...  makes sense.
Thank You.

On Fri, Oct 9, 2015 at 10:04 AM, Upayavira  wrote:

>
>
> On Fri, Oct 9, 2015, at 09:54 AM, Colin Hunter wrote:
> > Hi
> >
> > I am working on a complex search utility with an index created via data
> > import from an extensive MySQL database.
> > There are many ways in which the index is searched. One of the utility
> > input fields searches only on a Service Name. However, if I target the
> > query as q=ServiceName:"Searched service", this only returns an exact
> > string match. If q=Searched Service, the query still returns results from
> > all indexed data.
> >
> > Is there a way to construct a query to only return results from one field
> > of a doc ?
> > I have tried setting index=false, stored=true on unwanted fields, but
> > these
> > appear to have still been returned in results.
>
> q=ServiceName:(Searched Service)
>
> That'll look in just one field.
>
> Remember changing indexed to false doesn't impact the stuff already in
> your index. And the reason you are likely getting all that stuff is
> because you have a copyField that copies it over into the 'text' field.
> If you'll never want to search on some fields, switch them to
> index=false, make sure you aren't doing a copyField on them, and then
> reindex.
>
> Upayavira
>



-- 
www.gfc.uk.net


java.util.EmptyStackException during SPLITSHARD

2015-10-09 Thread Oliver Schrenk
Hi,

trying to experiment with overcharging on our Solr 4.7.2 cluster and called 
SPLITSHARD command which after ~30 minutes of work failed with

curl 
"http://solrhost:1234/solr/admin/collections?collection=acme=shard1=SPLITSHARD;


50083612901499elmar_v3_shard1_1_replica101507elmar_v3_shard1_0_replica10100201001071345801elmar_v3_shard1_0_replica1EMPTY_BUFFER01elmar_v3_shard1_1_replica1EMPTY_BUFFERjava.util.EmptyStackException:java.util.EmptyStackException-1org.apache.solr.common.SolrException
at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:248)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:484)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:165)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:720)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:949)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1011)
at 
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
500


It also seems that for 30 minutes it was very slow and not even half way done:

acme@solrhost:/var/lib/solr/cores$ du -h acme_shard1_replica1/
271Macme_shard1_replica1/data/tlog
5.4Gacme_shard1_replica1/data/index
5.7Gacme_shard1_replica1/data
5.7Gacme_shard1_replica1/
acme@solrhost:/var/lib/solr/cores$ du -h acme_shard1_1_replica1/
4.0Kacme_shard1_1_replica1/data/tlog
1004M   acme_shard1_1_replica1/data/index
1004M   acme_shard1_1_replica1/data
1004M   acme_shard1_1_replica1/
acme@solrhost:/var/lib/solr/cores$ du -h acme_shard1_0_replica1/
4.0Kacme_shard1_0_replica1/data/tlog
898Macme_shard1_0_replica1/data/index
898Macme_shard1_0_replica1/data
898Macme_shard1_0_replica1/


What is the recommended cleanup strategy? 

Also is there a way for me run the command successfully on my end without 
upgrading? 


Cheers,
Oliver



Re: Exclude documents having same data in two fields

2015-10-09 Thread Aman Tandon
Hi,

I tried to use the same as mentioned in the url

.

And I used the description field to check because mapping field
is multivalued.

So I add the fq={!frange%20l=0%20u=1}strdist(title,description,edit) in my
url, but I am getting this error. As mentioned below. Please take a look.

*Solr Version 4.8.1*

*Url is*
http://localhost:8150/solr/core1/select?q.alt=*:*=big*,title,catid={!frange%20l=0%20u=1}strdist(title,description,edit)=edismax

> 
> 
> 500
> 8
> 
> *:*
> edismax
> big*,title,catid
> {!frange l=0 u=1}strdist(title,description,edit)
> 
> 
> 
> 
> java.lang.RuntimeException at
> org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.(ExtendedDismaxQParser.java:1455)
> at
> org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239)
> at
> org.apache.solr.search.ExtendedDismaxQParser.(ExtendedDismaxQParser.java:108)
> at
> org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37)
> at org.apache.solr.search.QParser.getParser(QParser.java:315) at
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 
> 500
> 
> 
>

With Regards
Aman Tandon

On Thu, Oct 8, 2015 at 8:07 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Hi agree with Nutch,
> using the Function Range Query Parser, should do your trick :
>
>
> https://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html
>
> Cheers
>
> On 8 October 2015 at 13:31, NutchDev  wrote:
>
> > Hi Aman,
> >
> > Have a look at this , it has query time approach also using Solr function
> > query,
> >
> >
> >
> http://stackoverflow.com/questions/15927893/how-to-check-equality-of-two-solr-fields
> >
> >
> http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Exclude-documents-having-same-data-in-two-fields-tp4233408p4233489.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Solr Pagination

2015-10-09 Thread Shawn Heisey
On 10/9/2015 1:39 PM, Salman Ansari wrote:

> INFO  - 2015-10-09 18:46:17.953; [c:sabr102 s:shard1 r:core_node2
> x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
> [sabr102_shard1_replica1] webapp=/solr path=/select
> params={start=0=(content_text:Football)=10} hits=24408 status=0
> QTime=3391

Over 3 seconds for a query like this definitely sounds like there's a
problem.

> INFO  - 2015-10-09 18:47:04.727; [c:sabr102 s:shard1 r:core_node2
> x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
> [sabr102_shard1_replica1] webapp=/solr path=/select
> params={start=1000=(content_text:Football)=10} hits=24408 status=0
> QTime=21569

Adding a start value of 1000 increases QTime by a factor of more than
6?  Even more evidence of a performance problem.

For comparison purposes, I did a couple of simple queries on a large
index of mine.  Here are the response headers showing the QTime value
and all the parameters (except my shard URLs) for each query:

  "responseHeader": {
"status": 0,
"QTime": 1253,
"params": {
  "df": "catchall",
  "spellcheck.maxCollationEvaluations": "2",
  "spellcheck.dictionary": "default",
  "echoParams": "all",
  "spellcheck.maxCollations": "5",
  "q.op": "AND",
  "shards.info": "true",
  "spellcheck.maxCollationTries": "2",
  "rows": "70",
  "spellcheck.extendedResults": "false",
  "shards": "REDACTED SEVEN SHARD URLS",
  "shards.tolerant": "true",
  "spellcheck.onlyMorePopular": "false",
  "facet.method": "enum",
  "spellcheck.count": "9",
  "q": "catchall:carriage",
  "indent": "true",
  "wt": "json",
  "_": "120900498"
}


  "responseHeader": {
"status": 0,
"QTime": 176,
"params": {
  "df": "catchall",
  "spellcheck.maxCollationEvaluations": "2",
  "spellcheck.dictionary": "default",
  "echoParams": "all",
  "spellcheck.maxCollations": "5",
  "q.op": "AND",
  "shards.info": "true",
  "spellcheck.maxCollationTries": "2",
  "rows": "70",
  "spellcheck.extendedResults": "false",
  "shards": "REDACTED SEVEN SHARD URLS",
  "shards.tolerant": "true",
  "spellcheck.onlyMorePopular": "false",
  "facet.method": "enum",
  "spellcheck.count": "9",
  "q": "catchall:wibble",
  "indent": "true",
  "wt": "json",
  "_": "121001024"
}

The first query had a numFound of 120906, the second a numFound of 32. 
When I re-executed the first  query (the one with a QTime of 1253) so it
would use the Solr caches, QTime was 17.

This is an index that has six cold shards with 38.8 million documents
each and a hot shard with 1.5 million documents.  Total document count
for the index is over 234 million documents, and the total size of the
index is about 272GB.  Each copy of the index has its shards split
between two servers that each have 64GB of RAM, with an 8GB max Java
heap.  I do not have enough memory to cache all the index contents in
RAM, but I can get a little less than half of it in the cache -- each
machine has about 56GB of cache available and contains around 135GB of
index data.  The index data is stored on a RAID10 array with six SATA
disks, so it's fairly fast, but nowhere near as fast as SSD.

You've already mentioned the SolrPerformanceProblems wiki page that I
wrote, which is where I would normally send you for more information. 
You said that your machine has 14GB of RAM and 4GB is allocated to Solr,
leaving about 10GB for caching.  That 10GB number assumes there's no
other software on the machine, like a database server or a webserver. 
How much index data is on the machine?  You need to count all the Solr
cores.  If the "10GB for caching" figure is accurate, then more than
about 20GB of index data means you might need more memory.  If it's more
than about 40GB of index data, you definitely need more memory.

A memory size of 14GB would be unusual for a physical machine, and makes
me wonder if you're using virtual machines.  Bare metal is always going
to offer better performance than a VM.  Another potential problem with
VMs is that the host system might have its memory oversubscribed -- the
total amount of memory in the host machine might be less than the total
amount of memory allocated to all the running virtual machines.  Solr
performance will be terrible if VM memory is oversubscribed.

Thanks,
Shawn



Re: Query Keyword Storage

2015-10-09 Thread Erik Hatcher
There’s no built-in query log handling, other than the (jetty) request logs.

More and more these days, folks are logging directly or processing log files 
back into Solr, in a separate collection, and driving analytics from that.   
You can do a lot with logstash + banana (https://github.com/LucidWorks/banana 
).  We, at Lucidworks, wrap all this up 
into our [excuse the commercial interruption] platform Fusion.  Fusion logs 
(optionally) all requests to the query pipeline to a logs collection and drive 
the Silk (banana) dashboard from that.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 




> On Oct 9, 2015, at 6:29 PM, Imtiaz Shakil Siddique  
> wrote:
> 
> Hi,
> 
> I'd like to know is there any built-in feature/plugin in solr that can
> store user query .
> 
> I know that I can always check the jetty server's log files which ships
> with solr for collecting user query. But is there any other better way? And
> If I needed to write a plugin for this case, what plugin should I extend?
> 
> Thank you.
> Imtiaz Shakil Siddique
> Senior Software Engineer
> Chorki Limited
> www.chorki.com



RE: which one is faster synonym_edismax & edismax faster?

2015-10-09 Thread Markus Jelsma
Hi - if you run a CPU sampler or profiler you will probably see it doesn't 
matter.
Markus

 
 
-Original message-
> From:Aman Tandon 
> Sent: Friday 9th October 2015 6:52
> To: solr-user@lucene.apache.org
> Subject: which one is faster synonym_edismax  edismax faster?
> 
> Hi,
> 
> Currently we are using the *synonym_edismax query parser* plugin to handle
> the multi-word synonym. I want to know which is more faster *edismax* or
> *synonym_edismax*.
> 
> As we are having the very less amount of multi-words in our dictionary so
> we are thinking to use standard edismax query parser.
> 
> Any suggestions or observations will be helpful.
> 
> With Regards
> Aman Tandon
> 


Re: Highlighting tag is not showing occasionally

2015-10-09 Thread Zheng Lin Edwin Yeo
I found that it could be due to the EdgeNGramFilterFactory. This issue
didn't happen if I did not apply the EdgeNGramFilterFactory filter for my
fieldType.

But does anyone knows why using the EdgeNGramFilterFactory will cause this
problem?

Regards,
Edwin


On 7 October 2015 at 17:46, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> Has anyone face the problem of when using highlighting, sometimes there
> are results which are returned, but there is no highlighting to the result
> (ie: no  tag).
>
> I found that there is a match in another field which I did not include in
> my hl.fl parameters when I do fl=*, but that same word acutally does appear
> in content.
>
> Would like to find out, why sometimes there is a match in content, but it
> is not highlighted (the word is not in the stopword list)? Did I make any
> mistakes in my configuration?
>
> I've include my highlighting request handler from solrconfig.xml here.
>
> 
> 
> explicit
> 10
> json
> true
> text
> id, title, content_type, last_modified, url, score 
>
> on
> id, title, content, author, tag
>   true
> true
> html
> 200
>
> true
> signature
> true
> 100
> 
> 
>
>
> Regards,
> Edwin
>


Re: Best Indexing Approaches - To max the throughput

2015-10-09 Thread Alessandro Benedetti
For doing what ?
We were talking for best approaches for both the single server
infrastructure or cloud one.


Cheers

On 8 October 2015 at 19:45, Susheel Kumar  wrote:

> The ConcurrentUpdateSolrClient is not cloud aware or takes zkHostString as
> input.  So only option is to use CloudSolrClient with SolrJ & Thread pool
> executor framework.
>
> On Thu, Oct 8, 2015 at 12:50 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > This depends of the number of active producers, but ideally it's ok.
> > Different threads will access the ThreadSafe ConcurrentUpdateSolrClient
> and
> > send the document in batches.
> >
> > Or you were meaning something different ?
> >
> >
> > On 8 October 2015 at 16:00, Mugeesh Husain  wrote:
> >
> > > Good way Using SolrJ with Thread pool executor framework, increase
> number
> > > of
> > > Thread as per your requirement
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Best-Indexing-Approaches-To-max-the-throughput-tp4232740p4233513.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Query Keyword Storage

2015-10-09 Thread Imtiaz Shakil Siddique
Hi,

I'd like to know is there any built-in feature/plugin in solr that can
store user query .

I know that I can always check the jetty server's log files which ships
with solr for collecting user query. But is there any other better way? And
If I needed to write a plugin for this case, what plugin should I extend?

Thank you.
Imtiaz Shakil Siddique
Senior Software Engineer
Chorki Limited
www.chorki.com


Re: Selective field query

2015-10-09 Thread Erick Erickson
Colin:

Adding =all to your query is your friend here, the
parsed_query.toString will show you exactly what
is searched against.

Best,
Erick

On Fri, Oct 9, 2015 at 2:09 AM, Colin Hunter  wrote:
> Ah ha...   the copy field...  makes sense.
> Thank You.
>
> On Fri, Oct 9, 2015 at 10:04 AM, Upayavira  wrote:
>
>>
>>
>> On Fri, Oct 9, 2015, at 09:54 AM, Colin Hunter wrote:
>> > Hi
>> >
>> > I am working on a complex search utility with an index created via data
>> > import from an extensive MySQL database.
>> > There are many ways in which the index is searched. One of the utility
>> > input fields searches only on a Service Name. However, if I target the
>> > query as q=ServiceName:"Searched service", this only returns an exact
>> > string match. If q=Searched Service, the query still returns results from
>> > all indexed data.
>> >
>> > Is there a way to construct a query to only return results from one field
>> > of a doc ?
>> > I have tried setting index=false, stored=true on unwanted fields, but
>> > these
>> > appear to have still been returned in results.
>>
>> q=ServiceName:(Searched Service)
>>
>> That'll look in just one field.
>>
>> Remember changing indexed to false doesn't impact the stuff already in
>> your index. And the reason you are likely getting all that stuff is
>> because you have a copyField that copies it over into the 'text' field.
>> If you'll never want to search on some fields, switch them to
>> index=false, make sure you aren't doing a copyField on them, and then
>> reindex.
>>
>> Upayavira
>>
>
>
>
> --
> www.gfc.uk.net


How do I set up custom collection cores?

2015-10-09 Thread espeake

We are installing Alfresco One 5.0.1 with solr4 on a server that has an
existing instance of tomcat7.  I am trying to find some better
documentation on how to setup our cores.  In the solr4.xml located
at /etc/tomcat7/Catalina/localhost has this inside of it.





Then at /data/alfresco/alf_data/solr4/solr4.xml I have:



  
  



This is what I get in the catalina.out showing that it is trying to create
collection1

2015-10-09 08:31:42,789  ERROR [solr.core.CoreContainer]
[coreLoadExecutor-5-thread-1] Unable to create core: collection1
 org.apache.solr.common.SolrException: Could not load core configuration
for core collection1
at org.apache.solr.core.ConfigSetService.getConfig
(ConfigSetService.java:66)
at org.apache.solr.core.CoreContainer.create
(CoreContainer.java:554)
at org.apache.solr.core.CoreContainer$1.call
(CoreContainer.java:261)
at org.apache.solr.core.CoreContainer$1.call
(CoreContainer.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call
(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker
(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Error loading solr config
from solr/collection1/solrconfig.xml
at org.apache.solr.core.SolrConfig.readFromResourceLoader
(SolrConfig.java:148)
at org.apache.solr.core.ConfigSetService.createSolrConfig
(ConfigSetService.java:79)
at org.apache.solr.core.ConfigSetService.getConfig
(ConfigSetService.java:61)
... 9 more
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in
classpath or '/var/lib/tomcat7/solr/collection1/conf'
at org.apache.solr.core.SolrResourceLoader.openResource
(SolrResourceLoader.java:362)
at org.apache.solr.core.SolrResourceLoader.openConfig
(SolrResourceLoader.java:308)
at org.apache.solr.core.Config.(Config.java:116)
at org.apache.solr.core.Config.(Config.java:86)
at org.apache.solr.core.SolrConfig.(SolrConfig.java:161)
at org.apache.solr.core.SolrConfig.readFromResourceLoader
(SolrConfig.java:144)
... 11 more
2015-10-09 08:31:42,808  ERROR [solr.core.CoreContainer]
[coreLoadExecutor-5-thread-1] null:org.apache.solr.common.SolrException:
Unable to create core: collection1
at org.apache.solr.core.CoreContainer.recordAndThrow
(CoreContainer.java:911)
at org.apache.solr.core.CoreContainer.create
(CoreContainer.java:568)
at org.apache.solr.core.CoreContainer$1.call
(CoreContainer.java:261)
at org.apache.solr.core.CoreContainer$1.call
(CoreContainer.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call
(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker
(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Could not load core
configuration for core collection1
at org.apache.solr.core.ConfigSetService.getConfig
(ConfigSetService.java:66)
at org.apache.solr.core.CoreContainer.create
(CoreContainer.java:554)
... 8 more
Caused by: org.apache.solr.common.SolrException: Error loading solr config
from solr/collection1/solrconfig.xml
at org.apache.solr.core.SolrConfig.readFromResourceLoader
(SolrConfig.java:148)
at org.apache.solr.core.ConfigSetService.createSolrConfig
(ConfigSetService.java:79)
at org.apache.solr.core.ConfigSetService.getConfig
(ConfigSetService.java:61)
... 9 more
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in
classpath or '/var/lib/tomcat7/solr/collection1/conf'
at org.apache.solr.core.SolrResourceLoader.openResource
(SolrResourceLoader.java:362)
at org.apache.solr.core.SolrResourceLoader.openConfig
(SolrResourceLoader.java:308)
at org.apache.solr.core.Config.(Config.java:116)
at org.apache.solr.core.Config.(Config.java:86)
at org.apache.solr.core.SolrConfig.(SolrConfig.java:161)
at org.apache.solr.core.SolrConfig.readFromResourceLoader
(SolrConfig.java:144)

When I try to define a docBase in
the /etc/tomcat7/Catalina/localhost/solr4.xml file catalina.out logs has
this:

WARNING: A docBase /var/lib/tomcat7/webapps/solr4/WEB-INF/lib inside the
host appBase has been specified, and will be ignored

With this in here as the entire file:



  
  


I get this in the catalina.out file:

INFO: Deploying configuration
descriptor 

Re: [SolrJ] Indexing Java Map into Solr

2015-10-09 Thread Erick Erickson
Hmmm, what does the code look like for Java? One of the cardinal sins
of indexing with SolrJ is sending docs one at a time rather than as
batches of at least 100 (I usually use 1,000). See:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/

One technique I often use to chase this kind of thing down:
comment out just the
server.add()
call. That determines whether the time is spent acquiring the docs
or actually sending them to Solr, at least that tells you where to start
looking.

If commenting that out substantially speeds up your throughput _and_
you're batching, then check the CPU utilization. If it's not very high, you
can add a bunch more clients/ threads

Bottom line: I'm doubtful that parsing your input is all that expensive, but
what do I know? Until you can actually pinpoint where the time is being
spent it's all guesswork, so a profiler seems in order.

Best,
Erick

On Fri, Oct 9, 2015 at 8:44 AM, Alessandro Benedetti
 wrote:
> Hi guys,
> I was evaluating an Indexer application.
> This application takes in input a Collection of Objects that are basically
> Java Maps.
> This is for covering Solr side a big group of dynamic fields basically and
> avoid that complexity java side.
>
> Let's go to the point, currently the indexing approach is trough the Json
> IndexUpdateHandler . It means that each object is now serialised in Json
> and sent in a batch to the Index handler.
>
> If possible I would like to speed up this, moving to javabin indexing ( as
> discussed previously).
> But apparently is slower ( I guess because of the conversion between the
> Map object and the SolrDocument that need to happen before the indexing and
> that is not straightforward for business logic) .
>
> Is there any better way to index java maps, apart simply converting it to
> SolrInputDocuments on the fly before sending it to the SolrClient ?
>
> Cheers
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


[SolrJ] Indexing Java Map into Solr

2015-10-09 Thread Alessandro Benedetti
Hi guys,
I was evaluating an Indexer application.
This application takes in input a Collection of Objects that are basically
Java Maps.
This is for covering Solr side a big group of dynamic fields basically and
avoid that complexity java side.

Let's go to the point, currently the indexing approach is trough the Json
IndexUpdateHandler . It means that each object is now serialised in Json
and sent in a batch to the Index handler.

If possible I would like to speed up this, moving to javabin indexing ( as
discussed previously).
But apparently is slower ( I guess because of the conversion between the
Map object and the SolrDocument that need to happen before the indexing and
that is not straightforward for business logic) .

Is there any better way to index java maps, apart simply converting it to
SolrInputDocuments on the fly before sending it to the SolrClient ?

Cheers

-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr Pagination

2015-10-09 Thread Erick Erickson
bq: 10GB JVM as mentioned here...and they were getting 140 ms response
time for 10 Billion documents

This simply could _not_ work in a single shard as there's a hard 2B
doc limit per shard. On slide 14
it states "both collections are sharded". They are not fitting 10B
docs in 10G of JVM on a single
machine. Trust me on this ;). The slides do not state how many shards they've
split their collection into, but I suspect it's a bunch. Each
application is different enough that the
numbers wouldn't translate anyway...

70M docs can fit on a single shard with quite good response time, but
YMMV. You simply
have to experiment. Here's a long blog on the subject:
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Start with a profiler and see where you're spending your time. My
first guess is that
you're spending a lot of CPU cycles in garbage collection. This
sometimes happens
when you are running near your JVM limit, a GC kicks in and recovers a
tiny bit of memory
and then initiates another GC cycle immediately. Turn on GC logging
and take a look
at the stats provided, see:
https://lucidworks.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/

Tens of seconds is entirely unexpected though. Do the Solr logs point
to anything happening?

Best,
Erick

On Fri, Oct 9, 2015 at 8:51 AM, Salman Ansari  wrote:
> Thanks Eric for your response. If you find pagination is not the main
> culprit, what other factors do you guys suggest I need to tweak to test
> that? As I mentioned, by navigating to 2 results using start and row I
> am getting time out from Solr.NET and I need a way to fix that.
>
> You suggested that 4GB JVM is not enough, I have seen MapQuest going with
> 10GB JVM as mentioned here
> http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
> and they were getting 140 ms response time for 10 Billion documents. Not
> sure how many shards they had though. With data of around 70M documents,
> what do you guys suggest as how many shards should I use and how much
> should I dedicate for RAM and JVM?
>
> Regards,
> Salman
>
> On Fri, Oct 9, 2015 at 6:37 PM, Erick Erickson 
> wrote:
>
>> I think paging is something of a red herring. You say:
>>
>> bq: but still I get delays of around 16 seconds and sometimes even more.
>>
>> Even for a start of 1,000, this is ridiculously long for Solr. All
>> you're really saving
>> here is keeping a record of the id and score for a list 1,000 cells
>> long (or even
>> 20,000 assuming 1,000 pages and 20 docs/page). that's somewhat wasteful,
>> but it's still hard to believe it's responsible for what you're seeing.
>>
>> Having 4G of RAM for 70M docs is very little memory, assuming this is on
>> a single shard.
>>
>> So my suspicion is that you have something fundamentally slow about
>> your system, the additional overhead shouldn't be as large as you're
>> reporting.
>>
>> And I'll second Toke's comment. It's very rare that users see anything
>> _useful_ by navigating that deep. Make them hit next next next and they'll
>> tire out way before that.
>>
>> Cursor mark's sweet spot is handling some kind of automated process that
>> goes through the whole result set. It'll work for what you're trying
>> to do though.
>>
>> Best,
>> Erick
>>
>> On Fri, Oct 9, 2015 at 8:27 AM, Salman Ansari 
>> wrote:
>> > Is this a real problem or a worry? Do you have users that page really
>> deep
>> > and if so, have you considered other mechanisms for delivering what they
>> > need?
>> >
>> > The issue is that currently I have around 70M documents and some generic
>> > queries are resulting in lots of pages. Now if I try deep navigation (to
>> > page# 1000 for example), a lot of times the query takes so long that
>> > Solr.NET throws operation time out exception. The first page is
>> relatively
>> > faster to load but it does take around few seconds as well. After reading
>> > some documentation I realized that cursors could help and it does. I have
>> > tried to following the test better performance:
>> >
>> > 1) Used cursors instead of start and row
>> > 2) Increased the RAM on my Solr machine to 14GB
>> > 3) Increase the JVM on that machine to 4GB
>> > 4) Increased the filterChache
>> > 5) Increased the docCache
>> > 6) Run Optimize on the Solr Admin
>> >
>> > but still I get delays of around 16 seconds and sometimes even more.
>> > What other mechanisms do you suggest I should use to handle this issue?
>> >
>> > While pagination is faster than increasing the start parameter, the
>> > difference is small as long as you stay below a start of 1000. 10K might
>> > also work for you. Do your users page beyond that?
>> > I can limit users not to go beyond 10K but still think at that level
>> > cursors will be much faster than increasing the start variable as
>> explained
>> > here (
>> 

Re: Is solr.StandardDirectoryFactory an MMapDirectory?

2015-10-09 Thread Eric Torti
Ok, thanks Shawn!

That makes sense. We'll be experimenting with it.

Best,
Eric

On Wed, Oct 7, 2015 at 5:54 PM, Shawn Heisey  wrote:
> On 10/7/2015 12:00 PM, Eric Torti wrote:
>> Can we read "high reopen rate" as "frequent soft commits"? (In our
>> case, hard commits do not open a searcher. But soft commits do).
>>
>> Considering it does mean "frequent soft commits", I'd say that it
>> doesn't fit our setup because we have an index rate of about 10
>> updates/s and we perform a soft commit at each 15min. So our scenario
>> is not near real time in that sense. In light of this, do you thing
>> using NRTCachingDirectory is still convenient?
>
> The NRT factory achieves high speed in NRT situations by flushing very
> small updates to RAM instead of the disk.  As more updates come in,
> older index segments sitting in RAM will eventually be flushed to disk,
> so a sustained flood of updates doesn't really achieve a speed increase,
> but a short burst of updates will be searchable *very* quickly.
>
> NRTCachingDirectoryFactory was chosen for Solr examples (and I think
> it's the Solr default) because it has no real performance downsides, but
> has a strong possibility to be noticeably faster than the standard
> factory in NRT situations.
>
> The only problem with it is that small index segments from recent
> updates might only exist in RAM, and not get flushed to disk, so they
> would be lost if Solr dies or is killed suddenly.  This is part of why
> the updateLog feature exists -- when Solr is started, the transaction
> logs will be replayed, inserting/replacing (at a minimum) all documents
> indexed since the last hard commit.  When the replay is finished, you
> will not lose data.  This does require a defined uniqueKey to operate
> correctly.
>
> Thanks,
> Shawn
>


Re: Solr Pagination

2015-10-09 Thread Salman Ansari
Thanks Eric for your response. If you find pagination is not the main
culprit, what other factors do you guys suggest I need to tweak to test
that? As I mentioned, by navigating to 2 results using start and row I
am getting time out from Solr.NET and I need a way to fix that.

You suggested that 4GB JVM is not enough, I have seen MapQuest going with
10GB JVM as mentioned here
http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
and they were getting 140 ms response time for 10 Billion documents. Not
sure how many shards they had though. With data of around 70M documents,
what do you guys suggest as how many shards should I use and how much
should I dedicate for RAM and JVM?

Regards,
Salman

On Fri, Oct 9, 2015 at 6:37 PM, Erick Erickson 
wrote:

> I think paging is something of a red herring. You say:
>
> bq: but still I get delays of around 16 seconds and sometimes even more.
>
> Even for a start of 1,000, this is ridiculously long for Solr. All
> you're really saving
> here is keeping a record of the id and score for a list 1,000 cells
> long (or even
> 20,000 assuming 1,000 pages and 20 docs/page). that's somewhat wasteful,
> but it's still hard to believe it's responsible for what you're seeing.
>
> Having 4G of RAM for 70M docs is very little memory, assuming this is on
> a single shard.
>
> So my suspicion is that you have something fundamentally slow about
> your system, the additional overhead shouldn't be as large as you're
> reporting.
>
> And I'll second Toke's comment. It's very rare that users see anything
> _useful_ by navigating that deep. Make them hit next next next and they'll
> tire out way before that.
>
> Cursor mark's sweet spot is handling some kind of automated process that
> goes through the whole result set. It'll work for what you're trying
> to do though.
>
> Best,
> Erick
>
> On Fri, Oct 9, 2015 at 8:27 AM, Salman Ansari 
> wrote:
> > Is this a real problem or a worry? Do you have users that page really
> deep
> > and if so, have you considered other mechanisms for delivering what they
> > need?
> >
> > The issue is that currently I have around 70M documents and some generic
> > queries are resulting in lots of pages. Now if I try deep navigation (to
> > page# 1000 for example), a lot of times the query takes so long that
> > Solr.NET throws operation time out exception. The first page is
> relatively
> > faster to load but it does take around few seconds as well. After reading
> > some documentation I realized that cursors could help and it does. I have
> > tried to following the test better performance:
> >
> > 1) Used cursors instead of start and row
> > 2) Increased the RAM on my Solr machine to 14GB
> > 3) Increase the JVM on that machine to 4GB
> > 4) Increased the filterChache
> > 5) Increased the docCache
> > 6) Run Optimize on the Solr Admin
> >
> > but still I get delays of around 16 seconds and sometimes even more.
> > What other mechanisms do you suggest I should use to handle this issue?
> >
> > While pagination is faster than increasing the start parameter, the
> > difference is small as long as you stay below a start of 1000. 10K might
> > also work for you. Do your users page beyond that?
> > I can limit users not to go beyond 10K but still think at that level
> > cursors will be much faster than increasing the start variable as
> explained
> > here (
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
> > ), have you tried both ways on your collection and it was giving you
> > similar results?
> >
> > On Fri, Oct 9, 2015 at 5:20 PM, Toke Eskildsen 
> > wrote:
> >
> >> Salman Ansari  wrote:
> >>
> >> [Pagination with cursors]
> >>
> >> > For example, what happens if the user navigates from page 1 to page 2,
> >> > does the front end  need to store the next cursor at each query?
> >>
> >> Yes.
> >>
> >> > What about going to a previous page, do we need to store all cursors
> >> > that have been navigated up to now at the client side?
> >>
> >> Yes, if you want to provide that functionality.
> >>
> >> Is this a real problem or a worry? Do you have users that page really
> deep
> >> and if so, have you considered other mechanisms for delivering what they
> >> need?
> >>
> >> While pagination is faster than increasing the start parameter, the
> >> difference is small as long as you stay below a start of 1000. 10K might
> >> also work for you. Do your users page beyond that?
> >>
> >> - Toke Eskildsen
> >>
>


Re: Exclude documents having same data in two fields

2015-10-09 Thread Aman Tandon
okay Thanks

With Regards
Aman Tandon

On Fri, Oct 9, 2015 at 4:25 PM, Upayavira  wrote:

> Just beware of performance here. This is fine for smaller indexes, but
> for larger ones won't work so well. It will need to do this calculation
> for every document in your index, thereby undoing all benefits of having
> an inverted index.
>
> If your index (or resultset) is small enough, it can work, but might
> catch you out later.
>
> Upayavira
>
> On Fri, Oct 9, 2015, at 10:59 AM, Aman Tandon wrote:
> > Hi,
> >
> > I tried to use the same as mentioned in the url
> > <
> http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal
> >
> > .
> >
> > And I used the description field to check because mapping field
> > is multivalued.
> >
> > So I add the fq={!frange%20l=0%20u=1}strdist(title,description,edit) in
> > my
> > url, but I am getting this error. As mentioned below. Please take a look.
> >
> > *Solr Version 4.8.1*
> >
> > *Url is*
> >
> http://localhost:8150/solr/core1/select?q.alt=*:*=big*,title,catid={!frange%20l=0%20u=1}strdist(title,description,edit)=edismax
> >
> > > 
> > > 
> > > 500
> > > 8
> > > 
> > > *:*
> > > edismax
> > > big*,title,catid
> > > {!frange l=0 u=1}strdist(title,description,edit)
> > > 
> > > 
> > > 
> > > 
> > > java.lang.RuntimeException at
> > >
> org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.(ExtendedDismaxQParser.java:1455)
> > > at
> > >
> org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239)
> > > at
> > >
> org.apache.solr.search.ExtendedDismaxQParser.(ExtendedDismaxQParser.java:108)
> > > at
> > >
> org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37)
> > > at org.apache.solr.search.QParser.getParser(QParser.java:315) at
> > >
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144)
> > > at
> > >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
> > > at
> > >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at
> > >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
> > > at
> > >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> > > at
> > >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> > > at
> > >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> > > at
> > >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> > > at
> > >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> > > at
> > >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> > > at
> > >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> > > at
> > >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> > > at
> > >
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
> > > at
> > >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> > > at
> > >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> > > at
> > >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
> > > at
> > >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> > > at
> > >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
> > > at
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > at
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > at java.lang.Thread.run(Thread.java:745)
> > > 
> > > 500
> > > 
> > > 
> > >
> >
> > With Regards
> > Aman Tandon
> >
> > On Thu, Oct 8, 2015 at 8:07 PM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > Hi agree with Nutch,
> > > using the Function Range Query Parser, should do your trick :
> > >
> > >
> > >
> https://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html
> > >
> > > Cheers
> > >
> > > On 8 October 2015 at 13:31, NutchDev  wrote:
> > >
> > > > Hi Aman,
> > > >
> > > > Have a look at this , it has query time approach also using Solr
> function
> > > > query,
> > > >
> > > >
> > > >
> > >
> http://stackoverflow.com/questions/15927893/how-to-check-equality-of-two-solr-fields
> > > >
> > > >
> > >
> http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> 

Solr Pagination

2015-10-09 Thread Salman Ansari
Hi guys,

I have been working with Solr and Solr.NET for some time for a big project
that requires around 300M documents. Consequently, I faced an issue and I
am highlighting it here in case you have any comments:

As mentioned here (
https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results),
cursors are introduced to solve the problem of pagination. However, I was
not able to find an example to do proper handling of page navigation with
multiple users. For example, what happens if the user navigates from page 1
to page 2, does the front end  need to store the next cursor at each query?
What about going to a previous page, do we need to store all cursors that
have been navigated up to now at the client side? Any comments/sample on
how proper pagination should be handled using cursors?

Regards,
Salman


Re: Exclude documents having same data in two fields

2015-10-09 Thread Upayavira
Just beware of performance here. This is fine for smaller indexes, but
for larger ones won't work so well. It will need to do this calculation
for every document in your index, thereby undoing all benefits of having
an inverted index.

If your index (or resultset) is small enough, it can work, but might
catch you out later.

Upayavira

On Fri, Oct 9, 2015, at 10:59 AM, Aman Tandon wrote:
> Hi,
> 
> I tried to use the same as mentioned in the url
> 
> .
> 
> And I used the description field to check because mapping field
> is multivalued.
> 
> So I add the fq={!frange%20l=0%20u=1}strdist(title,description,edit) in
> my
> url, but I am getting this error. As mentioned below. Please take a look.
> 
> *Solr Version 4.8.1*
> 
> *Url is*
> http://localhost:8150/solr/core1/select?q.alt=*:*=big*,title,catid={!frange%20l=0%20u=1}strdist(title,description,edit)=edismax
> 
> > 
> > 
> > 500
> > 8
> > 
> > *:*
> > edismax
> > big*,title,catid
> > {!frange l=0 u=1}strdist(title,description,edit)
> > 
> > 
> > 
> > 
> > java.lang.RuntimeException at
> > org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.(ExtendedDismaxQParser.java:1455)
> > at
> > org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239)
> > at
> > org.apache.solr.search.ExtendedDismaxQParser.(ExtendedDismaxQParser.java:108)
> > at
> > org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37)
> > at org.apache.solr.search.QParser.getParser(QParser.java:315) at
> > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144)
> > at
> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> > at
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> > at
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> > at
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> > at
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> > at
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
> > at
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> > at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> > at
> > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
> > at
> > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> > at
> > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> > 
> > 500
> > 
> > 
> >
> 
> With Regards
> Aman Tandon
> 
> On Thu, Oct 8, 2015 at 8:07 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
> 
> > Hi agree with Nutch,
> > using the Function Range Query Parser, should do your trick :
> >
> >
> > https://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html
> >
> > Cheers
> >
> > On 8 October 2015 at 13:31, NutchDev  wrote:
> >
> > > Hi Aman,
> > >
> > > Have a look at this , it has query time approach also using Solr function
> > > query,
> > >
> > >
> > >
> > http://stackoverflow.com/questions/15927893/how-to-check-equality-of-two-solr-fields
> > >
> > >
> > http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> > http://lucene.472066.n3.nabble.com/Exclude-documents-having-same-data-in-two-fields-tp4233408p4233489.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > 

Re: Solr Pagination

2015-10-09 Thread Toke Eskildsen
Salman Ansari  wrote:

[Pagination with cursors]

> For example, what happens if the user navigates from page 1 to page 2,
> does the front end  need to store the next cursor at each query?

Yes.

> What about going to a previous page, do we need to store all cursors
> that have been navigated up to now at the client side?

Yes, if you want to provide that functionality.

Is this a real problem or a worry? Do you have users that page really deep and 
if so, have you considered other mechanisms for delivering what they need? 

While pagination is faster than increasing the start parameter, the difference 
is small as long as you stay below a start of 1000. 10K might also work for 
you. Do your users page beyond that?

- Toke Eskildsen


OverseerCollectionMessageHandler logging

2015-10-09 Thread Alan Woodward
Hi all,

The OverseerCollectionMessageHandler logs all messages that it processes at 
WARN level, which seems wrong?  Particularly as it handles OVERSEERSTATUS 
messages, which means that monitoring systems can trigger warnings all over the 
place.  Is there a specific reason for this, or should I change it to INFO?

Alan Woodward
www.flax.co.uk




Re: Solr Pagination

2015-10-09 Thread Salman Ansari
I agree 10B will not be residing on the same machine :)

About the other issue you raised, while submitting the query to Solr I was
keeping a close eye on RAM and JVM consumption on Solr Admin and for
queries at the beginning that were taking most of the time, neither RAM nor
JVM was hitting the limit so I doubt that is the problem. For reference, I
did have an issue with JVM raising an exception of "Out of Memory" when it
was around 500MB but then I raised the machine capacity to 14GB RAM and 4GB
JVM.  I have read here (
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache) that
for best performance I should be able to put my entire collection in
memory. Does that sound reasonable?

As for the logs, I searched for "Salman" with rows=10 and start=1000 and it
took about 29 seconds to complete. However, it took less at each shard as
shown in the log file

INFO  - 2015-10-09 16:43:39.170; [c:sabr102 s:shard1 r:core_node4
x:sabr102_shard1_replica2] org.apache.solr.core.SolrCore;
[sabr102_shard1_replica2] webapp=/solr path=/select
params={distrib=false=javabin=2=1010=text=id=score=http://
[MySolrIP]:8983/solr/sabr102_shard1_replica1/|
http://100.114.184.37:7574/solr/sabr102_shard1_replica2/=109019061=0=4=(content_text:Salman)=true=true=false}
hits=1819 status=0 QTime=91

INFO  - 2015-10-09 16:44:08.116; [c:sabr102 s:shard1 r:core_node4
x:sabr102_shard1_replica2] org.apache.solr.core.SolrCore;
[sabr102_shard1_replica2] webapp=/solr path=/select
params={ids=584673511333089281,584680513887010816,584697461744111616,584668540118044672,583299685516984320=false=javabin=2=10=text=
http://100.114.184.37:8983/solr/sabr102_shard1_replica1/|http://[MySolrIP]:7574/solr/sabr102_shard1_replica2/=109019061=1000=64=(content_text:Salman)=true=false}
status=0 QTime=4

the search in the second shard started AFTER 29 seconds. Any logic behind
what I am seeing here?

Moreover, I do understand that everyone's need is different and I do need
to prototype, but there must be strategies to follow even when prototyping,
that is what I am looking forward to hear from you and the community. My
concurrent users are not that much, but I do have a good amount of data to
be stored/indexed in Solr and even if one user is not able to execute
queries quite efficiently, that will be problematic.

Regards,
Salman

On Fri, Oct 9, 2015 at 7:06 PM, Erick Erickson 
wrote:

> bq: 10GB JVM as mentioned here...and they were getting 140 ms response
> time for 10 Billion documents
>
> This simply could _not_ work in a single shard as there's a hard 2B
> doc limit per shard. On slide 14
> it states "both collections are sharded". They are not fitting 10B
> docs in 10G of JVM on a single
> machine. Trust me on this ;). The slides do not state how many shards
> they've
> split their collection into, but I suspect it's a bunch. Each
> application is different enough that the
> numbers wouldn't translate anyway...
>
> 70M docs can fit on a single shard with quite good response time, but
> YMMV. You simply
> have to experiment. Here's a long blog on the subject:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Start with a profiler and see where you're spending your time. My
> first guess is that
> you're spending a lot of CPU cycles in garbage collection. This
> sometimes happens
> when you are running near your JVM limit, a GC kicks in and recovers a
> tiny bit of memory
> and then initiates another GC cycle immediately. Turn on GC logging
> and take a look
> at the stats provided, see:
> https://lucidworks.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/
>
> Tens of seconds is entirely unexpected though. Do the Solr logs point
> to anything happening?
>
> Best,
> Erick
>
> On Fri, Oct 9, 2015 at 8:51 AM, Salman Ansari 
> wrote:
> > Thanks Eric for your response. If you find pagination is not the main
> > culprit, what other factors do you guys suggest I need to tweak to test
> > that? As I mentioned, by navigating to 2 results using start and row
> I
> > am getting time out from Solr.NET and I need a way to fix that.
> >
> > You suggested that 4GB JVM is not enough, I have seen MapQuest going with
> > 10GB JVM as mentioned here
> >
> http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
> > and they were getting 140 ms response time for 10 Billion documents. Not
> > sure how many shards they had though. With data of around 70M documents,
> > what do you guys suggest as how many shards should I use and how much
> > should I dedicate for RAM and JVM?
> >
> > Regards,
> > Salman
> >
> > On Fri, Oct 9, 2015 at 6:37 PM, Erick Erickson 
> > wrote:
> >
> >> I think paging is something of a red herring. You say:
> >>
> >> bq: but still I get delays of around 16 seconds and sometimes even more.
> >>
> >> Even for a start of 

Re: Solr Pagination

2015-10-09 Thread Toke Eskildsen
Salman Ansari  wrote:

> As for the logs, I searched for "Salman" with rows=10 and start=1000 and it
> took about 29 seconds to complete. However, it took less at each shard as
> shown in the log file

> [...] QTime=91
> [...] QTime=4

> the search in the second shard started AFTER 29 seconds. Any logic behind
> what I am seeing here?

It shows that the shard-searches themselves is not what is slowing you down. 
Are the returned documents very large? Try setting fl=id,score and see if it 
brings response times below 1 second.

- Toke Eskildsen


Re: Solr Pagination

2015-10-09 Thread Erick Erickson
OK, this makes very little sense. The individual queries are taking < 100ms
yet the total response is 29 seconds. I do note that one of your
queries has rows=1010, a typo?

Anyway, not at all sure what's going on here. If these are gigantic files you're
returning, then it could be decompressing time, unlikely but possible.

Try again with rows=0=1000 to see if it's something weird with getting
the stored data, but that's highly doubtful.

I think the only real way to get to the bottom of it will be to slap a profiler
on it and see where the time is being spent.

Best,
Erick

On Fri, Oct 9, 2015 at 9:53 AM, Toke Eskildsen  wrote:
> Salman Ansari  wrote:
>> Thanks Eric for your response. If you find pagination is not the main
>> culprit, what other factors do you guys suggest I need to tweak to test
>> that?
>
> Well, is basic search slow? What are your response times for plain un-warmed 
> top-20 searches?
>
>> As I mentioned, by navigating to 2 results using start and row I
>> am getting time out from Solr.NET and I need a way to fix that.
>
> You still haven't answered my question: Do your users actually need to page 
> that far?
>
>
> Again: I know there can be 10 million results. Why would your users need to 
> page through all of them? Why would they need to page trough just the first 
> 1000? What are they trying to achieve?
>
> If they used it automatically for full export of the result set, then I can 
> understand it, but you talked about next & previous page, which indicates 
> that this is a manual process. A manual process that requires clicking next 
> 1000 times is a severe indicator that something can be done differently.
>
> - Toke Eskildsen


Re: How do I set up custom collection cores?

2015-10-09 Thread Shawn Heisey
On 10/9/2015 10:03 AM, espe...@oreillyauto.com wrote:
> We are installing Alfresco One 5.0.1 with solr4 on a server that has an
> existing instance of tomcat7.  I am trying to find some better
> documentation on how to setup our cores.  In the solr4.xml located



> Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in
> classpath or '/var/lib/tomcat7/solr/collection1/conf'
> at org.apache.solr.core.SolrResourceLoader.openResource
> (SolrResourceLoader.java:362)
> at org.apache.solr.core.SolrResourceLoader.openConfig
> (SolrResourceLoader.java:308)
> at org.apache.solr.core.Config.(Config.java:116)
> at org.apache.solr.core.Config.(Config.java:86)
> at org.apache.solr.core.SolrConfig.(SolrConfig.java:161)
> at org.apache.solr.core.SolrConfig.readFromResourceLoader
> (SolrConfig.java:144)

Solr can't find the config for the collection1 core.

> When I try to define a docBase in
> the /etc/tomcat7/Catalina/localhost/solr4.xml file catalina.out logs has
> this:

It looks like docBase needs to point to the war file.  If you want to
change where Solr puts its data, you need to define either the
solr.solr.home java system property (on the java commandline --
-Dsolr.solr.home=/my/path) or the solr/home JNDI property.

https://wiki.apache.org/solr/SolrTomcat#Configuring_Solr_Home_with_JNDI

Exactly what file/directory layout you need is dependent on the precise
Solr version and whether your solr.xml file (not the solr4.xml you
mentioned -- that's for your container and is not used by solr) is in
the new or old format.  The solr.xml file lives in the solr home directory.

http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond
https://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

Thanks,
Shawn



Re: Solr Pagination

2015-10-09 Thread Toke Eskildsen
Salman Ansari  wrote:
> Thanks Eric for your response. If you find pagination is not the main
> culprit, what other factors do you guys suggest I need to tweak to test
> that?

Well, is basic search slow? What are your response times for plain un-warmed 
top-20 searches?

> As I mentioned, by navigating to 2 results using start and row I
> am getting time out from Solr.NET and I need a way to fix that.

You still haven't answered my question: Do your users actually need to page 
that far?


Again: I know there can be 10 million results. Why would your users need to 
page through all of them? Why would they need to page trough just the first 
1000? What are they trying to achieve?

If they used it automatically for full export of the result set, then I can 
understand it, but you talked about next & previous page, which indicates that 
this is a manual process. A manual process that requires clicking next 1000 
times is a severe indicator that something can be done differently.

- Toke Eskildsen


Re: Exclude documents having same data in two fields

2015-10-09 Thread Aman Tandon
Thanks Mikhail the suggestion. I will try that on monday will let you know.

*@*Walter This was just an random requirement to find those fields which
are not same and then reindex only those. I can full index but I was
wondering if there might some function or something.

With Regards
Aman Tandon

On Fri, Oct 9, 2015 at 9:05 PM, Mikhail Khludnev  wrote:

> Aman,
>
> You can invoke Terms Component for the filed M, let it returns terms:
> {a,c,d,f}
> then you invoke it for field T let it return {b,c,f,e},
> then you intersect both lists (it's quite romantic if they are kept
> ordered), you've got {c,f}
> and then you applies filter:
> fq=-((+M:c +T:c) (+M:f +T:f))
> etc
>
>
> On Thu, Oct 8, 2015 at 8:29 AM, Aman Tandon 
> wrote:
>
> > Hi,
> >
> > Is there a way in solr to remove all those documents from the search
> > results in which two of the fields, *mapping* and  *title* is the exactly
> > same.
> >
> > With Regards
> > Aman Tandon
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Exclude documents having same data in two fields

2015-10-09 Thread Susheel Kumar
Hi Aman,  Did the problem resolved or still having some errors.

Thnx

On Fri, Oct 9, 2015 at 8:28 AM, Aman Tandon  wrote:

> okay Thanks
>
> With Regards
> Aman Tandon
>
> On Fri, Oct 9, 2015 at 4:25 PM, Upayavira  wrote:
>
> > Just beware of performance here. This is fine for smaller indexes, but
> > for larger ones won't work so well. It will need to do this calculation
> > for every document in your index, thereby undoing all benefits of having
> > an inverted index.
> >
> > If your index (or resultset) is small enough, it can work, but might
> > catch you out later.
> >
> > Upayavira
> >
> > On Fri, Oct 9, 2015, at 10:59 AM, Aman Tandon wrote:
> > > Hi,
> > >
> > > I tried to use the same as mentioned in the url
> > > <
> >
> http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal
> > >
> > > .
> > >
> > > And I used the description field to check because mapping field
> > > is multivalued.
> > >
> > > So I add the fq={!frange%20l=0%20u=1}strdist(title,description,edit) in
> > > my
> > > url, but I am getting this error. As mentioned below. Please take a
> look.
> > >
> > > *Solr Version 4.8.1*
> > >
> > > *Url is*
> > >
> >
> http://localhost:8150/solr/core1/select?q.alt=*:*=big*,title,catid={!frange%20l=0%20u=1}strdist(title,description,edit)=edismax
> > >
> > > > 
> > > > 
> > > > 500
> > > > 8
> > > > 
> > > > *:*
> > > > edismax
> > > > big*,title,catid
> > > > {!frange l=0 u=1}strdist(title,description,edit)
> > > > 
> > > > 
> > > > 
> > > > 
> > > > java.lang.RuntimeException at
> > > >
> >
> org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.(ExtendedDismaxQParser.java:1455)
> > > > at
> > > >
> >
> org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239)
> > > > at
> > > >
> >
> org.apache.solr.search.ExtendedDismaxQParser.(ExtendedDismaxQParser.java:108)
> > > > at
> > > >
> >
> org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37)
> > > > at org.apache.solr.search.QParser.getParser(QParser.java:315) at
> > > >
> >
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144)
> > > > at
> > > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
> > > > at
> > > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at
> > > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
> > > > at
> > > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> > > > at
> > > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> > > > at
> > > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> > > > at
> > > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> > > > at
> > > >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> > > > at
> > > >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> > > > at
> > > >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> > > > at
> > > >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> > > > at
> > > >
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
> > > > at
> > > >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> > > > at
> > > >
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> > > > at
> > > >
> >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
> > > > at
> > > >
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> > > > at
> > > >
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
> > > > at
> > > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > > at
> > > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > > at java.lang.Thread.run(Thread.java:745)
> > > > 
> > > > 500
> > > > 
> > > > 
> > > >
> > >
> > > With Regards
> > > Aman Tandon
> > >
> > > On Thu, Oct 8, 2015 at 8:07 PM, Alessandro Benedetti <
> > > benedetti.ale...@gmail.com> wrote:
> > >
> > > > Hi agree with Nutch,
> > > > using the Function Range Query Parser, should do your trick :
> > > >
> > > >
> > > >
> >
> https://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html
> > > >
> > > > Cheers
> > > >
> > > > On 8 October 2015 at 13:31, NutchDev 
> wrote:
> > > >
> > > > > Hi Aman,
> > > > >
> > > > 

Re: Solr Pagination

2015-10-09 Thread Salman Ansari
> Thanks Eric for your response. If you find pagination is not the main
> culprit, what other factors do you guys suggest I need to tweak to test
> that?
Well, is basic search slow? What are your response times for plain
un-warmed top-20 searches?

I have restarted Solr and I have tried running a query "Football" on Solr
and here are the results
for start=0, rows=10 it took around 3.391 seconds
for start=1000, rows=10 it took around 21.569 seconds *(btw, after trying
the query the second time, it took around 332 ms, could you explain this
behavior?)*
I am not quite sure what do you mean by un-warmed search, but I do have
autowarmed set to true for filtercache
btw, here is the log for both queries and it looks like that indeed it does
take that long for Solr to query

INFO  - 2015-10-09 18:46:17.937; [c:sabr102 s:shard2 r:core_node1
x:sabr102_shard2_replica1] org.apache.solr.core.SolrCore;
[sabr102_shard2_replica1] webapp=/solr path=/select
params={ids=592367114956177408,590296378955407362,585347065619750912,584382847948951552=false=javabin=2=10=text=http://
[MySolrIP]:8983/solr/sabr102_shard2_replica1/|http://[MySolrIP]:7574/solr/sabr102_shard2_replica2/=116374563=0=64=(content_text:Football)=true=false}
status=0 QTime=13
INFO  - 2015-10-09 18:46:17.953; [c:sabr102 s:shard1 r:core_node2
x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
[sabr102_shard1_replica1] webapp=/solr path=/select
params={start=0=(content_text:Football)=10} hits=24408 status=0
QTime=3391

INFO  - 2015-10-09 18:46:43.207; [c:sabr102 s:shard2 r:core_node1
x:sabr102_shard2_replica1] org.apache.solr.core.SolrCore;
[sabr102_shard2_replica1] webapp=/solr path=/select
params={distrib=false=javabin=2=1010=text=id=score=http://
[MySolrIP]:8983/solr/sabr102_shard2_replica1/|http://[MySolrIP]:7574/solr/sabr102_shard2_replica2/=116403161=0=4=(content_text:Football)=true=true=false}
hits=12198 status=0 QTime=32
INFO  - 2015-10-09 18:47:04.727; [c:sabr102 s:shard1 r:core_node2
x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
[sabr102_shard1_replica1] webapp=/solr path=/select
params={start=1000=(content_text:Football)=10} hits=24408 status=0
QTime=21569


> As I mentioned, by navigating to 2 results using start and row I
> am getting time out from Solr.NET and I need a way to fix that.
You still haven't answered my question: Do your users actually need to page
that far?

No, they do not need to navigate to that level but I was checking the edge
cases. Moreover, based on my previous query results, even navigating to the
100th page (1000 results as each page has 10 results, which they can easily
do from the query strings in the URL or jumping bunch of pages at once in
the UI as I am giving access to 10 pages at a time like Google or LinkedIn)
the performance results are not promising.

It shows that the shard-searches themselves is not what is slowing you
down. Are the returned documents very large? Try setting fl=id,score and
see if it brings response times below 1 second.
I have around 50-60 fields per document in schema but not all of them get
populated for each document. The main field that I am searching on is
called content_text but that is usually small.
I have tried running the following query on Solr
http://[MySolrMachine]:8983/solr/sabr102/select?q=(content_text:Football)=1000=10=id,score

and it took around 13.567 seconds *(the same goes here after running the
query the second time, it took around 244 ms)*
The log shows that it did take Solr that long

INFO  - 2015-10-09 18:54:44.271; [c:sabr102 s:shard1 r:core_node2
x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
[sabr102_shard1_replica1] webapp=/solr path=/select
params={fl=id,score=1000=(content_text:Football)=10}
hits=24408 status=0 QTime=13567

INFO  - 2015-10-09 19:02:41.732; [c:sabr102 s:shard2 r:core_node1
x:sabr102_shard2_replica1] org.apache.solr.core.SolrCore;
[sabr102_shard2_replica1] webapp=/solr path=/select
params={distrib=false=javabin=2=1010=text=id=score=http://
[MySolrIP]:8983/solr/sabr102_shard2_replica1/|http://[MySolrIP]:7574/solr/sabr102_shard2_replica2/=117361716=0=4=(content_text:Football)=true=true=false}
hits=12198 status=0 QTime=9

*Why is it the case that for some reasons shard1 is taking way more longer
than shard2?*

 I do note that one of your queries has rows=1010, a typo?
No that was not a typo,

Try again with rows=0=1000 to see if it's something weird with getting
the stored data, but that's highly doubtful.
I have tried the query "Salman" with rows=0, start=1000 and it took around
13.819 seconds.

I think the only real way to get to the bottom of it will be to slap a
profiler on it and see where the time is being spent.
Can you direct me to a good profiler for Solr?

Regards,
Salman












On Fri, Oct 9, 2015 at 8:02 PM, Erick Erickson 
wrote:

> OK, this makes very little sense. The individual queries are taking < 100ms
> yet the total response is 29 seconds. I do note that one 

Re: how to deployed another web project into jetty server(solr inbuilt)

2015-10-09 Thread Mugeesh Husain
Thank you Upayavira  
Clearly understand, now they agree to install another server.

 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-deployed-another-web-project-into-jetty-server-solr-inbuilt-tp4233288p4233733.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OverseerCollectionMessageHandler logging

2015-10-09 Thread Shalin Shekhar Mangar
Yes, that should be INFO

On Fri, Oct 9, 2015 at 8:02 PM, Alan Woodward  wrote:
> Hi all,
>
> The OverseerCollectionMessageHandler logs all messages that it processes at 
> WARN level, which seems wrong?  Particularly as it handles OVERSEERSTATUS 
> messages, which means that monitoring systems can trigger warnings all over 
> the place.  Is there a specific reason for this, or should I change it to 
> INFO?
>
> Alan Woodward
> www.flax.co.uk
>
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Exclude documents having same data in two fields

2015-10-09 Thread Aman Tandon
No Susheel, As our index size is 62 GB so it seems hard to find those
records.

With Regards
Aman Tandon

On Fri, Oct 9, 2015 at 7:30 PM, Susheel Kumar  wrote:

> Hi Aman,  Did the problem resolved or still having some errors.
>
> Thnx
>
> On Fri, Oct 9, 2015 at 8:28 AM, Aman Tandon 
> wrote:
>
> > okay Thanks
> >
> > With Regards
> > Aman Tandon
> >
> > On Fri, Oct 9, 2015 at 4:25 PM, Upayavira  wrote:
> >
> > > Just beware of performance here. This is fine for smaller indexes, but
> > > for larger ones won't work so well. It will need to do this calculation
> > > for every document in your index, thereby undoing all benefits of
> having
> > > an inverted index.
> > >
> > > If your index (or resultset) is small enough, it can work, but might
> > > catch you out later.
> > >
> > > Upayavira
> > >
> > > On Fri, Oct 9, 2015, at 10:59 AM, Aman Tandon wrote:
> > > > Hi,
> > > >
> > > > I tried to use the same as mentioned in the url
> > > > <
> > >
> >
> http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal
> > > >
> > > > .
> > > >
> > > > And I used the description field to check because mapping field
> > > > is multivalued.
> > > >
> > > > So I add the fq={!frange%20l=0%20u=1}strdist(title,description,edit)
> in
> > > > my
> > > > url, but I am getting this error. As mentioned below. Please take a
> > look.
> > > >
> > > > *Solr Version 4.8.1*
> > > >
> > > > *Url is*
> > > >
> > >
> >
> http://localhost:8150/solr/core1/select?q.alt=*:*=big*,title,catid={!frange%20l=0%20u=1}strdist(title,description,edit)=edismax
> > > >
> > > > > 
> > > > > 
> > > > > 500
> > > > > 8
> > > > > 
> > > > > *:*
> > > > > edismax
> > > > > big*,title,catid
> > > > > {!frange l=0
> u=1}strdist(title,description,edit)
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > java.lang.RuntimeException at
> > > > >
> > >
> >
> org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.(ExtendedDismaxQParser.java:1455)
> > > > > at
> > > > >
> > >
> >
> org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239)
> > > > > at
> > > > >
> > >
> >
> org.apache.solr.search.ExtendedDismaxQParser.(ExtendedDismaxQParser.java:108)
> > > > > at
> > > > >
> > >
> >
> org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37)
> > > > > at org.apache.solr.search.QParser.getParser(QParser.java:315) at
> > > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144)
> > > > > at
> > > > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
> > > > > at
> > > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at
> > > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
> > > > > at
> > > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> > > > > at
> > > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> > > > > at
> > > > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> > > > > at
> > > > >
> > >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> > > > > at
> > > > >
> > >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> > > > > at
> > > > >
> > >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> > > > > at
> > > > >
> > >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> > > > > at
> > > > >
> > >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> > > > > at
> > > > >
> > >
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
> > > > > at
> > > > >
> > >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> > > > > at
> > > > >
> > >
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> > > > > at
> > > > >
> > >
> >
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
> > > > > at
> > > > >
> > >
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> > > > > at
> > > > >
> > >
> >
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
> > > > > at
> > > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > > > at
> > > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > 
> > > > > 500
> > > > > 

schema.xml field configuration

2015-10-09 Thread Vincenzo D'Amore
Hi,

I have this fieldType configuration:












Using Solr Field Analysis tool for the string "aaa", in the last step
at end I see this:

text | aaa |  | aaa | aaa
position | 1   | 1| 1   | 2
start| 0   | 0| 0   | 4
end  | 8   | 4| 7   | 7
type | word| word | word| word


Now I'm quite surprised to see there are two occurrences of "aaa".
Why? I suppose there should be something to do with the position, but I
don't understand what.
RemoveDuplicatesTokenFilterFactory should't remove all the duplicates?


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Exclude documents having same data in two fields

2015-10-09 Thread Walter Underwood
Please explain why you do not want to use an extra field. That is the only 
solution that will perform well on your large index.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 9, 2015, at 7:47 AM, Aman Tandon  wrote:
> 
> No Susheel, As our index size is 62 GB so it seems hard to find those
> records.
> 
> With Regards
> Aman Tandon
> 
> On Fri, Oct 9, 2015 at 7:30 PM, Susheel Kumar  wrote:
> 
>> Hi Aman,  Did the problem resolved or still having some errors.
>> 
>> Thnx
>> 
>> On Fri, Oct 9, 2015 at 8:28 AM, Aman Tandon 
>> wrote:
>> 
>>> okay Thanks
>>> 
>>> With Regards
>>> Aman Tandon
>>> 
>>> On Fri, Oct 9, 2015 at 4:25 PM, Upayavira  wrote:
>>> 
 Just beware of performance here. This is fine for smaller indexes, but
 for larger ones won't work so well. It will need to do this calculation
 for every document in your index, thereby undoing all benefits of
>> having
 an inverted index.
 
 If your index (or resultset) is small enough, it can work, but might
 catch you out later.
 
 Upayavira
 
 On Fri, Oct 9, 2015, at 10:59 AM, Aman Tandon wrote:
> Hi,
> 
> I tried to use the same as mentioned in the url
> <
 
>>> 
>> http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal
> 
> .
> 
> And I used the description field to check because mapping field
> is multivalued.
> 
> So I add the fq={!frange%20l=0%20u=1}strdist(title,description,edit)
>> in
> my
> url, but I am getting this error. As mentioned below. Please take a
>>> look.
> 
> *Solr Version 4.8.1*
> 
> *Url is*
> 
 
>>> 
>> http://localhost:8150/solr/core1/select?q.alt=*:*=big*,title,catid={!frange%20l=0%20u=1}strdist(title,description,edit)=edismax
> 
>> 
>> 
>> 500
>> 8
>> 
>> *:*
>> edismax
>> big*,title,catid
>> {!frange l=0
>> u=1}strdist(title,description,edit)
>> 
>> 
>> 
>> 
>> java.lang.RuntimeException at
>> 
 
>>> 
>> org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.(ExtendedDismaxQParser.java:1455)
>> at
>> 
 
>>> 
>> org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239)
>> at
>> 
 
>>> 
>> org.apache.solr.search.ExtendedDismaxQParser.(ExtendedDismaxQParser.java:108)
>> at
>> 
 
>>> 
>> org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37)
>> at org.apache.solr.search.QParser.getParser(QParser.java:315) at
>> 
 
>>> 
>> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144)
>> at
>> 
 
>>> 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
>> at
>> 
 
>>> 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at
>> 
 
>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
>> at
>> 
 
>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
>> at
>> 
 
>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>> at
>> 
 
>>> 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>> at
>> 
 
>>> 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>> at
>> 
 
>>> 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>> at
>> 
 
>>> 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>> at
>> 
 
>>> 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>> at
>> 
 
>>> 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>> at
>> 
 
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
>> at
>> 
 
>>> 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>> at
>> 
 
>>> 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>> at
>> 
 
>>> 
>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
>> at
>> 
 
>>> 
>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>> at
>> 
 
>>> 
>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
>> at
>> 
 
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at

SolrCloud NoAuth for /unrelatednode error

2015-10-09 Thread Jamie Johnson
I am getting an error that essentially says solr does not have auth for
/unrelatednode/... I would be ok with the error being displayed, but I
think this may be what is causing my solr instances to be shown as down.
Currently I'm issuing the following command

http://localhost:8983/solr/admin/collections?action=CREATE=collection=2=2=config=2

I see the collection and shards being created, but they appear as down in
the clusterstate.json.  The only exception I see when trying to show the
Cloud graph is shown below.  Could this be the cause for the shards showing
up as down?

WARN  ZookeeperInfoServlet - Keeper Exception
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
NoAuth for /unrelatednode/foo/bar
at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:308)
at
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:305)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:305)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:279)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:226)
at
org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:104)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1667)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:466)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
at java.lang.Thread.run(Thread.java:745)


Re: Solr Pagination

2015-10-09 Thread Salman Ansari
Is this a real problem or a worry? Do you have users that page really deep
and if so, have you considered other mechanisms for delivering what they
need?

The issue is that currently I have around 70M documents and some generic
queries are resulting in lots of pages. Now if I try deep navigation (to
page# 1000 for example), a lot of times the query takes so long that
Solr.NET throws operation time out exception. The first page is relatively
faster to load but it does take around few seconds as well. After reading
some documentation I realized that cursors could help and it does. I have
tried to following the test better performance:

1) Used cursors instead of start and row
2) Increased the RAM on my Solr machine to 14GB
3) Increase the JVM on that machine to 4GB
4) Increased the filterChache
5) Increased the docCache
6) Run Optimize on the Solr Admin

but still I get delays of around 16 seconds and sometimes even more.
What other mechanisms do you suggest I should use to handle this issue?

While pagination is faster than increasing the start parameter, the
difference is small as long as you stay below a start of 1000. 10K might
also work for you. Do your users page beyond that?
I can limit users not to go beyond 10K but still think at that level
cursors will be much faster than increasing the start variable as explained
here (https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
), have you tried both ways on your collection and it was giving you
similar results?

On Fri, Oct 9, 2015 at 5:20 PM, Toke Eskildsen 
wrote:

> Salman Ansari  wrote:
>
> [Pagination with cursors]
>
> > For example, what happens if the user navigates from page 1 to page 2,
> > does the front end  need to store the next cursor at each query?
>
> Yes.
>
> > What about going to a previous page, do we need to store all cursors
> > that have been navigated up to now at the client side?
>
> Yes, if you want to provide that functionality.
>
> Is this a real problem or a worry? Do you have users that page really deep
> and if so, have you considered other mechanisms for delivering what they
> need?
>
> While pagination is faster than increasing the start parameter, the
> difference is small as long as you stay below a start of 1000. 10K might
> also work for you. Do your users page beyond that?
>
> - Toke Eskildsen
>


Re: schema.xml field configuration

2015-10-09 Thread Erick Erickson
Seems odd to me as well. I suspect you can work around
this by either setting catenateall="0" or perserveOriginal="0"

Best,
Erick

On Fri, Oct 9, 2015 at 7:50 AM, Vincenzo D'Amore  wrote:
> Hi,
>
> I have this fieldType configuration:
>
>  positionIncrementGap="100">
> 
> 
>  replacement=" " />
>  generateNumberParts="1" catenateWords="1"
> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
> splitOnNumerics="1" preserveOriginal="1" />
> 
> 
> 
> 
> 
>
> Using Solr Field Analysis tool for the string "aaa", in the last step
> at end I see this:
>
> text | aaa |  | aaa | aaa
> position | 1   | 1| 1   | 2
> start| 0   | 0| 0   | 4
> end  | 8   | 4| 7   | 7
> type | word| word | word| word
>
>
> Now I'm quite surprised to see there are two occurrences of "aaa".
> Why? I suppose there should be something to do with the position, but I
> don't understand what.
> RemoveDuplicatesTokenFilterFactory should't remove all the duplicates?
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251


Re: OverseerCollectionMessageHandler logging

2015-10-09 Thread Alan Woodward
I'll raise a Jira, thanks Shalin.

Alan Woodward
www.flax.co.uk


On 9 Oct 2015, at 16:05, Shalin Shekhar Mangar wrote:

> Yes, that should be INFO
> 
> On Fri, Oct 9, 2015 at 8:02 PM, Alan Woodward  wrote:
>> Hi all,
>> 
>> The OverseerCollectionMessageHandler logs all messages that it processes at 
>> WARN level, which seems wrong?  Particularly as it handles OVERSEERSTATUS 
>> messages, which means that monitoring systems can trigger warnings all over 
>> the place.  Is there a specific reason for this, or should I change it to 
>> INFO?
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.



Re: Exclude documents having same data in two fields

2015-10-09 Thread Mikhail Khludnev
Aman,

You can invoke Terms Component for the filed M, let it returns terms:
{a,c,d,f}
then you invoke it for field T let it return {b,c,f,e},
then you intersect both lists (it's quite romantic if they are kept
ordered), you've got {c,f}
and then you applies filter:
fq=-((+M:c +T:c) (+M:f +T:f))
etc


On Thu, Oct 8, 2015 at 8:29 AM, Aman Tandon  wrote:

> Hi,
>
> Is there a way in solr to remove all those documents from the search
> results in which two of the fields, *mapping* and  *title* is the exactly
> same.
>
> With Regards
> Aman Tandon
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: SolrCloud NoAuth for /unrelatednode error

2015-10-09 Thread Jamie Johnson
Ah please ignore, it looks like this was totally unrelated and my issue was
configuration related

On Fri, Oct 9, 2015 at 11:18 AM, Jamie Johnson  wrote:

> I am getting an error that essentially says solr does not have auth for
> /unrelatednode/... I would be ok with the error being displayed, but I
> think this may be what is causing my solr instances to be shown as down.
> Currently I'm issuing the following command
>
>
> http://localhost:8983/solr/admin/collections?action=CREATE=collection=2=2=config=2
>
> I see the collection and shards being created, but they appear as down in
> the clusterstate.json.  The only exception I see when trying to show the
> Cloud graph is shown below.  Could this be the cause for the shards showing
> up as down?
>
> WARN  ZookeeperInfoServlet - Keeper Exception
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> NoAuth for /unrelatednode/foo/bar
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:308)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:305)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:305)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:279)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:322)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.print(ZookeeperInfoServlet.java:226)
> at
> org.apache.solr.servlet.ZookeeperInfoServlet.doGet(ZookeeperInfoServlet.java:104)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1667)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:466)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:497)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
> at java.lang.Thread.run(Thread.java:745)
>
>


Re: Solr Pagination

2015-10-09 Thread Erick Erickson
I think paging is something of a red herring. You say:

bq: but still I get delays of around 16 seconds and sometimes even more.

Even for a start of 1,000, this is ridiculously long for Solr. All
you're really saving
here is keeping a record of the id and score for a list 1,000 cells
long (or even
20,000 assuming 1,000 pages and 20 docs/page). that's somewhat wasteful,
but it's still hard to believe it's responsible for what you're seeing.

Having 4G of RAM for 70M docs is very little memory, assuming this is on
a single shard.

So my suspicion is that you have something fundamentally slow about
your system, the additional overhead shouldn't be as large as you're
reporting.

And I'll second Toke's comment. It's very rare that users see anything
_useful_ by navigating that deep. Make them hit next next next and they'll
tire out way before that.

Cursor mark's sweet spot is handling some kind of automated process that
goes through the whole result set. It'll work for what you're trying
to do though.

Best,
Erick

On Fri, Oct 9, 2015 at 8:27 AM, Salman Ansari  wrote:
> Is this a real problem or a worry? Do you have users that page really deep
> and if so, have you considered other mechanisms for delivering what they
> need?
>
> The issue is that currently I have around 70M documents and some generic
> queries are resulting in lots of pages. Now if I try deep navigation (to
> page# 1000 for example), a lot of times the query takes so long that
> Solr.NET throws operation time out exception. The first page is relatively
> faster to load but it does take around few seconds as well. After reading
> some documentation I realized that cursors could help and it does. I have
> tried to following the test better performance:
>
> 1) Used cursors instead of start and row
> 2) Increased the RAM on my Solr machine to 14GB
> 3) Increase the JVM on that machine to 4GB
> 4) Increased the filterChache
> 5) Increased the docCache
> 6) Run Optimize on the Solr Admin
>
> but still I get delays of around 16 seconds and sometimes even more.
> What other mechanisms do you suggest I should use to handle this issue?
>
> While pagination is faster than increasing the start parameter, the
> difference is small as long as you stay below a start of 1000. 10K might
> also work for you. Do your users page beyond that?
> I can limit users not to go beyond 10K but still think at that level
> cursors will be much faster than increasing the start variable as explained
> here (https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
> ), have you tried both ways on your collection and it was giving you
> similar results?
>
> On Fri, Oct 9, 2015 at 5:20 PM, Toke Eskildsen 
> wrote:
>
>> Salman Ansari  wrote:
>>
>> [Pagination with cursors]
>>
>> > For example, what happens if the user navigates from page 1 to page 2,
>> > does the front end  need to store the next cursor at each query?
>>
>> Yes.
>>
>> > What about going to a previous page, do we need to store all cursors
>> > that have been navigated up to now at the client side?
>>
>> Yes, if you want to provide that functionality.
>>
>> Is this a real problem or a worry? Do you have users that page really deep
>> and if so, have you considered other mechanisms for delivering what they
>> need?
>>
>> While pagination is faster than increasing the start parameter, the
>> difference is small as long as you stay below a start of 1000. 10K might
>> also work for you. Do your users page beyond that?
>>
>> - Toke Eskildsen
>>