Problems with pointing to custom core directories during startup in Solr 5.3.0

2015-09-08 Thread Zheng Lin Edwin Yeo
Hi,

I have a custom core directories in my Solr located at solrMain\node1\solr,
and I set it through the -s  parameter in the Solr startup script, and
it looks like this:
bin\solr.cmd start -cloud -p 8983 -s solrMain\node1\solr -m 12g -z
"localhost:2181,localhost:2182,localhost:2183"

This works fine in Solr 5.2.1.

However, when I tried to upgrade it to Solr 5.3.0 and start Solr using the
same startup script, it gives the following error:

HTTP ERROR 500

Problem accessing /solr/admin. Reason:

Server Error

Caused by:

java.lang.NullPointerException
at 
org.apache.solr.servlet.SolrDispatchFilter.authenticateRequest(SolrDispatchFilter.java:237)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:186)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)


If I start without the -s  command, it is able to start with the
default directory (server/solr), but without any core and index, as my
cores and indexes are not located there.

The Solr is able to start normally with the migrated cores and indexes only
if I replace the solr.cmd to the one I used in Solr 5.2.1. I couldn't use
the one provided in Solr 5.3.0.

Is there any changes that I need to do with my setting in order to use the
solr.cmd provided in Solr 5.3.0, and also to link it to my custom core
directories by using the -s  in my startup script? I check the Solr
Start Script Reference Page at
https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference,
but it didn't state any difference.


Regards,
Edwin


Re: SolrCloud Admin UI shows node is Down, but state.json says it's active/up

2015-09-08 Thread Arcadius Ahouansou
On Sep 8, 2015 6:25 AM, "Erick Erickson"  wrote:
>
> Perhaps the browser cache? What happens if you, say, use
> Zookeeper client tools to bring down the the cluster state in
> question? Or perhaps just refresh the admin UI when showing
> the cluster status
>

Hello Erick.

Thank you very much for answering.
I did use the ZooInspetor tool to check the state.json in all 5 zk nodes
and they are all out of date and identical to what I get through the tree
view in sole admin ui.

Looking at the source code cloud.js that correctly display nodes as "gone"
in the graph view, it calls the end point /zookeeper?wt=json and relies on
the live nodes to mark a node as down instead of status.json.

Thanks.

> Shot in the dark,
> Erick
>
> On Mon, Sep 7, 2015 at 6:09 PM, Arcadius Ahouansou 
wrote:
> > We are running the latest Solr 5.3.0
> >
> > Thanks.


Re: Maximum Number of entires in External Field?

2015-09-08 Thread Upayavira
If you have just 5-7 items, then an external file will work, as will the
join query. You'll need to handle the 'default' case with the join
query, that is, making sure you do  OR  so that
documents matching the join are boosted above those matching the main
query, rather than the join being a filter on the main query.

I can provide examples if needed.

Upayavira

On Mon, Sep 7, 2015, at 07:21 PM, Aman Tandon wrote:
> I am currently doing boosting for 5-7 things. will  it work great with
> this
> too?
> 
> With Regards
> Aman Tandon
> 
> On Mon, Sep 7, 2015 at 11:42 PM, Upayavira  wrote:
> 
> > External file field would work, but requires a full import of the
> > external file field every time you change a single entry, which is
> > pretty extreme.
> >
> > I've tested out "score joins" which seemed to perform very well and
> > achieved the same effect, but using another core, rather than an
> > external file.
> >
> > Thus:
> >
> > {!join score=max fromIndex=prices from=id to=id}{!boost b=price}*:*
> >
> > seemed to do the job of using the price as a boost. Of course you could
> > extend this like so:
> >
> > q={!join score=max fromIndex=prices from=id to=id}{!boost b=$b}*:*
> > b=sqrt(price)
> >
> > or such things to make the price a more reasonable value.
> >
> > Upayavira
> >
> > On Mon, Sep 7, 2015, at 06:21 PM, Aman Tandon wrote:
> > > Any suggestions?
> > >
> > > With Regards
> > > Aman Tandon
> > >
> > > On Mon, Sep 7, 2015 at 1:07 PM, Aman Tandon 
> > > wrote:
> > >
> > > > Hi Upayavira,
> > > >
> > > > Have you tried it?
> > > >
> > > >
> > > > No
> > > >
> > > > E.g. external file fields don't play nice with Solr Cloud
> > > >
> > > >
> > > > We are not using Solr Cloud.
> > > >
> > > >
> > > >> What are you using the external file for?
> > > >
> > > >
> > > > We are doing the boosting in the search result which are *having price
> > by
> > > > 1.2* &  *country is India by 1.1*. We are doing by using the boosting
> > > > parameter in conjucation with query & map function e.g.
> > *=map(query({!dismax
> > > > qf=hasPrice v='yes' pf=''},0),1,1,1,1)*
> > > >
> > > > This is being done with 5/6 parameters. And I am hoping it will
> > increase
> > > > query time. So I am planning to make the single score and populate it
> > in
> > > > external file field. And this might reduce some time.
> > > >
> > > > Just to mention we are doing incremental updates after every 10
> > minutes.
> > > >
> > > > With Regards
> > > > Aman Tandon
> > > >
> > > > On Mon, Sep 7, 2015 at 12:53 PM, Upayavira  wrote:
> > > >
> > > >> Have you tried it? I suspect your issue will be with the process of
> > > >> reloading the external file rather than consuming it once loaded.
> > > >>
> > > >> What are you using the external file for? There may be other ways
> > also.
> > > >> E.g. external file fields don't play nice with Solr Cloud.
> > > >>
> > > >> Upayavira
> > > >>
> > > >> On Mon, Sep 7, 2015, at 07:05 AM, Aman Tandon wrote:
> > > >> > Hi,
> > > >> >
> > > >> > How much ids information can I define in External File? Currently I
> > am
> > > >> > having the 100 Million records in my index.
> > > >> >
> > > >> > With Regards
> > > >> > Aman Tandon
> > > >>
> > > >
> > > >
> >


Re: Different boost values for multiple parsers in Solr 5.2.1

2015-09-08 Thread Upayavira
you can add bq= inside your {!synonym_edismax} section, if you wish and
it will apply to that query parser only.

Upayavira

On Mon, Sep 7, 2015, at 03:05 PM, dinesh naik wrote:
> Please find below the detail:
> 
>  My main query is like this:
> 
> q=(((_query_:"{!synonym_edismax qf='itemname OR itemnumber OR itemdesc'
> v='HTC' mm=100 synonyms=true synonyms.constructPhrases=true
> synonyms.ignoreQueryOperators=true}") OR (itemname:"HTC" OR
> itemnamecomp:HTC* OR itemnumber:"HTC" OR itemnumbercomp:HTC* OR
> itemdesc:"HTC"~500)) AND (warehouse:Ind02 OR warehouse:Ind03 OR
> warehouse:Ind04 ))
> 
>  Giving Boost of 1000 for warehouse Ind02
>  using below parameter:
> 
>  bq=warehouse:Ind02^1000
> 
> 
> Here i am expecting a boost of 1004 but , somehow 1000 is added extra may
> be because of my additional parser. How can i avoid this?
> 
> 
> Debug information for the boost :
> 
>  
> 2004.0 = sum of:
>   1004.0 = sum of:
> 1003.0 = sum of:
>   1001.0 = sum of:
> 1.0 = max of:
>   1.0 = weight(itemname:HTC in 235500) [CustomSimilarity], result
> of:
> 1.0 = fieldWeight in 235500, product of:
>   1.0 = tf(freq=1.0), with freq of:
> 1.0 = termFreq=1.0
>   1.0 = idf(docFreq=26, maxDocs=1738053)
>   1.0 = fieldNorm(doc=235500)
> 1000.0 = weight(warehouse:e02^1000.0 in 235500)
> [CustomSimilarity],
> result of:
>   1000.0 = score(doc=235500,freq=1.0), product of:
> 1000.0 = queryWeight, product of:
>   1000.0 = boost
>   1.0 = idf(docFreq=416190, maxDocs=1738053)
>   1.0 = queryNorm
> 1.0 = fieldWeight in 235500, product of:
>   1.0 = tf(freq=1.0), with freq of:
> 1.0 = termFreq=1.0
>   1.0 = idf(docFreq=416190, maxDocs=1738053)
>   1.0 = fieldNorm(doc=235500)
>   2.0 = sum of:
> 1.0 = weight(itemname:HTC in 235500) [CustomSimilarity], result
> of:
>   1.0 = fieldWeight in 235500, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 1.0 = idf(docFreq=26, maxDocs=1738053)
> 1.0 = fieldNorm(doc=235500)
> 1.0 = itemnamecomp:HTC*, product of:
>   1.0 = boost
>   1.0 = queryNorm
> 1.0 = sum of:
>   1.0 = weight(warehouse:e02 in 235500) [CustomSimilarity], result
>   of:
> 1.0 = fieldWeight in 235500, product of:
>   1.0 = tf(freq=1.0), with freq of:
> 1.0 = termFreq=1.0
>   1.0 = idf(docFreq=416190, maxDocs=1738053)
>   1.0 = fieldNorm(doc=235500)
>   1000.0 = weight(warehouse:e02^1000.0 in 235500) [CustomSimilarity],
> result of:
> 1000.0 = score(doc=235500,freq=1.0), product of:
>   1000.0 = queryWeight, product of:
> 1000.0 = boost
> 1.0 = idf(docFreq=416190, maxDocs=1738053)
> 1.0 = queryNorm
>   1.0 = fieldWeight in 235500, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 1.0 = idf(docFreq=416190, maxDocs=1738053)
> 1.0 = fieldNorm(doc=235500)
> 
> 
> On Mon, Sep 7, 2015 at 7:21 PM, dinesh naik 
> wrote:
> Hi all,
> 
> Is there a way to apply different boost , using bq parameter for
> different
> parser.
> 
> for example if i am using a synonym parser and edismax parser in a single
> query, my bq param value is getting applied for both the parser making
> the
> boost value double.
> 
> -- 
> Best Regards,
> Dinesh Naik
> 
> 
> 
> 
> 
> On Mon, Sep 7, 2015 at 7:21 PM, dinesh naik 
> wrote:
> 
> > Hi all,
> >
> > Is there a way to apply different boost , using bq parameter for different
> > parser.
> >
> > for example if i am using a synonym parser and edismax parser in a single
> > query, my bq param value is getting applied for both the parser making the
> > boost value double.
> >
> > --
> > Best Regards,
> > Dinesh Naik
> >
> 
> 
> 
> -- 
> Best Regards,
> Dinesh Naik


Re: SolrCloud Admin UI shows node is Down, but state.json says it's active/up

2015-09-08 Thread Arcadius Ahouansou
On Sep 8, 2015 6:25 AM, "Erick Erickson"  wrote:
>
> Perhaps the browser cache? What happens if you, say, use
> Zookeeper client tools to bring down the the cluster state in
> question? Or perhaps just refresh the admin UI when showing
> the cluster status
>

Hello Erick.

Thank you very much for answering.
I did use the ZooInspetor tool to check the state.json in all 5 zk nodes
and they are all out of date and identical to what I get through the tree
view in sole admin ui.

Looking at the source code cloud.js that correctly display nodes as "gone"
in the graph view, it calls the end point /zookeeper?wt=json and relies on
the live nodes to mark a node as down instead of status.json.

Thanks.

> Shot in the dark,
> Erick
>
> On Mon, Sep 7, 2015 at 6:09 PM, Arcadius Ahouansou 
wrote:
> > We are running the latest Solr 5.3.0
> >
> > Thanks.


RE: Trouble making tests with BaseDistributedSearchTestCase

2015-09-08 Thread Markus Jelsma
Thanks! I went on using AbstractFullDistribZkTestBase and for some tests i 
circumvent the control core. I do sometimes get a recovery time out when 
starting up the tests. I have set the time out to 30 seconds, just like many 
other tests that extends AbstractFullDistribZkTestBase.  Any thoughts on that?

Issue created: https://issues.apache.org/jira/browse/SOLR-8018
 
-Original message-
> From:Chris Hostetter 
> Sent: Saturday 5th September 2015 2:23
> To: solr-user@lucene.apache.org
> Subject: RE: Trouble making tests with BaseDistributedSearchTestCase
> 
> : Strange enough, the following code gives different errors:
> : 
> : assertQ(
> 
> I'm not sure what exactly assertQ will do in a distributed test like this 
> ... probably nothing good.   you'll almost certainly want to stick with 
> the distributed indexDoc() and query* methods and avoid assertU and 
> assertQ
> 
> 
> : [TEST-TestComponent.test-seed#[EA2ED1E118114486]] ERROR 
> org.apache.solr.SolrTestCaseJ4 - REQUEST FAILED: 
> xpath=//result/doc[1]/str[@name='id'][.='1']
> : xml response was: 
>   ...
> : 
> : 
> 
> ...i'm guessing that's because assertQ is (probably) querying the "local" 
> core from the TestHarness, not any of the distributed cores setup by 
> BaseDistributedSearchTestCase and your docs didn't get indexed there.
> 
> : And, when i forcfully add distrib=true, i get a NPE in SearchHandler!
> 
> which is probably becaues since you (manually) added the debug param, but 
> didn't add a list of shards to query, you triggered some slopy code in 
> SearchHandler that should be giving you a nice error about shards not 
> being specified.  (i bet you can manually reproduce this in a single-node 
> solr setup by adding distrib=true to any query thta doesn't have a 
> "shards" param, if so please file a bug that it should produce a sane 
> error message)
> 
> if you use something like BaseDistributedSearchTestCase.query on the other 
> hand, it takes care of adding hte correct distrib related request 
> params for the shards it creates under the covers.
> 
> (allthough at this point, in general, i would strongly suggest that you 
> instead consider using AbstractFullDistribZkTestBase instead of 
> BaseDistributedSearchTestCase -- assuming of course that your goal is good 
> tests of how some distributed queries behave in a modern solr cloud setup.  
> if your goal is to test solr under manual sharding/distributed queries, 
> BaseDistributedSearchTestCase still makes sense.)
> 
> 
> As to your first question (which applies to both old school and 
> cloud/zk related tests)...
> 
> : > Executing the above text either results in a: IOException occured when 
> talking to server at: https://127.0.0.1:44761//collection1
> 
> That might be due ot a timing issue of the servers not completley starting 
> up before you start sending requests to them? not really sure ... would 
> need to see the logs.
> 
> : > Or it fails with a curous error: .response.maxScore:1.0!=null
> : > 
> : > The score correctly changes according to whatever value i set for 
> parameter q.
> 
> that has to do with teh way the BaseDistributedSearchTestCase plumbing 
> tries to help ensure that a distribute query returns the same results as a 
> single shard query by "diffing" the responses (note: this is why 
> BaseDistributedSearchTestCase.indexDoc adds your doc to both a random 
> shard *and* to a "control collection").  But there are some legacy quirks 
> about how things like "maxScore" are handled: notably SOLR-6612 
> (historically, because of the possibility of filter optimizations, solr 
> only kept track of the scores if it needed to.  in single core, this was 
> if you asked for "fl=score,..." but in a distributed query it might also 
> compute scores (and maxScore) if you are sorting on scores (which is the 
> default)
> 
> they way to indicate that you don't want BaseDistributedSearchTestCase's 
> response diff checking to freak out over the max score is using the 
> (horribly undocumented) "handle" feature...
> 
> handle.put("maxScore", SKIPVAL);
> 
> ...that's not the default in all tests because it could hide errors in 
> situations where tests *are* expecting the maxScore to be the same.
> 
> 
> the same mechanism can be used to ignore things like the _version_ 
> field, or timestamp fields which are virtually garunteed not to be the 
> same between two differnet collections.  (see uses of the "handle" Map in 
> existing test cases for examples).
> 
> 
> 
> -Hoss
> http://www.lucidworks.com/
> 


Re: Search opening hours

2015-09-08 Thread O. Klein
BTW any idea how index speed is influenced?

I used worldbounds with -1 and 1 y-axes. But figured this could also be 0.

After changing to 0 indexing became a lot slower though (no exceptions in
log).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-opening-hours-tp4225250p4227531.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Different boost values for multiple parsers in Solr 5.2.1

2015-09-08 Thread dinesh naik
Thanks Alot Upayavira. It worked as expected.


On Tue, Sep 8, 2015 at 2:09 PM, Upayavira  wrote:

> you can add bq= inside your {!synonym_edismax} section, if you wish and
> it will apply to that query parser only.
>
> Upayavira
>
> On Mon, Sep 7, 2015, at 03:05 PM, dinesh naik wrote:
> > Please find below the detail:
> >
> >  My main query is like this:
> >
> > q=(((_query_:"{!synonym_edismax qf='itemname OR itemnumber OR itemdesc'
> > v='HTC' mm=100 synonyms=true synonyms.constructPhrases=true
> > synonyms.ignoreQueryOperators=true}") OR (itemname:"HTC" OR
> > itemnamecomp:HTC* OR itemnumber:"HTC" OR itemnumbercomp:HTC* OR
> > itemdesc:"HTC"~500)) AND (warehouse:Ind02 OR warehouse:Ind03 OR
> > warehouse:Ind04 ))
> >
> >  Giving Boost of 1000 for warehouse Ind02
> >  using below parameter:
> >
> >  bq=warehouse:Ind02^1000
> >
> >
> > Here i am expecting a boost of 1004 but , somehow 1000 is added extra may
> > be because of my additional parser. How can i avoid this?
> >
> >
> > Debug information for the boost :
> >
> >  
> > 2004.0 = sum of:
> >   1004.0 = sum of:
> > 1003.0 = sum of:
> >   1001.0 = sum of:
> > 1.0 = max of:
> >   1.0 = weight(itemname:HTC in 235500) [CustomSimilarity], result
> > of:
> > 1.0 = fieldWeight in 235500, product of:
> >   1.0 = tf(freq=1.0), with freq of:
> > 1.0 = termFreq=1.0
> >   1.0 = idf(docFreq=26, maxDocs=1738053)
> >   1.0 = fieldNorm(doc=235500)
> > 1000.0 = weight(warehouse:e02^1000.0 in 235500)
> > [CustomSimilarity],
> > result of:
> >   1000.0 = score(doc=235500,freq=1.0), product of:
> > 1000.0 = queryWeight, product of:
> >   1000.0 = boost
> >   1.0 = idf(docFreq=416190, maxDocs=1738053)
> >   1.0 = queryNorm
> > 1.0 = fieldWeight in 235500, product of:
> >   1.0 = tf(freq=1.0), with freq of:
> > 1.0 = termFreq=1.0
> >   1.0 = idf(docFreq=416190, maxDocs=1738053)
> >   1.0 = fieldNorm(doc=235500)
> >   2.0 = sum of:
> > 1.0 = weight(itemname:HTC in 235500) [CustomSimilarity], result
> > of:
> >   1.0 = fieldWeight in 235500, product of:
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = termFreq=1.0
> > 1.0 = idf(docFreq=26, maxDocs=1738053)
> > 1.0 = fieldNorm(doc=235500)
> > 1.0 = itemnamecomp:HTC*, product of:
> >   1.0 = boost
> >   1.0 = queryNorm
> > 1.0 = sum of:
> >   1.0 = weight(warehouse:e02 in 235500) [CustomSimilarity], result
> >   of:
> > 1.0 = fieldWeight in 235500, product of:
> >   1.0 = tf(freq=1.0), with freq of:
> > 1.0 = termFreq=1.0
> >   1.0 = idf(docFreq=416190, maxDocs=1738053)
> >   1.0 = fieldNorm(doc=235500)
> >   1000.0 = weight(warehouse:e02^1000.0 in 235500) [CustomSimilarity],
> > result of:
> > 1000.0 = score(doc=235500,freq=1.0), product of:
> >   1000.0 = queryWeight, product of:
> > 1000.0 = boost
> > 1.0 = idf(docFreq=416190, maxDocs=1738053)
> > 1.0 = queryNorm
> >   1.0 = fieldWeight in 235500, product of:
> > 1.0 = tf(freq=1.0), with freq of:
> >   1.0 = termFreq=1.0
> > 1.0 = idf(docFreq=416190, maxDocs=1738053)
> > 1.0 = fieldNorm(doc=235500)
> > 
> >
> > On Mon, Sep 7, 2015 at 7:21 PM, dinesh naik 
> > wrote:
> > Hi all,
> >
> > Is there a way to apply different boost , using bq parameter for
> > different
> > parser.
> >
> > for example if i am using a synonym parser and edismax parser in a single
> > query, my bq param value is getting applied for both the parser making
> > the
> > boost value double.
> >
> > --
> > Best Regards,
> > Dinesh Naik
> >
> >
> >
> >
> >
> > On Mon, Sep 7, 2015 at 7:21 PM, dinesh naik 
> > wrote:
> >
> > > Hi all,
> > >
> > > Is there a way to apply different boost , using bq parameter for
> different
> > > parser.
> > >
> > > for example if i am using a synonym parser and edismax parser in a
> single
> > > query, my bq param value is getting applied for both the parser making
> the
> > > boost value double.
> > >
> > > --
> > > Best Regards,
> > > Dinesh Naik
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Dinesh Naik
>



-- 
Best Regards,
Dinesh Naik


tmp directory over load

2015-09-08 Thread LeZotte, Tom
HI

Solr/Tika uses the /tmp directory to process documents. At times the directory 
hits 100%. This causes alarms from Nagios for us. Is there a way in Solr/Tika 
to limit the amount of space used in /tmp? Value could be 80% or 570MB.

thanks

Tom LeZotte
Health I.T. - Senior Product Developer
(p) 615-875-8830








Re: Solr facets implementation question

2015-09-08 Thread Toke Eskildsen
adfel70  wrote:
> I am trying to understand why faceting on a field with lots of unique values
> has a great impact on query performance.

Faceting in Solr is performed in different ways. String faceting different from 
Numerics faceting, DocValued fields different from non-DocValued, fc different 
from enum. Let's look at String faceting with facet.method=fc and DocValues.

Strings (aka Terms) are represented in the faceting code with an ordinal, which 
is really just a number. The first term has number 0, the next number 1 and so 
forth. When doing a faceting call with the above premise, what happens is

1) An counter of int[unique_values] is allocated.
This is fairly fast, but still with a noticeable impact when the number of 
unique value creeps into the millions. On our machine it takes several hundred 
milliseconds for 100M values. Also relevant is the overall strain it puts on 
the garbage collector.

2) For each hit in the result set, the corresponding ordinals are resolved and 
counter[ordinal]++ is triggered.
This scales with the result set. Small sets are very fast, quite independent of 
the size of the counter-structure. Large result sets are (naturally) equally 
slow.

3) The counter-structure is iterated and top-X are determined.
This scales with the size of the counter-structure, (nearly) independent of the 
result set size.

4) The Terms for the top-X ordinals are resolved from the index.
This scales with X.


Some of these parts has some non-intuitive penalties: Even very tiny result 
sets has aa constant overhead from allocation and iteration. Asking for top-1M 
hits means that the underlying priority queue will probably no longer fit in 
the CPU cache and will slow things down. Resolving Terms from ordinals relies 
of fast IO and a large number of unique Terms might mean that the disk cache is 
not large enough.


Blatant plug: I have spend a fair amount of time trying to make some of this 
faster http://tokee.github.io/lucene-solr/

- Toke Eskildsen


Source address of zookeeper connection

2015-09-08 Thread Jens Brandt
Hi,

We have multihomed hosts running solr 5.2.1 as well es external zookeeper 
instances. In solr.in.sh, the value of SOLR_HOST is set correctly to the 
hostname with the correct IP address that must be used. However, in the 
zookeeper logs I find another IP address used as source address for the 
connection from the solr instances to the zookeeper nodes. Can anybody tell me 
how I can define the network source address used by solr for the connection to 
the zookeeper nodes?

Regards,
  Jens


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Exception using Json Facet API with Multivalue Int field +docValues=true

2015-09-08 Thread Alexandre Rafalovitch
A sanity check question. Was this test done with a completely new
index after you enabled docvalues? Not just "delete all" but actually
deleted index directory and rebuilt from scratch? If it still happens
after such a thorough cleanup, it might be a bug.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 8 September 2015 at 17:22, Curtis Fehr  wrote:
> Hello!
>
> I'm attempting to facet a multivalue int field that has docvalues enabled.  
> Using the new json facet api, running 5.3.0,  I get the exception here: 
> http://pastebin.com/xNaqGJRf
>
> Here's the relevant config:
>  docValues="true" />
>  docValues="true" />
>  multiValued="true" docValues="true" />
> 
>
> Here's my facet:
> json.facet={"pv-44":{"type":"query","q":"ReportingDate:[2015-04-22T0:00:0:000Z
>  TO 
> 2015-05-19T23:59:59:999Z]","facet":{"Streams":{"type":"terms","limit":-1,"field":"Streams","prefix":"","facet":{"SurveyID":{"type":"terms","limit":-1,"field":"SurveyID","prefix":"","facet":{"n":"sum(Score_TopBoxActual)","d":"sum(Score_TopLowBoxPossible)"}}}
>
> It was suggested to me that the new facet api may be going down the wrong 
> code path to where it assumes docvalues is not enabled on the field in 
> question and tries to uninvert it.  Is this a bug or am I missing something?
>
> Thanks


RE: Exception using Json Facet API with Multivalue Int field +docValues=true

2015-09-08 Thread Curtis Fehr
It's a very large index, will take a couple of days to reload it from scratch.  
I'll post back once I have tried this with either success or failure.

Thanks,
Curt


A sanity check question. Was this test done with a completely new index after 
you enabled docvalues? Not just "delete all" but actually deleted index 
directory and rebuilt from scratch? If it still happens after such a thorough 
cleanup, it might be a bug.


Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 8 September 2015 at 17:22, Curtis Fehr  wrote:
> Hello!
>
> I'm attempting to facet a multivalue int field that has docvalues 
> enabled.  Using the new json facet api, running 5.3.0,  I get the 
> exception here: http://pastebin.com/xNaqGJRf
>
> Here's the relevant config:
>  docValues="true" />  indexed="true" stored="true" docValues="true" />  type="tint" indexed="true" stored="true" multiValued="true" 
> docValues="true" />  indexed="true" stored="true" />
>
> Here's my facet:
> json.facet={"pv-44":{"type":"query","q":"ReportingDate:[2015-04-22T0:0
> 0:0:000Z TO 
> 2015-05-19T23:59:59:999Z]","facet":{"Streams":{"type":"terms","limit":
> -1,"field":"Streams","prefix":"","facet":{"SurveyID":{"type":"terms","
> limit":-1,"field":"SurveyID","prefix":"","facet":{"n":"sum(Score_TopBo
> xActual)","d":"sum(Score_TopLowBoxPossible)"}}}
>
> It was suggested to me that the new facet api may be going down the wrong 
> code path to where it assumes docvalues is not enabled on the field in 
> question and tries to uninvert it.  Is this a bug or am I missing something?
>
> Thanks


Re: Exception using Json Facet API with Multivalue Int field +docValues=true

2015-09-08 Thread Alexandre Rafalovitch
Could you make a small index from scratch using a subset of data and
see if the problem happens anyway? If yes, you have a test case. If
no, you may need to do a full rebuild to be fully assured.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 8 September 2015 at 17:36, Curtis Fehr  wrote:
> It's a very large index, will take a couple of days to reload it from 
> scratch.  I'll post back once I have tried this with either success or 
> failure.
>
> Thanks,
> Curt
> 
>
> A sanity check question. Was this test done with a completely new index after 
> you enabled docvalues? Not just "delete all" but actually deleted index 
> directory and rebuilt from scratch? If it still happens after such a thorough 
> cleanup, it might be a bug.
>
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 8 September 2015 at 17:22, Curtis Fehr  wrote:
>> Hello!
>>
>> I'm attempting to facet a multivalue int field that has docvalues
>> enabled.  Using the new json facet api, running 5.3.0,  I get the
>> exception here: http://pastebin.com/xNaqGJRf
>>
>> Here's the relevant config:
>> > docValues="true" /> > indexed="true" stored="true" docValues="true" /> > type="tint" indexed="true" stored="true" multiValued="true"
>> docValues="true" /> > indexed="true" stored="true" />
>>
>> Here's my facet:
>> json.facet={"pv-44":{"type":"query","q":"ReportingDate:[2015-04-22T0:0
>> 0:0:000Z TO
>> 2015-05-19T23:59:59:999Z]","facet":{"Streams":{"type":"terms","limit":
>> -1,"field":"Streams","prefix":"","facet":{"SurveyID":{"type":"terms","
>> limit":-1,"field":"SurveyID","prefix":"","facet":{"n":"sum(Score_TopBo
>> xActual)","d":"sum(Score_TopLowBoxPossible)"}}}
>>
>> It was suggested to me that the new facet api may be going down the wrong 
>> code path to where it assumes docvalues is not enabled on the field in 
>> question and tries to uninvert it.  Is this a bug or am I missing something?
>>
>> Thanks


Re: Solr Join between two indexes taking too long.

2015-09-08 Thread Mikhail Khludnev
Hello Russ,

It's an interesting case! Can you get a brief context?
- is it possible to keep both type of data at the same core? Why not?
- can you manually shard both indices by those longValues?
- It seems like you query a plenty of data, don't you have another
query/filter to intersect that join result with?

Such a long time for "universe of 5 docs" seems really strange. Can you
open the index with Solr 5.3 and run the same query with number of result
in universe:universeValue, but adding local param {!join ... score=none}?
that triggers alternative algorithm.

Also, profiler snapshots always help, you know. I've given a brief intro in
join algorithms, and problems in Solr at recent Berlin Buzzwords, feel free
to have a look if you are interested.

On Tue, Sep 8, 2015 at 3:09 PM, Russell Taylor <
russell.tay...@interactivedata.com> wrote:

> Hi,
>  I hope somebody can help.
>
> We have two indexes, one which holds the descriptive data and the other
> one which holds lists of docs which are
> of a certain type (called universes in our world). They need to be joined
> together to show a list of data from indexA
> where a filtered indexB (by universe:value) has matching longs (The join
> field).
>
> At the moment the query is taking 55 seconds we need to get it under a
> second, any help most appreciated.
>
> INDEXES:
>
> Index a (primary index)
> 31 million docs with a converted alphanumeric to a long value with a
> possible 10 million unique values.
>
> Index B (the joined index)
> 250 million documents with a converted alphanumeric to a long value with a
> possible 10 million unique values.
> IndexB is filtered by universe which could be between 1 and 500,000 docs.
>
> QUERY:
>
> http://127.0.0.1:8080/solr/indexA/select?q={!join+from=longValue+to=longValue+fromIndex=IndexB}universe
> :<
> http://127.0.0.1:8080/solr/indexA/select?q=%7b!join+from=longValue+to=longValue+fromIndex=IndexB%7duniverse
> :>universeValue
>
> Qtime is 55 seconds for either a universe of 5 docs or 500,000 docs.
>
>
>
> Thanks
>
>
> Russ.
>
>
> ***
> This message (including any files transmitted with it) may contain
> confidential and/or proprietary information, is the property of Interactive
> Data Corporation and/or its subsidiaries, and is directed only to the
> addressee(s). If you are not the designated recipient or have reason to
> believe you received this message in error, please delete this message from
> your system and notify the sender immediately. An unintended recipient's
> disclosure, copying, distribution, or use of this message or any
> attachments is prohibited and may be unlawful.
> ***
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





RE: Exception using Json Facet API with Multivalue Int field +docValues=true

2015-09-08 Thread Curtis Fehr
The issue persists even with a new core with a couple of documents.  One thing 
I did notice though, is that string multi-value fields do not have this 
problem.  I can probably use that as a workaround for now, but this seems like 
a bug.

Thanks,
Curt


Could you make a small index from scratch using a subset of data and see if the 
problem happens anyway? If yes, you have a test case. If no, you may need to do 
a full rebuild to be fully assured.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 8 September 2015 at 17:36, Curtis Fehr  wrote:
> It's a very large index, will take a couple of days to reload it from 
> scratch.  I'll post back once I have tried this with either success or 
> failure.
>
> Thanks,
> Curt
> 
>
> A sanity check question. Was this test done with a completely new index after 
> you enabled docvalues? Not just "delete all" but actually deleted index 
> directory and rebuilt from scratch? If it still happens after such a thorough 
> cleanup, it might be a bug.
>
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 8 September 2015 at 17:22, Curtis Fehr  wrote:
>> Hello!
>>
>> I'm attempting to facet a multivalue int field that has docvalues 
>> enabled.  Using the new json facet api, running 5.3.0,  I get the 
>> exception here: http://pastebin.com/xNaqGJRf
>>
>> Here's the relevant config:
>> > docValues="true" /> > indexed="true" stored="true" docValues="true" /> > type="tint" indexed="true" stored="true" multiValued="true"
>> docValues="true" /> > indexed="true" stored="true" />
>>
>> Here's my facet:
>> json.facet={"pv-44":{"type":"query","q":"ReportingDate:[2015-04-22T0:
>> 0
>> 0:0:000Z TO
>> 2015-05-19T23:59:59:999Z]","facet":{"Streams":{"type":"terms","limit":
>> -1,"field":"Streams","prefix":"","facet":{"SurveyID":{"type":"terms","
>> limit":-1,"field":"SurveyID","prefix":"","facet":{"n":"sum(Score_TopB
>> o xActual)","d":"sum(Score_TopLowBoxPossible)"}}}
>>
>> It was suggested to me that the new facet api may be going down the wrong 
>> code path to where it assumes docvalues is not enabled on the field in 
>> question and tries to uninvert it.  Is this a bug or am I missing something?
>>
>> Thanks


Exception using Json Facet API with Multivalue Int field +docValues=true

2015-09-08 Thread Curtis Fehr
Hello!

I'm attempting to facet a multivalue int field that has docvalues enabled.  
Using the new json facet api, running 5.3.0,  I get the exception here: 
http://pastebin.com/xNaqGJRf

Here's the relevant config:





Here's my facet:
json.facet={"pv-44":{"type":"query","q":"ReportingDate:[2015-04-22T0:00:0:000Z 
TO 
2015-05-19T23:59:59:999Z]","facet":{"Streams":{"type":"terms","limit":-1,"field":"Streams","prefix":"","facet":{"SurveyID":{"type":"terms","limit":-1,"field":"SurveyID","prefix":"","facet":{"n":"sum(Score_TopBoxActual)","d":"sum(Score_TopLowBoxPossible)"}}}

It was suggested to me that the new facet api may be going down the wrong code 
path to where it assumes docvalues is not enabled on the field in question and 
tries to uninvert it.  Is this a bug or am I missing something?

Thanks


Help on Out of memory when using Cursor with sort on Unique Key

2015-09-08 Thread Naresh Yadav
Cluster details :

Solr Version  : solr-4.10.4
No of nodes : 2 each 16 GB RAM
Node of shards : 2
Replication : 1
Each node memory parameter : -Xms2g, -Xmx4g

Collection details :

No of docs in my collection : 12.31 million
Indexed field per document : 2
Unique key field : tids
Stored filed per document : varies 30- 40
Total index size node1+node2 = 13gb+13gb=26gb

Query throwing Heap Space : /select?q=*:*=tids+desc=100=tids

Query working* : */select?q=*:*=100=tids

I am using sort on unique key field tids for Cursor based pagination of 100
size.

Already tried :

I also tried tweaking Xmx but problem not solved..
I also tried q with criteria of indexed filed with only 4200 hits that also
not working
when sort parameter included.

Please help me here as i am clueless why OOM error in getting 100 documents.

Thanks
Naresh


Re: Search opening hours

2015-09-08 Thread Darren Spehr
Sounds odd that the indexing times would change. Hopefully something else
was going on - I've not experienced this.

On Tue, Sep 8, 2015 at 4:31 AM, O. Klein  wrote:

> BTW any idea how index speed is influenced?
>
> I used worldbounds with -1 and 1 y-axes. But figured this could also be 0.
>
> After changing to 0 indexing became a lot slower though (no exceptions in
> log).
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Search-opening-hours-tp4225250p4227531.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Darren


Solr Join between two indexes taking too long.

2015-09-08 Thread Russell Taylor
Hi,
 I hope somebody can help.

We have two indexes, one which holds the descriptive data and the other one 
which holds lists of docs which are
of a certain type (called universes in our world). They need to be joined 
together to show a list of data from indexA
where a filtered indexB (by universe:value) has matching longs (The join field).

At the moment the query is taking 55 seconds we need to get it under a second, 
any help most appreciated.

INDEXES:

Index a (primary index)
31 million docs with a converted alphanumeric to a long value with a possible 
10 million unique values.

Index B (the joined index)
250 million documents with a converted alphanumeric to a long value with a 
possible 10 million unique values.
IndexB is filtered by universe which could be between 1 and 500,000 docs.

QUERY:
http://127.0.0.1:8080/solr/indexA/select?q={!join+from=longValue+to=longValue+fromIndex=IndexB}universe:universeValue

Qtime is 55 seconds for either a universe of 5 docs or 500,000 docs.



Thanks


Russ.


***
This message (including any files transmitted with it) may contain confidential 
and/or proprietary information, is the property of Interactive Data Corporation 
and/or its subsidiaries, and is directed only to the addressee(s). If you are 
not the designated recipient or have reason to believe you received this 
message in error, please delete this message from your system and notify the 
sender immediately. An unintended recipient's disclosure, copying, 
distribution, or use of this message or any attachments is prohibited and may 
be unlawful. 
***


Re: Help on Out of memory when using Cursor with sort on Unique Key

2015-09-08 Thread Raja Pothuganti
Hi Naresh

1) For 'sort by' fields, have you considered using DocValue=true for in
schema definition.
If you  are changing schema definition, you would need redo full reindex
after backing up & deleting current index from dataDir.
Also note that, adding docValue=true would increase size of index.

2)>Each node memory parameter : -Xms2g, -Xmx4g
What is the basis choosing above memory sizes? Have you observed through
jconsole or visual vm?

Raja
On 9/8/15, 8:57 AM, "Naresh Yadav"  wrote:

>Cluster details :
>
>Solr Version  : solr-4.10.4
>No of nodes : 2 each 16 GB RAM
>Node of shards : 2
>Replication : 1
>Each node memory parameter : -Xms2g, -Xmx4g
>
>Collection details :
>
>No of docs in my collection : 12.31 million
>Indexed field per document : 2
>Unique key field : tids
>Stored filed per document : varies 30- 40
>Total index size node1+node2 = 13gb+13gb=26gb
>
>Query throwing Heap Space : /select?q=*:*=tids+desc=100=tids
>
>Query working* : */select?q=*:*=100=tids
>
>I am using sort on unique key field tids for Cursor based pagination of
>100
>size.
>
>Already tried :
>
>I also tried tweaking Xmx but problem not solved..
>I also tried q with criteria of indexed filed with only 4200 hits that
>also
>not working
>when sort parameter included.
>
>Please help me here as i am clueless why OOM error in getting 100
>documents.
>
>Thanks
>Naresh



Re: Exception using Json Facet API with Multivalue Int field +docValues=true

2015-09-08 Thread Mikhail Khludnev
Right. It seems like a functional gap - numeric DVs out of scope for a
while.
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/facet/FacetField.java#L147

On Tue, Sep 8, 2015 at 11:57 PM, Curtis Fehr  wrote:

> The issue persists even with a new core with a couple of documents.  One
> thing I did notice though, is that string multi-value fields do not have
> this problem.  I can probably use that as a workaround for now, but this
> seems like a bug.
>
> Thanks,
> Curt
> 
>
> Could you make a small index from scratch using a subset of data and see
> if the problem happens anyway? If yes, you have a test case. If no, you may
> need to do a full rebuild to be fully assured.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 8 September 2015 at 17:36, Curtis Fehr  wrote:
> > It's a very large index, will take a couple of days to reload it from
> scratch.  I'll post back once I have tried this with either success or
> failure.
> >
> > Thanks,
> > Curt
> > 
> >
> > A sanity check question. Was this test done with a completely new index
> after you enabled docvalues? Not just "delete all" but actually deleted
> index directory and rebuilt from scratch? If it still happens after such a
> thorough cleanup, it might be a bug.
> >
> >
> > Regards,
> >Alex.
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 8 September 2015 at 17:22, Curtis Fehr  wrote:
> >> Hello!
> >>
> >> I'm attempting to facet a multivalue int field that has docvalues
> >> enabled.  Using the new json facet api, running 5.3.0,  I get the
> >> exception here: http://pastebin.com/xNaqGJRf
> >>
> >> Here's the relevant config:
> >>  >> docValues="true" />  >> indexed="true" stored="true" docValues="true" />  >> type="tint" indexed="true" stored="true" multiValued="true"
> >> docValues="true" />  >> indexed="true" stored="true" />
> >>
> >> Here's my facet:
> >> json.facet={"pv-44":{"type":"query","q":"ReportingDate:[2015-04-22T0:
> >> 0
> >> 0:0:000Z TO
> >> 2015-05-19T23:59:59:999Z]","facet":{"Streams":{"type":"terms","limit":
> >> -1,"field":"Streams","prefix":"","facet":{"SurveyID":{"type":"terms","
> >> limit":-1,"field":"SurveyID","prefix":"","facet":{"n":"sum(Score_TopB
> >> o xActual)","d":"sum(Score_TopLowBoxPossible)"}}}
> >>
> >> It was suggested to me that the new facet api may be going down the
> wrong code path to where it assumes docvalues is not enabled on the field
> in question and tries to uninvert it.  Is this a bug or am I missing
> something?
> >>
> >> Thanks
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Source address of zookeeper connection

2015-09-08 Thread Shawn Heisey
On 9/8/2015 3:09 PM, Jens Brandt wrote:
> We have multihomed hosts running solr 5.2.1 as well es external zookeeper 
> instances. In solr.in.sh, the value of SOLR_HOST is set correctly to the 
> hostname with the correct IP address that must be used. However, in the 
> zookeeper logs I find another IP address used as source address for the 
> connection from the solr instances to the zookeeper nodes. Can anybody tell 
> me how I can define the network source address used by solr for the 
> connection to the zookeeper nodes?

This is very likely left up to the operating system.  As far as I know,
there is nothing in Solr that sets the source interface for any
communication.

Most operating systems will source the packet with the address of
whichever interface is used to route to the destination, which is most
commonly the default route.

I had hoped to find that there was a system property honored by the
Zookeeper client that specifies the source address, but I was not able
to find anything.  I have asked the zookeeper mailing list whether there
is any way to accomplish this.

The SOLR_HOST variable just sets the value that SolrCloud will use to
register itself in the clusterstate.

Thanks,
Shawn



Re: Search opening hours

2015-09-08 Thread O. Klein
Doesn't sound odd to me. I just expected index time to be faster with smaller 
"world"

I used minutes as scale first, but that slows it even down a lot more. So
changed to 15 minute interval to keep it reasonable.

Maybe there is a setting that can speed this up. Like the precisionStep in a
Triefield?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-opening-hours-tp4225250p4227606.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: conf Folder is not getting created while creating a collection on solr cloud

2015-09-08 Thread Zheng Lin Edwin Yeo
There is no conf file located at
solr-5.3.0/server/solr/test_collection_shard1_replica1/.
Inside that folder should only contain data folder and core.properties file.
The conf folder is only in the solr-5.3.0/server/solr/data_
driven_schema_configs.

Why do you need the conf file in
solr-5.3.0/server/solr/test_collection_shard1_replica1/?
This is where Solr stores the indexes (in data\index) and normally I don't
touch anything in this folder.

Regards,
Edwin


On 8 September 2015 at 22:42, Ritesh Sinha  wrote:

> I am trying to create a collection on Solr cloud.
>
> I have created a 3 node zookeeper cluster on the same machine.
>
> using this command to start solr on three ports :
> bin/solr start  cloud -z localhost:2181,localhost:2182,localhost:2183 -p
> 8983
> bin/solr start  cloud -z localhost:2181,localhost:2182,localhost:2183 -p
> 8984
> bin/solr start  cloud -z localhost:2181,localhost:2182,localhost:2183 -p
> 8985
>
> running this to upload the configuration set to zookeeper before creating
> the collection
>
> /var/data/solr/solr-5.3.0/server/scripts/cloud-scripts/zkcli.sh -zkhost
> localhost:2181,localhost:2182,localhost:2183 -cmd upconfig -confdir
>
> /var/data/solr/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf
> -confname test_conf
>
> command for creating collection
> curl '
>
> http://localhost:8983/solr/admin/collections?action=CREATE=tenlaz=1=3=test_collection=test_conf
> '
>
> But when I check
> solr-5.3.0/server/solr/test_collection_shard1_replica1/
> There is no conf file.
>
> I know i can explicitly copy it.
>
> but is there any command which can automatically create the conf directory.
>
> I know i am missing something.Any help is appreciated.
>
> Reagrds
>


Re: SOLR DataImportHandler - Problem with XPathEntityProcessor

2015-09-08 Thread Alexandre Rafalovitch
Both version seem to be painful in that they will retrieve the URL content
multiple times. The first version is definitely wrong. The second version
is probably wrong because both inner and outer entities are having the same
name. I would try giving different name to the inner entity and seeing if
the issue resolves itself.

But, realistically, I would probably pre-process that document with XLST
instead to flatten the structure. Solr apparently (I did not test) supports
that both in DIH and in the update handler:
https://wiki.apache.org/solr/XsltUpdateRequestHandler . You could XSLT your
schema directly into Solr XML Update document and not even need DIH.

Regards,
   Alex.



Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 8 September 2015 at 09:04, Umang Agrawal  wrote:

> Hi All
>
> I am facing a problem with XPathEntityProcessor .
>
> Objective:
> When I index Resource XML file using DIH XPathEntityProcessor then there
> should be 2 solr documents
> 01) Link where id is 1000 with 2 tags ABC and DEF
> 02) Link where id is 2000 with 3 tags GHI, JKL and MNO
>
> Solr Version: 4.10.2
>
> Problem:
> I am not able to index  data properly.
>
> Expected Output:
> {
> "id": "1000",
> "field_name": "val1",
> "ABC": "ABC_VALUE",
> "DEF": "DEF_VALUE"
> },
> {
> "id": "2000",
> "field_name": "val2",
> "GHI": "GHI_VALUE",
> "JKL": "JKL_VALUE",
> "MNO": "MNO_VALUE"
> }
>
> 
>
> Resource XML:
>
> 
> 
> val1
> 
> ABC
> ABC_VALUE
> 
> 
> DEF
> DEF_VALUE
> 
> 
> 
> val2
> 
> GHI
> GHI_VALUE
> 
> 
> JKL
> JKL_VALUE
> 
> 
> MNO
> MNO_VALUE
> 
> 
> 
>
>
> 
>
> DataConfig XML (TRY 1):
> 
>  function f1(row) {
> var code = row.get("TAG_CODE");
> var val = row.get("TAG_VALUE");
> row.put(code, val);
> row.remove("TAG_CODE");
> row.remove("TAG_VALUE");
> return row;
> }
> ]]>
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor"
> forEach="/RESOURCE/LINK/TAG" transformer="script:f1">
> 
> 
> 
> 
> 
> 
>
> Output:
> {
> "id": "1000",
> "field_name": "val1",
> "ABC": "ABC_VALUE",
> "DEF": "DEF_VALUE",
> "GHI": "GHI_VALUE",
> "JKL": "JKL_VALUE",
> "MNO": "MNO_VALUE"
> },
> {
> "id": "2000",
> "field_name": "val2",
> "ABC": "ABC_VALUE",
> "DEF": "DEF_VALUE",
> "GHI": "GHI_VALUE",
> "JKL": "JKL_VALUE",
> "MNO": "MNO_VALUE"
> }
>
>
> 
>
> DataConfig XML (TRY 2):
> 
>  function f1(row) {
> var code = row.get("TAG_CODE");
> var val = row.get("TAG_VALUE");
> row.put(code, val);
> row.remove("TAG_CODE");
> row.remove("TAG_VALUE");
> return row;
> }
> ]]>
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor"
> forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">
> 
> 
> 
> 
> 
> 
>
> Output:
> {
> "id": "1000",
> "field_name": "val1"
> },
> {
> "id": "2000",
> "field_name": "val2"
> }
>
>
> 
>
> DataConfig XML (TRY 3):
> 
>  function f1(row) {
> var code = row.get("TAG_CODE");
> var val = row.get("TAG_VALUE");
> row.put(code, val);
> row.remove("TAG_CODE");
> row.remove("TAG_VALUE");
> return row;
> }
> ]]>
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor"
> forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">
>  xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_CODE"
> />
>  xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_VALUE"
> />
> 
> 
> 
> 
>
> Output:
> {
> "id": "1000",
> "field_name": "val1"
> },
> {
> "id": "2000",
> "field_name": "val2"
> }
>
>
> --
> Thanx & Regards
> Umang Agrawal
>
>
> [image: Inline image 1]
>


Re: Merging documents from a distributed search

2015-09-08 Thread tedsolr
Joel,

It needs to perform. Typically users will have 1 - 5 million rows in a
query, returning 10 - 15 fields. Grouping reduces the return by 50% or more
normally. Responses tend be less than a half second.

It sounds like the manipulation of docs at the collector level has been left
to the single solr node implementations, and that your streaming API is the
way forward for cloud implementations. Even if it does have some performance
drawbacks. I can bear slower searches as long as they are not seconds
slower.

I could implement some business strategy that forks searching to either the
AnalyticsQuery or the streaming API based on the shard count in the
collection. Most of my customers will have single shard collections. A goal
of mine is to keep each collection whole as long as possible. If one gets
too big for the pond I'll move it to a bigger pond, until some heap limit is
reached when it will have to be split. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Merging-documents-from-a-distributed-search-tp4226802p4227595.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr facets implementation question

2015-09-08 Thread adfel70
I am trying to understand why faceting on a field with lots of unique values
has a great impact on query performance. Since Googling for Solr facet
algorithm did not yield anything, I looked how facets are implemented in
Lucene. I found out that there are 2 methods - taxonomy-based and
SortedSetDocValues-based. Does Solr facet capabilities are based on one of
those methods? if so, I still cant understand why unique values impacts
query performance...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facets-implementation-question-tp4227604.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problems with pointing to custom core directories during startup in Solr 5.3.0

2015-09-08 Thread Zheng Lin Edwin Yeo
I have found that it could be a problem in solr.cmd.

It works after I replace this line in Solr 5.3.0

IF NOT "%EXAMPLE%"=="" goto run_example


With this segment of the code from Solr 5.2.1


 IF "%EXAMPLE%"=="" (

  IF NOT "%SOLR_HOME%"=="" (

REM Absolutize a relative solr home

IF EXIST "%cd%\%SOLR_HOME%" set "SOLR_HOME=%cd%\%SOLR_HOME%"

  )

  REM Otherwise SOLR_HOME just becomes %SOLR_SERVER_DIR%/solr

) ELSE IF "%EXAMPLE%"=="techproducts" (

  mkdir "%SOLR_TIP%\example\techproducts\solr"

  set "SOLR_HOME=%SOLR_TIP%\example\techproducts\solr"

  IF NOT EXIST "!SOLR_HOME!\solr.xml" (

copy "%DEFAULT_SERVER_DIR%\solr\solr.xml" "!SOLR_HOME!\solr.xml"

  )

  IF NOT EXIST "!SOLR_HOME!\zoo.cfg" (

copy "%DEFAULT_SERVER_DIR%\solr\zoo.cfg" "!SOLR_HOME!\zoo.cfg"

  )

) ELSE IF "%EXAMPLE%"=="cloud" (

  set SOLR_MODE=solrcloud

  goto cloud_example_start

) ELSE IF "%EXAMPLE%"=="dih" (

  set "SOLR_HOME=%SOLR_TIP%\example\example-DIH\solr"

) ELSE IF "%EXAMPLE%"=="schemaless" (

  mkdir "%SOLR_TIP%\example\schemaless\solr"

  set "SOLR_HOME=%SOLR_TIP%\example\schemaless\solr"

  IF NOT EXIST "!SOLR_HOME!\solr.xml" (

copy "%DEFAULT_SERVER_DIR%\solr\solr.xml" "!SOLR_HOME!\solr.xml"

  )

  IF NOT EXIST "!SOLR_HOME!\zoo.cfg" (

copy "%DEFAULT_SERVER_DIR%\solr\zoo.cfg" "!SOLR_HOME!\zoo.cfg"

  )

) ELSE (

  @echo.

  @echo 'Unrecognized example %EXAMPLE%!'

  @echo.

  goto start_usage

)



Could there be some issues with this line [IF NOT "%EXAMPLE%"=="" goto
run_example] in Solr 5.3.0 (line 601 in solr.cmd)?


Regards,

Edwin



On 8 September 2015 at 15:19, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I have a custom core directories in my Solr located at
> solrMain\node1\solr, and I set it through the -s  parameter in the
> Solr startup script, and it looks like this:
> bin\solr.cmd start -cloud -p 8983 -s solrMain\node1\solr -m 12g -z
> "localhost:2181,localhost:2182,localhost:2183"
>
> This works fine in Solr 5.2.1.
>
> However, when I tried to upgrade it to Solr 5.3.0 and start Solr using the
> same startup script, it gives the following error:
>
> HTTP ERROR 500
>
> Problem accessing /solr/admin. Reason:
>
> Server Error
>
> Caused by:
>
> java.lang.NullPointerException
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.authenticateRequest(SolrDispatchFilter.java:237)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:186)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:499)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
>
>
> If I start without the -s  command, it is able to start with the
> default directory (server/solr), but without any core and index, as my
> cores and indexes are not located there.
>
> The Solr is able to start normally with the migrated cores and indexes
> only if I replace the solr.cmd to the one I used in Solr 5.2.1. I couldn't
> use the one provided in Solr 5.3.0.
>
> Is there any changes that I need to do with my setting in order to use the
> solr.cmd provided in Solr 5.3.0, and also to link it to my custom core
> directories by using the -s  in my startup script? I check the Solr
> Start Script Reference Page at
> 

Re: Solr facets implementation question

2015-09-08 Thread Shawn Heisey
On 9/8/2015 9:10 AM, adfel70 wrote:
> I am trying to understand why faceting on a field with lots of unique values
> has a great impact on query performance. Since Googling for Solr facet
> algorithm did not yield anything, I looked how facets are implemented in
> Lucene. I found out that there are 2 methods - taxonomy-based and
> SortedSetDocValues-based. Does Solr facet capabilities are based on one of
> those methods? if so, I still cant understand why unique values impacts
> query performance...

Lucene's facet implementation is completely separate (and different)
from Solr's implementation.  I am not familiar with the inner workings
of either implementation.  Solr implemented faceting long before Lucene
did.  I think *Solr* actually contains at least two different facet
implementations, used for different kinds of facets.

Faceting on a field with many unique values uses a HUGE amount of heap
memory, which is likely why query performance is impacted.

I have a dev system with all my indexes (each of which has dedicated
hardware for production) on it.  Normally it requires 15GB of heap to
operate properly.  Every now and then, I get asked to do a duplicate
check on a field that *should* be unique, on an index with 250 million
docs in it.  The query that I am asked to do for the facet matches about
100 million docs.  This facet query, on a field that DOES have
docValues, will throw OOM if my heap is less than 27GB.  The dev machine
only has 32GB of RAM, so as you might imagine, performance is really
terrible when I do this query.  Thankfully it's a dev machine.  When I
was doing these queries, it was running 4.9.1.  I have since upgraded it
to 5.2.1, as a proof of concept for upgrading our production indexes ...
but I have not attempted the facet query since the upgrade.

Thanks,
Shawn



Log4J level from admin web UI

2015-09-08 Thread Nir Barel
Hi All,

I am using Solr 4.8.1 and when I tries to change the log level via the admin 
web UI
It doesn't do anything and the only way to change the log level is to edit my 
log4j file and restart SOLR process

Is it a known issue?
Can you guide me what should I check?


Re: Maximum Number of entires in External Field?

2015-09-08 Thread Aman Tandon
>
> I can provide examples if needed.

Yes that will be so much helpful. Thank you so much.

Then I will try both methodology. And will report the results back here.

With Regards
Aman Tandon

On Tue, Sep 8, 2015 at 2:11 PM, Upayavira  wrote:

> If you have just 5-7 items, then an external file will work, as will the
> join query. You'll need to handle the 'default' case with the join
> query, that is, making sure you do  OR  so that
> documents matching the join are boosted above those matching the main
> query, rather than the join being a filter on the main query.
>
> I can provide examples if needed.
>
> Upayavira
>
> On Mon, Sep 7, 2015, at 07:21 PM, Aman Tandon wrote:
> > I am currently doing boosting for 5-7 things. will  it work great with
> > this
> > too?
> >
> > With Regards
> > Aman Tandon
> >
> > On Mon, Sep 7, 2015 at 11:42 PM, Upayavira  wrote:
> >
> > > External file field would work, but requires a full import of the
> > > external file field every time you change a single entry, which is
> > > pretty extreme.
> > >
> > > I've tested out "score joins" which seemed to perform very well and
> > > achieved the same effect, but using another core, rather than an
> > > external file.
> > >
> > > Thus:
> > >
> > > {!join score=max fromIndex=prices from=id to=id}{!boost b=price}*:*
> > >
> > > seemed to do the job of using the price as a boost. Of course you could
> > > extend this like so:
> > >
> > > q={!join score=max fromIndex=prices from=id to=id}{!boost b=$b}*:*
> > > b=sqrt(price)
> > >
> > > or such things to make the price a more reasonable value.
> > >
> > > Upayavira
> > >
> > > On Mon, Sep 7, 2015, at 06:21 PM, Aman Tandon wrote:
> > > > Any suggestions?
> > > >
> > > > With Regards
> > > > Aman Tandon
> > > >
> > > > On Mon, Sep 7, 2015 at 1:07 PM, Aman Tandon  >
> > > > wrote:
> > > >
> > > > > Hi Upayavira,
> > > > >
> > > > > Have you tried it?
> > > > >
> > > > >
> > > > > No
> > > > >
> > > > > E.g. external file fields don't play nice with Solr Cloud
> > > > >
> > > > >
> > > > > We are not using Solr Cloud.
> > > > >
> > > > >
> > > > >> What are you using the external file for?
> > > > >
> > > > >
> > > > > We are doing the boosting in the search result which are *having
> price
> > > by
> > > > > 1.2* &  *country is India by 1.1*. We are doing by using the
> boosting
> > > > > parameter in conjucation with query & map function e.g.
> > > *=map(query({!dismax
> > > > > qf=hasPrice v='yes' pf=''},0),1,1,1,1)*
> > > > >
> > > > > This is being done with 5/6 parameters. And I am hoping it will
> > > increase
> > > > > query time. So I am planning to make the single score and populate
> it
> > > in
> > > > > external file field. And this might reduce some time.
> > > > >
> > > > > Just to mention we are doing incremental updates after every 10
> > > minutes.
> > > > >
> > > > > With Regards
> > > > > Aman Tandon
> > > > >
> > > > > On Mon, Sep 7, 2015 at 12:53 PM, Upayavira  wrote:
> > > > >
> > > > >> Have you tried it? I suspect your issue will be with the process
> of
> > > > >> reloading the external file rather than consuming it once loaded.
> > > > >>
> > > > >> What are you using the external file for? There may be other ways
> > > also.
> > > > >> E.g. external file fields don't play nice with Solr Cloud.
> > > > >>
> > > > >> Upayavira
> > > > >>
> > > > >> On Mon, Sep 7, 2015, at 07:05 AM, Aman Tandon wrote:
> > > > >> > Hi,
> > > > >> >
> > > > >> > How much ids information can I define in External File?
> Currently I
> > > am
> > > > >> > having the 100 Million records in my index.
> > > > >> >
> > > > >> > With Regards
> > > > >> > Aman Tandon
> > > > >>
> > > > >
> > > > >
> > >
>


Re: SOLR DataImportHandler - Problem with XPathEntityProcessor

2015-09-08 Thread Umang Agrawal
Thanks Alex.

Inner entity name should be different - It was a typo error in my question.

Regarding using XsltUpdateRequestHandler
 , It's a good
solution but I can not use it in my application since I need to include few
more transformer and java manipulators.

Could you please suggest how to use XPATH syntax like "/RESOURCE/LINK[@ID=${
testdata.id}]/TAG/TAG_VALUE" in data config xml file?

On Tue, Sep 8, 2015 at 6:34 PM, Umang Agrawal  wrote:

> Hi All
>
> I am facing a problem with XPathEntityProcessor .
>
> Objective:
> When I index Resource XML file using DIH XPathEntityProcessor then there
> should be 2 solr documents
> 01) Link where id is 1000 with 2 tags ABC and DEF
> 02) Link where id is 2000 with 3 tags GHI, JKL and MNO
>
> Solr Version: 4.10.2
>
> Problem:
> I am not able to index  data properly.
>
> Expected Output:
> {
> "id": "1000",
> "field_name": "val1",
> "ABC": "ABC_VALUE",
> "DEF": "DEF_VALUE"
> },
> {
> "id": "2000",
> "field_name": "val2",
> "GHI": "GHI_VALUE",
> "JKL": "JKL_VALUE",
> "MNO": "MNO_VALUE"
> }
>
> 
>
> Resource XML:
>
> 
> 
> val1
> 
> ABC
> ABC_VALUE
> 
> 
> DEF
> DEF_VALUE
> 
> 
> 
> val2
> 
> GHI
> GHI_VALUE
> 
> 
> JKL
> JKL_VALUE
> 
> 
> MNO
> MNO_VALUE
> 
> 
> 
>
>
> 
>
> DataConfig XML (TRY 1):
> 
>  function f1(row) {
> var code = row.get("TAG_CODE");
> var val = row.get("TAG_VALUE");
> row.put(code, val);
> row.remove("TAG_CODE");
> row.remove("TAG_VALUE");
> return row;
> }
> ]]>
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor"
> forEach="/RESOURCE/LINK/TAG" transformer="script:f1">
> 
> 
> 
> 
> 
> 
>
> Output:
> {
> "id": "1000",
> "field_name": "val1",
> "ABC": "ABC_VALUE",
> "DEF": "DEF_VALUE",
> "GHI": "GHI_VALUE",
> "JKL": "JKL_VALUE",
> "MNO": "MNO_VALUE"
> },
> {
> "id": "2000",
> "field_name": "val2",
> "ABC": "ABC_VALUE",
> "DEF": "DEF_VALUE",
> "GHI": "GHI_VALUE",
> "JKL": "JKL_VALUE",
> "MNO": "MNO_VALUE"
> }
>
>
> 
>
> DataConfig XML (TRY 2):
> 
>  function f1(row) {
> var code = row.get("TAG_CODE");
> var val = row.get("TAG_VALUE");
> row.put(code, val);
> row.remove("TAG_CODE");
> row.remove("TAG_VALUE");
> return row;
> }
> ]]>
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor"
> forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">
> 
> 
> 
> 
> 
> 
>
> Output:
> {
> "id": "1000",
> "field_name": "val1"
> },
> {
> "id": "2000",
> "field_name": "val2"
> }
>
>
> 
>
> DataConfig XML (TRY 3):
> 
>  function f1(row) {
> var code = row.get("TAG_CODE");
> var val = row.get("TAG_VALUE");
> row.put(code, val);
> row.remove("TAG_CODE");
> row.remove("TAG_VALUE");
> return row;
> }
> ]]>
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> 
> 
> http://host:port/uri;
> processor="XPathEntityProcessor"
> forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">
>  xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_CODE"
> />
>  xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_VALUE"
> />
> 
> 
> 
> 
>
> Output:
> {
> "id": "1000",
> "field_name": "val1"
> },
> {
> "id": "2000",
> "field_name": "val2"
> }
>
>
> --
> Thanx & Regards
> Umang Agrawal
>
>
> [image: Inline image 1]
>



-- 
Thanx & Regards
Umang Agrawal


Re: Sorting on date with multivalued False attribute

2015-09-08 Thread Mugeesh Husain
Hi,
stop the solr server,delete index  before indexing you should change or
write schema fields then start solr and index which you want.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-on-date-with-multivalued-False-attribute-tp4227495p4227625.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Cloud: Massive indexing

2015-09-08 Thread Bertrand Venzal
Hello,
I am indexing lots of big documents thanks to Solr Cloud in a map reduce job: 
so every day it is 1 - 2 documents (avg:8Mb, max 100Mb, total ~ 100 
Gb). This is done is 20 minutes. We have 5 nodes, Solr server is launched with 
20 Gb of Ram (and GC1). We add in parallel around 200 
SolrDocuments.Unfortunately, Solr Cloud does not accept so much data and it 
fails (org.apache.solr.client.solrj.SolrServerException: IOException occured 
when talking to server at:).That still indexed many documents thanks to the 
multiple attempts, so if I launch multiple times my Map Reduce, I finally get 
all my documents indexed ...
Is there a way to be check availability of Solr Cloud before adding a document 
or maybe synchronize with the Solr Server ?What do you think ?
ThanksBest RegardsBertrand

Re: conf Folder is not getting created while creating a collection on solr cloud

2015-09-08 Thread Erick Erickson
When you create a collection, you specify a "configset" via the
collection.configName
parameter _or_ it's the same name as your collection and already uploaded
_or_ it's the only configset up in ZK.

Anyway, thereafter whenever a node starts up it downloads the configs
from ZK. If
the config directory was resident on the individual nodes, it'd be
difficult to keep
them all in sync.

So the basic pattern is
1> you have some source of config files, often kept in a version control system
or as Edwin says located in the distro.
2> you make whatever modifications you want
3> you push the configs up to Zookeeper with the zkCli tool
4> you use the configs (reload the collection using them, create a new
collection, whatever)

Repeat <2> and <3> whenever you need to change your configs.

Best,
Erick

On Tue, Sep 8, 2015 at 8:31 AM, Zheng Lin Edwin Yeo
 wrote:
> There is no conf file located at
> solr-5.3.0/server/solr/test_collection_shard1_replica1/.
> Inside that folder should only contain data folder and core.properties file.
> The conf folder is only in the solr-5.3.0/server/solr/data_
> driven_schema_configs.
>
> Why do you need the conf file in
> solr-5.3.0/server/solr/test_collection_shard1_replica1/?
> This is where Solr stores the indexes (in data\index) and normally I don't
> touch anything in this folder.
>
> Regards,
> Edwin
>
>
> On 8 September 2015 at 22:42, Ritesh Sinha > wrote:
>
>> I am trying to create a collection on Solr cloud.
>>
>> I have created a 3 node zookeeper cluster on the same machine.
>>
>> using this command to start solr on three ports :
>> bin/solr start  cloud -z localhost:2181,localhost:2182,localhost:2183 -p
>> 8983
>> bin/solr start  cloud -z localhost:2181,localhost:2182,localhost:2183 -p
>> 8984
>> bin/solr start  cloud -z localhost:2181,localhost:2182,localhost:2183 -p
>> 8985
>>
>> running this to upload the configuration set to zookeeper before creating
>> the collection
>>
>> /var/data/solr/solr-5.3.0/server/scripts/cloud-scripts/zkcli.sh -zkhost
>> localhost:2181,localhost:2182,localhost:2183 -cmd upconfig -confdir
>>
>> /var/data/solr/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf
>> -confname test_conf
>>
>> command for creating collection
>> curl '
>>
>> http://localhost:8983/solr/admin/collections?action=CREATE=tenlaz=1=3=test_collection=test_conf
>> '
>>
>> But when I check
>> solr-5.3.0/server/solr/test_collection_shard1_replica1/
>> There is no conf file.
>>
>> I know i can explicitly copy it.
>>
>> but is there any command which can automatically create the conf directory.
>>
>> I know i am missing something.Any help is appreciated.
>>
>> Reagrds
>>


conf Folder is not getting created while creating a collection on solr cloud

2015-09-08 Thread Ritesh Sinha
I am trying to create a collection on Solr cloud.

I have created a 3 node zookeeper cluster on the same machine.

using this command to start solr on three ports :
bin/solr start  cloud -z localhost:2181,localhost:2182,localhost:2183 -p
8983
bin/solr start  cloud -z localhost:2181,localhost:2182,localhost:2183 -p
8984
bin/solr start  cloud -z localhost:2181,localhost:2182,localhost:2183 -p
8985

running this to upload the configuration set to zookeeper before creating
the collection

/var/data/solr/solr-5.3.0/server/scripts/cloud-scripts/zkcli.sh -zkhost
localhost:2181,localhost:2182,localhost:2183 -cmd upconfig -confdir
/var/data/solr/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf
-confname test_conf

command for creating collection
curl '
http://localhost:8983/solr/admin/collections?action=CREATE=tenlaz=1=3=test_collection=test_conf
'

But when I check
solr-5.3.0/server/solr/test_collection_shard1_replica1/
There is no conf file.

I know i can explicitly copy it.

but is there any command which can automatically create the conf directory.

I know i am missing something.Any help is appreciated.

Reagrds


Re: Solr facets implementation question

2015-09-08 Thread Walter Underwood
Every faceting implementation I’ve seen (not just Solr/Lucene) makes big 
in-memory lists. Lots of values means a bigger list.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Sep 8, 2015, at 8:33 AM, Shawn Heisey  wrote:

> On 9/8/2015 9:10 AM, adfel70 wrote:
>> I am trying to understand why faceting on a field with lots of unique values
>> has a great impact on query performance. Since Googling for Solr facet
>> algorithm did not yield anything, I looked how facets are implemented in
>> Lucene. I found out that there are 2 methods - taxonomy-based and
>> SortedSetDocValues-based. Does Solr facet capabilities are based on one of
>> those methods? if so, I still cant understand why unique values impacts
>> query performance...
> 
> Lucene's facet implementation is completely separate (and different)
> from Solr's implementation.  I am not familiar with the inner workings
> of either implementation.  Solr implemented faceting long before Lucene
> did.  I think *Solr* actually contains at least two different facet
> implementations, used for different kinds of facets.
> 
> Faceting on a field with many unique values uses a HUGE amount of heap
> memory, which is likely why query performance is impacted.
> 
> I have a dev system with all my indexes (each of which has dedicated
> hardware for production) on it.  Normally it requires 15GB of heap to
> operate properly.  Every now and then, I get asked to do a duplicate
> check on a field that *should* be unique, on an index with 250 million
> docs in it.  The query that I am asked to do for the facet matches about
> 100 million docs.  This facet query, on a field that DOES have
> docValues, will throw OOM if my heap is less than 27GB.  The dev machine
> only has 32GB of RAM, so as you might imagine, performance is really
> terrible when I do this query.  Thankfully it's a dev machine.  When I
> was doing these queries, it was running 4.9.1.  I have since upgraded it
> to 5.2.1, as a proof of concept for upgrading our production indexes ...
> but I have not attempted the facet query since the upgrade.
> 
> Thanks,
> Shawn
> 



SOLR DataImportHandler - Problem with XPathEntityProcessor

2015-09-08 Thread Umang Agrawal
Hi All

I am facing a problem with XPathEntityProcessor .

Objective:
When I index Resource XML file using DIH XPathEntityProcessor then there
should be 2 solr documents
01) Link where id is 1000 with 2 tags ABC and DEF
02) Link where id is 2000 with 3 tags GHI, JKL and MNO

Solr Version: 4.10.2

Problem:
I am not able to index  data properly.

Expected Output:
{
"id": "1000",
"field_name": "val1",
"ABC": "ABC_VALUE",
"DEF": "DEF_VALUE"
},
{
"id": "2000",
"field_name": "val2",
"GHI": "GHI_VALUE",
"JKL": "JKL_VALUE",
"MNO": "MNO_VALUE"
}


Resource XML:



val1

ABC
ABC_VALUE


DEF
DEF_VALUE



val2

GHI
GHI_VALUE


JKL
JKL_VALUE


MNO
MNO_VALUE






DataConfig XML (TRY 1):




http://host:port/uri;
processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">


http://host:port/uri;
processor="XPathEntityProcessor"
forEach="/RESOURCE/LINK/TAG" transformer="script:f1">







Output:
{
"id": "1000",
"field_name": "val1",
"ABC": "ABC_VALUE",
"DEF": "DEF_VALUE",
"GHI": "GHI_VALUE",
"JKL": "JKL_VALUE",
"MNO": "MNO_VALUE"
},
{
"id": "2000",
"field_name": "val2",
"ABC": "ABC_VALUE",
"DEF": "DEF_VALUE",
"GHI": "GHI_VALUE",
"JKL": "JKL_VALUE",
"MNO": "MNO_VALUE"
}



DataConfig XML (TRY 2):




http://host:port/uri;
processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">


http://host:port/uri;
processor="XPathEntityProcessor"
forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">







Output:
{
"id": "1000",
"field_name": "val1"
},
{
"id": "2000",
"field_name": "val2"
}



DataConfig XML (TRY 3):




http://host:port/uri;
processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">


http://host:port/uri;
processor="XPathEntityProcessor"
forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">







Output:
{
"id": "1000",
"field_name": "val1"
},
{
"id": "2000",
"field_name": "val2"
}


-- 
Thanx & Regards
Umang Agrawal


[image: Inline image 1]


Re: SolrCloud Admin UI shows node is Down, but state.json says it's active/up

2015-09-08 Thread Erick Erickson
bq: You were probably referring to state.json

yep, I'm never sure whether people are on the old or new ZK versions.

OK, With Tomás' comment, I think it's explained... although confusing.

WDYT?


On Tue, Sep 8, 2015 at 10:03 AM, Arcadius Ahouansou
 wrote:
> Hello Erick.
>
> Yes,
>
> 1> liveNodes has N nodes listed (correctly): Correct, liveNodes is always
> right.
>
> 2> clusterstate.json has N+M nodes listed as "active": clusterstate.json is
> always empty as it's no longer being "used" in 5.3. You were
> probably referring to state.json which is in individual collections. Yes,
> that one reflects the wrong value i.e N+M
>
> 3> using the collection API to get CLUSTERSTATUS always return the correct
> value N
>
> 4> The Front-end code in code in cloud.js displays the right colour when
> nodes go down because it checks for the live node
>
> The problem is only with state.json under certain circumstances.
>
> Thanks.
>
> On 8 September 2015 at 17:51, Erick Erickson 
> wrote:
>
>> Arcadius:
>>
>> Hmmm. It may take a while for the cluster state to change, but I'm
>> assuming that this state persists for minutes/hours/days.
>>
>> So to recap: If dump the entire ZK node from the root, you have
>> 1> liveNodes has N nodes listed (correctly)
>> 2> clusterstate.json has N+M nodes listed as "active"
>>
>> Doesn't sound right to me, but I'll have to let people who are deep
>> into that code speculate from here.
>>
>> Best,
>> Erick
>>
>> On Tue, Sep 8, 2015 at 1:13 AM, Arcadius Ahouansou 
>> wrote:
>> > On Sep 8, 2015 6:25 AM, "Erick Erickson" 
>> wrote:
>> >>
>> >> Perhaps the browser cache? What happens if you, say, use
>> >> Zookeeper client tools to bring down the the cluster state in
>> >> question? Or perhaps just refresh the admin UI when showing
>> >> the cluster status
>> >>
>> >
>> > Hello Erick.
>> >
>> > Thank you very much for answering.
>> > I did use the ZooInspetor tool to check the state.json in all 5 zk nodes
>> > and they are all out of date and identical to what I get through the tree
>> > view in sole admin ui.
>> >
>> > Looking at the source code cloud.js that correctly display nodes as
>> "gone"
>> > in the graph view, it calls the end point /zookeeper?wt=json and relies
>> on
>> > the live nodes to mark a node as down instead of status.json.
>> >
>> > Thanks.
>> >
>> >> Shot in the dark,
>> >> Erick
>> >>
>> >> On Mon, Sep 7, 2015 at 6:09 PM, Arcadius Ahouansou <
>> arcad...@menelic.com>
>> > wrote:
>> >> > We are running the latest Solr 5.3.0
>> >> >
>> >> > Thanks.
>>
>
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Information is Power
> M: 07908761999
> W: www.menelic.com
> ---


Re: SOLR DataImportHandler - Problem with XPathEntityProcessor

2015-09-08 Thread Alexandre Rafalovitch
What about DIH's own XSL pre-processor? It is XSL param on
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheXPathEntityProcessor

No other ideas, unfortunately, I don't usually nest XML processors.

Regards,
Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 8 September 2015 at 12:12, Umang Agrawal  wrote:

> Thanks Alex.
>
> Inner entity name should be different - It was a typo error in my question.
>
> Regarding using XsltUpdateRequestHandler
>  , It's a good
> solution but I can not use it in my application since I need to include few
> more transformer and java manipulators.
>
> Could you please suggest how to use XPATH syntax like "
> /RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_VALUE" in data config xml file?
>
> On Tue, Sep 8, 2015 at 6:34 PM, Umang Agrawal 
> wrote:
>
>> Hi All
>>
>> I am facing a problem with XPathEntityProcessor .
>>
>> Objective:
>> When I index Resource XML file using DIH XPathEntityProcessor then there
>> should be 2 solr documents
>> 01) Link where id is 1000 with 2 tags ABC and DEF
>> 02) Link where id is 2000 with 3 tags GHI, JKL and MNO
>>
>> Solr Version: 4.10.2
>>
>> Problem:
>> I am not able to index  data properly.
>>
>> Expected Output:
>> {
>> "id": "1000",
>> "field_name": "val1",
>> "ABC": "ABC_VALUE",
>> "DEF": "DEF_VALUE"
>> },
>> {
>> "id": "2000",
>> "field_name": "val2",
>> "GHI": "GHI_VALUE",
>> "JKL": "JKL_VALUE",
>> "MNO": "MNO_VALUE"
>> }
>>
>> 
>>
>> Resource XML:
>>
>> 
>> 
>> val1
>> 
>> ABC
>> ABC_VALUE
>> 
>> 
>> DEF
>> DEF_VALUE
>> 
>> 
>> 
>> val2
>> 
>> GHI
>> GHI_VALUE
>> 
>> 
>> JKL
>> JKL_VALUE
>> 
>> 
>> MNO
>> MNO_VALUE
>> 
>> 
>> 
>>
>>
>> 
>>
>> DataConfig XML (TRY 1):
>> 
>> > function f1(row) {
>> var code = row.get("TAG_CODE");
>> var val = row.get("TAG_VALUE");
>> row.put(code, val);
>> row.remove("TAG_CODE");
>> row.remove("TAG_VALUE");
>> return row;
>> }
>> ]]>
>> 
>> 
>> http://host:port/uri;
>> processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
>> 
>> 
>> http://host:port/uri;
>> processor="XPathEntityProcessor"
>> forEach="/RESOURCE/LINK/TAG" transformer="script:f1">
>> 
>> 
>> 
>> 
>> 
>> 
>>
>> Output:
>> {
>> "id": "1000",
>> "field_name": "val1",
>> "ABC": "ABC_VALUE",
>> "DEF": "DEF_VALUE",
>> "GHI": "GHI_VALUE",
>> "JKL": "JKL_VALUE",
>> "MNO": "MNO_VALUE"
>> },
>> {
>> "id": "2000",
>> "field_name": "val2",
>> "ABC": "ABC_VALUE",
>> "DEF": "DEF_VALUE",
>> "GHI": "GHI_VALUE",
>> "JKL": "JKL_VALUE",
>> "MNO": "MNO_VALUE"
>> }
>>
>>
>> 
>>
>> DataConfig XML (TRY 2):
>> 
>> > function f1(row) {
>> var code = row.get("TAG_CODE");
>> var val = row.get("TAG_VALUE");
>> row.put(code, val);
>> row.remove("TAG_CODE");
>> row.remove("TAG_VALUE");
>> return row;
>> }
>> ]]>
>> 
>> 
>> http://host:port/uri;
>> processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
>> 
>> 
>> http://host:port/uri;
>> processor="XPathEntityProcessor"
>> forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">
>> 
>> 
>> 
>> 
>> 
>> 
>>
>> Output:
>> {
>> "id": "1000",
>> "field_name": "val1"
>> },
>> {
>> "id": "2000",
>> "field_name": "val2"
>> }
>>
>>
>> 
>>
>> DataConfig XML (TRY 3):
>> 
>> > function f1(row) {
>> var code = row.get("TAG_CODE");
>> var val = row.get("TAG_VALUE");
>> row.put(code, val);
>> row.remove("TAG_CODE");
>> row.remove("TAG_VALUE");
>> return row;
>> }
>> ]]>
>> 
>> 
>> http://host:port/uri;
>> processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
>> 
>> 
>> http://host:port/uri;
>> processor="XPathEntityProcessor"
>> forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">
>> > xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_CODE"
>> />
>> > xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_VALUE"
>> />
>> 
>> 
>> 
>> 
>>
>> Output:
>> {
>> "id": "1000",
>> "field_name": "val1"
>> },
>> {
>> "id": "2000",
>> "field_name": "val2"
>> }
>>
>>
>> --
>> Thanx & Regards
>> Umang Agrawal
>>
>>
>> [image: Inline image 1]
>>
>
>
>
> --
> Thanx & Regards
> Umang Agrawal
>


Solr score distribution usage

2015-09-08 Thread Ashish Mukherjee
Hello,

I would like to use the Solr score distribution to pick up most relevant
documents from the search result. Rather than top n results, I am
interested only in picking up the most relevant based on statistical
distribution of the scores.

A brief study of some sample searches (the most frequently searched terms)
on my data-set shows that the mode and median scores seem to coincide or be
very close together. Is this the kind of trend which is generally observed
in Solr (though I understand variations on specific searches)? Hence, I was
considering using statistical mode as the threshold above which I use the
documents from the result.

Has anyone done something like this before or would like to critique my
approach?

Regards,
Ashish


Re: SolrCloud Admin UI shows node is Down, but state.json says it's active/up

2015-09-08 Thread Erick Erickson
Arcadius:

Hmmm. It may take a while for the cluster state to change, but I'm
assuming that this state persists for minutes/hours/days.

So to recap: If dump the entire ZK node from the root, you have
1> liveNodes has N nodes listed (correctly)
2> clusterstate.json has N+M nodes listed as "active"

Doesn't sound right to me, but I'll have to let people who are deep
into that code speculate from here.

Best,
Erick

On Tue, Sep 8, 2015 at 1:13 AM, Arcadius Ahouansou  wrote:
> On Sep 8, 2015 6:25 AM, "Erick Erickson"  wrote:
>>
>> Perhaps the browser cache? What happens if you, say, use
>> Zookeeper client tools to bring down the the cluster state in
>> question? Or perhaps just refresh the admin UI when showing
>> the cluster status
>>
>
> Hello Erick.
>
> Thank you very much for answering.
> I did use the ZooInspetor tool to check the state.json in all 5 zk nodes
> and they are all out of date and identical to what I get through the tree
> view in sole admin ui.
>
> Looking at the source code cloud.js that correctly display nodes as "gone"
> in the graph view, it calls the end point /zookeeper?wt=json and relies on
> the live nodes to mark a node as down instead of status.json.
>
> Thanks.
>
>> Shot in the dark,
>> Erick
>>
>> On Mon, Sep 7, 2015 at 6:09 PM, Arcadius Ahouansou 
> wrote:
>> > We are running the latest Solr 5.3.0
>> >
>> > Thanks.


Re: SolrCloud Admin UI shows node is Down, but state.json says it's active/up

2015-09-08 Thread Arcadius Ahouansou
Hello Erick.

Yes,

1> liveNodes has N nodes listed (correctly): Correct, liveNodes is always
right.

2> clusterstate.json has N+M nodes listed as "active": clusterstate.json is
always empty as it's no longer being "used" in 5.3. You were
probably referring to state.json which is in individual collections. Yes,
that one reflects the wrong value i.e N+M

3> using the collection API to get CLUSTERSTATUS always return the correct
value N

4> The Front-end code in code in cloud.js displays the right colour when
nodes go down because it checks for the live node

The problem is only with state.json under certain circumstances.

Thanks.

On 8 September 2015 at 17:51, Erick Erickson 
wrote:

> Arcadius:
>
> Hmmm. It may take a while for the cluster state to change, but I'm
> assuming that this state persists for minutes/hours/days.
>
> So to recap: If dump the entire ZK node from the root, you have
> 1> liveNodes has N nodes listed (correctly)
> 2> clusterstate.json has N+M nodes listed as "active"
>
> Doesn't sound right to me, but I'll have to let people who are deep
> into that code speculate from here.
>
> Best,
> Erick
>
> On Tue, Sep 8, 2015 at 1:13 AM, Arcadius Ahouansou 
> wrote:
> > On Sep 8, 2015 6:25 AM, "Erick Erickson" 
> wrote:
> >>
> >> Perhaps the browser cache? What happens if you, say, use
> >> Zookeeper client tools to bring down the the cluster state in
> >> question? Or perhaps just refresh the admin UI when showing
> >> the cluster status
> >>
> >
> > Hello Erick.
> >
> > Thank you very much for answering.
> > I did use the ZooInspetor tool to check the state.json in all 5 zk nodes
> > and they are all out of date and identical to what I get through the tree
> > view in sole admin ui.
> >
> > Looking at the source code cloud.js that correctly display nodes as
> "gone"
> > in the graph view, it calls the end point /zookeeper?wt=json and relies
> on
> > the live nodes to mark a node as down instead of status.json.
> >
> > Thanks.
> >
> >> Shot in the dark,
> >> Erick
> >>
> >> On Mon, Sep 7, 2015 at 6:09 PM, Arcadius Ahouansou <
> arcad...@menelic.com>
> > wrote:
> >> > We are running the latest Solr 5.3.0
> >> >
> >> > Thanks.
>



-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---


Re: Log4J level from admin web UI

2015-09-08 Thread Shawn Heisey
On 9/8/2015 10:02 AM, Nir Barel wrote:
> Hi All,
>
> I am using Solr 4.8.1 and when I tries to change the log level via the admin 
> web UI
> It doesn't do anything and the only way to change the log level is to edit my 
> log4j file and restart SOLR process
>
> Is it a known issue?
> Can you guide me what should I check?

We'll need a lot more details.  What exactly are you changing, what are
you expecting to happen, and what actually happens?

https://wiki.apache.org/solr/UsingMailingLists

Changing the logging level of a class in the admin UI will usually not
affect what you can see in the Logging tab in the same UI. That is
limited to messages that are at least as severe as WARN, so lower
severity details like INFO and DEBUG will not normally appear there,
they will be in the Solr logfile, and possibly on the console (stdout)
of the Solr process.

Thanks,
Shawn



Re: SolrCloud Admin UI shows node is Down, but state.json says it's active/up

2015-09-08 Thread Tomás Fernández Löbbe
I believe this is expected in the current code. From Replica.State javadoc:


  /**
   * The replica's state. In general, if the node the replica is hosted on
is
   * not under {@code /live_nodes} in ZK, the replica's state should be
   * discarded.
   */
  public enum State {

/**
 * The replica is ready to receive updates and queries.
 * 
 * NOTE: when the node the replica is hosted on crashes, the
 * replica's state may remain ACTIVE in ZK. To determine if the replica
is
 * truly active, you must also verify that its {@link
Replica#getNodeName()
 * node} is under {@code /live_nodes} in ZK (or use
 * {@link ClusterState#liveNodesContain(String)}).
 * 
 */
ACTIVE,
...

On Tue, Sep 8, 2015 at 9:51 AM, Erick Erickson 
wrote:

> Arcadius:
>
> Hmmm. It may take a while for the cluster state to change, but I'm
> assuming that this state persists for minutes/hours/days.
>
> So to recap: If dump the entire ZK node from the root, you have
> 1> liveNodes has N nodes listed (correctly)
> 2> clusterstate.json has N+M nodes listed as "active"
>
> Doesn't sound right to me, but I'll have to let people who are deep
> into that code speculate from here.
>
> Best,
> Erick
>
> On Tue, Sep 8, 2015 at 1:13 AM, Arcadius Ahouansou 
> wrote:
> > On Sep 8, 2015 6:25 AM, "Erick Erickson" 
> wrote:
> >>
> >> Perhaps the browser cache? What happens if you, say, use
> >> Zookeeper client tools to bring down the the cluster state in
> >> question? Or perhaps just refresh the admin UI when showing
> >> the cluster status
> >>
> >
> > Hello Erick.
> >
> > Thank you very much for answering.
> > I did use the ZooInspetor tool to check the state.json in all 5 zk nodes
> > and they are all out of date and identical to what I get through the tree
> > view in sole admin ui.
> >
> > Looking at the source code cloud.js that correctly display nodes as
> "gone"
> > in the graph view, it calls the end point /zookeeper?wt=json and relies
> on
> > the live nodes to mark a node as down instead of status.json.
> >
> > Thanks.
> >
> >> Shot in the dark,
> >> Erick
> >>
> >> On Mon, Sep 7, 2015 at 6:09 PM, Arcadius Ahouansou <
> arcad...@menelic.com>
> > wrote:
> >> > We are running the latest Solr 5.3.0
> >> >
> >> > Thanks.
>


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-08 Thread Kevin Lee
Thanks Dan!  Please let us know what you find.  I’m interested to know if this 
is an issue with anyone else’s setup or if I have an issue in my local 
configuration that is still preventing it to work on start/restart.

- Kevin

> On Sep 5, 2015, at 8:45 AM, Dan Davis  wrote:
> 
> Kevin & Noble,
> 
> I'll take it on to test this.   I've built from source before, and I've
> wanted this authorization capability for awhile.
> 
> On Fri, Sep 4, 2015 at 9:59 AM, Kevin Lee  wrote:
> 
>> Noble,
>> 
>> Does SOLR-8000 need to be re-opened?  Has anyone else been able to test
>> the restart fix?
>> 
>> At startup, these are the log messages that say there is no security
>> configuration and the plugins aren’t being used even though security.json
>> is in Zookeeper:
>> 2015-09-04 08:06:21.205 INFO  (main) [   ] o.a.s.c.CoreContainer Security
>> conf doesn't exist. Skipping setup for authorization module.
>> 2015-09-04 08:06:21.205 INFO  (main) [   ] o.a.s.c.CoreContainer No
>> authentication plugin used.
>> 
>> Thanks,
>> Kevin
>> 
>>> On Sep 4, 2015, at 5:47 AM, Noble Paul  wrote:
>>> 
>>> There are no download links for 5.3.x branch  till we do a bug fix
>> release
>>> 
>>> If you wish to download the trunk nightly (which is not same as 5.3.0)
>>> check here
>> https://builds.apache.org/job/Solr-Artifacts-trunk/lastSuccessfulBuild/artifact/solr/package/
>>> 
>>> If you wish to get the binaries for 5.3 branch you will have to make it
>>> (you will need to install svn and ant)
>>> 
>>> Here are the steps
>>> 
>>> svn checkout
>> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_3/
>>> cd lucene_solr_5_3/solr
>>> ant server
>>> 
>>> 
>>> 
>>> On Fri, Sep 4, 2015 at 4:11 PM, davidphilip cherian
>>>  wrote:
 Hi Kevin/Noble,
 
 What is the download link to take the latest? What are the steps to
>> compile
 it, test and use?
 We also have a use case to have this feature in solr too. Therefore,
>> wanted
 to test and above info would help a lot to get started.
 
 Thanks.
 
 
 On Fri, Sep 4, 2015 at 1:45 PM, Kevin Lee 
>> wrote:
 
> Thanks, I downloaded the source and compiled it and replaced the jar
>> file
> in the dist and solr-webapp’s WEB-INF/lib directory.  It does seem to
>> be
> protecting the Collections API reload command now as long as I upload
>> the
> security.json after startup of the Solr instances.  If I shutdown and
>> bring
> the instances back up, the security is no longer in place and I have to
> upload the security.json again for it to take effect.
> 
> - Kevin
> 
>> On Sep 3, 2015, at 10:29 PM, Noble Paul  wrote:
>> 
>> Both these are committed. If you could test with the latest 5.3 branch
>> it would be helpful
>> 
>> On Wed, Sep 2, 2015 at 5:11 PM, Noble Paul 
>> wrote:
>>> I opened a ticket for the same
>>> https://issues.apache.org/jira/browse/SOLR-8004
>>> 
>>> On Wed, Sep 2, 2015 at 1:36 PM, Kevin Lee >> 
> wrote:
 I’ve found that completely exiting Chrome or Firefox and opening it
> back up re-prompts for credentials when they are required.  It was
> re-prompting with the /browse path where authentication was working
>> each
> time I completely exited and started the browser again, however it
>> won’t
> re-prompt unless you exit completely and close all running instances
>> so I
> closed all instances each time to test.
 
 However, to make sure I ran it via the command line via curl as
> suggested and it still does not give any authentication error when
>> trying
> to issue the command via curl.  I get a success response from all the
>> Solr
> instances that the reload was successful.
 
 Not sure why the pre-canned permissions aren’t working, but the one
>> to
> the request handler at the /browse path is.
 
 
> On Sep 1, 2015, at 11:03 PM, Noble Paul 
>> wrote:
> 
> " However, after uploading the new security.json and restarting the
> web browser,"
> 
> The browser remembers your login , So it is unlikely to prompt for
>> the
> credentials again.
> 
> Why don't you try the RELOAD operation using command line (curl) ?
> 
> On Tue, Sep 1, 2015 at 10:31 PM, Kevin Lee
>> 
> wrote:
>> The restart issues aside, I’m trying to lockdown usage of the
> Collections API, but that also does not seem to be working either.
>> 
>> Here is my security.json.  I’m using the “collection-admin-edit”
> permission and assigning it to the “adminRole”.  However, after
>> uploading
> the new security.json and restarting 

Re: SolrCloud Admin UI shows node is Down, but state.json says it's active/up

2015-09-08 Thread Arcadius Ahouansou
Thank you Tomás for pointing to the JavaDoc
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE

The Javadoc is quite clear. So this stale state.json is not an issue after
all.

However, it's very confusing that when a node goes down, state.json may be
updated for 1 collection while it remains stale in the other collection.
Also in our case, the node did not crash as per the JavaDoc... it was a
normal server stop/shut-down.
We may need to review our shut-down process and see whether things change.

Thank you very much Erick and Tomás for your valuable help... very
appreciated.

Arcadius.


On 8 September 2015 at 18:28, Erick Erickson 
wrote:

> bq: You were probably referring to state.json
>
> yep, I'm never sure whether people are on the old or new ZK versions.
>
> OK, With Tomás' comment, I think it's explained... although confusing.
>
> WDYT?
>
>
> On Tue, Sep 8, 2015 at 10:03 AM, Arcadius Ahouansou
>  wrote:
> > Hello Erick.
> >
> > Yes,
> >
> > 1> liveNodes has N nodes listed (correctly): Correct, liveNodes is always
> > right.
> >
> > 2> clusterstate.json has N+M nodes listed as "active": clusterstate.json
> is
> > always empty as it's no longer being "used" in 5.3. You were
> > probably referring to state.json which is in individual collections. Yes,
> > that one reflects the wrong value i.e N+M
> >
> > 3> using the collection API to get CLUSTERSTATUS always return the
> correct
> > value N
> >
> > 4> The Front-end code in code in cloud.js displays the right colour when
> > nodes go down because it checks for the live node
> >
> > The problem is only with state.json under certain circumstances.
> >
> > Thanks.
> >
> > On 8 September 2015 at 17:51, Erick Erickson 
> > wrote:
> >
> >> Arcadius:
> >>
> >> Hmmm. It may take a while for the cluster state to change, but I'm
> >> assuming that this state persists for minutes/hours/days.
> >>
> >> So to recap: If dump the entire ZK node from the root, you have
> >> 1> liveNodes has N nodes listed (correctly)
> >> 2> clusterstate.json has N+M nodes listed as "active"
> >>
> >> Doesn't sound right to me, but I'll have to let people who are deep
> >> into that code speculate from here.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Sep 8, 2015 at 1:13 AM, Arcadius Ahouansou <
> arcad...@menelic.com>
> >> wrote:
> >> > On Sep 8, 2015 6:25 AM, "Erick Erickson" 
> >> wrote:
> >> >>
> >> >> Perhaps the browser cache? What happens if you, say, use
> >> >> Zookeeper client tools to bring down the the cluster state in
> >> >> question? Or perhaps just refresh the admin UI when showing
> >> >> the cluster status
> >> >>
> >> >
> >> > Hello Erick.
> >> >
> >> > Thank you very much for answering.
> >> > I did use the ZooInspetor tool to check the state.json in all 5 zk
> nodes
> >> > and they are all out of date and identical to what I get through the
> tree
> >> > view in sole admin ui.
> >> >
> >> > Looking at the source code cloud.js that correctly display nodes as
> >> "gone"
> >> > in the graph view, it calls the end point /zookeeper?wt=json and
> relies
> >> on
> >> > the live nodes to mark a node as down instead of status.json.
> >> >
> >> > Thanks.
> >> >
> >> >> Shot in the dark,
> >> >> Erick
> >> >>
> >> >> On Mon, Sep 7, 2015 at 6:09 PM, Arcadius Ahouansou <
> >> arcad...@menelic.com>
> >> > wrote:
> >> >> > We are running the latest Solr 5.3.0
> >> >> >
> >> >> > Thanks.
> >>
> >
> >
> >
> > --
> > Arcadius Ahouansou
> > Menelic Ltd | Information is Power
> > M: 07908761999
> > W: www.menelic.com
> > ---
>



-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---


Replication Sync OR Async?

2015-09-08 Thread Maulin Rathod
We are using Solrcloud 5.3 with 2 shards and 2 replica. We observed that
indexing is slower when replicas is up and running. If we stop replicas
than indexing become very fast.

Here is some readings for indexing of 10 documents.

When replicas are running it took around 900 seconds for indexing.
After stopping replicas it took around 500 seconds for indexing.

Is the replication happens in Sync or Async?  If it is Sync, can we make it
Async so that it will not affect indexing performance.

I have checked solr code and it seems like in doFinish() method (of
DistributedUpdateProcessor class) waits for replication request to complete.


DistributedUpdateProcessor
===

  private void doFinish() {
// TODO: if not a forward and replication req is not specified, we could
// send in a background thread

*cmdDistrib.finish();*
List errors = cmdDistrib.getErrors();
// TODO - we may need to tell about more than one error...
...

...
   }


Solr Replication sometimes coming in log files

2015-09-08 Thread Kamal Kishore Aggarwal
Hi Team,

I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. The solr
configuration has master & slave ( 2 Slaves) architecture.


Master & Slave 2 are in same server location (say zone A) , whereas Slave 1
is in another server in different zone (say zone B). There is latency of 40
ms between two zones.

Now, a days we are facing high load on Slave 1 & we suspect that it is due
to delay in data replication from Master server. These days we are finding
these below mentioned replication information in log files, but such lines
are not in previous files on the Slave 1 server. Also, such information is
not there in any Slave 2 log files (might be due to same zone of master &
slave 2).


> INFO: [Core] webapp=/solr path=/replication
> params={wt=json=details&_=1441708786003} status=0 QTime=173
> INFO: [Core] webapp=/solr path=/replication
> params={wt=json=details&_=1441708787976} status=0 QTime=1807
> INFO: [Core] webapp=/solr path=/replication
> params={wt=json=details&_=1441708791563} status=0 QTime=7140
> INFO: [Core] webapp=/solr path=/replication
> params={wt=json=details&_=1441708800450} status=0 QTime=1679



Please confirm if we our thought that increased replication time (which can
be due to servers connectivity issues) is the reason for high load on solr.

Regards
Kamal Kishore