Re: regex-urlfilter help

2016-12-18 Thread forest_soup
Yeah,, I'm curious why this thread is used to talk that topic.
I'll start a new thread on my questions. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cannot-provide-index-service-after-a-large-GC-pause-but-core-state-in-ZK-is-still-active-tp4308942p4310302.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Very long young generation stop the world GC pause

2016-12-18 Thread forest_soup
Sorry for my wrong memory. The swap is 16GB. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Very-long-young-generation-stop-the-world-GC-pause-tp4308911p4310301.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Soft commit and reading data just after the commit

2016-12-18 Thread Lasitha Wattaladeniya
I didn't look much onto  REALTIME GET handler.  Thanks for mentioning
that.  I'm checking it now

On 19 Dec 2016 10:09, "Lasitha Wattaladeniya"  wrote:

> Hi all,
>
> Thanks for your replies,
>
> @dorian : the requirement is,  we are showing a list of entries on a page.
> For each user there's a read / unread flag.  The data for listing is
> fetched from solr. And you can see the entry was previously read or not. So
> when a user views an entry by clicking.  We are updating the database flag
> to READ and use real time indexing to update solr entry.  So when the user
> close the full view of the entry and go back to entry listing page,  the
> data fetched from solr should be updated to READ. That's the use case we
> are trying to fix.
>
> @eric : thanks for the lengthy reply.  So let's say I increase the
> autosoftcommit time out to may be 100 ms.  In that case do I have to wait
> much that time from client side before calling search ?.  What's the
> correct way of achieving this?
>
> Regards,
> Lasitha
>
> On 18 Dec 2016 23:52, "Erick Erickson"  wrote:
>
>> 1 ms autocommit is far too frequent. And it's not
>> helping you anyway.
>>
>> There is some lag between when a commit happens
>> and when the docs are really available. The sequence is:
>> 1> commit (soft or hard-with-opensearcher=true doesn't matter).
>> 2> a new searcher is opened and autowarming starts
>> 3> until the new searcher is opened, queries continue to be served by
>> the old searcher
>> 4> the new searcher is fully opened
>> 5> _new_ requests are served by the new searcher.
>> 6> the last request is finished by the old searcher and it's closed.
>>
>> So what's probably happening is that you send docs and then send a
>> query and Solr is still in step <3>. You can look at your admin UI
>> pluginst/stats page or your log to see how long it takes for a
>> searcher to open and adjust your expectations accordingly.
>>
>> If you want to fetch only the document (not try to get it by a
>> search), Real Time Get is designed to insure that you always get the
>> most recent copy whether it's searchable or not.
>>
>> All that said, Solr wasn't designed for autocommits that are that
>> frequent. That's why the documentation talks about _Near_ Real Time.
>> You may need to adjust your expectations.
>>
>> Best,
>> Erick
>>
>> On Sun, Dec 18, 2016 at 6:49 AM, Dorian Hoxha 
>> wrote:
>> > There's a very high probability that you're using the wrong tool for the
>> > job if you need 1ms softCommit time. Especially when you always need it
>> (ex
>> > there are apps where you need commit-after-insert very rarely).
>> >
>> > So explain what you're using it for ?
>> >
>> > On Sun, Dec 18, 2016 at 3:38 PM, Lasitha Wattaladeniya <
>> watt...@gmail.com>
>> > wrote:
>> >
>> >> Hi Furkan,
>> >>
>> >> Thanks for the links. I had read the first one but not the second one.
>> I
>> >> did read it after you sent. So in my current solrconfig.xml settings
>> below
>> >> are the configurations,
>> >>
>> >> 
>> >>${solr.autoSoftCommit.maxTime:1}
>> >>  
>> >>
>> >>
>> >> 
>> >>15000
>> >>false
>> >>  
>> >>
>> >> The problem i'm facing is, just after adding the documents to solr
>> using
>> >> solrj, when I retrieve data from solr I am not getting the updated
>> results.
>> >> This happens time to time. Most of the time I get the correct data but
>> in
>> >> some occasions I get wrong results. so as you suggest, what the best
>> >> practice to use here ? , should I wait 1 mili second before calling for
>> >> updated results ?
>> >>
>> >> Regards,
>> >> Lasitha
>> >>
>> >> Lasitha Wattaladeniya
>> >> Software Engineer
>> >>
>> >> Mobile : +6593896893
>> >> Blog : techreadme.blogspot.com
>> >>
>> >> On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACI > >
>> >> wrote:
>> >>
>> >> > Hi Lasitha,
>> >> >
>> >> > First of all, did you check these:
>> >> >
>> >> > https://cwiki.apache.org/confluence/display/solr/Near+
>> >> Real+Time+Searching
>> >> > https://lucidworks.com/blog/2013/08/23/understanding-
>> >> > transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >> >
>> >> > after that, if you cannot adjust your configuration you can give more
>> >> > information and we can find a solution.
>> >> >
>> >> > Kind Regards,
>> >> > Furkan KAMACI
>> >> >
>> >> > On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya <
>> >> watt...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hi furkan,
>> >> >>
>> >> >> Thanks for your reply, it is generally a query heavy system. We are
>> >> using
>> >> >> realtime indexing for editing the available data
>> >> >>
>> >> >> Regards,
>> >> >> Lasitha
>> >> >>
>> >> >> Lasitha Wattaladeniya
>> >> >> Software Engineer
>> >> >>
>> >> >> Mobile : +6593896893 <+65%209389%206893>
>> >> >> Blog : techreadme.blogspot.com
>> >> >>
>> >> >> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI <
>> furkankam...@gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >>> Hi Lasitha,

Re: Soft commit and reading data just after the commit

2016-12-18 Thread Lasitha Wattaladeniya
Hi all,

Thanks for your replies,

@dorian : the requirement is,  we are showing a list of entries on a page.
For each user there's a read / unread flag.  The data for listing is
fetched from solr. And you can see the entry was previously read or not. So
when a user views an entry by clicking.  We are updating the database flag
to READ and use real time indexing to update solr entry.  So when the user
close the full view of the entry and go back to entry listing page,  the
data fetched from solr should be updated to READ. That's the use case we
are trying to fix.

@eric : thanks for the lengthy reply.  So let's say I increase the
autosoftcommit time out to may be 100 ms.  In that case do I have to wait
much that time from client side before calling search ?.  What's the
correct way of achieving this?

Regards,
Lasitha

On 18 Dec 2016 23:52, "Erick Erickson"  wrote:

> 1 ms autocommit is far too frequent. And it's not
> helping you anyway.
>
> There is some lag between when a commit happens
> and when the docs are really available. The sequence is:
> 1> commit (soft or hard-with-opensearcher=true doesn't matter).
> 2> a new searcher is opened and autowarming starts
> 3> until the new searcher is opened, queries continue to be served by
> the old searcher
> 4> the new searcher is fully opened
> 5> _new_ requests are served by the new searcher.
> 6> the last request is finished by the old searcher and it's closed.
>
> So what's probably happening is that you send docs and then send a
> query and Solr is still in step <3>. You can look at your admin UI
> pluginst/stats page or your log to see how long it takes for a
> searcher to open and adjust your expectations accordingly.
>
> If you want to fetch only the document (not try to get it by a
> search), Real Time Get is designed to insure that you always get the
> most recent copy whether it's searchable or not.
>
> All that said, Solr wasn't designed for autocommits that are that
> frequent. That's why the documentation talks about _Near_ Real Time.
> You may need to adjust your expectations.
>
> Best,
> Erick
>
> On Sun, Dec 18, 2016 at 6:49 AM, Dorian Hoxha 
> wrote:
> > There's a very high probability that you're using the wrong tool for the
> > job if you need 1ms softCommit time. Especially when you always need it
> (ex
> > there are apps where you need commit-after-insert very rarely).
> >
> > So explain what you're using it for ?
> >
> > On Sun, Dec 18, 2016 at 3:38 PM, Lasitha Wattaladeniya <
> watt...@gmail.com>
> > wrote:
> >
> >> Hi Furkan,
> >>
> >> Thanks for the links. I had read the first one but not the second one. I
> >> did read it after you sent. So in my current solrconfig.xml settings
> below
> >> are the configurations,
> >>
> >> 
> >>${solr.autoSoftCommit.maxTime:1}
> >>  
> >>
> >>
> >> 
> >>15000
> >>false
> >>  
> >>
> >> The problem i'm facing is, just after adding the documents to solr using
> >> solrj, when I retrieve data from solr I am not getting the updated
> results.
> >> This happens time to time. Most of the time I get the correct data but
> in
> >> some occasions I get wrong results. so as you suggest, what the best
> >> practice to use here ? , should I wait 1 mili second before calling for
> >> updated results ?
> >>
> >> Regards,
> >> Lasitha
> >>
> >> Lasitha Wattaladeniya
> >> Software Engineer
> >>
> >> Mobile : +6593896893
> >> Blog : techreadme.blogspot.com
> >>
> >> On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACI 
> >> wrote:
> >>
> >> > Hi Lasitha,
> >> >
> >> > First of all, did you check these:
> >> >
> >> > https://cwiki.apache.org/confluence/display/solr/Near+
> >> Real+Time+Searching
> >> > https://lucidworks.com/blog/2013/08/23/understanding-
> >> > transaction-logs-softcommit-and-commit-in-sorlcloud/
> >> >
> >> > after that, if you cannot adjust your configuration you can give more
> >> > information and we can find a solution.
> >> >
> >> > Kind Regards,
> >> > Furkan KAMACI
> >> >
> >> > On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya <
> >> watt...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi furkan,
> >> >>
> >> >> Thanks for your reply, it is generally a query heavy system. We are
> >> using
> >> >> realtime indexing for editing the available data
> >> >>
> >> >> Regards,
> >> >> Lasitha
> >> >>
> >> >> Lasitha Wattaladeniya
> >> >> Software Engineer
> >> >>
> >> >> Mobile : +6593896893 <+65%209389%206893>
> >> >> Blog : techreadme.blogspot.com
> >> >>
> >> >> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI <
> furkankam...@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> Hi Lasitha,
> >> >>>
> >> >>> What is your indexing / querying requirements. Do you have an index
> >> >>> heavy/light  - query heavy/light system?
> >> >>>
> >> >>> Kind Regards,
> >> >>> Furkan KAMACI
> >> >>>
> >> >>> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya <
> >> >>> watt...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > Hello devs,

Re: Very long young generation stop the world GC pause

2016-12-18 Thread forest_soup
Thanks a lot, PushKar! And sorry for late response.
Our OS ram is 128GB. And we have 2 solr nodes on one machine. Each solr node
has max heap size 32GB.
And we do not have swap.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Very-long-young-generation-stop-the-world-GC-pause-tp4308911p4310291.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stemming with SOLR

2016-12-18 Thread Lasitha Wattaladeniya
Thank you all for the replies.  I am considering the suggestions

On 17 Dec 2016 01:50, "Susheel Kumar"  wrote:

> To handle irregular nouns (
> http://www.ef.com/english-resources/english-grammar/
> singular-and-plural-nouns/),
> the simplest way is handle them using StemOverriderFactory.  The list is
> not so long. Or otherwise go for commercial solutions like basistech etc.
> as Alex suggested  oR you can customize Hunspell extensively to handle most
> of them.
>
> Thanks,
> Susheel
>
> On Thu, Dec 15, 2016 at 9:46 PM, Alexandre Rafalovitch  >
> wrote:
>
> > If you need the full fidelity solution taking care of multiple
> > edge-cases, it could be worth looking at commercial solutions.
> >
> >
> > http://www.basistech.com/ has one, including a free-level SAAS plan.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 15 December 2016 at 21:28, Lasitha Wattaladeniya 
> > wrote:
> > > Hi all,
> > >
> > > Thanks for the replies,
> > >
> > > @eric, ahmet : since those stemmers are logical stemmers it won't work
> on
> > > words such as caught, ran and so on. So in our case it won't work
> > >
> > > @susheel : Yes I thought about it but problems we have is, the
> documents
> > we
> > > index are some what large text, so copy fielding these into duplicate
> > > fields will affect on the index time ( we have jobs to index data
> > > periodically) and query time. I wonder why there isn't a correct
> solution
> > > to this
> > >
> > > Regards,
> > > Lasitha
> > >
> > > Lasitha Wattaladeniya
> > > Software Engineer
> > >
> > > Mobile : +6593896893
> > > Blog : techreadme.blogspot.com
> > >
> > > On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar  >
> > > wrote:
> > >
> > >> We did extensive comparison in the past for Snowball, KStem and
> Hunspell
> > >> and there are cases where one of them works better but not other or
> > >> vice-versa. You may utilise all three of them by having 3 different
> > fields
> > >> (fieldTypes) and during query, search in all of them.
> > >>
> > >> For some of the cases where none of them works (e.g wolves, wolf
> etc).,
> > use
> > >> StemOverriderFactory.
> > >>
> > >> HTH.
> > >>
> > >> Thanks,
> > >> Susheel
> > >>
> > >> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan
> > 
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > KStemFilter returns legitimate English words, please use it.
> > >> >
> > >> > Ahmet
> > >> >
> > >> >
> > >> >
> > >> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
> > >> > watt...@gmail.com> wrote:
> > >> > Hello devs,
> > >> >
> > >> > I'm trying to develop this indexing and querying flow where it
> > converts
> > >> the
> > >> > words to its original form (lemmatization). I was doing bit of
> > research
> > >> > lately but the information on the internet is very limited. I tried
> > using
> > >> > hunspellfactory but it doesn't convert the word to it's original
> form,
> > >> > instead it gives suggestions for some words (hunspell works for some
> > >> > english words correctly but for some it gives multiple suggestions
> or
> > no
> > >> > suggestions, i used the en_us.dic provided by openoffice)
> > >> >
> > >> > I know this is a generic problem in searching, so is there anyone
> who
> > can
> > >> > point me to correct direction or some information :)
> > >> >
> > >> > Best regards,
> > >> > Lasitha Wattaladeniya
> > >> > Software Engineer
> > >> >
> > >> > Mobile : +6593896893
> > >> > Blog : techreadme.blogspot.com
> > >> >
> > >>
> >
>


Re: Solr on HDFS: Streaming API performance tuning

2016-12-18 Thread Joel Bernstein
Ok, based on the stack trace I suspect one of your sort fields has NULL
values, which in the 5x branch could produce null pointers if a segment had
no values for a sort field. This is also fixed in the Solr 6x branch.

Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, Dec 17, 2016 at 2:44 PM, Chetas Joshi 
wrote:

> Here is the stack trace.
>
> java.lang.NullPointerException
>
> at
> org.apache.solr.client.solrj.io.comp.FieldComparator$2.
> compare(FieldComparator.java:85)
>
> at
> org.apache.solr.client.solrj.io.comp.FieldComparator.
> compare(FieldComparator.java:92)
>
> at
> org.apache.solr.client.solrj.io.comp.FieldComparator.
> compare(FieldComparator.java:30)
>
> at
> org.apache.solr.client.solrj.io.comp.MultiComp.compare(MultiComp.java:45)
>
> at
> org.apache.solr.client.solrj.io.comp.MultiComp.compare(MultiComp.java:33)
>
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> TupleWrapper.compareTo(CloudSolrStream.java:396)
>
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> TupleWrapper.compareTo(CloudSolrStream.java:381)
>
> at java.util.TreeMap.put(TreeMap.java:560)
>
> at java.util.TreeSet.add(TreeSet.java:255)
>
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream._
> read(CloudSolrStream.java:366)
>
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> read(CloudSolrStream.java:353)
>
> at
>
> *.*.*.*.SolrStreamResultIterator$$anon$1.run(SolrStreamResultIterator.
> scala:101)
>
> at java.lang.Thread.run(Thread.java:745)
>
> 16/11/17 13:04:31 *ERROR* SolrStreamResultIterator:missing exponent
> number:
> char=A,position=106596
> BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA'
> AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj'
>
> org.noggit.JSONParser$ParseException: missing exponent number:
> char=A,position=106596
> BEFORE='p":1477189323},{"uuid":"//699/UzOPQx6thu","timestamp": 6EA'
> AFTER='E 1476861439},{"uuid":"//699/vG8k4Tj'
>
> at org.noggit.JSONParser.err(JSONParser.java:356)
>
> at org.noggit.JSONParser.readExp(JSONParser.java:513)
>
> at org.noggit.JSONParser.readNumber(JSONParser.java:419)
>
> at org.noggit.JSONParser.next(JSONParser.java:845)
>
> at org.noggit.JSONParser.nextEvent(JSONParser.java:951)
>
> at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:127)
>
> at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:57)
>
> at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37)
>
> at
> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> next(JSONTupleStream.java:84)
>
> at
> org.apache.solr.client.solrj.io.stream.SolrStream.read(
> SolrStream.java:147)
>
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.next(
> CloudSolrStream.java:413)
>
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream._
> read(CloudSolrStream.java:365)
>
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> read(CloudSolrStream.java:353)
>
>
> Thanks!
>
> On Fri, Dec 16, 2016 at 11:45 PM, Reth RM  wrote:
>
> > If you could provide the json parse exception stack trace, it might help
> to
> > predict issue there.
> >
> >
> > On Fri, Dec 16, 2016 at 5:52 PM, Chetas Joshi 
> > wrote:
> >
> > > Hi Joel,
> > >
> > > The only NON alpha-numeric characters I have in my data are '+' and
> '/'.
> > I
> > > don't have any backslashes.
> > >
> > > If the special characters was the issue, I should get the JSON parsing
> > > exceptions every time irrespective of the index size and irrespective
> of
> > > the available memory on the machine. That is not the case here. The
> > > streaming API successfully returns all the documents when the index
> size
> > is
> > > small and fits in the available memory. That's the reason I am
> confused.
> > >
> > > Thanks!
> > >
> > > On Fri, Dec 16, 2016 at 5:43 PM, Joel Bernstein 
> > > wrote:
> > >
> > > > The Streaming API may have been throwing exceptions because the JSON
> > > > special characters were not escaped. This was fixed in Solr 6.0.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi <
> chetas.jo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am running Solr 5.5.0.
> > > > > It is a solrCloud of 50 nodes and I have the following config for
> all
> > > the
> > > > > collections.
> > > > > maxShardsperNode: 1
> > > > > replicationFactor: 1
> > > > >
> > > > > I was using Streaming API to get back results from Solr. It worked
> > fine
> > > > for
> > > > > a while until the index data size reached beyond 40 GB per shard
> > (i.e.
> > > > per
> > > > > node). It started throwing JSON parsing exceptions while reading

Re: Confusing debug=timing parameter

2016-12-18 Thread S G
Thank you Furkan.

I am still a little confused.
So I will shorten the response and post only the relevant pieces for easier
understanding.

 "responseHeader": {
"status": 0,
"QTime": 2978
}
 "response": {
"numFound": 1565135270,
  },
  "debug": {
"timing": {
  "time": 19320,
  "prepare": {
"time": 4,
"query": {
  "time": 3
},
  "process": {
"time": 19315,
"query": {
  "time": 19309
}
   }
}

As I understand, QTime is the total time spent by the core.
"process", "prepare" etc. are all the parts that together make the part of
query processing.
And so their times should approximately add up to the QTime.
Numbers wise, I would have expected prepare-time + process-time <= QTime
Or 4 + 19315 <= 2978
This is obviously not correct.

Where am I making a mistake?
Any pointers would be greatly appreciated.

Thanks
SG




On Sun, Dec 18, 2016 at 4:40 AM, Furkan KAMACI 
wrote:

> Hi,
>
> Let me explain you *time* *parameters in Solr*:
>
> *Timing* parameter of debug returns information about how long the query
> took to process.
>
> *Query time* shows information of how long did it take in Solr to get the
> search
> results. It doesn't include reading bits from disk, etc.
>
> Also, there is another parameter named as *elapsed time*. It shows time
> frame of the query sent to Solr and response is returned. Includes query
> time, reading bits from disk, constructing the response and transmissioning
> it, etc.
>
> Kind Regards,
> Furkan KAMACI
>
> On Sat, Dec 17, 2016 at 6:43 PM, S G  wrote:
>
> > Hi,
> >
> > I am using Solr 4.10 and its response time for the clients is not very
> > good.
> > Even though the Solr's plugin/stats shows less than 200 milliseconds,
> > clients report several seconds in response time.
> >
> > So I tried using debug-timing parameter from the Solr UI and this is
> what I
> > got.
> > Note how the QTime is 2978 while the time in debug-timing is 19320.
> >
> > What does this mean?
> > How can Solr return a result in 3 seconds when time taken between two
> > points in the same path is 20 seconds ?
> >
> > {
> >   "responseHeader": {
> > "status": 0,
> > "QTime": 2978,
> > "params": {
> >   "q": "*:*",
> >   "debug": "timing",
> >   "indent": "true",
> >   "wt": "json",
> >   "_": "1481992653008"
> > }
> >   },
> >   "response": {
> > "numFound": 1565135270,
> > "start": 0,
> > "maxScore": 1,
> > "docs": [
> >   
> > ]
> >   },
> >   "debug": {
> > "timing": {
> >   "time": 19320,
> >   "prepare": {
> > "time": 4,
> > "query": {
> >   "time": 3
> > },
> > "facet": {
> >   "time": 0
> > },
> > "mlt": {
> >   "time": 0
> > },
> > "highlight": {
> >   "time": 0
> > },
> > "stats": {
> >   "time": 0
> > },
> > "expand": {
> >   "time": 0
> > },
> > "debug": {
> >   "time": 0
> > }
> >   },
> >   "process": {
> > "time": 19315,
> > "query": {
> >   "time": 19309
> > },
> > "facet": {
> >   "time": 0
> > },
> > "mlt": {
> >   "time": 1
> > },
> > "highlight": {
> >   "time": 0
> > },
> > "stats": {
> >   "time": 0
> > },
> > "expand": {
> >   "time": 0
> > },
> > "debug": {
> >   "time": 5
> > }
> >   }
> > }
> >   }
> > }
> >
>


Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-18 Thread GW
Wow, thanks.

So assuming I have a five node ensemble and one machine is rolling along as
leader, am I correct to assume that as a leader becomes taxed it can lose
the election and another takes over as leader? The leader actually floats
about the ensemble under load? I was thinking the leader was merely for
referential integrity and things stayed that way until a physical failure.

This would all seem important when building indexes.

I think I need to set up a sniffer.

Identifying the node with a hash id seems very cool. If my app makes the
call to the server with the appropriate shard, then there might only be
messaging on the Zookeeper network. Is this a correct assumption?

Is my terminology cross threaded?

Oh well, time to build my first cluster. I wrote all my clients with single
shard collections on a stand alone. Now I need to make sure my app is not a
cluster buster.

I feel like I am on the right path.

Thanks and Best,

GW

















On 18 December 2016 at 09:53, Dorian Hoxha  wrote:

> On Sun, Dec 18, 2016 at 3:48 PM, GW  wrote:
>
> > Yeah,
> >
> >
> > I'll look at the proxy you suggested shortly.
> >
> > I've discovered that the idea of making a zookeeper aware app is
> pointless
> > when scripting REST calls right after I installed libzookeeper.
> >
> > Zookeeper is there to provide the zookeeping for Solr: End of story. Me
> > thinks
> >
> > I believe what really has to happen is: connect to the admin API to get
> > status
> >
> > /solr/admin/collections?action=CLUSTERSTATUS
> >
> > I think it is more sensible to make a cluster aware app.
> >
> > 1 > name="shards"> > name="range">8000-7fffactive > name="replicas"> > name="core">FrogMerchants_shard1_replica1
> > http://10.128.0.2:8983/solr > name="node_name">10.128.0.2:8983_solr > name="state">active > name="leader">true
> >
> > I can get an array of nodes that have a state of active. So if I have 7
> > nodes that are state = active, I will have those in an array. Then I can
> > use rand() funtion with an array count to select a node/url to post a
> json
> > string. It would eliminate the need for a load balancer. I think.
> >
> If you send to random(node), there is high chance(increasing with number of
> nodes/shards) that node won't have the leader, so that node will also
> redirect it to the leader. What you can do, is compute the hash of the 'id'
> field locally. with hash-id you will get shard-id (because each shard has
> the hash-range), and with shard, you will find the leader, and you will
> find on which node the leader is (cluster-status) and send the request
> directly to the leader and be certain that it won't be redirected again
> (less network hops).
>
>
> > //pseudo code
> >
> > $array_count = $count($active_nodes)
> >
> > $url_target = rand(0, $array_count);
> >
> > // creat a function to pull the url   somthing like
> >
> >
> > $url = get_solr_url($url_target);
> >
> > I have test sever on my bench. I'll spin up a 5 node cluster today, get
> my
> > app cluster aware and then get into some Solr indexes with Vi and totally
> > screw with some shards.
> >
> > If I am correct I will post again.
> >
> > Best,
> >
> > GW
> >
> > On 15 December 2016 at 12:34, Shawn Heisey  wrote:
> >
> > > On 12/14/2016 7:36 AM, GW wrote:
> > > > I understand accessing solr directly. I'm doing REST calls to a
> single
> > > > machine.
> > > >
> > > > If I have a cluster of five servers and say three Apache servers, I
> can
> > > > round robin the REST calls to all five in the cluster?
> > > >
> > > > I guess I'm going to find out. :-)  If so I might be better off just
> > > > running Apache on all my solr instances.
> > >
> > > If you're running SolrCloud (which uses zookeeper) then sending
> multiple
> > > query requests to any node will load balance the requests across all
> > > replicas for the collection.  This is an inherent feature of SolrCloud.
> > > Indexing requests will be forwarded to the correct place.
> > >
> > > The node you're sending to is a potential single point of failure,
> which
> > > you can eliminate by putting a load balancer in front of Solr that
> > > connects to at least two of the nodes.  As I just mentioned, SolrCloud
> > > will do further load balancing to all nodes which are capable of
> serving
> > > the requests.
> > >
> > > I use haproxy for a load balancer in front of Solr.  I'm not running in
> > > Cloud mode, but a load balancer would also work for Cloud, and is
> > > required for high availability when your client only connects to one
> > > server and isn't cloud aware.
> > >
> > > http://www.haproxy.org/
> > >
> > > Solr includes a cloud-aware Java client that talks to zookeeper and
> > > always knows the state of the cloud.  This eliminates the requirement
> > > for a load balancer, but using that client would require that you write
> > > your website in Java.
> > >
> > > The PHP clients are third-party software, and 

Re: Separating Search and Indexing in SolrCloud

2016-12-18 Thread Иван Иванов
Stop

16 дек. 2016 г. 3:31 PM пользователь "Jaroslaw Rozanski" <
m...@jarekrozanski.com> написал:

> Hi all,
>
> According to documentation, in normal operation (not recovery) in Solr
> Cloud configuration the leader sends updates it receives to all the
> replicas.
>
> This means and all nodes in the shard perform same effort to index
> single document. Correct?
>
> Is there then a benefit to *not* to send search requests to leader, but
> only to replicas?
>
> Given index & search heavy Solr Cloud system, is it possible to separate
> search from indexing nodes?
>
>
> RE: Solr 5.5.0
>
> --
> Jaroslaw Rozanski | e: m...@jarekrozanski.com
> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>
>


Re: Separating Search and Indexing in SolrCloud

2016-12-18 Thread Erick Erickson
Analyzed documents. The transaction log stores the raw input.

On Sun, Dec 18, 2016 at 5:32 AM, Jaroslaw Rozanski  
wrote:
> Hi Erick,
>
>
> Not talking about separation any more. I merely summarized message from
> Pushkar. As I said it was clear that it was not possible.
>
>
> About the RAMBufferSizeMB, getting back to my original question, is this
> buffer for storing update requests or ready to index, analyzed documents?
>
> Documentation suggests former, your first mention however suggests the
> later.
>
>
> Thanks,
> Jaroslaw
>
>
> On 18/12/16 02:16, Erick Erickson wrote:
>> Yes indexing is adding stress. No you can't separate
>> the two in SolrCloud. End of story, why beat it to death?
>> You'll have to figure out the sharding strategy that
>> meets your indexing and querying needs and live
>> within that framework. I'd advise setting up a small
>> cluster and driving it to its tipping point and extrapolating
>> from there. Here's the long version of "the sizing exercise".
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> My point that while indexing to Solr/Lucene there is
>> additional pressure. That pressure has a fixed upper
>> limit that doesn't grow with the number of docs. That's not
>> true for searching, as you add more docs per node, the
>> pressure (especially memory) increases. Concentrate
>> your efforts there IMO.
>>
>> Best
>> Erick
>>
>>
>>
>> On Sat, Dec 17, 2016 at 12:54 PM, Jaroslaw Rozanski
>>  wrote:
>>> Hi Erick,
>>>
>>> So what does this buffer represent? What does it actually store? Raw
>>> update request or analyzed document?
>>>
>>> The documentation suggest that it stores actual update requests.
>>>
>>> Obviously analyzed document can and will occupy much more space than raw
>>> one. Also analysis with create a lot of new allocations and subsequent
>>> GC work.
>>>
>>> Yes, you are probably right that search puts more stress and is main
>>> memory user but combination of:
>>> - non-trivial analysis,
>>> - high volume of updates and
>>> - search on the same node
>>>
>>> seems adding fuel to the fire.
>>>
>>> From previous response by Pushkar, it is clear that separation is not
>>> achievable with existing SolrCloud mechanism.
>>>
>>> Thanks
>>>
>>>
>>> On 17/12/16 20:24, Erick Erickson wrote:
 bq: I am more concerned with indexing memory requirements at volume

 By and large this isn't much of a problem. RAMBufferSizeMB in
 solrconfig.xml governs how much memory is consumed in Solr for
 indexing. When that limit is exceeded, the buffer is flushed to disk.
 I've rarely heard of indexing being a memory issue. Anecdotally I
 haven't seen throughput benefit with buffer sizes over 128M.

 You're correct in that master/slave style replication would use less
 memory on the slave, although there are other costs. I.e. rather than
 the data for document X being sent to the replicas once as in
 SolrCloud, that data is re-sent to the slave every time it's merged
 into a new segment.

 That said, memory issues are _far_ more prevalent on the search side
 of things so unless this is a proven issue in your environment I would
 fight other fires.

 Best,
 Erick

 On Fri, Dec 16, 2016 at 1:06 PM, Jaroslaw Rozanski 
  wrote:
> Thanks, that issue looks interesting!
>
> On 16/12/16 16:38, Pushkar Raste wrote:
>> This kind of separation is not supported yet.  There however some work
>> going on,  you can read about it on
>> https://issues.apache.org/jira/browse/SOLR-9835
>>
>> This unfortunately would not support soft commits and hence would not be 
>> a
>> good solution for near real time indexing.
>>
>> On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski"  
>> wrote:
>>
>>> Sorry, not what I meant.
>>>
>>> Leader is responsible for distributing update requests to replica. So
>>> eventually all replicas have same state as leader. Not a problem.
>>>
>>> It is more about the performance of such. If I gather correctly normal
>>> replication happens by standard update request. Not by, say, segment 
>>> copy.
>>>
>>> Which means update on leader is as "expensive" as on replica.
>>>
>>> Hence, if my understanding is correct, sending search request to replica
>>> only, in index heavy environment, would bring no benefit.
>>>
>>> So the question is: is there a mechanism, in SolrCloud (not legacy
>>> master/slave set-up) to make one node take a load of indexing which
>>> other nodes focus on searching.
>>>
>>> This is not a question of SolrClient cause that is clear how to direct
>>> search request to specific nodes. This is more about index optimization
>>> so that certain nodes (ie. replicas) could suffer less due 

Re: Soft commit and reading data just after the commit

2016-12-18 Thread Erick Erickson
1 ms autocommit is far too frequent. And it's not
helping you anyway.

There is some lag between when a commit happens
and when the docs are really available. The sequence is:
1> commit (soft or hard-with-opensearcher=true doesn't matter).
2> a new searcher is opened and autowarming starts
3> until the new searcher is opened, queries continue to be served by
the old searcher
4> the new searcher is fully opened
5> _new_ requests are served by the new searcher.
6> the last request is finished by the old searcher and it's closed.

So what's probably happening is that you send docs and then send a
query and Solr is still in step <3>. You can look at your admin UI
pluginst/stats page or your log to see how long it takes for a
searcher to open and adjust your expectations accordingly.

If you want to fetch only the document (not try to get it by a
search), Real Time Get is designed to insure that you always get the
most recent copy whether it's searchable or not.

All that said, Solr wasn't designed for autocommits that are that
frequent. That's why the documentation talks about _Near_ Real Time.
You may need to adjust your expectations.

Best,
Erick

On Sun, Dec 18, 2016 at 6:49 AM, Dorian Hoxha  wrote:
> There's a very high probability that you're using the wrong tool for the
> job if you need 1ms softCommit time. Especially when you always need it (ex
> there are apps where you need commit-after-insert very rarely).
>
> So explain what you're using it for ?
>
> On Sun, Dec 18, 2016 at 3:38 PM, Lasitha Wattaladeniya 
> wrote:
>
>> Hi Furkan,
>>
>> Thanks for the links. I had read the first one but not the second one. I
>> did read it after you sent. So in my current solrconfig.xml settings below
>> are the configurations,
>>
>> 
>>${solr.autoSoftCommit.maxTime:1}
>>  
>>
>>
>> 
>>15000
>>false
>>  
>>
>> The problem i'm facing is, just after adding the documents to solr using
>> solrj, when I retrieve data from solr I am not getting the updated results.
>> This happens time to time. Most of the time I get the correct data but in
>> some occasions I get wrong results. so as you suggest, what the best
>> practice to use here ? , should I wait 1 mili second before calling for
>> updated results ?
>>
>> Regards,
>> Lasitha
>>
>> Lasitha Wattaladeniya
>> Software Engineer
>>
>> Mobile : +6593896893
>> Blog : techreadme.blogspot.com
>>
>> On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACI 
>> wrote:
>>
>> > Hi Lasitha,
>> >
>> > First of all, did you check these:
>> >
>> > https://cwiki.apache.org/confluence/display/solr/Near+
>> Real+Time+Searching
>> > https://lucidworks.com/blog/2013/08/23/understanding-
>> > transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >
>> > after that, if you cannot adjust your configuration you can give more
>> > information and we can find a solution.
>> >
>> > Kind Regards,
>> > Furkan KAMACI
>> >
>> > On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya <
>> watt...@gmail.com>
>> > wrote:
>> >
>> >> Hi furkan,
>> >>
>> >> Thanks for your reply, it is generally a query heavy system. We are
>> using
>> >> realtime indexing for editing the available data
>> >>
>> >> Regards,
>> >> Lasitha
>> >>
>> >> Lasitha Wattaladeniya
>> >> Software Engineer
>> >>
>> >> Mobile : +6593896893 <+65%209389%206893>
>> >> Blog : techreadme.blogspot.com
>> >>
>> >> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI 
>> >> wrote:
>> >>
>> >>> Hi Lasitha,
>> >>>
>> >>> What is your indexing / querying requirements. Do you have an index
>> >>> heavy/light  - query heavy/light system?
>> >>>
>> >>> Kind Regards,
>> >>> Furkan KAMACI
>> >>>
>> >>> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya <
>> >>> watt...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Hello devs,
>> >>> >
>> >>> > I'm here with another problem i'm facing. I'm trying to do a commit
>> >>> (soft
>> >>> > commit) through solrj and just after the commit, retrieve the data
>> from
>> >>> > solr (requirement is to get updated data list).
>> >>> >
>> >>> > I'm using soft commit instead of the hard commit, is previously I got
>> >>> an
>> >>> > error "Exceeded limit of maxWarmingSearchers=2, try again later"
>> >>> because of
>> >>> > too many commit requests. Now I have removed the explicit commit and
>> >>> has
>> >>> > let the solr to do the commit using autoSoftCommit *(1 mili second)*
>> >>> and
>> >>> > autoCommit *(30 seconds)* configurations. Now I'm not getting any
>> >>> errors
>> >>> > when i'm committing frequently.
>> >>> >
>> >>> > The problem i'm facing now is, I'm not getting the updated data when
>> I
>> >>> > fetch from solr just after the soft commit. So in this case what are
>> >>> the
>> >>> > best practices to use ? to wait 1 mili second before retrieving data
>> >>> after
>> >>> > soft commit ? I don't feel like waiting from client side is a good
>> >>> option.
>> >>> > Please give me some help from your expert knowledge

Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-18 Thread Dorian Hoxha
On Sun, Dec 18, 2016 at 3:48 PM, GW  wrote:

> Yeah,
>
>
> I'll look at the proxy you suggested shortly.
>
> I've discovered that the idea of making a zookeeper aware app is pointless
> when scripting REST calls right after I installed libzookeeper.
>
> Zookeeper is there to provide the zookeeping for Solr: End of story. Me
> thinks
>
> I believe what really has to happen is: connect to the admin API to get
> status
>
> /solr/admin/collections?action=CLUSTERSTATUS
>
> I think it is more sensible to make a cluster aware app.
>
> 1 name="shards"> name="range">8000-7fffactive name="replicas"> name="core">FrogMerchants_shard1_replica1
> http://10.128.0.2:8983/solr name="node_name">10.128.0.2:8983_solr name="state">active name="leader">true
>
> I can get an array of nodes that have a state of active. So if I have 7
> nodes that are state = active, I will have those in an array. Then I can
> use rand() funtion with an array count to select a node/url to post a json
> string. It would eliminate the need for a load balancer. I think.
>
If you send to random(node), there is high chance(increasing with number of
nodes/shards) that node won't have the leader, so that node will also
redirect it to the leader. What you can do, is compute the hash of the 'id'
field locally. with hash-id you will get shard-id (because each shard has
the hash-range), and with shard, you will find the leader, and you will
find on which node the leader is (cluster-status) and send the request
directly to the leader and be certain that it won't be redirected again
(less network hops).


> //pseudo code
>
> $array_count = $count($active_nodes)
>
> $url_target = rand(0, $array_count);
>
> // creat a function to pull the url   somthing like
>
>
> $url = get_solr_url($url_target);
>
> I have test sever on my bench. I'll spin up a 5 node cluster today, get my
> app cluster aware and then get into some Solr indexes with Vi and totally
> screw with some shards.
>
> If I am correct I will post again.
>
> Best,
>
> GW
>
> On 15 December 2016 at 12:34, Shawn Heisey  wrote:
>
> > On 12/14/2016 7:36 AM, GW wrote:
> > > I understand accessing solr directly. I'm doing REST calls to a single
> > > machine.
> > >
> > > If I have a cluster of five servers and say three Apache servers, I can
> > > round robin the REST calls to all five in the cluster?
> > >
> > > I guess I'm going to find out. :-)  If so I might be better off just
> > > running Apache on all my solr instances.
> >
> > If you're running SolrCloud (which uses zookeeper) then sending multiple
> > query requests to any node will load balance the requests across all
> > replicas for the collection.  This is an inherent feature of SolrCloud.
> > Indexing requests will be forwarded to the correct place.
> >
> > The node you're sending to is a potential single point of failure, which
> > you can eliminate by putting a load balancer in front of Solr that
> > connects to at least two of the nodes.  As I just mentioned, SolrCloud
> > will do further load balancing to all nodes which are capable of serving
> > the requests.
> >
> > I use haproxy for a load balancer in front of Solr.  I'm not running in
> > Cloud mode, but a load balancer would also work for Cloud, and is
> > required for high availability when your client only connects to one
> > server and isn't cloud aware.
> >
> > http://www.haproxy.org/
> >
> > Solr includes a cloud-aware Java client that talks to zookeeper and
> > always knows the state of the cloud.  This eliminates the requirement
> > for a load balancer, but using that client would require that you write
> > your website in Java.
> >
> > The PHP clients are third-party software, and as far as I know, are not
> > cloud-aware.
> >
> > https://wiki.apache.org/solr/IntegratingSolr#PHP
> >
> > Some advantages of using a Solr client over creating HTTP requests
> > yourself:  The code is easier to write, and to read.  You generally do
> > not need to worry about making sure that your requests are properly
> > escaped for URLs, XML, JSON, etc.  The response to the requests is
> > usually translated into data structures appropriate to the language --
> > your program probably doesn't need to know how to parse XML or JSON.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Soft commit and reading data just after the commit

2016-12-18 Thread Dorian Hoxha
There's a very high probability that you're using the wrong tool for the
job if you need 1ms softCommit time. Especially when you always need it (ex
there are apps where you need commit-after-insert very rarely).

So explain what you're using it for ?

On Sun, Dec 18, 2016 at 3:38 PM, Lasitha Wattaladeniya 
wrote:

> Hi Furkan,
>
> Thanks for the links. I had read the first one but not the second one. I
> did read it after you sent. So in my current solrconfig.xml settings below
> are the configurations,
>
> 
>${solr.autoSoftCommit.maxTime:1}
>  
>
>
> 
>15000
>false
>  
>
> The problem i'm facing is, just after adding the documents to solr using
> solrj, when I retrieve data from solr I am not getting the updated results.
> This happens time to time. Most of the time I get the correct data but in
> some occasions I get wrong results. so as you suggest, what the best
> practice to use here ? , should I wait 1 mili second before calling for
> updated results ?
>
> Regards,
> Lasitha
>
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893
> Blog : techreadme.blogspot.com
>
> On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACI 
> wrote:
>
> > Hi Lasitha,
> >
> > First of all, did you check these:
> >
> > https://cwiki.apache.org/confluence/display/solr/Near+
> Real+Time+Searching
> > https://lucidworks.com/blog/2013/08/23/understanding-
> > transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> > after that, if you cannot adjust your configuration you can give more
> > information and we can find a solution.
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya <
> watt...@gmail.com>
> > wrote:
> >
> >> Hi furkan,
> >>
> >> Thanks for your reply, it is generally a query heavy system. We are
> using
> >> realtime indexing for editing the available data
> >>
> >> Regards,
> >> Lasitha
> >>
> >> Lasitha Wattaladeniya
> >> Software Engineer
> >>
> >> Mobile : +6593896893 <+65%209389%206893>
> >> Blog : techreadme.blogspot.com
> >>
> >> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI 
> >> wrote:
> >>
> >>> Hi Lasitha,
> >>>
> >>> What is your indexing / querying requirements. Do you have an index
> >>> heavy/light  - query heavy/light system?
> >>>
> >>> Kind Regards,
> >>> Furkan KAMACI
> >>>
> >>> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya <
> >>> watt...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hello devs,
> >>> >
> >>> > I'm here with another problem i'm facing. I'm trying to do a commit
> >>> (soft
> >>> > commit) through solrj and just after the commit, retrieve the data
> from
> >>> > solr (requirement is to get updated data list).
> >>> >
> >>> > I'm using soft commit instead of the hard commit, is previously I got
> >>> an
> >>> > error "Exceeded limit of maxWarmingSearchers=2, try again later"
> >>> because of
> >>> > too many commit requests. Now I have removed the explicit commit and
> >>> has
> >>> > let the solr to do the commit using autoSoftCommit *(1 mili second)*
> >>> and
> >>> > autoCommit *(30 seconds)* configurations. Now I'm not getting any
> >>> errors
> >>> > when i'm committing frequently.
> >>> >
> >>> > The problem i'm facing now is, I'm not getting the updated data when
> I
> >>> > fetch from solr just after the soft commit. So in this case what are
> >>> the
> >>> > best practices to use ? to wait 1 mili second before retrieving data
> >>> after
> >>> > soft commit ? I don't feel like waiting from client side is a good
> >>> option.
> >>> > Please give me some help from your expert knowledge
> >>> >
> >>> > Best regards,
> >>> > Lasitha Wattaladeniya
> >>> > Software Engineer
> >>> >
> >>> > Mobile : +6593896893
> >>> > Blog : techreadme.blogspot.com
> >>> >
> >>>
> >>
> >>
> >
>


Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-18 Thread GW
Yeah,


I'll look at the proxy you suggested shortly.

I've discovered that the idea of making a zookeeper aware app is pointless
when scripting REST calls right after I installed libzookeeper.

Zookeeper is there to provide the zookeeping for Solr: End of story. Me
thinks

I believe what really has to happen is: connect to the admin API to get
status

/solr/admin/collections?action=CLUSTERSTATUS

I think it is more sensible to make a cluster aware app.

18000-7fffactiveFrogMerchants_shard1_replica1
http://10.128.0.2:8983/solr10.128.0.2:8983_solractivetrue

I can get an array of nodes that have a state of active. So if I have 7
nodes that are state = active, I will have those in an array. Then I can
use rand() funtion with an array count to select a node/url to post a json
string. It would eliminate the need for a load balancer. I think.

//pseudo code

$array_count = $count($active_nodes)

$url_target = rand(0, $array_count);

// creat a function to pull the url   somthing like


$url = get_solr_url($url_target);

I have test sever on my bench. I'll spin up a 5 node cluster today, get my
app cluster aware and then get into some Solr indexes with Vi and totally
screw with some shards.

If I am correct I will post again.

Best,

GW

On 15 December 2016 at 12:34, Shawn Heisey  wrote:

> On 12/14/2016 7:36 AM, GW wrote:
> > I understand accessing solr directly. I'm doing REST calls to a single
> > machine.
> >
> > If I have a cluster of five servers and say three Apache servers, I can
> > round robin the REST calls to all five in the cluster?
> >
> > I guess I'm going to find out. :-)  If so I might be better off just
> > running Apache on all my solr instances.
>
> If you're running SolrCloud (which uses zookeeper) then sending multiple
> query requests to any node will load balance the requests across all
> replicas for the collection.  This is an inherent feature of SolrCloud.
> Indexing requests will be forwarded to the correct place.
>
> The node you're sending to is a potential single point of failure, which
> you can eliminate by putting a load balancer in front of Solr that
> connects to at least two of the nodes.  As I just mentioned, SolrCloud
> will do further load balancing to all nodes which are capable of serving
> the requests.
>
> I use haproxy for a load balancer in front of Solr.  I'm not running in
> Cloud mode, but a load balancer would also work for Cloud, and is
> required for high availability when your client only connects to one
> server and isn't cloud aware.
>
> http://www.haproxy.org/
>
> Solr includes a cloud-aware Java client that talks to zookeeper and
> always knows the state of the cloud.  This eliminates the requirement
> for a load balancer, but using that client would require that you write
> your website in Java.
>
> The PHP clients are third-party software, and as far as I know, are not
> cloud-aware.
>
> https://wiki.apache.org/solr/IntegratingSolr#PHP
>
> Some advantages of using a Solr client over creating HTTP requests
> yourself:  The code is easier to write, and to read.  You generally do
> not need to worry about making sure that your requests are properly
> escaped for URLs, XML, JSON, etc.  The response to the requests is
> usually translated into data structures appropriate to the language --
> your program probably doesn't need to know how to parse XML or JSON.
>
> Thanks,
> Shawn
>
>


Re: Soft commit and reading data just after the commit

2016-12-18 Thread Lasitha Wattaladeniya
Hi Furkan,

Thanks for the links. I had read the first one but not the second one. I
did read it after you sent. So in my current solrconfig.xml settings below
are the configurations,


   ${solr.autoSoftCommit.maxTime:1}
 



   15000
   false
 

The problem i'm facing is, just after adding the documents to solr using
solrj, when I retrieve data from solr I am not getting the updated results.
This happens time to time. Most of the time I get the correct data but in
some occasions I get wrong results. so as you suggest, what the best
practice to use here ? , should I wait 1 mili second before calling for
updated results ?

Regards,
Lasitha

Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com

On Sun, Dec 18, 2016 at 8:46 PM, Furkan KAMACI 
wrote:

> Hi Lasitha,
>
> First of all, did you check these:
>
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> https://lucidworks.com/blog/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> after that, if you cannot adjust your configuration you can give more
> information and we can find a solution.
>
> Kind Regards,
> Furkan KAMACI
>
> On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya 
> wrote:
>
>> Hi furkan,
>>
>> Thanks for your reply, it is generally a query heavy system. We are using
>> realtime indexing for editing the available data
>>
>> Regards,
>> Lasitha
>>
>> Lasitha Wattaladeniya
>> Software Engineer
>>
>> Mobile : +6593896893 <+65%209389%206893>
>> Blog : techreadme.blogspot.com
>>
>> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI 
>> wrote:
>>
>>> Hi Lasitha,
>>>
>>> What is your indexing / querying requirements. Do you have an index
>>> heavy/light  - query heavy/light system?
>>>
>>> Kind Regards,
>>> Furkan KAMACI
>>>
>>> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya <
>>> watt...@gmail.com>
>>> wrote:
>>>
>>> > Hello devs,
>>> >
>>> > I'm here with another problem i'm facing. I'm trying to do a commit
>>> (soft
>>> > commit) through solrj and just after the commit, retrieve the data from
>>> > solr (requirement is to get updated data list).
>>> >
>>> > I'm using soft commit instead of the hard commit, is previously I got
>>> an
>>> > error "Exceeded limit of maxWarmingSearchers=2, try again later"
>>> because of
>>> > too many commit requests. Now I have removed the explicit commit and
>>> has
>>> > let the solr to do the commit using autoSoftCommit *(1 mili second)*
>>> and
>>> > autoCommit *(30 seconds)* configurations. Now I'm not getting any
>>> errors
>>> > when i'm committing frequently.
>>> >
>>> > The problem i'm facing now is, I'm not getting the updated data when I
>>> > fetch from solr just after the soft commit. So in this case what are
>>> the
>>> > best practices to use ? to wait 1 mili second before retrieving data
>>> after
>>> > soft commit ? I don't feel like waiting from client side is a good
>>> option.
>>> > Please give me some help from your expert knowledge
>>> >
>>> > Best regards,
>>> > Lasitha Wattaladeniya
>>> > Software Engineer
>>> >
>>> > Mobile : +6593896893
>>> > Blog : techreadme.blogspot.com
>>> >
>>>
>>
>>
>


Re: Separating Search and Indexing in SolrCloud

2016-12-18 Thread Jaroslaw Rozanski
Hi Erick,


Not talking about separation any more. I merely summarized message from
Pushkar. As I said it was clear that it was not possible.


About the RAMBufferSizeMB, getting back to my original question, is this
buffer for storing update requests or ready to index, analyzed documents?

Documentation suggests former, your first mention however suggests the
later.


Thanks,
Jaroslaw


On 18/12/16 02:16, Erick Erickson wrote:
> Yes indexing is adding stress. No you can't separate
> the two in SolrCloud. End of story, why beat it to death?
> You'll have to figure out the sharding strategy that
> meets your indexing and querying needs and live
> within that framework. I'd advise setting up a small
> cluster and driving it to its tipping point and extrapolating
> from there. Here's the long version of "the sizing exercise".
> 
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> 
> My point that while indexing to Solr/Lucene there is
> additional pressure. That pressure has a fixed upper
> limit that doesn't grow with the number of docs. That's not
> true for searching, as you add more docs per node, the
> pressure (especially memory) increases. Concentrate
> your efforts there IMO.
> 
> Best
> Erick
> 
> 
> 
> On Sat, Dec 17, 2016 at 12:54 PM, Jaroslaw Rozanski
>  wrote:
>> Hi Erick,
>>
>> So what does this buffer represent? What does it actually store? Raw
>> update request or analyzed document?
>>
>> The documentation suggest that it stores actual update requests.
>>
>> Obviously analyzed document can and will occupy much more space than raw
>> one. Also analysis with create a lot of new allocations and subsequent
>> GC work.
>>
>> Yes, you are probably right that search puts more stress and is main
>> memory user but combination of:
>> - non-trivial analysis,
>> - high volume of updates and
>> - search on the same node
>>
>> seems adding fuel to the fire.
>>
>> From previous response by Pushkar, it is clear that separation is not
>> achievable with existing SolrCloud mechanism.
>>
>> Thanks
>>
>>
>> On 17/12/16 20:24, Erick Erickson wrote:
>>> bq: I am more concerned with indexing memory requirements at volume
>>>
>>> By and large this isn't much of a problem. RAMBufferSizeMB in
>>> solrconfig.xml governs how much memory is consumed in Solr for
>>> indexing. When that limit is exceeded, the buffer is flushed to disk.
>>> I've rarely heard of indexing being a memory issue. Anecdotally I
>>> haven't seen throughput benefit with buffer sizes over 128M.
>>>
>>> You're correct in that master/slave style replication would use less
>>> memory on the slave, although there are other costs. I.e. rather than
>>> the data for document X being sent to the replicas once as in
>>> SolrCloud, that data is re-sent to the slave every time it's merged
>>> into a new segment.
>>>
>>> That said, memory issues are _far_ more prevalent on the search side
>>> of things so unless this is a proven issue in your environment I would
>>> fight other fires.
>>>
>>> Best,
>>> Erick
>>>
>>> On Fri, Dec 16, 2016 at 1:06 PM, Jaroslaw Rozanski  
>>> wrote:
 Thanks, that issue looks interesting!

 On 16/12/16 16:38, Pushkar Raste wrote:
> This kind of separation is not supported yet.  There however some work
> going on,  you can read about it on
> https://issues.apache.org/jira/browse/SOLR-9835
>
> This unfortunately would not support soft commits and hence would not be a
> good solution for near real time indexing.
>
> On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski"  
> wrote:
>
>> Sorry, not what I meant.
>>
>> Leader is responsible for distributing update requests to replica. So
>> eventually all replicas have same state as leader. Not a problem.
>>
>> It is more about the performance of such. If I gather correctly normal
>> replication happens by standard update request. Not by, say, segment 
>> copy.
>>
>> Which means update on leader is as "expensive" as on replica.
>>
>> Hence, if my understanding is correct, sending search request to replica
>> only, in index heavy environment, would bring no benefit.
>>
>> So the question is: is there a mechanism, in SolrCloud (not legacy
>> master/slave set-up) to make one node take a load of indexing which
>> other nodes focus on searching.
>>
>> This is not a question of SolrClient cause that is clear how to direct
>> search request to specific nodes. This is more about index optimization
>> so that certain nodes (ie. replicas) could suffer less due to high
>> volume indexing while serving search requests.
>>
>>
>>
>>
>> On 16/12/16 12:35, Dorian Hoxha wrote:
>>> The leader is the source of truth. You expect to make the replica the
>>> source of truth or something???Doesn't make sense?
>>> 

Re: Soft commit and reading data just after the commit

2016-12-18 Thread Furkan KAMACI
Hi Lasitha,

First of all, did you check these:

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

after that, if you cannot adjust your configuration you can give more
information and we can find a solution.

Kind Regards,
Furkan KAMACI

On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya 
wrote:

> Hi furkan,
>
> Thanks for your reply, it is generally a query heavy system. We are using
> realtime indexing for editing the available data
>
> Regards,
> Lasitha
>
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893 <+65%209389%206893>
> Blog : techreadme.blogspot.com
>
> On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI 
> wrote:
>
>> Hi Lasitha,
>>
>> What is your indexing / querying requirements. Do you have an index
>> heavy/light  - query heavy/light system?
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya <
>> watt...@gmail.com>
>> wrote:
>>
>> > Hello devs,
>> >
>> > I'm here with another problem i'm facing. I'm trying to do a commit
>> (soft
>> > commit) through solrj and just after the commit, retrieve the data from
>> > solr (requirement is to get updated data list).
>> >
>> > I'm using soft commit instead of the hard commit, is previously I got an
>> > error "Exceeded limit of maxWarmingSearchers=2, try again later"
>> because of
>> > too many commit requests. Now I have removed the explicit commit and has
>> > let the solr to do the commit using autoSoftCommit *(1 mili second)* and
>> > autoCommit *(30 seconds)* configurations. Now I'm not getting any errors
>> > when i'm committing frequently.
>> >
>> > The problem i'm facing now is, I'm not getting the updated data when I
>> > fetch from solr just after the soft commit. So in this case what are the
>> > best practices to use ? to wait 1 mili second before retrieving data
>> after
>> > soft commit ? I don't feel like waiting from client side is a good
>> option.
>> > Please give me some help from your expert knowledge
>> >
>> > Best regards,
>> > Lasitha Wattaladeniya
>> > Software Engineer
>> >
>> > Mobile : +6593896893
>> > Blog : techreadme.blogspot.com
>> >
>>
>
>


Re: Confusing debug=timing parameter

2016-12-18 Thread Furkan KAMACI
Hi,

Let me explain you *time* *parameters in Solr*:

*Timing* parameter of debug returns information about how long the query
took to process.

*Query time* shows information of how long did it take in Solr to get the
search
results. It doesn't include reading bits from disk, etc.

Also, there is another parameter named as *elapsed time*. It shows time
frame of the query sent to Solr and response is returned. Includes query
time, reading bits from disk, constructing the response and transmissioning
it, etc.

Kind Regards,
Furkan KAMACI

On Sat, Dec 17, 2016 at 6:43 PM, S G  wrote:

> Hi,
>
> I am using Solr 4.10 and its response time for the clients is not very
> good.
> Even though the Solr's plugin/stats shows less than 200 milliseconds,
> clients report several seconds in response time.
>
> So I tried using debug-timing parameter from the Solr UI and this is what I
> got.
> Note how the QTime is 2978 while the time in debug-timing is 19320.
>
> What does this mean?
> How can Solr return a result in 3 seconds when time taken between two
> points in the same path is 20 seconds ?
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 2978,
> "params": {
>   "q": "*:*",
>   "debug": "timing",
>   "indent": "true",
>   "wt": "json",
>   "_": "1481992653008"
> }
>   },
>   "response": {
> "numFound": 1565135270,
> "start": 0,
> "maxScore": 1,
> "docs": [
>   
> ]
>   },
>   "debug": {
> "timing": {
>   "time": 19320,
>   "prepare": {
> "time": 4,
> "query": {
>   "time": 3
> },
> "facet": {
>   "time": 0
> },
> "mlt": {
>   "time": 0
> },
> "highlight": {
>   "time": 0
> },
> "stats": {
>   "time": 0
> },
> "expand": {
>   "time": 0
> },
> "debug": {
>   "time": 0
> }
>   },
>   "process": {
> "time": 19315,
> "query": {
>   "time": 19309
> },
> "facet": {
>   "time": 0
> },
> "mlt": {
>   "time": 1
> },
> "highlight": {
>   "time": 0
> },
> "stats": {
>   "time": 0
> },
> "expand": {
>   "time": 0
> },
> "debug": {
>   "time": 5
> }
>   }
> }
>   }
> }
>


Re: Soft commit and reading data just after the commit

2016-12-18 Thread Lasitha Wattaladeniya
Hi furkan,

Thanks for your reply, it is generally a query heavy system. We are using
realtime indexing for editing the available data

Regards,
Lasitha

Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com

On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI 
wrote:

> Hi Lasitha,
>
> What is your indexing / querying requirements. Do you have an index
> heavy/light  - query heavy/light system?
>
> Kind Regards,
> Furkan KAMACI
>
> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya  >
> wrote:
>
> > Hello devs,
> >
> > I'm here with another problem i'm facing. I'm trying to do a commit (soft
> > commit) through solrj and just after the commit, retrieve the data from
> > solr (requirement is to get updated data list).
> >
> > I'm using soft commit instead of the hard commit, is previously I got an
> > error "Exceeded limit of maxWarmingSearchers=2, try again later" because
> of
> > too many commit requests. Now I have removed the explicit commit and has
> > let the solr to do the commit using autoSoftCommit *(1 mili second)* and
> > autoCommit *(30 seconds)* configurations. Now I'm not getting any errors
> > when i'm committing frequently.
> >
> > The problem i'm facing now is, I'm not getting the updated data when I
> > fetch from solr just after the soft commit. So in this case what are the
> > best practices to use ? to wait 1 mili second before retrieving data
> after
> > soft commit ? I don't feel like waiting from client side is a good
> option.
> > Please give me some help from your expert knowledge
> >
> > Best regards,
> > Lasitha Wattaladeniya
> > Software Engineer
> >
> > Mobile : +6593896893
> > Blog : techreadme.blogspot.com
> >
>


Re: Soft commit and reading data just after the commit

2016-12-18 Thread Furkan KAMACI
Hi Lasitha,

What is your indexing / querying requirements. Do you have an index
heavy/light  - query heavy/light system?

Kind Regards,
Furkan KAMACI

On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya 
wrote:

> Hello devs,
>
> I'm here with another problem i'm facing. I'm trying to do a commit (soft
> commit) through solrj and just after the commit, retrieve the data from
> solr (requirement is to get updated data list).
>
> I'm using soft commit instead of the hard commit, is previously I got an
> error "Exceeded limit of maxWarmingSearchers=2, try again later" because of
> too many commit requests. Now I have removed the explicit commit and has
> let the solr to do the commit using autoSoftCommit *(1 mili second)* and
> autoCommit *(30 seconds)* configurations. Now I'm not getting any errors
> when i'm committing frequently.
>
> The problem i'm facing now is, I'm not getting the updated data when I
> fetch from solr just after the soft commit. So in this case what are the
> best practices to use ? to wait 1 mili second before retrieving data after
> soft commit ? I don't feel like waiting from client side is a good option.
> Please give me some help from your expert knowledge
>
> Best regards,
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893
> Blog : techreadme.blogspot.com
>


Soft commit and reading data just after the commit

2016-12-18 Thread Lasitha Wattaladeniya
Hello devs,

I'm here with another problem i'm facing. I'm trying to do a commit (soft
commit) through solrj and just after the commit, retrieve the data from
solr (requirement is to get updated data list).

I'm using soft commit instead of the hard commit, is previously I got an
error "Exceeded limit of maxWarmingSearchers=2, try again later" because of
too many commit requests. Now I have removed the explicit commit and has
let the solr to do the commit using autoSoftCommit *(1 mili second)* and
autoCommit *(30 seconds)* configurations. Now I'm not getting any errors
when i'm committing frequently.

The problem i'm facing now is, I'm not getting the updated data when I
fetch from solr just after the soft commit. So in this case what are the
best practices to use ? to wait 1 mili second before retrieving data after
soft commit ? I don't feel like waiting from client side is a good option.
Please give me some help from your expert knowledge

Best regards,
Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com