Re: Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-15 Thread Pradeep Chandra
Hi Sir,

Let me give some clarification on IsWithin(POLYGON(())) query...It is not
giving any result for beyond 213 points of polygon...

Thanks
M Pradeep Chandra



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regarding-google-maps-polyline-to-use-IsWithin-POLYGON-in-solr-tp4263975p4264046.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query behavior.

2016-03-15 Thread Modassar Ather
Jack as suggested I have created following jira issue.

https://issues.apache.org/jira/browse/SOLR-8853

Thanks,
Modassar


On Tue, Mar 15, 2016 at 8:15 PM, Jack Krupansky 
wrote:

> That was precisely the point of the need for a new Jira - to answer exactly
> the questions that you have posed - and that I had proposed as well. Until
> some of the senior committers comment on that Jira you won't have answers.
> They've painted themselves into a corner and now I am curious how they will
> unpaint themselves out of that corner.
>
> -- Jack Krupansky
>
> On Tue, Mar 15, 2016 at 1:46 AM, Modassar Ather 
> wrote:
>
> > Thanks Jack for your response.
> > The following jira bug for this issue is already present so I have not
> > created a new one.
> > https://issues.apache.org/jira/browse/SOLR-8812
> >
> > Kindly help me understand that whether it is possible to achieve search
> on
> > ORed terms as it was done in earlier Solr version.
> > Is this behavior intentional or is it a bug? I need to migrate to
> > Solr-5.5.0 but not doing so due to this behavior.
> >
> > Thanks,
> > Modassar
> >
> >
> > On Fri, Mar 11, 2016 at 3:18 AM, Jack Krupansky <
> jack.krupan...@gmail.com>
> > wrote:
> >
> > > We probably need a Jira to investigate whether this really is an
> > explicitly
> > > intentional feature change, or whether it really is a bug. And if it
> > truly
> > > was intentional, how people can work around the change to get the
> > desired,
> > > pre-5.5 behavior. Personally, I always thought it was a mistake that
> q.op
> > > and mm were so tightly linked in Solr even though they are independent
> in
> > > Lucene.
> > >
> > > In short, I think people want to be able to set the default behavior
> for
> > > individual terms (MUST vs. SHOULD) if explicit operators are not used,
> > and
> > > that OR is an explicit operator. And that mm should control only how
> many
> > > SHOULD terms are required (Lucene MinShouldMatch.)
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > On Thu, Mar 10, 2016 at 3:41 AM, Modassar Ather <
> modather1...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Shawn for pointing to the jira issue. I was not sure that if
> it
> > is
> > > > an expected behavior or a bug or there could have been a way to get
> the
> > > > desired result.
> > > >
> > > > Best,
> > > > Modassar
> > > >
> > > > On Thu, Mar 10, 2016 at 11:32 AM, Shawn Heisey 
> > > > wrote:
> > > >
> > > > > On 3/9/2016 10:55 PM, Shawn Heisey wrote:
> > > > > > The ~2 syntax, when not attached to a phrase query (quotes) is
> the
> > > way
> > > > > > you express a fuzzy query. If it's attached to a query in quotes,
> > > then
> > > > > > it is a proximity query. I'm not sure whether it means something
> > > > > > different when it's attached to a query clause in parentheses,
> > > someone
> > > > > > with more knowledge will need to comment.
> > > > > 
> > > > > > https://issues.apache.org/jira/browse/SOLR-8812
> > > > >
> > > > > After I read SOLR-8812 more closely, it seems that the ~2 syntax
> with
> > > > > parentheses is the way that the effective mm value is expressed
> for a
> > > > > particular query clause in the parsed query.  I've learned
> something
> > > new
> > > > > today.
> > > > >
> > > > > Thanks,
> > > > > Shawn
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: from zookeper embedded to standalone

2016-03-15 Thread Rachid Bouacheria
Thank you very much Erick.
You described almost for word what I thought about doing.
I just wasn't sure about how "drinkable" a cocktail made out of embedded
and the external zookeeper.
I was worried I'd get a severe hangover :-)

I will give it a try and give an update.
Rachid.



On Tue, Mar 15, 2016 at 2:29 PM, Erick Erickson 
wrote:

> Hmmm, I don't think anyone's really documented this as the
> supposition is that one would only run embedded for sandboxes
> and set up an external ensemble "for real".
>
> So, with the caveat that I haven't personally tried this, I'd
> add external zookeepers as part of an ensemble that
> contained my embedded zookeepers. That means your ZK
> configurations pointing to your embedded ZK instances. At
> that point the external ZK _should_ replicate the data from the
> embedded instances to the external ones.
>
> Then shut down all your Solrs and change your external
> ensemble configurations to only point to each other (i.e.
> take the embedded stuff out). Now start your solrs pointing
> to the external ensemble.
>
> As I said, though, I haven't personally done this so
> go ahead and give it a try.
>
> Or, if you're feeling _really_ brave, just copy the zookeeper data
> directory from the place the embedded Zookeeper put it to the place
> you specify in your external ensemble then start your external
> ensemble.
>
> Theoretically, this should work bu tas I said I haven't tried either of
> these personally.
>
> Best,
> Erick
>
> On Tue, Mar 15, 2016 at 9:59 AM, Rachid Bouacheria 
> wrote:
> > Hi,
> >
> > I am running solr 4.x on 3 servers with zookeper embedded in prod.
> >
> > Each servers has 1 leader and 2 replicas.
> >
> > I want to switch zookeper from embedded to standalone.
> >
> > I want to know if the steps are documented anywhere? I could not find
> them.
> >
> > I am worried my index will get messed up if in the transition.
> >
> > Thank you very much!
>


Re: Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-15 Thread David Smiley
Hi Pradeep,

Are you seeing an error when it doesn't work?  I believe a shape
overlapping itself will cause an error from JTS.  If you do see that, then
you can ask Spatial4j (used by Lucene/Solr) to attempt to deal with it in a
number of ways.  See "validationRule":
https://locationtech.github.io/spatial4j/apidocs/org/locationtech/spatial4j/context/jts/JtsSpatialContextFactory.html

Probably try validationRule="repairBuffer0".

If it still doesn't work (and if you can't use what I say next), I
suggesting debugging this at the JTS level.  You might then wind up
submitting a question to the JTS list.

Spatial4j extends the WKT syntax with a BUFFER() syntax which is possibly
easier/better than your approach of manually building up the buffered path
with your own code to produce a large polygon to send to Solr.  You would
do something like BUFFER(LINESTRING(...),0.001) whereas "10" is the
distance in degrees if you have geo="true", otherwise whatever units your
data was put in.  You can use that with or without JTS since Spatial4j has
a native BufferedLineString shape.  But FYI it doesn't support geo="true"
very well (i.e. working in degrees); the buffer will be skewed very much
away from the equator.  So you could set geo="false" and supply, say,
web-mercator bounding box and work in that Euclidean/2D projected space.

Another FYI, Lucene has a "Geo3d" package within the Spatial3d module that
has a native implementation of a buffered LineString as well, one that
works on the surface of the earth.  It hasn't yet been hooked into
Spatial4j, after which Solr would need no changes.  There's a user "Chris"
who is working on that; it's filed here:
https://github.com/locationtech/spatial4j/issues/134

Good luck.

~ David


On Tue, Mar 15, 2016 at 2:45 PM Pradeep Chandra <
pradeepchandra@gmail.com> wrote:

> Hi Sir,
>
> I want to draw a polyline along the route given by google maps (from one
> place to another place).
>
> I applied the logic of calculating parallel lines between the two markers
> on the route on both sides of the route. Because of the non-linear nature
> of the route. In some cases the polyline is overlapping.
>
> Finally what I am willing to do is by drawing that polyline along the
> route. I will give that polygon go Solr in order to get the results within
> the polygon. But where the problem I am getting is because of the
> overlapping nature of polyline, the Solr is not taking that shape.
>
> Can u suggest me a logic to draw a polyline along the route / Let me know
> is there any type to fetch the data with that type of polyline also in Sorl.
>
> I construct a polygon with 300 points. But for that solr is not giving any
> result..Where as it is giving for results for polygon having points of <
> 200...Can u tell me about the max no.of points to construct a polygon using
> solr...Or it is restricted to that many points in solr.
>
> I am sending some images of my final desired one & my applied one. Please
> find those attachments.
>
> Thanks and Regards
> M Pradeep Chandra
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: Understanding parsed queries.

2016-03-15 Thread Modassar Ather
The query parsing is not strict Boolean logic, here's a great
writeup on the topic:
https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/

Thanks for pointing to the link. I have gone through this post. Following
is mentioned in the post.
Practically speaking this means that NOT takes precedence over AND which
takes precedence over OR — but only if the default operator for the query
parser has not been changed from the default (“Or”). If the default
operator is set to “And” then the behavior is just plain weird.

I have q.op set as AND. Not sure how it will behave. Kindly provide your
inputs.

My guess as to why the counts are the same with and without the fl
term is that it's present only in docs with term2 and term3 in them
perhaps?

I have checked only fl:term1 and found many many more documents containing
it than document having all the three terms so the results should have the
documents containing only term1.

Thanks,
Modassar


On Tue, Mar 15, 2016 at 9:16 PM, Erick Erickson 
wrote:

> The query parsing is not strict Boolean logic, here's a great
> writeup on the topic:
> https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/
>
> The outer "+" is simply the entire clause (of which there is only one)
> must be present, i.e. it's the whole query.
>
> My guess as to why the counts are the same with and without the fl
> term is that it's present only in docs with term2 and term3 in them
> perhaps?
>
> Best,
> Erick
>
> Best,
> Erick
>
> On Tue, Mar 15, 2016 at 12:22 AM, Modassar Ather 
> wrote:
> > Hi,
> >
> > Kindly help me understand the parsed queries of following three queries.
> > How these parsed queries can be interpreted for boolean logic.
> > Please ignore the boost part.
> >
> > *Query : *fl:term1 OR fl:term2 AND fl:term3
> > *"parsedquery_toString" : *"boost(+(fl:term1 +fl:term2
> > +fl:term3),int(doc_wt))",
> > *matches : *50685
> >
> > The above query seems to be ignoring the fl:term1 as the result of
> fl:term2
> > AND fl:term3 is exactly 50685.
> >
> > *Query : *fl:term1 OR (fl:term2 AND fl:term3)
> > *parsedquery_toString:* "boost(+(fl:term1 (+fl:term2
> > +fl:term3)),int(doc_wt))",
> > *matches : *809006
> >
> > *Query : *(fl:term1 OR fl:term2) AND fl:term3
> > *parsedquery_toString:* "boost(+(+(fl:term1 fl:term2)
> > +fl:term3),int(doc_wt))",
> > *matches : *293949
> >
> > Per my understanding the terms having + is a must and must be present in
> > the document whereas a term without it may or may not be present but
> query
> > one seems to be ignoring the first term completely.
> > How the outer plus defines the behavior. E.g. *outer +* in query
> +(fl:term1
> > +fl:term2 +fl:term3)
> >
> > Thanks,
> > Modassar
>


Re: Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration

2016-03-15 Thread Shawn Heisey
On 3/15/2016 2:56 PM, Paul Hoffman wrote:
>> It sure looks like I started Solr from my blacklight project dir.
>>
>> Any ideas?  Thanks,
>>

You may need to get some help from the blacklight project.  I've got
absolutely no idea what sort of integration they may have done with
Solr, what they may have changed, or how they've arranged the filesystem.

Regarding the Jetty problem, in the directory where the "start.jar" that
you are running lives, there should be a lib directory, with various
jetty jars.  The jetty-xml jar should be one of them.  Here's a listing
of Jetty's lib directory from a Solr 4.9.1 install that I've got.  I
have upgraded to a newer version of Jetty:

root@bigindy5:/opt/solr4# ls -al lib
total 1496
drwxr-xr-x  3 solr solr   4096 Aug 31  2015 .
drwxr-xr-x 13 solr solr   4096 Aug 31  2015 ..
drwxr-xr-x  2 solr solr   4096 Aug 31  2015 ext
-rw-r--r--  1 solr solr  21162 Aug 31  2015
jetty-continuation-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr  61908 Aug 31  2015
jetty-deploy-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr  96122 Aug 31  2015 jetty-http-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr 104219 Aug 31  2015 jetty-io-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr  24770 Aug 31  2015 jetty-jmx-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr  89923 Aug 31  2015
jetty-security-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr 357704 Aug 31  2015
jetty-server-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr 101714 Aug 31  2015
jetty-servlet-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr 287680 Aug 31  2015 jetty-util-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr 110096 Aug 31  2015
jetty-webapp-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr  39065 Aug 31  2015 jetty-xml-8.1.14.v20131031.jar
-rw-r--r--  1 solr solr 200387 Aug 31  2015 servlet-api-3.0.jar

The Jetty included with blacklight may contain more jars than this.  The
Solr jetty install is stripped down so it's very lean.

Thanks,
Shawn



Re: Solr Wiki - Request to add to contributors group

2016-03-15 Thread Shawn Heisey
On 3/15/2016 5:06 PM, Alessandro Benedetti wrote:
> Can I ask you what are the steps to contribute to the solr wiki ?
> Accessing the confluence instance :
> https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

I can only grant you edit permission on the old community wiki, which is
powered by MoinMoin.  That wiki can be found here:

https://wiki.apache.org/solr

Permission to edit the Solr Reference Guide (confluence wiki) is only
granted to committers.  The reason for the tight control is that the
confluence wiki is published and released as PDF documentation for
Solr.  In the Apache world, only committers are permitted to work on
things that result in official release artifacts.

http://archive.apache.org/dist/lucene/solr/ref-guide/

Thanks,
Shawn



Re: from zookeper embedded to standalone

2016-03-15 Thread Upayavira
You need to be careful adding new nodes to Zookeeper as you change the
number of nodes required for quorum. If you then remove nodes
afterwards, how will the cluster know whether they were intentionally
removed, rather than being simply down?

One thing I've done recently, which seems to have worked, is to use a
Solr node embedded Zookeeper but no collections as a Zookeeper host.
This seems to work fine. So, what you could do is just add more nodes to
your Solr setup, replicate your collections onto those new nodes, and
once they've replicated successfully, remove them from the original
nodes, leaving them doing nothing but serving Zookeeper.

Essentially the issue with embedded Zookeeper, at least as I understand
it, is that it conflates two roles which should be kept separate - that
of hosting Zookeeper and that of hosting indexes. The above solution
fits in with that role separation.

Upayavira

On Tue, 15 Mar 2016, at 09:29 PM, Erick Erickson wrote:
> Hmmm, I don't think anyone's really documented this as the
> supposition is that one would only run embedded for sandboxes
> and set up an external ensemble "for real".
> 
> So, with the caveat that I haven't personally tried this, I'd
> add external zookeepers as part of an ensemble that
> contained my embedded zookeepers. That means your ZK
> configurations pointing to your embedded ZK instances. At
> that point the external ZK _should_ replicate the data from the
> embedded instances to the external ones.
> 
> Then shut down all your Solrs and change your external
> ensemble configurations to only point to each other (i.e.
> take the embedded stuff out). Now start your solrs pointing
> to the external ensemble.
> 
> As I said, though, I haven't personally done this so
> go ahead and give it a try.
> 
> Or, if you're feeling _really_ brave, just copy the zookeeper data
> directory from the place the embedded Zookeeper put it to the place
> you specify in your external ensemble then start your external
> ensemble.
> 
> Theoretically, this should work bu tas I said I haven't tried either of
> these personally.
> 
> Best,
> Erick
> 
> On Tue, Mar 15, 2016 at 9:59 AM, Rachid Bouacheria 
> wrote:
> > Hi,
> >
> > I am running solr 4.x on 3 servers with zookeper embedded in prod.
> >
> > Each servers has 1 leader and 2 replicas.
> >
> > I want to switch zookeper from embedded to standalone.
> >
> > I want to know if the steps are documented anywhere? I could not find them.
> >
> > I am worried my index will get messed up if in the transition.
> >
> > Thank you very much!


Re: Solr Wiki - Request to add to contributors group

2016-03-15 Thread Alessandro Benedetti
Thanks Shawn !
Can I ask you what are the steps to contribute to the solr wiki ?
Accessing the confluence instance :
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

I still can not create pages, is still something I need to do, to start ?

Cheers

On Tue, Mar 15, 2016 at 6:06 PM, Shawn Heisey  wrote:

> On 3/15/2016 11:59 AM, Alessandro Benedetti wrote:
> > I need to document better the early contributed Classification
> Lucene/Solr
> > plugins .
> >
> > *Account* :
> > Full Name: Alessandro Benedetti
> > Email : benedetti.ale...@gmail.com
>
> Done.
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0.

2016-03-15 Thread John Mitchell
Hi,

I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0.

I have Solr correctly setup via "bin/solr Start -c cloud -noprompt" and I
have even crawled data with Norconex web crawler and been able to
successfully commit this crawled data into Solr but I want to see if I can
commit Apache Nutch crawled data into Solr.

I tried the tutorial Integrate Solr with Nutchat
https://wiki.apache.org/nutch/NutchTutorial#Integrate_Solr_with_Nutch but
the location and files referred to don't match my Solr 5.3.0 setup.


Thanks,

John Mitchell


RE: Solr debug 'explain' values differ from the Solr score

2016-03-15 Thread Chris Hostetter

Sounds like a mismatch in the way the BooleanQuery explanation generation 
code is handling situations where there is/isn't a coord factor involved 
in computing the score itself.  (the bug is almost certainly in the 
"explain" code, since that is less rigorously tested in most cases, and 
the score itself is probably correct)

I tried to trivially reproduce the symptoms you described using the 
techproducts example and was unable to generate a discrepency using a 
simple boolean query w/a fuzzy clause...

http://localhost:8983/solr/techproducts/query?q=ipod~%20belkin=id,name,score=query=results=true

...can you distill one of your problematic queries down to a 
shorter/simpler reproducible example, and/or provide us with the field & 
fieldType details for all of the fields used in your example?

(i'm guessing it probably relates to your firstName_phonetic field?)



: Date: Tue, 15 Mar 2016 13:17:04 -0700
: From: Rick Sullivan 
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" 
: Subject: RE: Solr debug 'explain' values differ from the Solr score
: 
: After some digging and experimentation, here are some more details on the 
issue I'm seeing.
: 
: 
: 1. The adjusted documents' scores are always exactly (debug_score/N), where N 
is the number of OR items in the query. 
: 
: For example, `=firstName:gabby~ firstName_phonetic:gabby 
firstName_tokens:(gabby)` will result in some of the documents with 
firstName==GABBY receiving a score 1/3 of the score of other GABBY documents, 
even though the debug explanation shows that they generated the same score.
: 
: 
: 2. This doesn't appear to be a brand new issue, or an issue with SolrCloud.
: 
: I've tested the problem using SolrCloud 5.5.0, Solr 5.5.0 (not cloud), and 
Solr 5.4.1.
: 
: 
: Anyone have any ideas?
: 
: Thanks,
: -Rick
: 
: From: r...@ricksullivan.net
: To: solr-user@lucene.apache.org
: Subject: Solr debug 'explain' values differ from the Solr score
: Date: Thu, 10 Mar 2016 08:34:30 -0800
: 
: Hi,
: 
: I'm seeing behavior in Solr 5.5.0 where the top-level values I see in the 
debug response don't always correspond with the scores Solr assigns to the 
matched documents.
: 
: For example, here is the top-level debug information for two documents 
matched by a query:
: 
: 114628: Object
:   description: "sum of:"
:   details: Array[2]
:   match: true
:   value: 20.542768
: 
: 357547: Object
:   description: "sum of:"
:   details: Array[2]
:   match: true
:   value: 26.517654
: 
: But they have scores
: 
: 114628: 20.542767
: 357547: 13.258826
: 
: I expect the second document to be the most relevant for my query, and the 
debug values seem to agree. However, in the final score I receive, that 
document's score has been adjusted down.
: 
: The relevant debug response information can be found here: 
http://apaste.info/mju
: 
: Does anyone have an idea why the Solr score may differ from the debug value?
: 
: Thanks,
: -Rick   

-Hoss
http://www.lucidworks.com/


Re: from zookeper embedded to standalone

2016-03-15 Thread Erick Erickson
Hmmm, I don't think anyone's really documented this as the
supposition is that one would only run embedded for sandboxes
and set up an external ensemble "for real".

So, with the caveat that I haven't personally tried this, I'd
add external zookeepers as part of an ensemble that
contained my embedded zookeepers. That means your ZK
configurations pointing to your embedded ZK instances. At
that point the external ZK _should_ replicate the data from the
embedded instances to the external ones.

Then shut down all your Solrs and change your external
ensemble configurations to only point to each other (i.e.
take the embedded stuff out). Now start your solrs pointing
to the external ensemble.

As I said, though, I haven't personally done this so
go ahead and give it a try.

Or, if you're feeling _really_ brave, just copy the zookeeper data
directory from the place the embedded Zookeeper put it to the place
you specify in your external ensemble then start your external
ensemble.

Theoretically, this should work bu tas I said I haven't tried either of
these personally.

Best,
Erick

On Tue, Mar 15, 2016 at 9:59 AM, Rachid Bouacheria  wrote:
> Hi,
>
> I am running solr 4.x on 3 servers with zookeper embedded in prod.
>
> Each servers has 1 leader and 2 replicas.
>
> I want to switch zookeper from embedded to standalone.
>
> I want to know if the steps are documented anywhere? I could not find them.
>
> I am worried my index will get messed up if in the transition.
>
> Thank you very much!


Re: Inconsistent Shard Usage for Distributed Queries

2016-03-15 Thread Erick Erickson
If you're talking about the core admin API, it is entirely local basis,
that code is completely unaware of anything having to do with collections.

So it works but the ability to forte one core then have questions is
pretty high

Best,
Erick

On Tue, Mar 15, 2016 at 12:05 PM, Nick Vasilyev
 wrote:
> I had another collection I was running into this issue with, so I decided
> to play around with it. This one had active indexing going on, so I was
> able to confirm how the counts get updated. Basically, it looks like
> clicking the reload button will only send a commit to that one core, it
> will not be propagated to other shards and the same shard on the other
> replica. Full commit update?commit=true=true works fine. I
> know that the reload button was not intended to issue commits, but it's
> quicker than typing out the command.
>
> On Tue, Mar 15, 2016 at 12:24 PM, Nick Vasilyev 
> wrote:
>
>> Yea, the code sends actual commits, but I hate typing so usually just
>> click the reload button unless it's production.
>> On Mar 15, 2016 12:22 PM, "Erick Erickson" 
>> wrote:
>>
>>> bq: Not sure what the issue was, in previous versions of Solr, clicking
>>> reload
>>> would send a commit to all replicas, right
>>>
>>> Reloading doesn't really have anything to do with commits. Reload
>>> would certainly
>>> cause a new searcher to be opened and thus would pick up any changes
>>> that hat been hard-committed (openSearcher=false), but that's a complete
>>> side-effect. Simply issuing a commit on the url to the _collection_ will
>>> cause
>>> commits to happen on all replicas, as:
>>>
>>> blah/solr/collection/update?commit=true
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, Mar 15, 2016 at 9:11 AM, Nick Vasilyev 
>>> wrote:
>>> > I reloaded the collection and ran distrib=false query for several
>>> shards on
>>> > both replicas. The counts matched exactly.
>>> >
>>> > I then reloaded the second replica (through the UI) and now it seems
>>> like
>>> > it is working fine, I am getting consistent matches.
>>> >
>>> > Not sure what the issue was, in previous versions of Solr, clicking
>>> reload
>>> > would send a commit to all replicas, right? Is that still the case?
>>> >
>>> >
>>> >
>>> > On Tue, Mar 15, 2016 at 11:53 AM, Erick Erickson <
>>> erickerick...@gmail.com>
>>> > wrote:
>>> >
>>> >> This is very strange. What are the results you get when
>>> >> you compare replicas in th e_same_ shard? It doesn't really
>>> >> mean anything when you say
>>> >> "shard1 has X docs, shard2 has Y docs". The only way
>>> >> you should be getting different results from
>>> >> the match all docs query is if different replicas within the
>>> >> _same_ shard have different counts.
>>> >>
>>> >> And just as a sanity check, issue a commit. It's highly unlikely
>>> >> that you have uncommitted changes, but it never hurts to try.
>>> >>
>>> >> All distributed queries should have a sub query sent to one
>>> >> replica of each shard, is that what you're seeing? And I'd ping
>>> >> the cores  directly rather than provide shards parameters,
>>> >> something like:
>>> >>
>>> >> blha blah blah/products/query/shard1_core3/query?q=*:*. That
>>> >> addresses the specific core rather than rely on any internal query
>>> >> routing logic..
>>> >>
>>> >> Best,
>>> >> Erick
>>> >>
>>> >> On Tue, Mar 15, 2016 at 8:43 AM, Nick Vasilyev <
>>> nick.vasily...@gmail.com>
>>> >> wrote:
>>> >> > Hello,
>>> >> >
>>> >> > I have a brand new installation of Solr 5.4.1 and I am running into a
>>> >> > strange problem with one of my collections. Collection *products*
>>> has 5
>>> >> > shards and replication factor of two. Both replicas are up and show
>>> green
>>> >> > status on the Cloud page in the UI.
>>> >> >
>>> >> > When I run a default search on the query page (q=*:*) I always get a
>>> >> > different numFound although there is no active indexing and
>>> everything is
>>> >> > committed. I checked the logs and it looks like every time it runs a
>>> >> > search, it is sent to different shards. Below, search1 went to shard
>>> 5, 2
>>> >> > and 4, search2 went to shard 5, 3, 1 and search 3 went to shard 3,
>>> 4, 1,
>>> >> 5.
>>> >> >
>>> >> > To confirm this, I ran a =false query on shard 5 and got
>>> >> 8,928,379
>>> >> > items, 8,917,318 for shard 2, and 9,005,295 for shard 4. The results
>>> from
>>> >> > shard 2 distrib=false query did not match the results that were in
>>> the
>>> >> > distributed query (from the logs). The query returned 8917318. Here
>>> is
>>> >> the
>>> >> > log entry for the query.
>>> >> >
>>> >> > 214467874 INFO  (qtp1013423070-21019) [c:products s:shard2
>>> r:core_node7
>>> >> > x:products_shard2_replica2] o.a.s.c.S.Request
>>> [products_shard2_replica2]
>>> >> > webapp=/solr path=/select
>>> >> > params={q=*:*=false=true=json&_=1458056340020}
>>> >> > hits=8917318 status=0 QTime=0
>>> >> >
>>> >> >
>>> >> > Here are the logs 

Re: Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration

2016-03-15 Thread Paul Hoffman
On Tue, Mar 15, 2016 at 01:46:32PM -0600, Shawn Heisey wrote:
> On 3/15/2016 1:34 PM, Paul Hoffman wrote:
> > I've been running Solr successfully until this morning, when I stopped 
> > it to pick up a change in my schema, and now it won't start up again.  
> > I've whittled the problem down to this:
> >
> > 
> > # cd /home/paul/proj/blacklight/jetty
> >
> > # java -jar start.jar -Djetty.port=8983 -Dsolr.solr.home=$PWD/solr
> > WARNING: System properties and/or JVM args set.  Consider using --dry-run 
> > or --exec
> > java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> > at org.eclipse.jetty.start.Main.invokeMain(Main.java:440)
> > at org.eclipse.jetty.start.Main.start(Main.java:615)
> > at org.eclipse.jetty.start.Main.main(Main.java:96)
> > ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration
> 
> There are no Solr classes in that stacktrace.  The class that can't be
> found is a Jetty class.  I think the problem here is in Jetty, not
> Solr.  It probably can't find a jar with a name like one of these:
> 
> jetty-xml-8.1.14.v20131031.jar
> jetty-xml-9.2.13.v20150730.jar
> 
> What version of Solr?  I'm assuming it's not 5.x, since the command used
> to start those versions is very different, and Solr would probably not
> be located within a blacklight folder.

Thanks, Shawn.  Which version indeed -- I have a mishmash of cruft lying 
around from earlier attempts to get Solr and Blacklight running, so I 
don't want to assume anything.  I found the log file that shows me 
stopping and starting Solr today:


# ls -ltr $(find $(locate log | egrep 'solr|jetty') -type f -mtime -1) | head 
-n5
find: `/home/paul/proj/blacklight/jetty/logs/solr.log': No such file or 
directory
-rw-rw-r-- 1 paul paul 2885083 Mar 15 11:38 
/home/paul/proj/blacklight/jetty/logs/solr_log_20160315_1152
-rw-r--r-- 1 root root5088 Mar 15 11:49 
/home/paul/proj/solr-5.3.1/server/logs/solr_log_20160315_1150
-rw-r--r-- 1 root root   26701 Mar 15 11:49 
/home/paul/proj/solr-5.3.1/server/logs/solr_gc_log_20160315_1150
-rw-rw-r-- 1 paul paul5086 Mar 15 11:51 
/home/paul/proj/solr-5.3.1/server/logs/solr_log_20160315_1546
-rw-rw-r-- 1 paul paul   23537 Mar 15 11:51 
/home/paul/proj/solr-5.3.1/server/logs/solr_gc_log_20160315_1546

# LOGFILE=/home/paul/proj/blacklight/jetty/logs/solr_log_20160315_1152

# egrep -nw 'stopped|org.eclipse.jetty.server.Server;' $LOGFILE | tail
5852:INFO  - 2016-01-08 16:16:32.222; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
6128:INFO  - 2016-01-13 13:01:58.338; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
6281:INFO  - 2016-01-14 08:41:03.025; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
7792:INFO  - 2016-02-08 11:57:41.131; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
7957:INFO  - 2016-02-08 12:01:48.361; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
8174:INFO  - 2016-02-08 15:03:18.641; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
8773:INFO  - 2016-02-10 12:05:25.639; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
12244:INFO  - 2016-03-15 11:38:16.810; org.eclipse.jetty.server.Server; 
Graceful shutdown SocketConnector@0.0.0.0:8983
12245:INFO  - 2016-03-15 11:38:16.814; org.eclipse.jetty.server.Server; 
Graceful shutdown 
o.e.j.w.WebAppContext{/solr,file:/home/paul/proj/blacklight/jetty/solr-webapp/webapp/},/home/paul/proj/blacklight/jetty/webapps/solr.war
12262:INFO  - 2016-03-15 11:38:18.473; 
org.eclipse.jetty.server.handler.ContextHandler; stopped 
o.e.j.w.WebAppContext{/solr,file:/home/paul/proj/blacklight/jetty/solr-webapp/webapp/},/home/paul/proj/blacklight/jetty/webapps/solr.war


It looks like the last time it was last restarted was on February 10 
(line 8773).  The log file doesn't show the Solr version directly, but 
maybe the first lines will help: 


# sed -n 8773,8795p $LOGFILE
INFO  - 2016-02-10 12:05:25.639; org.eclipse.jetty.server.Server; 
jetty-8.1.10.v20130312
INFO  - 2016-02-10 12:05:25.703; 
org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor 
/home/paul/proj/blacklight/jetty/contexts at interval 0
INFO  - 2016-02-10 12:05:25.714; org.eclipse.jetty.deploy.DeploymentManager; 
Deployable added: /home/paul/proj/blacklight/jetty/contexts/solr.xml
INFO  - 2016-02-10 12:05:27.477; 
org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for /solr, 
did not find org.apache.jasper.servlet.JspServlet
INFO  - 2016-02-10 12:05:27.576; 

RE: Solr debug 'explain' values differ from the Solr score

2016-03-15 Thread Rick Sullivan
After some digging and experimentation, here are some more details on the issue 
I'm seeing.


1. The adjusted documents' scores are always exactly (debug_score/N), where N 
is the number of OR items in the query. 

For example, `=firstName:gabby~ firstName_phonetic:gabby 
firstName_tokens:(gabby)` will result in some of the documents with 
firstName==GABBY receiving a score 1/3 of the score of other GABBY documents, 
even though the debug explanation shows that they generated the same score.


2. This doesn't appear to be a brand new issue, or an issue with SolrCloud.

I've tested the problem using SolrCloud 5.5.0, Solr 5.5.0 (not cloud), and Solr 
5.4.1.


Anyone have any ideas?

Thanks,
-Rick

From: r...@ricksullivan.net
To: solr-user@lucene.apache.org
Subject: Solr debug 'explain' values differ from the Solr score
Date: Thu, 10 Mar 2016 08:34:30 -0800

Hi,

I'm seeing behavior in Solr 5.5.0 where the top-level values I see in the debug 
response don't always correspond with the scores Solr assigns to the matched 
documents.

For example, here is the top-level debug information for two documents matched 
by a query:

114628: Object
  description: "sum of:"
  details: Array[2]
  match: true
  value: 20.542768

357547: Object
  description: "sum of:"
  details: Array[2]
  match: true
  value: 26.517654

But they have scores

114628: 20.542767
357547: 13.258826

I expect the second document to be the most relevant for my query, and the 
debug values seem to agree. However, in the final score I receive, that 
document's score has been adjusted down.

The relevant debug response information can be found here: 
http://apaste.info/mju

Does anyone have an idea why the Solr score may differ from the debug value?

Thanks,
-Rick 

Re: Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration

2016-03-15 Thread Shawn Heisey
On 3/15/2016 1:34 PM, Paul Hoffman wrote:
> I've been running Solr successfully until this morning, when I stopped 
> it to pick up a change in my schema, and now it won't start up again.  
> I've whittled the problem down to this:
>
> 
> # cd /home/paul/proj/blacklight/jetty
>
> # java -jar start.jar -Djetty.port=8983 -Dsolr.solr.home=$PWD/solr
> WARNING: System properties and/or JVM args set.  Consider using --dry-run or 
> --exec
> java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at org.eclipse.jetty.start.Main.invokeMain(Main.java:440)
> at org.eclipse.jetty.start.Main.start(Main.java:615)
> at org.eclipse.jetty.start.Main.main(Main.java:96)
> ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration

There are no Solr classes in that stacktrace.  The class that can't be
found is a Jetty class.  I think the problem here is in Jetty, not
Solr.  It probably can't find a jar with a name like one of these:

jetty-xml-8.1.14.v20131031.jar
jetty-xml-9.2.13.v20150730.jar

What version of Solr?  I'm assuming it's not 5.x, since the command used
to start those versions is very different, and Solr would probably not
be located within a blacklight folder.

Thanks,
Shawn



Solr won't start -- java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration

2016-03-15 Thread Paul Hoffman
I've been running Solr successfully until this morning, when I stopped 
it to pick up a change in my schema, and now it won't start up again.  
I've whittled the problem down to this:


# cd /home/paul/proj/blacklight/jetty

# java -jar start.jar -Djetty.port=8983 -Dsolr.solr.home=$PWD/solr
WARNING: System properties and/or JVM args set.  Consider using --dry-run or 
--exec
java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:440)
at org.eclipse.jetty.start.Main.start(Main.java:615)
at org.eclipse.jetty.start.Main.main(Main.java:96)
ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration

Usage: java -jar start.jar [options] [properties] [configs]
   java -jar start.jar --help  # for more information

# readlink -e $(which java)
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

# uname -srvmpio
Linux 3.16.0-57-generic #77~14.04.1-Ubuntu SMP Thu Dec 17 23:20:00 UTC 2015 
x86_64 x86_64 x86_64 GNU/Linux

# env | fgrep JAVA
[no output]


I only have one JVM installed -- openjdk-8-jre-headless.  Judging from 
the file timestamps within /usr/lib/jvm, the package hasn't been updated 
since last August at the latest; the server has only been up for 62 
days.

Just in case it matters, I was running Solr successfully under 
Blacklight's jetty wrapper, and the command line above is what it uses 
(or claims to use).

Does anyone have any idea what might be causing this problem?

Thanks in advance,

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


Re: Inconsistent Shard Usage for Distributed Queries

2016-03-15 Thread Nick Vasilyev
I had another collection I was running into this issue with, so I decided
to play around with it. This one had active indexing going on, so I was
able to confirm how the counts get updated. Basically, it looks like
clicking the reload button will only send a commit to that one core, it
will not be propagated to other shards and the same shard on the other
replica. Full commit update?commit=true=true works fine. I
know that the reload button was not intended to issue commits, but it's
quicker than typing out the command.

On Tue, Mar 15, 2016 at 12:24 PM, Nick Vasilyev 
wrote:

> Yea, the code sends actual commits, but I hate typing so usually just
> click the reload button unless it's production.
> On Mar 15, 2016 12:22 PM, "Erick Erickson" 
> wrote:
>
>> bq: Not sure what the issue was, in previous versions of Solr, clicking
>> reload
>> would send a commit to all replicas, right
>>
>> Reloading doesn't really have anything to do with commits. Reload
>> would certainly
>> cause a new searcher to be opened and thus would pick up any changes
>> that hat been hard-committed (openSearcher=false), but that's a complete
>> side-effect. Simply issuing a commit on the url to the _collection_ will
>> cause
>> commits to happen on all replicas, as:
>>
>> blah/solr/collection/update?commit=true
>>
>> Best,
>> Erick
>>
>> On Tue, Mar 15, 2016 at 9:11 AM, Nick Vasilyev 
>> wrote:
>> > I reloaded the collection and ran distrib=false query for several
>> shards on
>> > both replicas. The counts matched exactly.
>> >
>> > I then reloaded the second replica (through the UI) and now it seems
>> like
>> > it is working fine, I am getting consistent matches.
>> >
>> > Not sure what the issue was, in previous versions of Solr, clicking
>> reload
>> > would send a commit to all replicas, right? Is that still the case?
>> >
>> >
>> >
>> > On Tue, Mar 15, 2016 at 11:53 AM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> >> This is very strange. What are the results you get when
>> >> you compare replicas in th e_same_ shard? It doesn't really
>> >> mean anything when you say
>> >> "shard1 has X docs, shard2 has Y docs". The only way
>> >> you should be getting different results from
>> >> the match all docs query is if different replicas within the
>> >> _same_ shard have different counts.
>> >>
>> >> And just as a sanity check, issue a commit. It's highly unlikely
>> >> that you have uncommitted changes, but it never hurts to try.
>> >>
>> >> All distributed queries should have a sub query sent to one
>> >> replica of each shard, is that what you're seeing? And I'd ping
>> >> the cores  directly rather than provide shards parameters,
>> >> something like:
>> >>
>> >> blha blah blah/products/query/shard1_core3/query?q=*:*. That
>> >> addresses the specific core rather than rely on any internal query
>> >> routing logic..
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Tue, Mar 15, 2016 at 8:43 AM, Nick Vasilyev <
>> nick.vasily...@gmail.com>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > I have a brand new installation of Solr 5.4.1 and I am running into a
>> >> > strange problem with one of my collections. Collection *products*
>> has 5
>> >> > shards and replication factor of two. Both replicas are up and show
>> green
>> >> > status on the Cloud page in the UI.
>> >> >
>> >> > When I run a default search on the query page (q=*:*) I always get a
>> >> > different numFound although there is no active indexing and
>> everything is
>> >> > committed. I checked the logs and it looks like every time it runs a
>> >> > search, it is sent to different shards. Below, search1 went to shard
>> 5, 2
>> >> > and 4, search2 went to shard 5, 3, 1 and search 3 went to shard 3,
>> 4, 1,
>> >> 5.
>> >> >
>> >> > To confirm this, I ran a =false query on shard 5 and got
>> >> 8,928,379
>> >> > items, 8,917,318 for shard 2, and 9,005,295 for shard 4. The results
>> from
>> >> > shard 2 distrib=false query did not match the results that were in
>> the
>> >> > distributed query (from the logs). The query returned 8917318. Here
>> is
>> >> the
>> >> > log entry for the query.
>> >> >
>> >> > 214467874 INFO  (qtp1013423070-21019) [c:products s:shard2
>> r:core_node7
>> >> > x:products_shard2_replica2] o.a.s.c.S.Request
>> [products_shard2_replica2]
>> >> > webapp=/solr path=/select
>> >> > params={q=*:*=false=true=json&_=1458056340020}
>> >> > hits=8917318 status=0 QTime=0
>> >> >
>> >> >
>> >> > Here are the logs from other queries.
>> >> >
>> >> > Search 1 - numFound 18309764
>> >> >
>> >> > 213941984 INFO  (qtp1013423070-21046) [c:products s:shard5
>> r:core_node4
>> >> > x:products_shard5_replica2] o.a.s.c.S.Request
>> [products_shard5_replica2]
>> >> > webapp=/solr path=/select
>> >> >
>> >>
>> params={df=text=false=id=score=4=0=true=
>> >> >
>> >>
>> 

Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-15 Thread Pradeep Chandra
Hi Sir,

I want to draw a polyline along the route given by google maps (from one
place to another place).

I applied the logic of calculating parallel lines between the two markers
on the route on both sides of the route. Because of the non-linear nature
of the route. In some cases the polyline is overlapping.

Finally what I am willing to do is by drawing that polyline along the
route. I will give that polygon go Solr in order to get the results within
the polygon. But where the problem I am getting is because of the
overlapping nature of polyline, the Solr is not taking that shape.

Can u suggest me a logic to draw a polyline along the route / Let me know
is there any type to fetch the data with that type of polyline also in Sorl.

I construct a polygon with 300 points. But for that solr is not giving any
result..Where as it is giving for results for polygon having points of <
200...Can u tell me about the max no.of points to construct a polygon using
solr...Or it is restricted to that many points in solr.

I am sending some images of my final desired one & my applied one. Please
find those attachments.

Thanks and Regards
M Pradeep Chandra


Get Recently Added/Updated Documents

2016-03-15 Thread Lyuba Romanchuk
Hi,

I have the following scenario:

   - there are 2 machines running solr 4.8.1
   - there are different time zones on both machines
   - the clock is not synchronized on both machines

Autorefresh query running each X-2 seconds should return documents for last
X seconds and the performance impact should be low as much as possible
(perfectly, should take less then second).

First of all, I added first-component that overrides NOW param set by main
shard in order to treat the local NOW time on each solr machine.
And I added a new custom function
recent_docs(ms_since_now(_version_),X)=recip(ms(NOW,_version_ to
milliseconds),0.01/X,1,1).

Then I thought about 2 possible solutions but there is disadvantage for
each one and now I try to decide which one is the most optimal.
And maybe there are another solutions that I didn't think about.

   1. *Solution 1*: use boosting for _version_ field like this: q={!boost
   b=recent_docs(ms_since_now(_version_),X)}*:*
   1. _version_ because I need to receive the recently updated documents
  and the time of the document shouldn't be changed. And I saw
from the code
  that the _version_ is calculated based on the time
  2. It's good for sorting because all documents are sorting by scoring
  but in this case all documents are matched and I need to return only
  documents with score from [0.1 to 1]. I may filter by _version_
field but I
  prefer not to do it due to performance.
  3. *Question*:
 1. what is the performance impact for such scoring?
 2. *how can I return only documents with scoring from 0.1 to 1*?
  2. *Solution 2*: use query function like this:  fq={!frange l=0.1
   u=1}recent_docs(ms_since_now(_version_),X)
   1. in this case only relevant documents are returned but they are not
  sorted and sorting by _version_ or adding scoring seems is not  efficient
  because in such case the same function will be claculated twice
  2. it seems that there is very high performance impact to use this
  query function on large cores with hundred millions of documents
  3. *Questions*:
 1. *what is the most optimal way to sort the returned documents
 without calculating twice the same function*?
 2. and what is the performance impact of such filter query, is
 FieldCache is used?
 3. May it drastic increase the memory consumption of solr on very
 updated cores with millions of documents?


Any assistance/suggestion/comment will be very appreciated.

Thank you.

Best regards,
Lyuba


Re: Solr Wiki - Request to add to contributors group

2016-03-15 Thread Shawn Heisey
On 3/15/2016 11:59 AM, Alessandro Benedetti wrote:
> I need to document better the early contributed Classification Lucene/Solr
> plugins .
>
> *Account* :
> Full Name: Alessandro Benedetti
> Email : benedetti.ale...@gmail.com

Done.



Solr Wiki - Request to add to contributors group

2016-03-15 Thread Alessandro Benedetti
Hi guys,
I need to document better the early contributed Classification Lucene/Solr
plugins .

*Account* :
Full Name: Alessandro Benedetti
Email : benedetti.ale...@gmail.com

*References* :  https://issues.apache.org/jira/browse/SOLR-7739 ,
http://alexbenedetti.blogspot.co.uk/2015/07/lucene-document-classification.html
,
http://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html

Cheers

-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


from zookeper embedded to standalone

2016-03-15 Thread Rachid Bouacheria
Hi,

I am running solr 4.x on 3 servers with zookeper embedded in prod.

Each servers has 1 leader and 2 replicas.

I want to switch zookeper from embedded to standalone.

I want to know if the steps are documented anywhere? I could not find them.

I am worried my index will get messed up if in the transition.

Thank you very much!


Re: Inconsistent Shard Usage for Distributed Queries

2016-03-15 Thread Nick Vasilyev
Yea, the code sends actual commits, but I hate typing so usually just click
the reload button unless it's production.
On Mar 15, 2016 12:22 PM, "Erick Erickson"  wrote:

> bq: Not sure what the issue was, in previous versions of Solr, clicking
> reload
> would send a commit to all replicas, right
>
> Reloading doesn't really have anything to do with commits. Reload
> would certainly
> cause a new searcher to be opened and thus would pick up any changes
> that hat been hard-committed (openSearcher=false), but that's a complete
> side-effect. Simply issuing a commit on the url to the _collection_ will
> cause
> commits to happen on all replicas, as:
>
> blah/solr/collection/update?commit=true
>
> Best,
> Erick
>
> On Tue, Mar 15, 2016 at 9:11 AM, Nick Vasilyev 
> wrote:
> > I reloaded the collection and ran distrib=false query for several shards
> on
> > both replicas. The counts matched exactly.
> >
> > I then reloaded the second replica (through the UI) and now it seems like
> > it is working fine, I am getting consistent matches.
> >
> > Not sure what the issue was, in previous versions of Solr, clicking
> reload
> > would send a commit to all replicas, right? Is that still the case?
> >
> >
> >
> > On Tue, Mar 15, 2016 at 11:53 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> This is very strange. What are the results you get when
> >> you compare replicas in th e_same_ shard? It doesn't really
> >> mean anything when you say
> >> "shard1 has X docs, shard2 has Y docs". The only way
> >> you should be getting different results from
> >> the match all docs query is if different replicas within the
> >> _same_ shard have different counts.
> >>
> >> And just as a sanity check, issue a commit. It's highly unlikely
> >> that you have uncommitted changes, but it never hurts to try.
> >>
> >> All distributed queries should have a sub query sent to one
> >> replica of each shard, is that what you're seeing? And I'd ping
> >> the cores  directly rather than provide shards parameters,
> >> something like:
> >>
> >> blha blah blah/products/query/shard1_core3/query?q=*:*. That
> >> addresses the specific core rather than rely on any internal query
> >> routing logic..
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Mar 15, 2016 at 8:43 AM, Nick Vasilyev <
> nick.vasily...@gmail.com>
> >> wrote:
> >> > Hello,
> >> >
> >> > I have a brand new installation of Solr 5.4.1 and I am running into a
> >> > strange problem with one of my collections. Collection *products* has
> 5
> >> > shards and replication factor of two. Both replicas are up and show
> green
> >> > status on the Cloud page in the UI.
> >> >
> >> > When I run a default search on the query page (q=*:*) I always get a
> >> > different numFound although there is no active indexing and
> everything is
> >> > committed. I checked the logs and it looks like every time it runs a
> >> > search, it is sent to different shards. Below, search1 went to shard
> 5, 2
> >> > and 4, search2 went to shard 5, 3, 1 and search 3 went to shard 3, 4,
> 1,
> >> 5.
> >> >
> >> > To confirm this, I ran a =false query on shard 5 and got
> >> 8,928,379
> >> > items, 8,917,318 for shard 2, and 9,005,295 for shard 4. The results
> from
> >> > shard 2 distrib=false query did not match the results that were in the
> >> > distributed query (from the logs). The query returned 8917318. Here is
> >> the
> >> > log entry for the query.
> >> >
> >> > 214467874 INFO  (qtp1013423070-21019) [c:products s:shard2
> r:core_node7
> >> > x:products_shard2_replica2] o.a.s.c.S.Request
> [products_shard2_replica2]
> >> > webapp=/solr path=/select
> >> > params={q=*:*=false=true=json&_=1458056340020}
> >> > hits=8917318 status=0 QTime=0
> >> >
> >> >
> >> > Here are the logs from other queries.
> >> >
> >> > Search 1 - numFound 18309764
> >> >
> >> > 213941984 INFO  (qtp1013423070-21046) [c:products s:shard5
> r:core_node4
> >> > x:products_shard5_replica2] o.a.s.c.S.Request
> [products_shard5_replica2]
> >> > webapp=/solr path=/select
> >> >
> >>
> params={df=text=false=id=score=4=0=true=
> >> >
> >>
> http://192.168.1.211:9000/solr/products_shard5_replica2/|http://192.168.1.212:9000/solr/products_shard5_replica1/=10=2=*:*=1458055805759=true=javabin&_=1458055814096
> >> }
> >> > hits=8928379 status=0 QTime=3
> >> > 213941985 INFO  (qtp1013423070-21028) [c:products s:shard4
> r:core_node6
> >> > x:products_shard4_replica2] o.a.s.c.S.Request
> [products_shard4_replica2]
> >> > webapp=/solr path=/select
> >> >
> >>
> params={df=text=false=id=score=4=0=true=
> >> >
> >>
> http://192.168.1.212:9000/solr/products_shard4_replica1/|http://192.168.1.211:9000/solr/products_shard4_replica2/=10=2=*:*=1458055805759=true=javabin&_=1458055814096
> >> }
> >> > hits=9005295 status=0 QTime=3
> >> > 213942045 INFO  (qtp1013423070-21042) [c:products s:shard2
> r:core_node7
> >> > x:products_shard2_replica2] o.a.s.c.S.Request
> [products_shard2_replica2]
> 

Re: Inconsistent Shard Usage for Distributed Queries

2016-03-15 Thread Erick Erickson
bq: Not sure what the issue was, in previous versions of Solr, clicking reload
would send a commit to all replicas, right

Reloading doesn't really have anything to do with commits. Reload
would certainly
cause a new searcher to be opened and thus would pick up any changes
that hat been hard-committed (openSearcher=false), but that's a complete
side-effect. Simply issuing a commit on the url to the _collection_ will cause
commits to happen on all replicas, as:

blah/solr/collection/update?commit=true

Best,
Erick

On Tue, Mar 15, 2016 at 9:11 AM, Nick Vasilyev  wrote:
> I reloaded the collection and ran distrib=false query for several shards on
> both replicas. The counts matched exactly.
>
> I then reloaded the second replica (through the UI) and now it seems like
> it is working fine, I am getting consistent matches.
>
> Not sure what the issue was, in previous versions of Solr, clicking reload
> would send a commit to all replicas, right? Is that still the case?
>
>
>
> On Tue, Mar 15, 2016 at 11:53 AM, Erick Erickson 
> wrote:
>
>> This is very strange. What are the results you get when
>> you compare replicas in th e_same_ shard? It doesn't really
>> mean anything when you say
>> "shard1 has X docs, shard2 has Y docs". The only way
>> you should be getting different results from
>> the match all docs query is if different replicas within the
>> _same_ shard have different counts.
>>
>> And just as a sanity check, issue a commit. It's highly unlikely
>> that you have uncommitted changes, but it never hurts to try.
>>
>> All distributed queries should have a sub query sent to one
>> replica of each shard, is that what you're seeing? And I'd ping
>> the cores  directly rather than provide shards parameters,
>> something like:
>>
>> blha blah blah/products/query/shard1_core3/query?q=*:*. That
>> addresses the specific core rather than rely on any internal query
>> routing logic..
>>
>> Best,
>> Erick
>>
>> On Tue, Mar 15, 2016 at 8:43 AM, Nick Vasilyev 
>> wrote:
>> > Hello,
>> >
>> > I have a brand new installation of Solr 5.4.1 and I am running into a
>> > strange problem with one of my collections. Collection *products* has 5
>> > shards and replication factor of two. Both replicas are up and show green
>> > status on the Cloud page in the UI.
>> >
>> > When I run a default search on the query page (q=*:*) I always get a
>> > different numFound although there is no active indexing and everything is
>> > committed. I checked the logs and it looks like every time it runs a
>> > search, it is sent to different shards. Below, search1 went to shard 5, 2
>> > and 4, search2 went to shard 5, 3, 1 and search 3 went to shard 3, 4, 1,
>> 5.
>> >
>> > To confirm this, I ran a =false query on shard 5 and got
>> 8,928,379
>> > items, 8,917,318 for shard 2, and 9,005,295 for shard 4. The results from
>> > shard 2 distrib=false query did not match the results that were in the
>> > distributed query (from the logs). The query returned 8917318. Here is
>> the
>> > log entry for the query.
>> >
>> > 214467874 INFO  (qtp1013423070-21019) [c:products s:shard2 r:core_node7
>> > x:products_shard2_replica2] o.a.s.c.S.Request [products_shard2_replica2]
>> > webapp=/solr path=/select
>> > params={q=*:*=false=true=json&_=1458056340020}
>> > hits=8917318 status=0 QTime=0
>> >
>> >
>> > Here are the logs from other queries.
>> >
>> > Search 1 - numFound 18309764
>> >
>> > 213941984 INFO  (qtp1013423070-21046) [c:products s:shard5 r:core_node4
>> > x:products_shard5_replica2] o.a.s.c.S.Request [products_shard5_replica2]
>> > webapp=/solr path=/select
>> >
>> params={df=text=false=id=score=4=0=true=
>> >
>> http://192.168.1.211:9000/solr/products_shard5_replica2/|http://192.168.1.212:9000/solr/products_shard5_replica1/=10=2=*:*=1458055805759=true=javabin&_=1458055814096
>> }
>> > hits=8928379 status=0 QTime=3
>> > 213941985 INFO  (qtp1013423070-21028) [c:products s:shard4 r:core_node6
>> > x:products_shard4_replica2] o.a.s.c.S.Request [products_shard4_replica2]
>> > webapp=/solr path=/select
>> >
>> params={df=text=false=id=score=4=0=true=
>> >
>> http://192.168.1.212:9000/solr/products_shard4_replica1/|http://192.168.1.211:9000/solr/products_shard4_replica2/=10=2=*:*=1458055805759=true=javabin&_=1458055814096
>> }
>> > hits=9005295 status=0 QTime=3
>> > 213942045 INFO  (qtp1013423070-21042) [c:products s:shard2 r:core_node7
>> > x:products_shard2_replica2] o.a.s.c.S.Request [products_shard2_replica2]
>> > webapp=/solr path=/select
>> > params={q=*:*=true=json&_=1458055814096} hits=18309764 status=0
>> > QTime=81
>> >
>> >
>> > Search 2 - numFound 27072144
>> > 213995779 INFO  (qtp1013423070-21046) [c:products s:shard5 r:core_node4
>> > x:products_shard5_replica2] o.a.s.c.S.Request [products_shard5_replica2]
>> > webapp=/solr path=/select
>> >
>> params={df=text=false=id=score=4=0=true=
>> >
>> 

SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-15 Thread Kelly, Frank
Just wondering if my observation of SolrCloud behavior after ZooKeeper loses a 
quorum is normal or to-be-expected

Version of Solr 5.3.1
Version of ZooKeeper: 3.4.7
Using SolrCloud with external ZooKeeper
Deployed on AWS

Our Zookeeper ensemble consists of three nodes with the same config e.g.

$ more ../conf/zoo.cfg
tickTime=2000
dataDir=/var/zookeeper
dataLogDir=/var/log/zookeeper
clientPort=2181
initLimit=10
syncLimit=5
standaloneEnabled=false
server.1=zookeeper1.qa.eu-west-1.mysearch.com:2888:3888
server.2=zookeeper2.qa.eu-west-1.mysearch.com:2888:3888
server.3=zookeeper3.qa.eu-west-1.mysearch.com:2888:3888

If we terminate one of the zookeeper nodes we get a ZK election (and I think) a 
quorum is maintained.
Operation continues OK and we detect the terminated instance and relaunch a new 
ZK node which comes up fine

If we terminate two of the ZK nodes we lose a quorum and then we observe the 
following

1.1) Admin UI shows the following
[cid:7B4ADA74-9257-4B60-8109-F8EF0C4E2125]

1.2) SolrJ returns the following

org.apache.solr.common.SolrException: Could not load collection from 
ZK:qa_eu-west-1_public_index
at 
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:850)
at org.apache.solr.common.cloud.ZkStateReader$7.get(ZkStateReader.java:515)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)
at 
com.here.scbe.search.solr.SolrFacadeImpl.addToSearchIndex(SolrFacadeImpl.java:112)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/collections/qa_eu-west-1_public_index/state.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
at 
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:841)
... 24 more

This makes sense based on our understanding.
When our AutoScale groups launch two new ZooKeeper nodes, initialize them, fix 
the DNS etc. we regain a quorum but at this point

2.1) Admin UI shows the shards as “GONE”
[cid:DC7412DD-FF95-4DE1-AA4E-9C6F7A47C74C]
2.2) SolrJ returns the same error even though the ZooKeeper DNS names are now 
bound to new IP addresses

So at this point I restart the Solr nodes. At this point then

3.1) Admin UI shows the following – yeah the nodes are back!
[cid:765921A3-CE96-4989-9C46-838F96A8F05B]

3.2) SolrJ Client still shows the same error – namely

org.apache.solr.common.SolrException: Could not load collection from 
ZK:qa_eu-west-1_here_account
at 
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:850)
at org.apache.solr.common.cloud.ZkStateReader$7.get(ZkStateReader.java:515)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:825)
at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:788)
at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:803)
at com.here.scbe.search.solr.SolrFacadeImpl.deleteById(SolrFacadeImpl.java:257)
.
.
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/collections/qa_eu-west-1_here_account/state.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
at 

Re: Inconsistent Shard Usage for Distributed Queries

2016-03-15 Thread Nick Vasilyev
I reloaded the collection and ran distrib=false query for several shards on
both replicas. The counts matched exactly.

I then reloaded the second replica (through the UI) and now it seems like
it is working fine, I am getting consistent matches.

Not sure what the issue was, in previous versions of Solr, clicking reload
would send a commit to all replicas, right? Is that still the case?



On Tue, Mar 15, 2016 at 11:53 AM, Erick Erickson 
wrote:

> This is very strange. What are the results you get when
> you compare replicas in th e_same_ shard? It doesn't really
> mean anything when you say
> "shard1 has X docs, shard2 has Y docs". The only way
> you should be getting different results from
> the match all docs query is if different replicas within the
> _same_ shard have different counts.
>
> And just as a sanity check, issue a commit. It's highly unlikely
> that you have uncommitted changes, but it never hurts to try.
>
> All distributed queries should have a sub query sent to one
> replica of each shard, is that what you're seeing? And I'd ping
> the cores  directly rather than provide shards parameters,
> something like:
>
> blha blah blah/products/query/shard1_core3/query?q=*:*. That
> addresses the specific core rather than rely on any internal query
> routing logic..
>
> Best,
> Erick
>
> On Tue, Mar 15, 2016 at 8:43 AM, Nick Vasilyev 
> wrote:
> > Hello,
> >
> > I have a brand new installation of Solr 5.4.1 and I am running into a
> > strange problem with one of my collections. Collection *products* has 5
> > shards and replication factor of two. Both replicas are up and show green
> > status on the Cloud page in the UI.
> >
> > When I run a default search on the query page (q=*:*) I always get a
> > different numFound although there is no active indexing and everything is
> > committed. I checked the logs and it looks like every time it runs a
> > search, it is sent to different shards. Below, search1 went to shard 5, 2
> > and 4, search2 went to shard 5, 3, 1 and search 3 went to shard 3, 4, 1,
> 5.
> >
> > To confirm this, I ran a =false query on shard 5 and got
> 8,928,379
> > items, 8,917,318 for shard 2, and 9,005,295 for shard 4. The results from
> > shard 2 distrib=false query did not match the results that were in the
> > distributed query (from the logs). The query returned 8917318. Here is
> the
> > log entry for the query.
> >
> > 214467874 INFO  (qtp1013423070-21019) [c:products s:shard2 r:core_node7
> > x:products_shard2_replica2] o.a.s.c.S.Request [products_shard2_replica2]
> > webapp=/solr path=/select
> > params={q=*:*=false=true=json&_=1458056340020}
> > hits=8917318 status=0 QTime=0
> >
> >
> > Here are the logs from other queries.
> >
> > Search 1 - numFound 18309764
> >
> > 213941984 INFO  (qtp1013423070-21046) [c:products s:shard5 r:core_node4
> > x:products_shard5_replica2] o.a.s.c.S.Request [products_shard5_replica2]
> > webapp=/solr path=/select
> >
> params={df=text=false=id=score=4=0=true=
> >
> http://192.168.1.211:9000/solr/products_shard5_replica2/|http://192.168.1.212:9000/solr/products_shard5_replica1/=10=2=*:*=1458055805759=true=javabin&_=1458055814096
> }
> > hits=8928379 status=0 QTime=3
> > 213941985 INFO  (qtp1013423070-21028) [c:products s:shard4 r:core_node6
> > x:products_shard4_replica2] o.a.s.c.S.Request [products_shard4_replica2]
> > webapp=/solr path=/select
> >
> params={df=text=false=id=score=4=0=true=
> >
> http://192.168.1.212:9000/solr/products_shard4_replica1/|http://192.168.1.211:9000/solr/products_shard4_replica2/=10=2=*:*=1458055805759=true=javabin&_=1458055814096
> }
> > hits=9005295 status=0 QTime=3
> > 213942045 INFO  (qtp1013423070-21042) [c:products s:shard2 r:core_node7
> > x:products_shard2_replica2] o.a.s.c.S.Request [products_shard2_replica2]
> > webapp=/solr path=/select
> > params={q=*:*=true=json&_=1458055814096} hits=18309764 status=0
> > QTime=81
> >
> >
> > Search 2 - numFound 27072144
> > 213995779 INFO  (qtp1013423070-21046) [c:products s:shard5 r:core_node4
> > x:products_shard5_replica2] o.a.s.c.S.Request [products_shard5_replica2]
> > webapp=/solr path=/select
> >
> params={df=text=false=id=score=4=0=true=
> >
> http://192.168.1.211:9000/solr/products_shard5_replica2/|http://192.168.1.212:9000/solr/products_shard5_replica1/=10=2=*:*=1458055859563=true=javabin&_=1458055867894
> }
> > hits=8928379 status=0 QTime=1
> > 213995781 INFO  (qtp1013423070-20985) [c:products s:shard3 r:core_node10
> > x:products_shard3_replica2] o.a.s.c.S.Request [products_shard3_replica2]
> > webapp=/solr path=/select
> >
> params={df=text=false=id=score=4=0=true=
> >
> http://192.168.1.212:9000/solr/products_shard3_replica1/|http://192.168.1.211:9000/solr/products_shard3_replica2/=10=2=*:*=1458055859563=true=javabin&_=1458055867894
> }
> > hits=8980542 status=0 QTime=3
> > 213995785 INFO  (qtp1013423070-21042) [c:products s:shard1 r:core_node9
> > x:products_shard1_replica2] o.a.s.c.S.Request 

Re: New to Solr 5.5

2016-03-15 Thread Bhanu Prasad
So I was able to create the core using the cli not with root but with
'solr' user access. I sudo as solr in the box and ran the command to create
the core

[solr@solr bin]$ ./solr create -c demo

Copying configuration to new core instance directory:
/var/solr/data/demo

Creating new core 'demo' using
command:http://localhost:8983/solr/admin/cores?action=CREATE=demo=demo

{
  "responseHeader":{
"status":0,
"QTime":2724},
  "core":"demo"}

This created the required file/folder structure under "/var/solr/data" It
has all the default files, I need some direction to configure existing
Cassandra keyspace into my solr core. I have already placed the Cassandra
jdbc drivers into my Java/lib, I believe I should be using the 'DIH'
example configs for getting the data into solr ?

Regards,
Bhanu Prasad

On Tue, Mar 15, 2016 at 11:38 AM, Erick Erickson 
wrote:

> Creating a core in stand-alone mode _also_ requires that the config files
> are findable, in this case on disk rather than on Zookeeper
> (sorry for the misdirection).
>
> So you need to create a directory, usually under solr_home that has
> the conf directory in it. That is the "instanceDir" that is one of the
> core creation
> parameters.
>
> Yes, this is a little arcane
>
> Best,
> Erick
>
> On Tue, Mar 15, 2016 at 7:34 AM, Bhanu Prasad 
> wrote:
> > I went through the solr-ref-guide. I got a brief idea on how it works,
> But
> > I can't help to think as to why I am unable to create a core through web
> UI
> > ? Does it have any dependency on SolrCloud ? If I am planning to run an
> > standalone instance do I need to create core's only through the command
> > line and with 'solr' user privileges ? Please advise.
> >
> >
> >
> > On Mon, Mar 14, 2016 at 6:26 PM, Erick Erickson  >
> > wrote:
> >
> >> OK, take Cassandra out of it for the time being and spend
> >> some time familiarizing yourself with Solr would be my
> >> advice ;)
> >>
> >> Yeah, the Solr documentation is a bit scattered, but your most
> >> complete and up to date reference is the Solr reference guide
> >> here:
> >>
> >> In particular, see the
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
> >> upconfig command.
> >>
> >> You can download the complete reference guide through the link in the
> >> upper left.
> >>
> >> The general idea here is that your configurations (solrconfig.xml,
> >> schema.xml and all the rest) are
> >> kept in Zookeeper. When creating a collection, you must reference that
> >> configuration set. The
> >> examples automatically push the configuration set up to Zookeeper,
> >> which you can see in the
> >> adminUI>>cloud>>tree view.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Mar 14, 2016 at 12:20 PM, Bhanu Prasad 
> >> wrote:
> >> > Hi Erick,
> >> >
> >> > I am very new to this, I haven't uploaded any configsets. I need help
> to
> >> > get existing cassandra keyspace into solr to do analysis. I am
> completely
> >> > new to this technology so having trouble with finding right
> documentation
> >> > on how to do it.
> >> >
> >> > Regards,
> >> > Bhanu
> >> >
> >> > On Mon, Mar 14, 2016 at 3:11 PM, Erick Erickson <
> erickerick...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> What configsets have you uploaded to Solr? The canned example does
> >> >> this for you. The configurations must reside in Zookeeper, NOT on the
> >> >> local disk. I think that's probably what you're seeing...
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Mon, Mar 14, 2016 at 11:33 AM, Bhanu Prasad <
> bhanupras...@gmail.com>
> >> >> wrote:
> >> >> > I was able to create a core using create -c option, But this time
> with
> >> >> user
> >> >> > as solr. It worked. How can I make sure that the solr user is
> running
> >> the
> >> >> > webapplication command requests as well ? Any help ?
> >> >> >
> >> >> > [solr@solr bin]$ ./solr create -c cassie
> >> >> >
> >> >> > Copying configuration to new core instance directory:
> >> >> > /var/solr/data/cassie
> >> >> >
> >> >> > Creating new core 'cassie' using command:
> >> >> >
> >> >>
> >>
> http://localhost:8983/solr/admin/cores?action=CREATE=cassie=cassie
> >> >> >
> >> >> > {
> >> >> >   "responseHeader":{
> >> >> > "status":0,
> >> >> > "QTime":709},
> >> >> >   "core":"cassie"}
> >> >> >
> >> >> > Regards,
> >> >> > Bhanu Prasad
> >> >> >
> >> >> > On Mon, Mar 14, 2016 at 1:30 PM, Bhanu Prasad <
> bhanupras...@gmail.com
> >> >
> >> >> > wrote:
> >> >> >
> >> >> >> Hello,
> >> >> >>
> >> >> >>
> >> >> >> I installed a new solr instance in lab on Cent OS 7
> >> >> >>
> >> >> >> # java -version
> >> >> >> java version "1.8.0_72"
> >> >> >> Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
> >> >> >> Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)
> >> >> >>
> >> >> >> #wget
> >> http://apache.mirror.gtcomm.net/lucene/solr/5.5.0/solr-5.5.0.tgz
> >> >> >> #tar -zxvf 

Re: accessing data in hdfs by solr in standalone mode

2016-03-15 Thread Erick Erickson
This should just be using the HDFS directory in solrconfig.xml
just like SolrCloud. There's nothing that I know of that would
prevent that.

Best,
Erick

On Mon, Mar 14, 2016 at 10:14 PM, vidya  wrote:
> Hi
>can solr access the data from HDFS in standalone mode? If so, can u brief
> how it is done.
>
>   Thnaks in advance
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/accessing-data-in-hdfs-by-solr-in-standalone-mode-tp4263805.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent Shard Usage for Distributed Queries

2016-03-15 Thread Erick Erickson
This is very strange. What are the results you get when
you compare replicas in th e_same_ shard? It doesn't really
mean anything when you say
"shard1 has X docs, shard2 has Y docs". The only way
you should be getting different results from
the match all docs query is if different replicas within the
_same_ shard have different counts.

And just as a sanity check, issue a commit. It's highly unlikely
that you have uncommitted changes, but it never hurts to try.

All distributed queries should have a sub query sent to one
replica of each shard, is that what you're seeing? And I'd ping
the cores  directly rather than provide shards parameters,
something like:

blha blah blah/products/query/shard1_core3/query?q=*:*. That
addresses the specific core rather than rely on any internal query
routing logic..

Best,
Erick

On Tue, Mar 15, 2016 at 8:43 AM, Nick Vasilyev  wrote:
> Hello,
>
> I have a brand new installation of Solr 5.4.1 and I am running into a
> strange problem with one of my collections. Collection *products* has 5
> shards and replication factor of two. Both replicas are up and show green
> status on the Cloud page in the UI.
>
> When I run a default search on the query page (q=*:*) I always get a
> different numFound although there is no active indexing and everything is
> committed. I checked the logs and it looks like every time it runs a
> search, it is sent to different shards. Below, search1 went to shard 5, 2
> and 4, search2 went to shard 5, 3, 1 and search 3 went to shard 3, 4, 1, 5.
>
> To confirm this, I ran a =false query on shard 5 and got 8,928,379
> items, 8,917,318 for shard 2, and 9,005,295 for shard 4. The results from
> shard 2 distrib=false query did not match the results that were in the
> distributed query (from the logs). The query returned 8917318. Here is the
> log entry for the query.
>
> 214467874 INFO  (qtp1013423070-21019) [c:products s:shard2 r:core_node7
> x:products_shard2_replica2] o.a.s.c.S.Request [products_shard2_replica2]
> webapp=/solr path=/select
> params={q=*:*=false=true=json&_=1458056340020}
> hits=8917318 status=0 QTime=0
>
>
> Here are the logs from other queries.
>
> Search 1 - numFound 18309764
>
> 213941984 INFO  (qtp1013423070-21046) [c:products s:shard5 r:core_node4
> x:products_shard5_replica2] o.a.s.c.S.Request [products_shard5_replica2]
> webapp=/solr path=/select
> params={df=text=false=id=score=4=0=true=
> http://192.168.1.211:9000/solr/products_shard5_replica2/|http://192.168.1.212:9000/solr/products_shard5_replica1/=10=2=*:*=1458055805759=true=javabin&_=1458055814096}
> hits=8928379 status=0 QTime=3
> 213941985 INFO  (qtp1013423070-21028) [c:products s:shard4 r:core_node6
> x:products_shard4_replica2] o.a.s.c.S.Request [products_shard4_replica2]
> webapp=/solr path=/select
> params={df=text=false=id=score=4=0=true=
> http://192.168.1.212:9000/solr/products_shard4_replica1/|http://192.168.1.211:9000/solr/products_shard4_replica2/=10=2=*:*=1458055805759=true=javabin&_=1458055814096}
> hits=9005295 status=0 QTime=3
> 213942045 INFO  (qtp1013423070-21042) [c:products s:shard2 r:core_node7
> x:products_shard2_replica2] o.a.s.c.S.Request [products_shard2_replica2]
> webapp=/solr path=/select
> params={q=*:*=true=json&_=1458055814096} hits=18309764 status=0
> QTime=81
>
>
> Search 2 - numFound 27072144
> 213995779 INFO  (qtp1013423070-21046) [c:products s:shard5 r:core_node4
> x:products_shard5_replica2] o.a.s.c.S.Request [products_shard5_replica2]
> webapp=/solr path=/select
> params={df=text=false=id=score=4=0=true=
> http://192.168.1.211:9000/solr/products_shard5_replica2/|http://192.168.1.212:9000/solr/products_shard5_replica1/=10=2=*:*=1458055859563=true=javabin&_=1458055867894}
> hits=8928379 status=0 QTime=1
> 213995781 INFO  (qtp1013423070-20985) [c:products s:shard3 r:core_node10
> x:products_shard3_replica2] o.a.s.c.S.Request [products_shard3_replica2]
> webapp=/solr path=/select
> params={df=text=false=id=score=4=0=true=
> http://192.168.1.212:9000/solr/products_shard3_replica1/|http://192.168.1.211:9000/solr/products_shard3_replica2/=10=2=*:*=1458055859563=true=javabin&_=1458055867894}
> hits=8980542 status=0 QTime=3
> 213995785 INFO  (qtp1013423070-21042) [c:products s:shard1 r:core_node9
> x:products_shard1_replica2] o.a.s.c.S.Request [products_shard1_replica2]
> webapp=/solr path=/select
> params={df=text=false=id=score=4=0=true=
> http://192.168.1.212:9000/solr/products_shard1_replica1/|http://192.168.1.211:9000/solr/products_shard1_replica2/=10=2=*:*=1458055859563=true=javabin&_=1458055867894}
> hits=8914801 status=0 QTime=3
> 213995798 INFO  (qtp1013423070-21028) [c:products s:shard2 r:core_node7
> x:products_shard2_replica2] o.a.s.c.S.Request [products_shard2_replica2]
> webapp=/solr path=/select
> params={q=*:*=true=json&_=1458055867894} hits=27072144 status=0
> QTime=30
>
>
> Search 3 - numFound 35953734
>
> 214022457 INFO  (qtp1013423070-21019) [c:products s:shard3 r:core_node10
> 

Re: Understanding parsed queries.

2016-03-15 Thread Erick Erickson
The query parsing is not strict Boolean logic, here's a great
writeup on the topic:
https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/

The outer "+" is simply the entire clause (of which there is only one)
must be present, i.e. it's the whole query.

My guess as to why the counts are the same with and without the fl
term is that it's present only in docs with term2 and term3 in them
perhaps?

Best,
Erick

Best,
Erick

On Tue, Mar 15, 2016 at 12:22 AM, Modassar Ather  wrote:
> Hi,
>
> Kindly help me understand the parsed queries of following three queries.
> How these parsed queries can be interpreted for boolean logic.
> Please ignore the boost part.
>
> *Query : *fl:term1 OR fl:term2 AND fl:term3
> *"parsedquery_toString" : *"boost(+(fl:term1 +fl:term2
> +fl:term3),int(doc_wt))",
> *matches : *50685
>
> The above query seems to be ignoring the fl:term1 as the result of fl:term2
> AND fl:term3 is exactly 50685.
>
> *Query : *fl:term1 OR (fl:term2 AND fl:term3)
> *parsedquery_toString:* "boost(+(fl:term1 (+fl:term2
> +fl:term3)),int(doc_wt))",
> *matches : *809006
>
> *Query : *(fl:term1 OR fl:term2) AND fl:term3
> *parsedquery_toString:* "boost(+(+(fl:term1 fl:term2)
> +fl:term3),int(doc_wt))",
> *matches : *293949
>
> Per my understanding the terms having + is a must and must be present in
> the document whereas a term without it may or may not be present but query
> one seems to be ignoring the first term completely.
> How the outer plus defines the behavior. E.g. *outer +* in query +(fl:term1
> +fl:term2 +fl:term3)
>
> Thanks,
> Modassar


Inconsistent Shard Usage for Distributed Queries

2016-03-15 Thread Nick Vasilyev
Hello,

I have a brand new installation of Solr 5.4.1 and I am running into a
strange problem with one of my collections. Collection *products* has 5
shards and replication factor of two. Both replicas are up and show green
status on the Cloud page in the UI.

When I run a default search on the query page (q=*:*) I always get a
different numFound although there is no active indexing and everything is
committed. I checked the logs and it looks like every time it runs a
search, it is sent to different shards. Below, search1 went to shard 5, 2
and 4, search2 went to shard 5, 3, 1 and search 3 went to shard 3, 4, 1, 5.

To confirm this, I ran a =false query on shard 5 and got 8,928,379
items, 8,917,318 for shard 2, and 9,005,295 for shard 4. The results from
shard 2 distrib=false query did not match the results that were in the
distributed query (from the logs). The query returned 8917318. Here is the
log entry for the query.

214467874 INFO  (qtp1013423070-21019) [c:products s:shard2 r:core_node7
x:products_shard2_replica2] o.a.s.c.S.Request [products_shard2_replica2]
webapp=/solr path=/select
params={q=*:*=false=true=json&_=1458056340020}
hits=8917318 status=0 QTime=0


Here are the logs from other queries.

Search 1 - numFound 18309764

213941984 INFO  (qtp1013423070-21046) [c:products s:shard5 r:core_node4
x:products_shard5_replica2] o.a.s.c.S.Request [products_shard5_replica2]
webapp=/solr path=/select
params={df=text=false=id=score=4=0=true=
http://192.168.1.211:9000/solr/products_shard5_replica2/|http://192.168.1.212:9000/solr/products_shard5_replica1/=10=2=*:*=1458055805759=true=javabin&_=1458055814096}
hits=8928379 status=0 QTime=3
213941985 INFO  (qtp1013423070-21028) [c:products s:shard4 r:core_node6
x:products_shard4_replica2] o.a.s.c.S.Request [products_shard4_replica2]
webapp=/solr path=/select
params={df=text=false=id=score=4=0=true=
http://192.168.1.212:9000/solr/products_shard4_replica1/|http://192.168.1.211:9000/solr/products_shard4_replica2/=10=2=*:*=1458055805759=true=javabin&_=1458055814096}
hits=9005295 status=0 QTime=3
213942045 INFO  (qtp1013423070-21042) [c:products s:shard2 r:core_node7
x:products_shard2_replica2] o.a.s.c.S.Request [products_shard2_replica2]
webapp=/solr path=/select
params={q=*:*=true=json&_=1458055814096} hits=18309764 status=0
QTime=81


Search 2 - numFound 27072144
213995779 INFO  (qtp1013423070-21046) [c:products s:shard5 r:core_node4
x:products_shard5_replica2] o.a.s.c.S.Request [products_shard5_replica2]
webapp=/solr path=/select
params={df=text=false=id=score=4=0=true=
http://192.168.1.211:9000/solr/products_shard5_replica2/|http://192.168.1.212:9000/solr/products_shard5_replica1/=10=2=*:*=1458055859563=true=javabin&_=1458055867894}
hits=8928379 status=0 QTime=1
213995781 INFO  (qtp1013423070-20985) [c:products s:shard3 r:core_node10
x:products_shard3_replica2] o.a.s.c.S.Request [products_shard3_replica2]
webapp=/solr path=/select
params={df=text=false=id=score=4=0=true=
http://192.168.1.212:9000/solr/products_shard3_replica1/|http://192.168.1.211:9000/solr/products_shard3_replica2/=10=2=*:*=1458055859563=true=javabin&_=1458055867894}
hits=8980542 status=0 QTime=3
213995785 INFO  (qtp1013423070-21042) [c:products s:shard1 r:core_node9
x:products_shard1_replica2] o.a.s.c.S.Request [products_shard1_replica2]
webapp=/solr path=/select
params={df=text=false=id=score=4=0=true=
http://192.168.1.212:9000/solr/products_shard1_replica1/|http://192.168.1.211:9000/solr/products_shard1_replica2/=10=2=*:*=1458055859563=true=javabin&_=1458055867894}
hits=8914801 status=0 QTime=3
213995798 INFO  (qtp1013423070-21028) [c:products s:shard2 r:core_node7
x:products_shard2_replica2] o.a.s.c.S.Request [products_shard2_replica2]
webapp=/solr path=/select
params={q=*:*=true=json&_=1458055867894} hits=27072144 status=0
QTime=30


Search 3 - numFound 35953734

214022457 INFO  (qtp1013423070-21019) [c:products s:shard3 r:core_node10
x:products_shard3_replica2] o.a.s.c.S.Request [products_shard3_replica2]
webapp=/solr path=/select
params={df=text=false=id=score=4=0=true=
http://192.168.1.212:9000/solr/products_shard3_replica1/|http://192.168.1.211:9000/solr/products_shard3_replica2/=10=2=*:*=1458055886247=true=javabin&_=1458055894580}
hits=8980542 status=0 QTime=0
214022458 INFO  (qtp1013423070-21036) [c:products s:shard4 r:core_node6
x:products_shard4_replica2] o.a.s.c.S.Request [products_shard4_replica2]
webapp=/solr path=/select
params={df=text=false=id=score=4=0=true=
http://192.168.1.212:9000/solr/products_shard4_replica1/|http://192.168.1.211:9000/solr/products_shard4_replica2/=10=2=*:*=1458055886247=true=javabin&_=1458055894580}
hits=9005295 status=0 QTime=1
214022459 INFO  (qtp1013423070-21046) [c:products s:shard1 r:core_node9
x:products_shard1_replica2] o.a.s.c.S.Request [products_shard1_replica2]
webapp=/solr path=/select
params={df=text=false=id=score=4=0=true=

Re: New to Solr 5.5

2016-03-15 Thread Erick Erickson
Creating a core in stand-alone mode _also_ requires that the config files
are findable, in this case on disk rather than on Zookeeper
(sorry for the misdirection).

So you need to create a directory, usually under solr_home that has
the conf directory in it. That is the "instanceDir" that is one of the
core creation
parameters.

Yes, this is a little arcane

Best,
Erick

On Tue, Mar 15, 2016 at 7:34 AM, Bhanu Prasad  wrote:
> I went through the solr-ref-guide. I got a brief idea on how it works, But
> I can't help to think as to why I am unable to create a core through web UI
> ? Does it have any dependency on SolrCloud ? If I am planning to run an
> standalone instance do I need to create core's only through the command
> line and with 'solr' user privileges ? Please advise.
>
>
>
> On Mon, Mar 14, 2016 at 6:26 PM, Erick Erickson 
> wrote:
>
>> OK, take Cassandra out of it for the time being and spend
>> some time familiarizing yourself with Solr would be my
>> advice ;)
>>
>> Yeah, the Solr documentation is a bit scattered, but your most
>> complete and up to date reference is the Solr reference guide
>> here:
>>
>> In particular, see the
>>
>> https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
>> upconfig command.
>>
>> You can download the complete reference guide through the link in the
>> upper left.
>>
>> The general idea here is that your configurations (solrconfig.xml,
>> schema.xml and all the rest) are
>> kept in Zookeeper. When creating a collection, you must reference that
>> configuration set. The
>> examples automatically push the configuration set up to Zookeeper,
>> which you can see in the
>> adminUI>>cloud>>tree view.
>>
>> Best,
>> Erick
>>
>> On Mon, Mar 14, 2016 at 12:20 PM, Bhanu Prasad 
>> wrote:
>> > Hi Erick,
>> >
>> > I am very new to this, I haven't uploaded any configsets. I need help to
>> > get existing cassandra keyspace into solr to do analysis. I am completely
>> > new to this technology so having trouble with finding right documentation
>> > on how to do it.
>> >
>> > Regards,
>> > Bhanu
>> >
>> > On Mon, Mar 14, 2016 at 3:11 PM, Erick Erickson > >
>> > wrote:
>> >
>> >> What configsets have you uploaded to Solr? The canned example does
>> >> this for you. The configurations must reside in Zookeeper, NOT on the
>> >> local disk. I think that's probably what you're seeing...
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, Mar 14, 2016 at 11:33 AM, Bhanu Prasad 
>> >> wrote:
>> >> > I was able to create a core using create -c option, But this time with
>> >> user
>> >> > as solr. It worked. How can I make sure that the solr user is running
>> the
>> >> > webapplication command requests as well ? Any help ?
>> >> >
>> >> > [solr@solr bin]$ ./solr create -c cassie
>> >> >
>> >> > Copying configuration to new core instance directory:
>> >> > /var/solr/data/cassie
>> >> >
>> >> > Creating new core 'cassie' using command:
>> >> >
>> >>
>> http://localhost:8983/solr/admin/cores?action=CREATE=cassie=cassie
>> >> >
>> >> > {
>> >> >   "responseHeader":{
>> >> > "status":0,
>> >> > "QTime":709},
>> >> >   "core":"cassie"}
>> >> >
>> >> > Regards,
>> >> > Bhanu Prasad
>> >> >
>> >> > On Mon, Mar 14, 2016 at 1:30 PM, Bhanu Prasad > >
>> >> > wrote:
>> >> >
>> >> >> Hello,
>> >> >>
>> >> >>
>> >> >> I installed a new solr instance in lab on Cent OS 7
>> >> >>
>> >> >> # java -version
>> >> >> java version "1.8.0_72"
>> >> >> Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
>> >> >> Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)
>> >> >>
>> >> >> #wget
>> http://apache.mirror.gtcomm.net/lucene/solr/5.5.0/solr-5.5.0.tgz
>> >> >> #tar -zxvf solr-5.5.0.tgz
>> >> >> #cd solr-5.5.0
>> >> >>
>> >> >> #bin/install_solr_service.sh /tmp/solr-5.5.0.tgz
>> >> >> #id solr
>> >> >> uid=1000(solr) gid=1000(solr) groups=1000(solr)
>> >> >>
>> >> >> I am getting an error when creating a new core from the UI and CLI.
>> >> Kindly
>> >> >> someone guide me what I am missing ?
>> >> >>
>> >> >> org.apache.solr.common.SolrException: Could not load conf for core
>> >> netflow: Error loading solr config from
>> >> /var/solr/data/netflow/conf/solrconfig.xml
>> >> >>   at
>> >>
>> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:84)
>> >> >>   at
>> >> org.apache.solr.core.CoreContainer.create(CoreContainer.java:812)
>> >> >>   at
>> >> org.apache.solr.core.CoreContainer.create(CoreContainer.java:751)
>> >> >>   at
>> >>
>> org.apache.solr.handler.admin.CoreAdminOperation$1.call(CoreAdminOperation.java:129)
>> >> >>   at
>> >>
>> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:354)
>> >> >>   at
>> >>
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:153)
>> >> >>   at
>> >>
>> 

Re: using data from external service in Solr: value source or auxiliary core?

2016-03-15 Thread Jitka
Charlie, thank you for replying and for the link to the blog about XJoin.
I am still concerned about performance and caching.  Our environment is
distributed and the external service would not be running on localhost.  In
our still experimental set-up, at least, the call to the service is much
too slow to be a viable option without caching.   I am new to Solr caching,
hence all the questions in my original post.

Thanks again,
Jitka

On Mon, Mar 14, 2016 at 2:07 AM, Charlie Hull-3 [via Lucene] <
ml-node+s472066n4263572...@n3.nabble.com> wrote:

> On 11/03/2016 17:36, Jitka wrote:
> > Hello.  Our company uses Solr-4.10 in a distributed environment.  We are
> > considering how best to customize results based on user preferences,
> > information about which is obtained from an external   service.  At
> present
> > the preferences can be expressed as filters, but eventually we might
> want to
> > introduce boosts as well.
>
> You might want to consider XJoin as well: here's a blog post we wrote on
> filtering using price discount data from an external source
>
> http://www.flax.co.uk/blog/2016/01/25/xjoin-solr-part-1-filtering-using-price-discount-data/
>
>
>
> Cheers
>
> Charlie
>
> >
> > The options we are considering are
> >
> > (1) a value source whose FunctionValues() function would make use of
> data
> > from the external service, and
> >
> > (2) a separate core containing docs whose primary key is user_id and
> whose
> > other fields represent data from the service.  We would create filter
> > queries based on joins with this core.
> >
> > Any suggestions would be most welcome.  Are we missing an obvious
> > alternative?
> >
> > I assume that we would want to cache data from the service (in option
> (1))
> > or the join (in option (2)), and possibly also the values of the value
> > source's FunctionValues() function in option (1).  Would it make
> > sense to use SolrCaches for this purpose and register them in
> > solrconfig.xml?   If so, or if not, how would we ensure that these
> caches
> > could be updated whenever the service is updated?  How about ensuring
> that
> > the built-in caches don't prevent Solr from even looking at the custom
> > caches?  We could use the 'cache=false' option for the filterCache;
> would we
> > have to worry about other ones?
> >
> > I see how how a join on a second core could support a filter, but how
> could
> > we use a second core to support boosts?  I suppose that if worse came to
> > worst we could forgo the join and simply make a call to the second core
> on
> > localhost to retrieve data to be used in a value source.  Is that a
> > practical solution?
> >
> > Thanks in advance for your time and advice.
> >
> > Jitka
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/using-data-from-external-service-in-Solr-value-source-or-auxiliary-core-tp4263334.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/using-data-from-external-service-in-Solr-value-source-or-auxiliary-core-tp4263334p4263572.html
> To unsubscribe from using data from external service in Solr: value source
> or auxiliary core?, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-data-from-external-service-in-Solr-value-source-or-auxiliary-core-tp4263334p4263924.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr and units

2016-03-15 Thread Shane McCarthy
Thank you Binoy.  I like simple and this should work well.

On Tue, Mar 15, 2016 at 11:42 AM, Binoy Dalal 
wrote:

> The simplest thing would be to index a separate field for the unit for each
> distinct value you're storing.
>
> On Tue, Mar 15, 2016 at 7:55 PM Shane McCarthy  wrote:
>
> > I am curious if it is possible to have a unit associated with a number in
> > Solr.  I have a field currently that has a value of x where x is an
> integer
> > or float, can I associate a unit with that?  So I know the value is x
> gram
> > or x Watt.
> >
> > Thank you,
> >
> > Shane
> >
> --
> Regards,
> Binoy Dalal
>


Solr switch zookeeper from memory to standalone

2016-03-15 Thread Rachid Bouacheria
Hi,

I am running solr 4.x on 3 servers with zookeper embedded in prod.

Each servers has 1 leader and 2 replicas.

I want to switch zookeper from embedded to standalone.

I want to know if the steps are documented anywhere? I could not find them.

I am worried my index will get messed up if in the transition.

Thank you very much!


Re: Query behavior.

2016-03-15 Thread Jack Krupansky
That was precisely the point of the need for a new Jira - to answer exactly
the questions that you have posed - and that I had proposed as well. Until
some of the senior committers comment on that Jira you won't have answers.
They've painted themselves into a corner and now I am curious how they will
unpaint themselves out of that corner.

-- Jack Krupansky

On Tue, Mar 15, 2016 at 1:46 AM, Modassar Ather 
wrote:

> Thanks Jack for your response.
> The following jira bug for this issue is already present so I have not
> created a new one.
> https://issues.apache.org/jira/browse/SOLR-8812
>
> Kindly help me understand that whether it is possible to achieve search on
> ORed terms as it was done in earlier Solr version.
> Is this behavior intentional or is it a bug? I need to migrate to
> Solr-5.5.0 but not doing so due to this behavior.
>
> Thanks,
> Modassar
>
>
> On Fri, Mar 11, 2016 at 3:18 AM, Jack Krupansky 
> wrote:
>
> > We probably need a Jira to investigate whether this really is an
> explicitly
> > intentional feature change, or whether it really is a bug. And if it
> truly
> > was intentional, how people can work around the change to get the
> desired,
> > pre-5.5 behavior. Personally, I always thought it was a mistake that q.op
> > and mm were so tightly linked in Solr even though they are independent in
> > Lucene.
> >
> > In short, I think people want to be able to set the default behavior for
> > individual terms (MUST vs. SHOULD) if explicit operators are not used,
> and
> > that OR is an explicit operator. And that mm should control only how many
> > SHOULD terms are required (Lucene MinShouldMatch.)
> >
> >
> > -- Jack Krupansky
> >
> > On Thu, Mar 10, 2016 at 3:41 AM, Modassar Ather 
> > wrote:
> >
> > > Thanks Shawn for pointing to the jira issue. I was not sure that if it
> is
> > > an expected behavior or a bug or there could have been a way to get the
> > > desired result.
> > >
> > > Best,
> > > Modassar
> > >
> > > On Thu, Mar 10, 2016 at 11:32 AM, Shawn Heisey 
> > > wrote:
> > >
> > > > On 3/9/2016 10:55 PM, Shawn Heisey wrote:
> > > > > The ~2 syntax, when not attached to a phrase query (quotes) is the
> > way
> > > > > you express a fuzzy query. If it's attached to a query in quotes,
> > then
> > > > > it is a proximity query. I'm not sure whether it means something
> > > > > different when it's attached to a query clause in parentheses,
> > someone
> > > > > with more knowledge will need to comment.
> > > > 
> > > > > https://issues.apache.org/jira/browse/SOLR-8812
> > > >
> > > > After I read SOLR-8812 more closely, it seems that the ~2 syntax with
> > > > parentheses is the way that the effective mm value is expressed for a
> > > > particular query clause in the parsed query.  I've learned something
> > new
> > > > today.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> >
>


need a help

2016-03-15 Thread Adel Mohamed Khalifa
Any help for my problem Please

Hello everybody,
 
I build a website (Java EE ) and want to search in some json files so I 
installed the solr server in an Ubuntu server and create a new core then 
indexing json files and the web searched correctly when I moved my code from 
windows to the server it stopped and cannot connect to solr server I try to 
debug using netbeans in Ubuntu it’s stopped and there is no exception on this 
statement (SolrServer server = new 
HttpSolrServer(“localhost:8983/solr/SearchCore”) ) and it stop.
 
I need for some help Please.
 
Note :- I attached the servlet I used to search and connect to solr server.
 
Regards,
Adel Khalifa




Re: Solr and units

2016-03-15 Thread Binoy Dalal
The simplest thing would be to index a separate field for the unit for each
distinct value you're storing.

On Tue, Mar 15, 2016 at 7:55 PM Shane McCarthy  wrote:

> I am curious if it is possible to have a unit associated with a number in
> Solr.  I have a field currently that has a value of x where x is an integer
> or float, can I associate a unit with that?  So I know the value is x gram
> or x Watt.
>
> Thank you,
>
> Shane
>
-- 
Regards,
Binoy Dalal


Re: New to Solr 5.5

2016-03-15 Thread Bhanu Prasad
I went through the solr-ref-guide. I got a brief idea on how it works, But
I can't help to think as to why I am unable to create a core through web UI
? Does it have any dependency on SolrCloud ? If I am planning to run an
standalone instance do I need to create core's only through the command
line and with 'solr' user privileges ? Please advise.



On Mon, Mar 14, 2016 at 6:26 PM, Erick Erickson 
wrote:

> OK, take Cassandra out of it for the time being and spend
> some time familiarizing yourself with Solr would be my
> advice ;)
>
> Yeah, the Solr documentation is a bit scattered, but your most
> complete and up to date reference is the Solr reference guide
> here:
>
> In particular, see the
>
> https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
> upconfig command.
>
> You can download the complete reference guide through the link in the
> upper left.
>
> The general idea here is that your configurations (solrconfig.xml,
> schema.xml and all the rest) are
> kept in Zookeeper. When creating a collection, you must reference that
> configuration set. The
> examples automatically push the configuration set up to Zookeeper,
> which you can see in the
> adminUI>>cloud>>tree view.
>
> Best,
> Erick
>
> On Mon, Mar 14, 2016 at 12:20 PM, Bhanu Prasad 
> wrote:
> > Hi Erick,
> >
> > I am very new to this, I haven't uploaded any configsets. I need help to
> > get existing cassandra keyspace into solr to do analysis. I am completely
> > new to this technology so having trouble with finding right documentation
> > on how to do it.
> >
> > Regards,
> > Bhanu
> >
> > On Mon, Mar 14, 2016 at 3:11 PM, Erick Erickson  >
> > wrote:
> >
> >> What configsets have you uploaded to Solr? The canned example does
> >> this for you. The configurations must reside in Zookeeper, NOT on the
> >> local disk. I think that's probably what you're seeing...
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Mar 14, 2016 at 11:33 AM, Bhanu Prasad 
> >> wrote:
> >> > I was able to create a core using create -c option, But this time with
> >> user
> >> > as solr. It worked. How can I make sure that the solr user is running
> the
> >> > webapplication command requests as well ? Any help ?
> >> >
> >> > [solr@solr bin]$ ./solr create -c cassie
> >> >
> >> > Copying configuration to new core instance directory:
> >> > /var/solr/data/cassie
> >> >
> >> > Creating new core 'cassie' using command:
> >> >
> >>
> http://localhost:8983/solr/admin/cores?action=CREATE=cassie=cassie
> >> >
> >> > {
> >> >   "responseHeader":{
> >> > "status":0,
> >> > "QTime":709},
> >> >   "core":"cassie"}
> >> >
> >> > Regards,
> >> > Bhanu Prasad
> >> >
> >> > On Mon, Mar 14, 2016 at 1:30 PM, Bhanu Prasad  >
> >> > wrote:
> >> >
> >> >> Hello,
> >> >>
> >> >>
> >> >> I installed a new solr instance in lab on Cent OS 7
> >> >>
> >> >> # java -version
> >> >> java version "1.8.0_72"
> >> >> Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
> >> >> Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)
> >> >>
> >> >> #wget
> http://apache.mirror.gtcomm.net/lucene/solr/5.5.0/solr-5.5.0.tgz
> >> >> #tar -zxvf solr-5.5.0.tgz
> >> >> #cd solr-5.5.0
> >> >>
> >> >> #bin/install_solr_service.sh /tmp/solr-5.5.0.tgz
> >> >> #id solr
> >> >> uid=1000(solr) gid=1000(solr) groups=1000(solr)
> >> >>
> >> >> I am getting an error when creating a new core from the UI and CLI.
> >> Kindly
> >> >> someone guide me what I am missing ?
> >> >>
> >> >> org.apache.solr.common.SolrException: Could not load conf for core
> >> netflow: Error loading solr config from
> >> /var/solr/data/netflow/conf/solrconfig.xml
> >> >>   at
> >>
> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:84)
> >> >>   at
> >> org.apache.solr.core.CoreContainer.create(CoreContainer.java:812)
> >> >>   at
> >> org.apache.solr.core.CoreContainer.create(CoreContainer.java:751)
> >> >>   at
> >>
> org.apache.solr.handler.admin.CoreAdminOperation$1.call(CoreAdminOperation.java:129)
> >> >>   at
> >>
> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:354)
> >> >>   at
> >>
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:153)
> >> >>   at
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
> >> >>   at
> >>
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:676)
> >> >>   at
> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:439)
> >> >>   at
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)
> >> >>   at
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)
> >> >>   at
> >>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> >> 

Solr and units

2016-03-15 Thread Shane McCarthy
I am curious if it is possible to have a unit associated with a number in
Solr.  I have a field currently that has a value of x where x is an integer
or float, can I associate a unit with that?  So I know the value is x gram
or x Watt.

Thank you,

Shane


Re: Re: Avoid Duplication of record in searching

2016-03-15 Thread Jack Krupansky
It's called "live indexing" and is in DSE 4.7:
http://docs.datastax.com/en/datastax_enterprise/4.7/datastax_enterprise/srch/srchConfIncrIndexThruPut.html


-- Jack Krupansky

On Tue, Mar 15, 2016 at 4:41 AM,  wrote:

> Hi Jack,
> I am using DSE Search in  Datastax DSE 4.7.3, Cassandra version -
> Cassandra 2.1.8.689
>
> what is the Recent version of DSE which have - DSE Search also has a
> real-time search feature that does not require commit
>
>
> ---
> Thanks and regards,
> Rajesh Kumar Sountarrajan
> Software Developer - IT Team
>
> Mobile: 91 - 9600984804
> Email - rajeshkuma...@maxval-ip.com
>
>
> - Original Message - Subject: Re: Avoid Duplication of
> record in searching
> From: "Jack Krupansky" 
> Date: 3/14/16 9:57 pm
> To: solr-user@lucene.apache.org
>
> Are you using DSE Search or some custom integration of Solr and Cassandra?
>
>  Generally, changes in Solr are only visible after a commit operation is
>  performed, either an explicit commit or a time-based auto-commit. Recent
>  DSE Search also has a real-time search feature that does not require
> commit
>  - are you using that?
>
>  -- Jack Krupansky
>
>  On Mon, Mar 14, 2016 at 12:18 PM,  wrote:
>
>  > HI,
>  > I am having SOLR Search on Cassandra Table, when I do some updation in
>  > the Cassandra Table to which the SOLR is being configured he Updated
> record
>  > gets Duplicated in SOLR Search.But when we do RE-Index of the SOLR
> there we
>  > are getting unique records.
>  >
>  > We can do re-index every time via application before the search process
> is
>  > started but it will degrade the performance of the search.
>  >
>  > So Kindly anyone point me what can be done to avoid duplication if we
> make
>  > updation to the Cassandra table configured in SOLR.
>  >
>  >
>  > ---
>  > Thanks and regards,
>  > Rajesh Kumar Sountarrajan
>  > Software Developer - IT Team
>  >
>


Re: solr & docker in production

2016-03-15 Thread Jay Potharaju
I have not yet tried in production yet, will post my findings.
Thanks
Jay

> On Mar 14, 2016, at 11:42 PM, Georg Sorst  wrote:
> 
> Hi,
> 
> sounds great!
> 
> Did you run any benchmarks? What's the IO penalty?
> 
> Best,
> Georg
> 
> Jay Potharaju  schrieb am Di., 15. Mär. 2016 04:25:
> 
>> Upayavira,
>> Thanks for the feedback.  I plan to deploy solr on its own instance rather
>> than on instance running multiple applications.
>> 
>> Jay
>> 
>>> On Mon, Mar 14, 2016 at 3:19 PM, Upayavira  wrote:
>>> 
>>> There is a default Docker image for Solr on the Docker Registry. I've
>>> used it to great effect in creating a custom Solr install.
>>> 
>>> The main thing I'd say is that Docker generally encourages you to run
>>> many apps on the same host, whereas Solr benefits hugely from a host of
>>> its own - so don't be misled into installing Solr alongside lots of
>>> other things.
>>> 
>>> Even if the only thing that gets put onto a node is a Docker install,
>>> then a Solr Docker image, it is *still* way easier to do than anything
>>> else I've tried and still very worth it.
>>> 
>>> Upayavira (who doesn't, yet, have Dockerised Solr in production, but
>>> will soon)
>>> 
 On Mon, 14 Mar 2016, at 07:53 PM, Jay Potharaju wrote:
 Hi,
 I was wondering is running solr inside a  docker container. Are there
>> any
 recommendations for this?
 
 
 --
 Thanks
 Jay
>> 
>> 
>> 
>> --
>> Thanks
>> Jay Potharaju
> -- 
> *Georg M. Sorst I CTO*
> FINDOLOGIC GmbH
> 
> 
> 
> Jakob-Haringer-Str. 5a | 5020 Salzburg I T.: +43 662 456708
> E.: g.so...@findologic.com
> www.findologic.com Folgen Sie uns auf: XING
> facebook
>  Twitter
> 
> 
> Wir sehen uns auf dem *Shopware Community Day in Ahaus am 20.05.2016!* Hier
>  Termin
> vereinbaren!
> Wir sehen uns auf der* dmexco in Köln am 14.09. und 15.09.2016!* Hier
>  Termin
> vereinbaren!


Re: Importing data from SQL server to Solr (Event or realtime)

2016-03-15 Thread Shawn Heisey
On 3/15/2016 2:12 AM, Pascal Ruppert wrote:
> Hi,I'd like to know how the DIH handle to update Solr. Does it update after a 
> specific amount of time or is there some trigger that activates the DIH every 
> time something is commited to the RDBMS.
> It would be the best, if there is something like "realtime" synchronization 
> between solr and our sql-servers.
>
> Also, how reliable is the DIH? I read Elasticsearch's JDBC plugin has some 
> problems reconstructing data with too many joins. Are there any issues with 
> solr in that way?

Every modern operating system comes with scheduling capability.  For
UNIX/Linux, it's cron.  For Windows, it's the task scheduler.

Solr itself has no scheduling capability built in ... and with
well-tested and debugged scheduling already available to you, it's not
likely that it ever will.

I use DIH when I need to do a full rebuild on my index.  It has always
been reliable.  I am *indirectly* doing joins -- on the DB server, with
a view.  Solr itself should be unaffected by joins.  If your database
software can handle the join query without problems, DIH should work
with it.

You might need to increase your maxMergeCount setting in solrconfig.xml
(to 6) if you're importing millions of records on a single DIH run, or
you might find that the database connection will time out and close. 
Here's a couple of mailing list messages (both from the same thread)
with some details:

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3c50ed9da9.1030...@elyograg.org%3E
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3c1357883772192-4032440.p...@n3.nabble.com%3E

Thanks,
Shawn



Re: indexing pdf files using post tool

2016-03-15 Thread roshan agarwal
Yes vidya, you just have to use copy field

Roshan

On Tue, Mar 15, 2016 at 3:07 PM, vidya  wrote:

> Hi
> I got data into my content field. But i wanted to have differnt fields to
> be
> allocated for data in my file.How can I achieve this ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/indexing-pdf-files-using-post-tool-tp4263811p4263840.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Roshan Agarwal
Managing Director
Siddhast IP Innovation (P) Ltd
Phone: +91 11-65246257
M:+91 9871549769
email: ros...@siddhast.com
-
About SIDDHAST(www.siddhast.com)
SIDDHAST is a research and analytical company, which provide service in the
following area-Intellectual Property, Market Research, Business
Research,Technology Transfer. The company is Incorporated in March 2007,
and has completed more than 100 assignments.
URL: www.siddhast.com

--
This message (including attachments, if any) is confidential and may be
privileged. Before opening the attachments please check them for viruses
and defects. M/s Siddhast Intellectual Property Innovations Pvt Ltd will
not be responsible for any viruses or defects or any forwarded attachments
emanating either from within SIDDHAST or outside.


Re: indexing pdf files using post tool

2016-03-15 Thread Binoy Dalal
You should use copy fields.
https://cwiki.apache.org/confluence/display/solr/Copying+Fields

On Tue, 15 Mar 2016, 15:07 vidya,  wrote:

> Hi
> I got data into my content field. But i wanted to have differnt fields to
> be
> allocated for data in my file.How can I achieve this ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/indexing-pdf-files-using-post-tool-tp4263811p4263840.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


RE: how to force rescan of core.properties file in solr

2016-03-15 Thread Gian Maria Ricci - aka Alkampfer
Actually a simple solution I've found is unloading the core, and recreating the 
core passing parameters to the action=CREATE api call.

Gian Maria.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: giovedì 10 marzo 2016 15:45
To: solr-user@lucene.apache.org
Subject: Re: how to force rescan of core.properties file in solr

On 3/10/2016 3:00 AM, Gian Maria Ricci - aka Alkampfer wrote:
> but this change in core.properties is not available until I restart 
> the service and Solr does core autodiscovery. Issuing a Core RELOAD 
> does not work.
>
>  
>
> How I can force solr to reload core.properties when I change it?
>

Through your experiments, you have confirmed something that I
suspected:  The core.properties file is only read when Solr first starts up -- 
during core discovery.  I think it would probably be a very major effort to 
change this, but there may be a much easier way that the project could allow  
properties that can change on reload.

>From what I can read in the code, not even the filename in the "properties" 
>element in the core.properties file is re-checked on core reload.  The reload 
>action simply re-uses the CoreDescriptor object, which is where these things 
>are held.

Unless there's another properties file that I'm not aware of that *does* get 
checked when a core gets re-loaded, I think you've got an excellent use case 
for an "Improvement" issue in Jira.

Here's the change that I think Solr needs:  When a core is reloaded, all 
property definitions that originated in the file referenced by the "properties" 
property should be dropped and re-read.

Thanks,
Shawn



Re: indexing pdf files using post tool

2016-03-15 Thread vidya
Hi
I got data into my content field. But i wanted to have differnt fields to be
allocated for data in my file.How can I achieve this ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-pdf-files-using-post-tool-tp4263811p4263840.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue with Auto Suggester Component

2016-03-15 Thread Alessandro Benedetti
Hi Manohar,
I have not clear what should be your ideal ranking of suggestions.

"I want prefix search of
entire keyword to be of high preference (#1 to #5 in the below example)
followed by prefix part of any other string (the last 2 in the below
example). I am not bothered about ordering within 1st and 2nd set.

ABC Corporation
ABCD Corporation
Abc Tech
AbCorporation
ABCD company
The ABC Company
The ABCDEF"

Could you take the example you posted, show an example of query and the
expected sort order ?
According to your description of the problem
Query : abc
1 Criteria : entire keyword to be of high preference
I can't understand why you didn't count #3, #6 but you did #5 .

2 Criteria : followed by prefix part of any other string
It is not that clear, probably you mean all the rest.
Anyway an infix lookup algorithm with a boost for exact search should do
the trick.

Please give us some more details !

Cheers

On Tue, Mar 15, 2016 at 8:19 AM, Manohar Sripada 
wrote:

> Consider the below company names indexed. I want the below auto suggestions
> to be listed when searched for "abc". Basically, I want prefix search of
> entire keyword to be of high preference (#1 to #5 in the below example)
> followed by prefix part of any other string (the last 2 in the below
> example). I am not bothered about ordering within 1st and 2nd set.
>
> ABC Corporation
> ABCD Corporation
> Abc Tech
> AbCorporation
> ABCD company
> The ABC Company
> The ABCDEF
>
> I am using Suggest feature of solr as mentioned in the wiki
> . I used
> different Lookup implementations available, but, I couldn't get the result
> as above. Here's is one sample config I used with BlendedInfixLookupFactory
>
>
>  **
> * businessNameBlendedInfixSuggester1*
> * BlendedInfixLookupFactory*
> * DocumentDictionaryFactory*
> * business_name_suggest*
> * id*
> *text_suggest*
> * business_name*
> * linear*
> * true*
> * /app/solrnode/suggest_test_1_blendedinfix1*
> * 0*
> * true*
> * true*
> * false*
> * *
>
> Can someone please suggest on how I can achieve this?
>
> Thanks,
> Manohar
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: Re: Avoid Duplication of record in searching

2016-03-15 Thread rajeshkumar . s
Hi Jack,
I am using DSE Search in  Datastax DSE 4.7.3, Cassandra version -  
Cassandra 2.1.8.689 
 
what is the Recent version of DSE which have - DSE Search also has a real-time 
search feature that does not require commit
 
 
---
Thanks and regards,
Rajesh Kumar Sountarrajan
Software Developer - IT Team
 
Mobile: 91 - 9600984804
Email - rajeshkuma...@maxval-ip.com
 
 
- Original Message - Subject: Re: Avoid Duplication of record 
in searching
From: "Jack Krupansky" 
Date: 3/14/16 9:57 pm
To: solr-user@lucene.apache.org

Are you using DSE Search or some custom integration of Solr and Cassandra?
 
 Generally, changes in Solr are only visible after a commit operation is
 performed, either an explicit commit or a time-based auto-commit. Recent
 DSE Search also has a real-time search feature that does not require commit
 - are you using that?
 
 -- Jack Krupansky
 
 On Mon, Mar 14, 2016 at 12:18 PM,  wrote:
 
 > HI,
 > I am having SOLR Search on Cassandra Table, when I do some updation in
 > the Cassandra Table to which the SOLR is being configured he Updated record
 > gets Duplicated in SOLR Search.But when we do RE-Index of the SOLR there we
 > are getting unique records.
 >
 > We can do re-index every time via application before the search process is
 > started but it will degrade the performance of the search.
 >
 > So Kindly anyone point me what can be done to avoid duplication if we make
 > updation to the Cassandra table configured in SOLR.
 >
 >
 > ---
 > Thanks and regards,
 > Rajesh Kumar Sountarrajan
 > Software Developer - IT Team
 >


Issue with Auto Suggester Component

2016-03-15 Thread Manohar Sripada
Consider the below company names indexed. I want the below auto suggestions
to be listed when searched for "abc". Basically, I want prefix search of
entire keyword to be of high preference (#1 to #5 in the below example)
followed by prefix part of any other string (the last 2 in the below
example). I am not bothered about ordering within 1st and 2nd set.

ABC Corporation
ABCD Corporation
Abc Tech
AbCorporation
ABCD company
The ABC Company
The ABCDEF

I am using Suggest feature of solr as mentioned in the wiki
. I used
different Lookup implementations available, but, I couldn't get the result
as above. Here's is one sample config I used with BlendedInfixLookupFactory


 **
* businessNameBlendedInfixSuggester1*
* BlendedInfixLookupFactory*
* DocumentDictionaryFactory*
* business_name_suggest*
* id*
*text_suggest*
* business_name*
* linear*
* true*
* /app/solrnode/suggest_test_1_blendedinfix1*
* 0*
* true*
* true*
* false*
* *

Can someone please suggest on how I can achieve this?

Thanks,
Manohar


Importing data from SQL server to Solr (Event or realtime)

2016-03-15 Thread Pascal Ruppert
Hi,I'd like to know how the DIH handle to update Solr. Does it update after a 
specific amount of time or is there some trigger that activates the DIH every 
time something is commited to the RDBMS.
It would be the best, if there is something like "realtime" synchronization 
between solr and our sql-servers.

Also, how reliable is the DIH? I read Elasticsearch's JDBC plugin has some 
problems reconstructing data with too many joins. Are there any issues with 
solr in that way?

Thanks for your help.


Re: Need a group custom function(fieldcollapsing)

2016-03-15 Thread Binoy Dalal
What you need is a search component. I've written an example on how to use
one.
Check https://github.com/lttazz99/SolrPluginsExamples

On Tue, 15 Mar 2016, 13:37 Abhishek Mishra,  wrote:

> Any update on this???
>
> On Mon, Mar 14, 2016 at 4:06 PM, Abhishek Mishra 
> wrote:
>
> > Hi all
> > We are running on solr5.2.1 . Now the requirement come that we need the
> > data on basis on some algo. The algorithm part we need to put on result
> > obtained from query. So best we can do is using
> > group.field,group.main,group.func. In group.func we need to use custom
> > function which will run the algorithm part. My doubts are where we need
> to
> > put custom function in which file??.  I found some articles related to
> this
> > https://dzone.com/articles/how-write-custom-solr
> > in this it's not explained where to put the code part in which file.
> >
> >
> > Regards,
> > Abhishek
> >
>
-- 
Regards,
Binoy Dalal


Re: Need a group custom function(fieldcollapsing)

2016-03-15 Thread Abhishek Mishra
Any update on this???

On Mon, Mar 14, 2016 at 4:06 PM, Abhishek Mishra 
wrote:

> Hi all
> We are running on solr5.2.1 . Now the requirement come that we need the
> data on basis on some algo. The algorithm part we need to put on result
> obtained from query. So best we can do is using
> group.field,group.main,group.func. In group.func we need to use custom
> function which will run the algorithm part. My doubts are where we need to
> put custom function in which file??.  I found some articles related to this
> https://dzone.com/articles/how-write-custom-solr
> in this it's not explained where to put the code part in which file.
>
>
> Regards,
> Abhishek
>


Re: return and highlight the most relevant child with BlockJoinQuery

2016-03-15 Thread michael solomon
Thanks Mikhail,
Regarding the former, :) Do you can elaborate? I didn't understand the
context of the JIRA issue that you mentioned(SOLR-8202).

Regarding highlighting, I think it's possible because:
https://issues.apache.org/jira/browse/LUCENE-5929
BUT HOW?

On Mon, Mar 14, 2016 at 7:28 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Michae,
> Regarding the former, it's not a feature of [child] result transformer, it
> might be separately requested, but I prefer to provide via generic
> SOLR-8202.
>
> Regarding highlighting, I can't comment, I only saw that there is some
> highlighting case for {!parent} queries. Sorry.
>
> On Mon, Mar 14, 2016 at 6:13 PM, michael solomon 
> wrote:
>
> > Hi,
> > how can I *highlight* and *return* the most relevant child with
> > BlockJoinQuery.
> > for this:
> >
> > > {!parent which="is_parent:*" score=max}(title:(terms)
> >
> >
> > I expect to get:
> >
> > .
> > .
> > .
> > docs:[
> >
> > {
> >doc parent
> >_childDocuments_:{the most relevant child}
> > }
> > {
> >doc parent2
> >_childDocuments_:{the most relevant child}
> > }
> > .
> > .
> > .
> >
> > ]
> > highlight:{
> >
> > {doc parent: highlight from the children}
> > {doc parent: highlight from the children}
> >
> > }
> >
> > Thanks a lot,
> > Michae
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Understanding parsed queries.

2016-03-15 Thread Modassar Ather
Hi,

Kindly help me understand the parsed queries of following three queries.
How these parsed queries can be interpreted for boolean logic.
Please ignore the boost part.

*Query : *fl:term1 OR fl:term2 AND fl:term3
*"parsedquery_toString" : *"boost(+(fl:term1 +fl:term2
+fl:term3),int(doc_wt))",
*matches : *50685

The above query seems to be ignoring the fl:term1 as the result of fl:term2
AND fl:term3 is exactly 50685.

*Query : *fl:term1 OR (fl:term2 AND fl:term3)
*parsedquery_toString:* "boost(+(fl:term1 (+fl:term2
+fl:term3)),int(doc_wt))",
*matches : *809006

*Query : *(fl:term1 OR fl:term2) AND fl:term3
*parsedquery_toString:* "boost(+(+(fl:term1 fl:term2)
+fl:term3),int(doc_wt))",
*matches : *293949

Per my understanding the terms having + is a must and must be present in
the document whereas a term without it may or may not be present but query
one seems to be ignoring the first term completely.
How the outer plus defines the behavior. E.g. *outer +* in query +(fl:term1
+fl:term2 +fl:term3)

Thanks,
Modassar


Re: indexing pdf files using post tool

2016-03-15 Thread Binoy Dalal
Do you have a "content" field defined in your schema? Is it stored?

By default, the content from the docs uploaded through post should be
mapped to a field called "content".

On Tue, 15 Mar 2016, 12:47 vidya,  wrote:

> Hi
> I am trying to index a pdf file by using post tool in my linux system,When
> i
> give the command
> bin/post -c core2 -p 8984 /root/solr/My_CV.pdf
> it is showing the search results like
> "response": {
> "numFound": 1,
> "start": 0,
> "docs": [
>   {
> "id": "/root/solr-5.5.0/My_CV.pdf",
> "meta_creation_date": [
>   "2016-03-15T06:22:17Z"
> ],
> "pdf_pdfversion": [
>   1.4
> ],
> "dcterms_created": [
>   "2016-03-15T06:22:17Z"
> ],
> "x_parsed_by": [
>   "org.apache.tika.parser.DefaultParser",
>   "org.apache.tika.parser.pdf.PDFParser"
> ],
> "xmptpg_npages": [
>   1
> ],
> "creation_date": [
>   "2016-03-15T06:22:17Z"
> ],
> "pdf_encrypted": [
>   false
> ],
> "title": [
>   "My CV"
> ],
> "stream_content_type": [
>   "application/pdf"
> ],
> "created": [
>   "Tue Mar 15 06:22:17 UTC 2016"
> ],
> "stream_size": [
>   18289
> ],
> "dc_format": [
>   "application/pdf; version=1.4"
> ],
> "producer": [
>   "wkhtmltopdf"
> ],
> "content_type": [
>   "application/pdf"
> ],
> "xmp_creatortool": [
>   "þÿ"
> ],
> "resourcename": [
>   "/root/solr/My_CV.pdf"
> ],
> "dc_title": [
>   "My CV"
> ],
> "_version_": 1528851429701189600
>   }
>
>
> but not the actual content in pdf file.
> How to index that dat.
> Please help me on this.
> Can post tool be used for indexing data from HDFS ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/indexing-pdf-files-using-post-tool-tp4263811.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


indexing pdf files using post tool

2016-03-15 Thread vidya
Hi
I am trying to index a pdf file by using post tool in my linux system,When i
give the command
bin/post -c core2 -p 8984 /root/solr/My_CV.pdf
it is showing the search results like 
"response": {
"numFound": 1,
"start": 0,
"docs": [
  {
"id": "/root/solr-5.5.0/My_CV.pdf",
"meta_creation_date": [
  "2016-03-15T06:22:17Z"
],
"pdf_pdfversion": [
  1.4
],
"dcterms_created": [
  "2016-03-15T06:22:17Z"
],
"x_parsed_by": [
  "org.apache.tika.parser.DefaultParser",
  "org.apache.tika.parser.pdf.PDFParser"
],
"xmptpg_npages": [
  1
],
"creation_date": [
  "2016-03-15T06:22:17Z"
],
"pdf_encrypted": [
  false
],
"title": [
  "My CV"
],
"stream_content_type": [
  "application/pdf"
],
"created": [
  "Tue Mar 15 06:22:17 UTC 2016"
],
"stream_size": [
  18289
],
"dc_format": [
  "application/pdf; version=1.4"
],
"producer": [
  "wkhtmltopdf"
],
"content_type": [
  "application/pdf"
],
"xmp_creatortool": [
  "þÿ"
],
"resourcename": [
  "/root/solr/My_CV.pdf"
],
"dc_title": [
  "My CV"
],
"_version_": 1528851429701189600
  }


but not the actual content in pdf file.
How to index that dat.
Please help me on this.
Can post tool be used for indexing data from HDFS ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-pdf-files-using-post-tool-tp4263811.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr & docker in production

2016-03-15 Thread Georg Sorst
Hi,

sounds great!

Did you run any benchmarks? What's the IO penalty?

Best,
Georg

Jay Potharaju  schrieb am Di., 15. Mär. 2016 04:25:

> Upayavira,
> Thanks for the feedback.  I plan to deploy solr on its own instance rather
> than on instance running multiple applications.
>
> Jay
>
> On Mon, Mar 14, 2016 at 3:19 PM, Upayavira  wrote:
>
> > There is a default Docker image for Solr on the Docker Registry. I've
> > used it to great effect in creating a custom Solr install.
> >
> > The main thing I'd say is that Docker generally encourages you to run
> > many apps on the same host, whereas Solr benefits hugely from a host of
> > its own - so don't be misled into installing Solr alongside lots of
> > other things.
> >
> > Even if the only thing that gets put onto a node is a Docker install,
> > then a Solr Docker image, it is *still* way easier to do than anything
> > else I've tried and still very worth it.
> >
> > Upayavira (who doesn't, yet, have Dockerised Solr in production, but
> > will soon)
> >
> > On Mon, 14 Mar 2016, at 07:53 PM, Jay Potharaju wrote:
> > > Hi,
> > > I was wondering is running solr inside a  docker container. Are there
> any
> > > recommendations for this?
> > >
> > >
> > > --
> > > Thanks
> > > Jay
> >
>
>
>
> --
> Thanks
> Jay Potharaju
>
-- 
*Georg M. Sorst I CTO*
FINDOLOGIC GmbH



Jakob-Haringer-Str. 5a | 5020 Salzburg I T.: +43 662 456708
E.: g.so...@findologic.com
www.findologic.com Folgen Sie uns auf: XING
facebook
 Twitter


Wir sehen uns auf dem *Shopware Community Day in Ahaus am 20.05.2016!* Hier
 Termin
vereinbaren!
Wir sehen uns auf der* dmexco in Köln am 14.09. und 15.09.2016!* Hier
 Termin
vereinbaren!