from:"John Bickerstaff"

Re: what to try next for replica that will not stay up.

2016-09-01 Thread John Bickerstaff

This may be too simplistic of course, but what if you totally wipe Solr off
that machine, re-install it from scratch, bring it up and let it join the
cluster, then add the replica?

If it was me (since I have the luxury of VM's) I'd turn it off and build a
new VM from the ground up, but I get that this is a luxury not everyone
has...

Possibly it will work, and if it doesn't, that would, to my mind, point to
something somewhere else and not on that machine

Of course, if porting all the data ver to the new replica is too expensive
in time or bandwidth, that wouldn't be a good option...

Feel free to ignore - I haven't read the entire thread carefully...

On Thu, Sep 1, 2016 at 9:45 AM, Jon Hawkesworth <
jon.hawkeswo...@medquist.onmicrosoft.com> wrote:

> Hi
>
>
>
> If anyone has any suggestions of things I could try to resolve my issue
> where one replica on one of my solcloud 6.0.1 shards refuses to stay up,
> I'd love to hear them.  In fact, I'll get you something off your amazon
> wishlist, within reason, if you can solve this puzzle.
>
>
>
> Today we pruned the dead replica, restarted the machine where it ran and
> once the node had rejoined the cluster, we added a new replica.
>
> The replica was marked as Active for about 10 minutes then went down
>
>
>
> I put some example logging from below, but it looks much the same as last
> time.
>
>
>
> There's a bunch of warnings about a checksum being different even though
> the file size is the same and then RecoveryStrategy
>
> reports 'Could not publish as ACTIVE after succesful recovery'
>
>
>
> I think I've found where that message comes from in the code here:
> https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=
> blob;f=solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java;h=
> abd00aef19a731b42b314f8b526cdb2d77baf89f;hb=refs/heads/master
>
> (I am running 6.0.1 though so could have changed in latest devel).
>
>
>
> So it seems this chunk of code…
>
>
>
> 451 if (successfulRecovery) {
>
> 452   LOG.info("Registering as Active after recovery.");
>
> 453   try {
>
> 454 zkController.publish(core.getCoreDescriptor(),
> Replica.State.ACTIVE);
>
> 455   } catch (Exception e) {
>
> 456 LOG.error("Could not publish as ACTIVE after succesful
> recovery", e);
>
> 457 successfulRecovery = false;
>
> 458   }
>
> 459
>
>  460   if (successfulRecovery) {
>
> 461 close = true;
>
> 462 recoveryListener.recovered();
>
> 463   }
>
> 464 }
>
>
>
> results in this:
>
>
>
> org.apache.solr.common.SolrException: Cannot publish state of core
> 'documents_shard1_replica2' as active without recovering first!
>
>at org.apache.solr.cloud.ZkController.publish(
> ZkController.java:1141)
>
>at org.apache.solr.cloud.ZkController.publish(
> ZkController.java:1097)
>
>at org.apache.solr.cloud.ZkController.publish(
> ZkController.java:1093)
>
>at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> RecoveryStrategy.java:457)
>
>at org.apache.solr.cloud.RecoveryStrategy.run(
> RecoveryStrategy.java:224)
>
>at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source)
>
>at java.util.concurrent.FutureTask.run(Unknown Source)
>
>at org.apache.solr.common.util.ExecutorUtil$
> MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>
>at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>at java.lang.Thread.run(Unknown Source)
>
>
>
> I don't yet understand the interaction with zookeeper but there's some
> disagreement about whether recovery has happened or not (if it hadn't from
> solr's point of view the successfulRecovery boolean would presumably be
> false.
>
>
>
> Should I raise a JIRA?  Is there any other useful information I could
> gather?
>
>
>
> I haven't really had any similar problems with the other 3 shards, just
> shard1.
>
>
>
> The nodes that it is running on are all pretty similar - all vms built to
> the same specification and the deployment of java and solrcloud is
> automated so there shouldn't be any differences in the stack.
>
>
>
> Many thanks,
>
>
>
> Jon
>
>
>
>
>
>
>
>
>
> Example log output below
>
>
>
>
>
> WARN false
>
>
>
>
>
> IndexFetcher
>
>
>
> File _jnux.si did not match. expected checksum is 1186898951 and actual
> is checksum 1994281621. expected length is 417 and actual length is 417
>
> 9/1/2016, 12:37:06 PM
>
>
>
> WARN false
>
>
>
>
>
> IndexFetcher
>
>
>
> File _jnuy.nvd did not match. expected checksum is 2200422612 and actual
> is checksum 3635321041. expected length is 63 and actual length is 65
>
> 9/1/2016, 12:37:06 PM
>
>
>
> WARN false
>
>
>
>
>
> IndexFetcher
>
>
>
> File _jnuy.fdx did not match. expected

Re: Mailing list subscriptions

2016-08-31 Thread John Bickerstaff

On that page, look for an "options" link.

Choose "subscribe via email" and there will be some options to choose.

At least that's all I've been able to find.  It doesn't allow "only threads
I've started" but it does allow a once-a-day summary and a few other
choices...

On Wed, Aug 31, 2016 at 6:52 AM, jennifer.coston <
jennifer.cos...@raytheon.com> wrote:

> Hi,
>
> I had the same question and went to the link provided where I can see my
> account information but I don't see a setting to only get messages I ask.
> Can you please provide more directions?
>
> Thank you!
>
> -Jennifer
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Mailing-list-subscriptions-tp4294026p4294177.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: changing the /solr path, additional steps needed for 6.1

2016-08-29 Thread John Bickerstaff

Bless you Chris!  And if you were local, I'd buy you a beer!

This was a big help - I was trying to figure this one out.

On Thu, Aug 25, 2016 at 1:27 PM, Chris Morley  wrote:

> This might help some people:
>
>  To change the URL to server:port/ourspecialpath from server:port/solr is a
> bit inconvenient.  You have to change several files where the solr part of
> the request path is hardcoded:
>
>  server/solr-webapp/webapp/WEB-INF/web.xml
>  server/solr/solr.xml
>  server/contexts/solr-jetty-context.xml
>
>  Now, with the release of the New UI defaulted to on in 6.1, you also have
> to change:
>  server/solr-webapp/webapp/js/angular/services.js
>  (in a bunch of places)
>
>  -Chris.
>
>
>
>

Re: Solr for Multi Tenant architecture

2016-08-27 Thread John Bickerstaff

In my own work, the risk to the business if every single client cannot
access search is so great, we would never consider putting everything in
one.  You should certainly ask that question of the business stakeholders
before you decide.

For that reason, I might recommend that each of the multiple collections
suggested above by Erick could also be on a separate SolrCloud (or single
Solr instance) so that no single failure can ever take down every tenant's
ability to search -- only those on that particular SolrCloud...

On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson 
wrote:

> There's no one right answer here. I've also seen a hybrid approach
> where there are multiple collections each of which has some
> number of tenants resident. Eventually, you need to think of some
> kind of partitioning, my rough number of documents for a single core
> is 50M (NOTE: I've seen between 10M and 300M docs fit in a core).
>
> All that said, you may also be interested in the "transient cores"
> option, see: https://cwiki.apache.org/confluence/display/solr/
> Defining+core.properties
> and the transient and transientCacheSize (this latter in solr.xml). Note
> that this is stand-alone only so you can't move that concept to
> SolrCloud if you eventually go there.
>
> Best,
> Erick
>
> On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha 
> wrote:
> > Dear Solr Members,
> >
> > We are using SolrCloud as the search provider of a multi-tenant cloud
> based
> > application. We have one schema for all the tenants. The indexes will
> have
> > large number(millions) of documents.
> >
> > As of our research, we have two options,
> >
> >- One large collection for all the tenants and use Composite-ID
> routing
> >- Collection per tenant
> >
> > The below mail says,
> >
> >
> > https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
> 201403.mbox/%3c5324cd4b.2020...@protulae.com%3E
> >
> > SolrCloud is *more scalable in terms of index size*. Plus you get
> > redundancy which can't be underestimated in a hosted solution.
> >
> >
> > AND
> >
> > The issue is management. 1000s of cores/collections require a level of
> > automation. On the other hand, having a single core/collection means if
> > you make one change to the schema or solrconfig, it affects everyone.
> >
> >
> > Based on the above facts we think One large collection will be the way to
> > go.
> >
> > Questions:
> >
> >1. Is that the right way to go?
> >2. Will it be a hassle when we need to do reindexing?
> >3. What is the chance of entire collection crash? (in that case all
> >tenants will be affected and reindexing will be painful.
> >
> > Thank you in advance for your kind opinion.
> >
> > Best Regards,
> > Chamil
> >
> > --
> > http://kavimalla.blgospot.com
> > http://kdchamil.blogspot.com
>

Re: /select results different between 5.4 and 6.1

2016-08-19 Thread John Bickerstaff

Many thanks.

On Fri, Aug 19, 2016 at 4:22 PM, Anshum Gupta <ans...@anshumgupta.net>
wrote:

> The default similarity changed from TF-IDF to BM25 in 6.0.
>
> On Fri, Aug 19, 2016 at 3:00 PM John Bickerstaff <j...@johnbickerstaff.com
> >
> wrote:
>
> > Bump!
> >
> > TL;DR Question: Are scores (and debug output) *expected* to be different
> > between 5.4 and 6.1?
> >
> > On Thu, Aug 18, 2016 at 2:44 PM, John Bickerstaff <
> > j...@johnbickerstaff.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > TL:DR -
> > > Is it expected that the /select endpoint would produce different
> > > scores/result order between versions 5.4 and 6.1?
> > >
> > >
> > > (I'm aware that it's certainly possible I've done something different
> to
> > > these environments, although at this point I can't see any difference
> in
> > > configs etc... and I used a very simple search against /select to test
> > this)
> > >
> > > == Detail ==
> > >
> > > I'm currently seeing different scoring and different result order when
> I
> > > compare Solr results in the Admin console for a 5.4 and 6.1
> environment.
> > >
> > > I'm using the /select endpoint to try to avoid any difference in
> > > configuration.  To the best of my knowledge (and reading) I haven't
> ever
> > > modified the xml for that endpoint.
> > >
> > > As I was looking into it, I saw that the debug output looks quite
> > > different in 6.1...
> > >
> > > Any advice, including "You must have broken it yourself, that's
> > > impossible" is much appreciated.
> > >
> > >
> > >
> > > Here's debug from the "old" 5.4 SolrCloud environment.  The id's are a
> > > pain to read, but not only am I getting different scores, I'm getting
> > > different docs (or docs in a clearly different order)
> > >
> > > "debug": { "rawquerystring": "chiari", "querystring": "chiari", "
> > > parsedquery": "text:chiari", "parsedquery_toString": "text:chiari", "
> > > explain": { "d9644f86-5fe2-4a9f-8517-545e2cde0b64": "\n4.3581347 =
> > > weight(text:chiari in 26783) [ClassicSimilarity], result of:\n
> 4.3581347
> > =
> > > fieldWeight in 26783, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> > 1.0
> > > = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> > > fieldNorm(doc=26783)\n", "1347f707-6fdd-4864-b9dd-6d3e7cc32bf5":
> > "\n4.3581347
> > > = weight(text:chiari in 26792) [ClassicSimilarity], result of:\n
> > 4.3581347
> > > = fieldWeight in 26792, product of:\n 1.0 = tf(freq=1.0), with freq
> of:\n
> > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> > 0.625 =
> > > fieldNorm(doc=26792)\n", "d01c32ad-e29d-4b65-9930-f8a6844a2613":
> > "\n4.3581347
> > > = weight(text:chiari in 27028) [ClassicSimilarity], result of:\n
> > 4.3581347
> > > = fieldWeight in 27028, product of:\n 1.0 = tf(freq=1.0), with freq
> of:\n
> > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> > 0.625 =
> > > fieldNorm(doc=27028)\n", "0c5a4be7-1162-4b1a-ab83-4b48a690fc3a":
> > "\n4.3581347
> > > = weight(text:chiari in 27029) [ClassicSimilarity], result of:\n
> > 4.3581347
> > > = fieldWeight in 27029, product of:\n 1.0 = tf(freq=1.0), with freq
> of:\n
> > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> > 0.625 =
> > > fieldNorm(doc=27029)\n", "e1cb441d-9d60-482d-956b-3fbc964a17c1":
> > "\n4.3581347
> > > = weight(text:chiari in 27042) [ClassicSimilarity], result of:\n
> > 4.3581347
> > > = fieldWeight in 27042, product of:\n 1.0 = tf(freq=1.0), with freq
> of:\n
> > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> > 0.625 =
> > > fieldNorm(doc=27042)\n", "f87951f1-e163-4f17-a628-904b9df0c609":
> > "\n4.3581347
> > > = weight(text:chiari in 27043) [ClassicSimilarity], result of:\n
> > 4.3581347
> > > = fieldWeight in 27043, product of:\n 1.0 = tf(freq=1.0), with freq
> of:\n
> > > 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n
> > 0.625 =
> > > fieldNorm(doc=27043)\n", "caaa7ca1-34cb-44a8-8dd9-12c90

Re: /select results different between 5.4 and 6.1

2016-08-19 Thread John Bickerstaff

Bump!

TL;DR Question: Are scores (and debug output) *expected* to be different
between 5.4 and 6.1?

On Thu, Aug 18, 2016 at 2:44 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Hi all,
>
> TL:DR -
> Is it expected that the /select endpoint would produce different
> scores/result order between versions 5.4 and 6.1?
>
>
> (I'm aware that it's certainly possible I've done something different to
> these environments, although at this point I can't see any difference in
> configs etc... and I used a very simple search against /select to test this)
>
> == Detail ==
>
> I'm currently seeing different scoring and different result order when I
> compare Solr results in the Admin console for a 5.4 and 6.1 environment.
>
> I'm using the /select endpoint to try to avoid any difference in
> configuration.  To the best of my knowledge (and reading) I haven't ever
> modified the xml for that endpoint.
>
> As I was looking into it, I saw that the debug output looks quite
> different in 6.1...
>
> Any advice, including "You must have broken it yourself, that's
> impossible" is much appreciated.
>
>
>
> Here's debug from the "old" 5.4 SolrCloud environment.  The id's are a
> pain to read, but not only am I getting different scores, I'm getting
> different docs (or docs in a clearly different order)
>
> "debug": { "rawquerystring": "chiari", "querystring": "chiari", "
> parsedquery": "text:chiari", "parsedquery_toString": "text:chiari", "
> explain": { "d9644f86-5fe2-4a9f-8517-545e2cde0b64": "\n4.3581347 =
> weight(text:chiari in 26783) [ClassicSimilarity], result of:\n 4.3581347 =
> fieldWeight in 26783, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0
> = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> fieldNorm(doc=26783)\n", "1347f707-6fdd-4864-b9dd-6d3e7cc32bf5": "\n4.3581347
> = weight(text:chiari in 26792) [ClassicSimilarity], result of:\n 4.3581347
> = fieldWeight in 26792, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> fieldNorm(doc=26792)\n", "d01c32ad-e29d-4b65-9930-f8a6844a2613": "\n4.3581347
> = weight(text:chiari in 27028) [ClassicSimilarity], result of:\n 4.3581347
> = fieldWeight in 27028, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> fieldNorm(doc=27028)\n", "0c5a4be7-1162-4b1a-ab83-4b48a690fc3a": "\n4.3581347
> = weight(text:chiari in 27029) [ClassicSimilarity], result of:\n 4.3581347
> = fieldWeight in 27029, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> fieldNorm(doc=27029)\n", "e1cb441d-9d60-482d-956b-3fbc964a17c1": "\n4.3581347
> = weight(text:chiari in 27042) [ClassicSimilarity], result of:\n 4.3581347
> = fieldWeight in 27042, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> fieldNorm(doc=27042)\n", "f87951f1-e163-4f17-a628-904b9df0c609": "\n4.3581347
> = weight(text:chiari in 27043) [ClassicSimilarity], result of:\n 4.3581347
> = fieldWeight in 27043, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> fieldNorm(doc=27043)\n", "caaa7ca1-34cb-44a8-8dd9-12c909db8c2d": "\n4.3581347
> = weight(text:chiari in 27044) [ClassicSimilarity], result of:\n 4.3581347
> = fieldWeight in 27044, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> fieldNorm(doc=27044)\n", "ada7a87e-725a-4533-b72e-3817af4c7179": "\n4.3581347
> = weight(text:chiari in 27055) [ClassicSimilarity], result of:\n 4.3581347
> = fieldWeight in 27055, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> fieldNorm(doc=27055)\n", "ac6d47fd-9a59-47d6-8cfb-11b34c7ded54": "\n4.3581347
> = weight(text:chiari in 27056) [ClassicSimilarity], result of:\n 4.3581347
> = fieldWeight in 27056, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
> 1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
> fieldNorm(doc=27056)\n", "4aaa7697-b26a-4bea-ba4e-70d18ea649f0": "\n4.3581347
> = weight(text:chiari in 62240) [ClassicSimilarity], result of:\n 4.3581347
> = fieldWeight in 62240, product of:\n 1

/select results different between 5.4 and 6.1

2016-08-18 Thread John Bickerstaff

Hi all,

TL:DR -
Is it expected that the /select endpoint would produce different
scores/result order between versions 5.4 and 6.1?


(I'm aware that it's certainly possible I've done something different to
these environments, although at this point I can't see any difference in
configs etc... and I used a very simple search against /select to test this)

== Detail ==

I'm currently seeing different scoring and different result order when I
compare Solr results in the Admin console for a 5.4 and 6.1 environment.

I'm using the /select endpoint to try to avoid any difference in
configuration.  To the best of my knowledge (and reading) I haven't ever
modified the xml for that endpoint.

As I was looking into it, I saw that the debug output looks quite different
in 6.1...

Any advice, including "You must have broken it yourself, that's impossible"
is much appreciated.



Here's debug from the "old" 5.4 SolrCloud environment.  The id's are a pain
to read, but not only am I getting different scores, I'm getting different
docs (or docs in a clearly different order)

"debug": { "rawquerystring": "chiari", "querystring": "chiari", "parsedquery":
"text:chiari", "parsedquery_toString": "text:chiari", "explain": { "
d9644f86-5fe2-4a9f-8517-545e2cde0b64": "\n4.3581347 = weight(text:chiari in
26783) [ClassicSimilarity], result of:\n 4.3581347 = fieldWeight in 26783,
product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
fieldNorm(doc=26783)\n", "1347f707-6fdd-4864-b9dd-6d3e7cc32bf5": "\n4.3581347
= weight(text:chiari in 26792) [ClassicSimilarity], result of:\n 4.3581347
= fieldWeight in 26792, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
fieldNorm(doc=26792)\n", "d01c32ad-e29d-4b65-9930-f8a6844a2613": "\n4.3581347
= weight(text:chiari in 27028) [ClassicSimilarity], result of:\n 4.3581347
= fieldWeight in 27028, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
fieldNorm(doc=27028)\n", "0c5a4be7-1162-4b1a-ab83-4b48a690fc3a": "\n4.3581347
= weight(text:chiari in 27029) [ClassicSimilarity], result of:\n 4.3581347
= fieldWeight in 27029, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
fieldNorm(doc=27029)\n", "e1cb441d-9d60-482d-956b-3fbc964a17c1": "\n4.3581347
= weight(text:chiari in 27042) [ClassicSimilarity], result of:\n 4.3581347
= fieldWeight in 27042, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
fieldNorm(doc=27042)\n", "f87951f1-e163-4f17-a628-904b9df0c609": "\n4.3581347
= weight(text:chiari in 27043) [ClassicSimilarity], result of:\n 4.3581347
= fieldWeight in 27043, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
fieldNorm(doc=27043)\n", "caaa7ca1-34cb-44a8-8dd9-12c909db8c2d": "\n4.3581347
= weight(text:chiari in 27044) [ClassicSimilarity], result of:\n 4.3581347
= fieldWeight in 27044, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
fieldNorm(doc=27044)\n", "ada7a87e-725a-4533-b72e-3817af4c7179": "\n4.3581347
= weight(text:chiari in 27055) [ClassicSimilarity], result of:\n 4.3581347
= fieldWeight in 27055, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
fieldNorm(doc=27055)\n", "ac6d47fd-9a59-47d6-8cfb-11b34c7ded54": "\n4.3581347
= weight(text:chiari in 27056) [ClassicSimilarity], result of:\n 4.3581347
= fieldWeight in 27056, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
fieldNorm(doc=27056)\n", "4aaa7697-b26a-4bea-ba4e-70d18ea649f0": "\n4.3581347
= weight(text:chiari in 62240) [ClassicSimilarity], result of:\n 4.3581347
= fieldWeight in 62240, product of:\n 1.0 = tf(freq=1.0), with freq of:\n
1.0 = termFreq=1.0\n 6.9730153 = idf(docFreq=281, maxDocs=110738)\n 0.625 =
fieldNorm(doc=62240)\n" }, "QParser": "LuceneQParser", "timing": { "time": 2,
"prepare": { "time": 0, "query": { "time": 0 },

... and here's the same from the Solr Cloud 6.0 environment

"debug":{ "rawquerystring":"chiari", "querystring":"chiari", "parsedquery":
"text:chiari", "parsedquery_toString":"text:chiari", "explain":{ "
85249c23-ef68-4276-9ef7-48c290033993":"\n9.735645 = weight(text:chiari in
106960) [], result of:\n 9.735645 = score(doc=106960,freq=50.0 =
termFreq=50.0\n), product of:\n 4.798444 = idf(docFreq=281,
docCount=34151)\n 2.0289173 = tfNorm, computed from:\n 50.0 =
termFreq=50.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 941.3421 =
avgFieldLength\n 4096.0 = fieldLength\n", "
495b660d-8e8f-4b75-a523-106440468818":"\n9.655164 =

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff

Thanks - I'll look at it...

On Fri, Aug 12, 2016 at 1:21 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Maybe rerankqparserplugin?
>
> On Aug 12, 2016 11:54, "John Bickerstaff" <j...@johnbickerstaff.com>
> wrote:
>
> > @Hossman --  thanks again.
> >
> > I've made the following change and so far things look good.  I couldn't
> see
> > debug or find results for what I put in for $func, so I just removed it,
> > but making modifications as you suggested appears to be working.
> >
> > Including the actual line from my endpoint XML in case this thread helps
> > someone else...
> >
> > {!boost defType=synonym_edismax qf='title' synonyms='true'
> > synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
> > v=$q}
> >
> > On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff <
> > j...@johnbickerstaff.com
> > > wrote:
> >
> > > Thanks!  I'll check it out.
> > >
> > > On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar <susheel2...@gmail.com
> >
> > > wrote:
> > >
> > >> Not exactly sure what you are looking from chaining the results but
> > >> similar
> > >> functionality is available in Streaming expressions where result of
> > inner
> > >> expressions are passed to outer expressions and so on
> > >> https://cwiki.apache.org/confluence/display/solr/
> Streaming+Expressions
> > >>
> > >> HTH
> > >> Susheel
> > >>
> > >> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
> > >> j...@johnbickerstaff.com>
> > >> wrote:
> > >>
> > >> > Hossman - many thanks again for your comprehensive and very helpful
> > >> answer!
> > >> >
> > >> > All,
> > >> >
> > >> > I am (possibly mis-remembering) reading something about being able
> to
> > >> pass
> > >> > the results of one query to another query...  Essentially "chaining"
> > >> result
> > >> > sets.
> > >> >
> > >> > I have looked in docs and can't find anything on a quick search -- I
> > may
> > >> > have been reading about the Re-Ranking feature, which doesn't help
> me
> > (I
> > >> > know because I just tried and it seems to return all results anyway,
> > >> just
> > >> > re-ranking the number specified in the reRankDocs flag...)
> > >> >
> > >> > Is there a way to (cleanly) send the results of one query to another
> > >> query
> > >> > for further processing?  Essentially, pass ONLY the results
> (including
> > >> an
> > >> > empty set of results) to another query for processing?
> > >> >
> > >> > thanks...
> > >> >
> > >> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
> > >> > j...@johnbickerstaff.com>
> > >> > wrote:
> > >> >
> > >> > > Thanks!
> > >> > >
> > >> > > To answer your questions, while I digest the rest of that
> > >> information...
> > >> > >
> > >> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> > >> > > https://github.com/healthonnet/hon-lucene-synonyms
> > >> > >
> > >> > > The config looks like this - and IIRC, is simply a copy from the
> > >> > > recommended cofig on the site mentioned above.
> > >> > >
> > >> > >   class="com.github.healthonnet.
> > >> > search.
> > >> > > SynonymExpandingExtendedDismaxQParserPlugin">
> > >> > > 
> > >> > > 
> > >> > >   
> > >> > >   
> > >> > > 
> > >> > > 
> > >> > >   solr.PatternTokenizerFactory
> > >> > >   
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > >   solr.ShingleFilterFactory
> > >> > >   true
> > >> > >   true
> > >> > >   2
> > >> > >   4
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > >   solr.SynonymFilt

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff

@Hossman --  thanks again.

I've made the following change and so far things look good.  I couldn't see
debug or find results for what I put in for $func, so I just removed it,
but making modifications as you suggested appears to be working.

Including the actual line from my endpoint XML in case this thread helps
someone else...

{!boost defType=synonym_edismax qf='title' synonyms='true'
synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
v=$q}

On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff <j...@johnbickerstaff.com
> wrote:

> Thanks!  I'll check it out.
>
> On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
>
>> Not exactly sure what you are looking from chaining the results but
>> similar
>> functionality is available in Streaming expressions where result of inner
>> expressions are passed to outer expressions and so on
>> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>>
>> HTH
>> Susheel
>>
>> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
>> j...@johnbickerstaff.com>
>> wrote:
>>
>> > Hossman - many thanks again for your comprehensive and very helpful
>> answer!
>> >
>> > All,
>> >
>> > I am (possibly mis-remembering) reading something about being able to
>> pass
>> > the results of one query to another query...  Essentially "chaining"
>> result
>> > sets.
>> >
>> > I have looked in docs and can't find anything on a quick search -- I may
>> > have been reading about the Re-Ranking feature, which doesn't help me (I
>> > know because I just tried and it seems to return all results anyway,
>> just
>> > re-ranking the number specified in the reRankDocs flag...)
>> >
>> > Is there a way to (cleanly) send the results of one query to another
>> query
>> > for further processing?  Essentially, pass ONLY the results (including
>> an
>> > empty set of results) to another query for processing?
>> >
>> > thanks...
>> >
>> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
>> > j...@johnbickerstaff.com>
>> > wrote:
>> >
>> > > Thanks!
>> > >
>> > > To answer your questions, while I digest the rest of that
>> information...
>> > >
>> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
>> > > https://github.com/healthonnet/hon-lucene-synonyms
>> > >
>> > > The config looks like this - and IIRC, is simply a copy from the
>> > > recommended cofig on the site mentioned above.
>> > >
>> > >  
>> > > 
>> > > 
>> > >   
>> > >   
>> > > 
>> > > 
>> > >   solr.PatternTokenizerFactory
>> > >   
>> > > 
>> > > 
>> > > 
>> > >   solr.ShingleFilterFactory
>> > >   true
>> > >   true
>> > >   2
>> > >   4
>> > > 
>> > > 
>> > > 
>> > >   solr.SynonymFilterFactory
>> > >   solr.
>> > KeywordTokenizerFactory
>> > >   example_synonym_file.txt
>> > >   true
>> > >   true
>> > > 
>> > >   
>> > > 
>> > >   
>> > >
>> > >
>> > >
>> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
>> > hossman_luc...@fucit.org
>> > > > wrote:
>> > >
>> > >>
>> > >> : First let me say that this is very possibly the "x - y problem" so
>> let
>> > >> me
>> > >> : state up front what my ultimate need is -- then I'll ask about the
>> > >> thing I
>> > >> : imagine might help...  which, of course, is heavily biased in the
>> > >> direction
>> > >> : of my experience coding Java and writing SQL...
>> > >>
>> > >> Thank you so much for asking your question this way!
>> > >>
>> > >> Right off the bat, the background you've provided seems supicious...
>> > >>
>> > >> : I have a piece of a query that calculates a score based on a
>> > "weighting"
>> > >> ...
>> > >> : Th

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff

Thanks!  I'll check it out.

On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> Not exactly sure what you are looking from chaining the results but similar
> functionality is available in Streaming expressions where result of inner
> expressions are passed to outer expressions and so on
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>
> HTH
> Susheel
>
> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Hossman - many thanks again for your comprehensive and very helpful
> answer!
> >
> > All,
> >
> > I am (possibly mis-remembering) reading something about being able to
> pass
> > the results of one query to another query...  Essentially "chaining"
> result
> > sets.
> >
> > I have looked in docs and can't find anything on a quick search -- I may
> > have been reading about the Re-Ranking feature, which doesn't help me (I
> > know because I just tried and it seems to return all results anyway, just
> > re-ranking the number specified in the reRankDocs flag...)
> >
> > Is there a way to (cleanly) send the results of one query to another
> query
> > for further processing?  Essentially, pass ONLY the results (including an
> > empty set of results) to another query for processing?
> >
> > thanks...
> >
> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
> > j...@johnbickerstaff.com>
> > wrote:
> >
> > > Thanks!
> > >
> > > To answer your questions, while I digest the rest of that
> information...
> > >
> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> > > https://github.com/healthonnet/hon-lucene-synonyms
> > >
> > > The config looks like this - and IIRC, is simply a copy from the
> > > recommended cofig on the site mentioned above.
> > >
> > >  
> > > 
> > > 
> > >   
> > >   
> > > 
> > > 
> > >   solr.PatternTokenizerFactory
> > >   
> > > 
> > > 
> > > 
> > >   solr.ShingleFilterFactory
> > >   true
> > >   true
> > >   2
> > >   4
> > > 
> > > 
> > > 
> > >   solr.SynonymFilterFactory
> > >   solr.
> > KeywordTokenizerFactory
> > >   example_synonym_file.txt
> > >   true
> > >   true
> > > 
> > >   
> > > 
> > >   
> > >
> > >
> > >
> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
> > hossman_luc...@fucit.org
> > > > wrote:
> > >
> > >>
> > >> : First let me say that this is very possibly the "x - y problem" so
> let
> > >> me
> > >> : state up front what my ultimate need is -- then I'll ask about the
> > >> thing I
> > >> : imagine might help...  which, of course, is heavily biased in the
> > >> direction
> > >> : of my experience coding Java and writing SQL...
> > >>
> > >> Thank you so much for asking your question this way!
> > >>
> > >> Right off the bat, the background you've provided seems supicious...
> > >>
> > >> : I have a piece of a query that calculates a score based on a
> > "weighting"
> > >> ...
> > >> : The specific line is this:
> > >> : product(field(category_weight),20)
> > >> :
> > >> : What I just realized is that when I query Solr for a string that has
> > NO
> > >> : matches in the entire corpus, I still get a slew of results because
> > >> EVERY
> > >> : doc has the weighting value in the category_weight field - and
> > therefore
> > >> : every doc gets some score.
> > >>
> > >> ...that is *NOT* how dismax and edisamx normally work.
> > >>
> > >> While both the "bf" abd "bq" params result in "additive" boosting, and
> > the
> > >> implementation of that "additive boost" comes from adding new optional
> > >> clauses to the top level BooleanQuery that is executed, that only
> > happens
> > >> after the "main" query (from your "q" param) is added to that top
>

Re: Getting "collection already exists" when creating collection in admin UI

2016-08-12 Thread John Bickerstaff

I don't know if this helps, but I had trouble creating collections due to a
number of issues and I think I got this error (I was using the command
line, not the UI)

As I recall, if it exists in Zookeeper, it will error out.  It was a while
ago, but I think the way I had to solve it was to go into Zookeeper and
delete the "node".

This was easier for me because I was using "chroot" in Zookeeper such that
each collection was separate - so all I had to do was delete the entire
node and start over.

Take me with a grain of salt - it was a while ago.

If you want, I have linux command lines for most / all of this... let me
know.

On Fri, Aug 12, 2016 at 11:10 AM, Alexandre Drouin <
alexandre.dro...@orckestra.com> wrote:

> Hi Esther-Melaine,
>
> The collection exists in Zookeeper under the /collections node and I can
> see the shardX_replicaX folders under $SOLR_HOME/server/solr of both
> servers.
>
> I was not able to replicate the issue using the collection API.  Here are
> the logs where I added the 'MyNewerNode' https://gist.github.com/orck-
> adrouin/4d074cbb60141cba90c0aae9c55360d4
>
> I took a closer look at the admin UI and here are my findings:
>   - In Chrome's devtool I can see the first create request
>   - After 10 seconds the request getting aborted and a second create
> request is sent to the server
>   - In Fiddler I can see that the first request completes successfully
> without any issues.  The second request is sent a few seconds before the
> first one ends so it looks like a admin UI issue.
>
> Is it possible that the admin UI has some kind of TTL for requests set to
> 10 seconds?
>
> You mentioned something about the nodes going into recovery.  Any idea how
> I can fix this issue?
>
> My development environment (if it makes a difference):
>   - OS: Windows
>   - 2 Solr 6.1 nodes using SolrCloud.  They both are running on the same
> server using different ports.
>   - Zookeeper 3.4.8
>
> Alexandre Drouin
>
>
> -Original Message-
> From: Esther-Melaine Quansah [mailto:esther.quan...@lucidworks.com]
> Sent: August 12, 2016 10:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Getting "collection already exists" when creating collection
> in admin UI
> Importance: High
>
> Hi Alexandre,
>
> The question here is why the create action is called twice. You’re getting
> that “collection already exists” error after the second action is called.
> Can you verify if MyNewNode exists in /collections in ZK or on the machines
> running Solr at $SOLR_HOME/server/solr/ Your logs show a lot of issues
> around the overseer and it looks like those nodes are going into recovery
> pretty frequently. Can you replicate this issue by creating a collection
> through the API (not through the UI):
>
> http://localhost:8983/admin/collections?action=CREATE;
> name=MyNewerNode=1=2&
> maxShardsPerNode=1=DefaultConfig
>
> Thanks,
> Esther
>
>
> > On Aug 12, 2016, at 10:05 AM, Alexandre Drouin <
> alexandre.dro...@orckestra.com> wrote:
> >
> > Hello,
> >
> > I am running SolrCloud with 2 nodes (Solr 6.1 with SSL and basic auth)
> and with one Zookeeper node (for development purposes) and when I try to
> create a new collection in the admin UI with 'replicationFactor=2' I get a
> "Connection to Solr lost" message and another message telling me "
> collection already exists: MyNewNode".  I made sure that a collection with
> the same name does not exists and the issue does not appear with a
> replication factor of 1.
> >
> > While debugging I saw that the create action is called twice with the
> > following parameters:
> > /solr/admin/collections?_=1471010473184=CREATE
> > gName=DefaultConfig=1=aaa=1
> > nFactor=2=compositeId=compositeId=json
> >
> > Can anyone replicate this issue?  I have not found it in JIRA.
> >
> >
> > Below is the relevant log (if useful) and I posted the full logs here
> > https://gist.github.com/orck-adrouin/690d485ba0835320273e7b2e09fb3771
> >
> > 63549 ERROR 
> > (OverseerThreadFactory-5-thread-5-processing-n:orc-dev-solr-cd.local:8444_solr)
> [   ] o.a.s.c.OverseerCollectionMessageHandler Collection: MyNewNode
> operation: create failed:org.apache.solr.common.SolrException: collection
> already exists: MyNewNode
> >   at org.apache.solr.cloud.OverseerCollectionMessageHandl
> er.createCollection(OverseerCollectionMessageHandler.java:1832)
> >   at org.apache.solr.cloud.OverseerCollectionMessageHandl
> er.processMessage(OverseerCollectionMessageHandler.java:224)
> >   at org.apache.solr.cloud.OverseerTaskProcessor$Runner.
> run(OverseerTaskProcessor.java:463)
> >   at org.apache.solr.common.util.ExecutorUtil$
> MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
> >   at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> >   at java.lang.Thread.run(Thread.java:745)
> >
> > Thanks,
> > Alexandre Drouin
>
>

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff

Hossman - many thanks again for your comprehensive and very helpful answer!

All,

I am (possibly mis-remembering) reading something about being able to pass
the results of one query to another query...  Essentially "chaining" result
sets.

I have looked in docs and can't find anything on a quick search -- I may
have been reading about the Re-Ranking feature, which doesn't help me (I
know because I just tried and it seems to return all results anyway, just
re-ranking the number specified in the reRankDocs flag...)

Is there a way to (cleanly) send the results of one query to another query
for further processing?  Essentially, pass ONLY the results (including an
empty set of results) to another query for processing?

thanks...

On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Thanks!
>
> To answer your questions, while I digest the rest of that information...
>
> I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> https://github.com/healthonnet/hon-lucene-synonyms
>
> The config looks like this - and IIRC, is simply a copy from the
> recommended cofig on the site mentioned above.
>
>  
> 
> 
>   
>   
> 
> 
>   solr.PatternTokenizerFactory
>   
> 
> 
> 
>   solr.ShingleFilterFactory
>   true
>   true
>   2
>   4
> 
> 
> 
>   solr.SynonymFilterFactory
>   solr.KeywordTokenizerFactory
>   example_synonym_file.txt
>   true
>   true
> 
>   
> 
>   
>
>
>
> On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <hossman_luc...@fucit.org
> > wrote:
>
>>
>> : First let me say that this is very possibly the "x - y problem" so let
>> me
>> : state up front what my ultimate need is -- then I'll ask about the
>> thing I
>> : imagine might help...  which, of course, is heavily biased in the
>> direction
>> : of my experience coding Java and writing SQL...
>>
>> Thank you so much for asking your question this way!
>>
>> Right off the bat, the background you've provided seems supicious...
>>
>> : I have a piece of a query that calculates a score based on a "weighting"
>> ...
>> : The specific line is this:
>> : product(field(category_weight),20)
>> :
>> : What I just realized is that when I query Solr for a string that has NO
>> : matches in the entire corpus, I still get a slew of results because
>> EVERY
>> : doc has the weighting value in the category_weight field - and therefore
>> : every doc gets some score.
>>
>> ...that is *NOT* how dismax and edisamx normally work.
>>
>> While both the "bf" abd "bq" params result in "additive" boosting, and the
>> implementation of that "additive boost" comes from adding new optional
>> clauses to the top level BooleanQuery that is executed, that only happens
>> after the "main" query (from your "q" param) is added to that top level
>> BooleanQuery as a "mandaory" clause.
>>
>> So, for example, "bf=true()" and "bq=*:*" should match & boost every doc,
>> but with the techprducts configs/data these requests still don't match
>> anything...
>>
>> /select?defType=edismax=bogus=true()=*:*=query
>> /select?defType=dismax=bogus=true()=*:*=query
>>
>> ...and if you look at the debug output, the parsed queries shows that the
>> "bogus" part of the query is mandatory...
>>
>> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*)
>> FunctionQuery(const(true))
>>
>> (i didn't use "pf" in that example, but the effect is the same, the "pf"
>> based clauses are optional, while the "qf" based clauses are mandatory)
>>
>> If you compare that example to your debug output, you'll notice a
>> difference in structure -- it's a bit hard to see in your example, but if
>> you simplify your qf, pf, and q fields it should be more obvious, but
>> AFAICT the "main" parts of your query are getting wrapped in an extra
>> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in
>> the top level query ... i don't see *any* mandatory clauses in your top
>> level BooleanQuery, which is why any match on a bf or bq function is
>> enough to cause a document to match.
>>
>> I suspect the reason your parsed query structure is so diff has to do with
>> this...
>>
>> :synonym_ed

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-11 Thread John Bickerstaff

Thanks!

To answer your questions, while I digest the rest of that information...

I'm using the hon-lucene-synonyms.5.0.4.jar from here:
https://github.com/healthonnet/hon-lucene-synonyms

The config looks like this - and IIRC, is simply a copy from the
recommended cofig on the site mentioned above.

 


  
  


  solr.PatternTokenizerFactory
  



  solr.ShingleFilterFactory
  true
  true
  2
  4



  solr.SynonymFilterFactory
  solr.KeywordTokenizerFactory
  example_synonym_file.txt
  true
  true

  

  



On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter 
wrote:

>
> : First let me say that this is very possibly the "x - y problem" so let me
> : state up front what my ultimate need is -- then I'll ask about the thing
> I
> : imagine might help...  which, of course, is heavily biased in the
> direction
> : of my experience coding Java and writing SQL...
>
> Thank you so much for asking your question this way!
>
> Right off the bat, the background you've provided seems supicious...
>
> : I have a piece of a query that calculates a score based on a "weighting"
> ...
> : The specific line is this:
> : product(field(category_weight),20)
> :
> : What I just realized is that when I query Solr for a string that has NO
> : matches in the entire corpus, I still get a slew of results because EVERY
> : doc has the weighting value in the category_weight field - and therefore
> : every doc gets some score.
>
> ...that is *NOT* how dismax and edisamx normally work.
>
> While both the "bf" abd "bq" params result in "additive" boosting, and the
> implementation of that "additive boost" comes from adding new optional
> clauses to the top level BooleanQuery that is executed, that only happens
> after the "main" query (from your "q" param) is added to that top level
> BooleanQuery as a "mandaory" clause.
>
> So, for example, "bf=true()" and "bq=*:*" should match & boost every doc,
> but with the techprducts configs/data these requests still don't match
> anything...
>
> /select?defType=edismax=bogus=true()=*:*=query
> /select?defType=dismax=bogus=true()=*:*=query
>
> ...and if you look at the debug output, the parsed queries shows that the
> "bogus" part of the query is mandatory...
>
> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*)
> FunctionQuery(const(true))
>
> (i didn't use "pf" in that example, but the effect is the same, the "pf"
> based clauses are optional, while the "qf" based clauses are mandatory)
>
> If you compare that example to your debug output, you'll notice a
> difference in structure -- it's a bit hard to see in your example, but if
> you simplify your qf, pf, and q fields it should be more obvious, but
> AFAICT the "main" parts of your query are getting wrapped in an extra
> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in
> the top level query ... i don't see *any* mandatory clauses in your top
> level BooleanQuery, which is why any match on a bf or bq function is
> enough to cause a document to match.
>
> I suspect the reason your parsed query structure is so diff has to do with
> this...
>
> :synonym_edismax>
>
>
> 1) how exactly is "synonym_edismax" defined in your solrconfig.xml?
> 2) what QParserPlugin are you using to implement that?
>
> I suspect whatever QParserPlugin you are using has a bug in it :)
>
>
> If you can't fix the bug, one possibile workaround would be to abandon bf
> and bq params completely, and instead wrap the query it produces in in a
> {!boost} parser with whatever function you want (using functions like
> sum() or prod() to combine multiple functions, and query() to incorporate
> your current bq param).  Doing this will require chanign how you specify
> you input (example below) and it will result in *multiplicitive* boosts --
> so your scores will be much diff, and you will likely have to adjust your
> constants, but: 1) multiplicitive boosts are almost always what people
> *really* want anyway; 2) it will ensure the boosts are only applied for
> things matching your main query, no matter how that query parser works or
> what bugs it has.
>
> Example of using {!boost} to wrap an arbitrary other parser...
>
> instead of...
>   defType=foofoo
>   q=barbarbar
>
> use...
>q={!boost b=$func defType=foofoo v=$qq}
>   qq=barbarbar
> func=sum(something,somethingelse)
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
>
>
>
> :
> : What I would like is to return zero results if there is no match for the
> : querystring.  My collection is small enough that I don't care if the
> actual
> : calculation runs on each doc (although that's wasteful) -- I just don't
> : want to see results come back for zero matches to the querystring

Want zero results from SOLR when there are no matches for "querystring"

2016-08-11 Thread John Bickerstaff

First let me say that this is very possibly the "x - y problem" so let me
state up front what my ultimate need is -- then I'll ask about the thing I
imagine might help...  which, of course, is heavily biased in the direction
of my experience coding Java and writing SQL...

I have a piece of a query that calculates a score based on a "weighting"
number stored in each solr doc.  I'm including the xml for my custom
endpoint below...

The specific line is this:
product(field(category_weight),20)

What I just realized is that when I query Solr for a string that has NO
matches in the entire corpus, I still get a slew of results because EVERY
doc has the weighting value in the category_weight field - and therefore
every doc gets some score.

What I would like is to return zero results if there is no match for the
querystring.  My collection is small enough that I don't care if the actual
calculation runs on each doc (although that's wasteful) -- I just don't
want to see results come back for zero matches to the querystring

(The /select endpoint does this of course, but my custom endpoint includes
this "weighting" piece and therefore returns every doc in the corpus
because they all have the weighting.


Enter my imagined solution...  The potential X-Y problem...


So - given that I come from a programming background, I immediately start
thinking of an if statement ...

 if(some_score_for_the_primary_search_string) {
  run_the_category_weight_calculation;
 } else {
  do_NOT_run_category_weight_calc;
 }


Another way of thinking of it would be something like the "WHERE" clause in
SQL...

 run_category_weight_calculation WHERE "searchstring" is found in the
document, not otherwise.

I'm aware that things could be handled in the client-side of my web app,
but if possible, I'd like the interface to SOLR to be as clean as possible,
and massage incoming SOLR data as little as possible.

In other words, do NOT return any docs if the querystring (and any
synonyms) match zero docs.

Here is the endpoint XML for the query.  I've highlighted the specific line
that is causing the unintended results...


 

 
   all
   20
   
   text
  
   synonym_edismax>
   true

   1.5
   1.1
   75%
   *:*
   20
   meta_doc_type:chapterDoc
   {!synonym_edismax qf='title' synonyms='true'
synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
v=$q}
   id category_weight title category_ss score
contentType
   {!edismax qf='title' bf='' bq='' v=$q}
=
   *product(field(category_weight),20)*
=
   product(query($titleQuery),4)
   text contentType^1000
   python
   true
   true
   true
   all
 
  

And here is the debug output for a query.  (This was a test for synonyms,
which you'll see in the output.) The original query string was, of
course, "μ-heavy
chain disease"

You'll note that although there is no score in the first doc explain for
the actual querystring, the highlighted section does get a score for
product(double(category_weight)=1.5,const(20))

... which is the thing that is currently causing all the docs in the
collection to "match" even though the querystring is not in any of them.

"debug":{ "rawquerystring":"\"μ-heavy chain disease\"",
"querystring":"\"μ-heavy
chain disease\"", "parsedquery":"(DisjunctionMaxQuery((text:\"μ heavy chain
disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5
((+DisjunctionMaxQuery((text:\"mu heavy chain disease\" | (contentType:\"mu
heavy chain disease\")^1000.0)))/no_coord^1.1)
((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0)))/no_coord^1.1) ((+DisjunctionMaxQuery((text:\"μ heavy chain
disease\" | (contentType:\"μ heavy chain disease\")^1000.0)))/no_coord^1.1)
((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0)))/no_coord^1.1)) ((DisjunctionMaxQuery((title:\"μ heavy
chain disease\"))^2.5 ((+DisjunctionMaxQuery((title:\"mu heavy chain
disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ
hcd\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ heavy chain
disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ
hcd\")))/no_coord^1.1)))
FunctionQuery(product(double(category_weight),const(20)))
FunctionQuery(product(query(+(title:\"μ heavy chain
disease\"),def=0.0),const(4)))", "parsedquery_toString":"(((text:\"μ heavy
chain disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5
((+(text:\"mu heavy chain disease\" | (contentType:\"mu heavy chain
disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0))^1.1) ((+(text:\"μ heavy chain disease\" | (contentType:\"μ
heavy chain disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0))^1.1)) title:\"μ heavy chain disease\"))^2.5
((+(title:\"mu heavy

Re: How to re-index SOLR data

2016-08-10 Thread John Bickerstaff

Right...  SOLR doesn't work quite that way...

Keep in mind the value of the data import jar if you have the data from
MySQL stored in a text file, although that would require a little
programming to get the data into the proper format..

But once you get everything into a text file or similar, you don't have to
task your MySQL database when you want to reindex  Unless your data
changes frequently... in which case you'll probably have to hit MySQL every
time.

Good luck!

On Aug 10, 2016 6:24 PM, "Bharath Kumar" <bharath.mvku...@gmail.com> wrote:

> Hi All,
>
> Thanks so much for your inputs. We have a MYSQL data source and i think we
> will try to re-index using the MYSQL data.
>
> I wanted something where i can export all my current data say to an excel
> file or some data source and then import it on another node with the same
> collection with empty data.
>
> On Tue, Aug 9, 2016 at 8:44 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Assuming you can re-index
> >
> > Consider "collection aliasing". Say your current collection is C1.
> > Create C2 (using the same cluster, Zookeeper and the like). Go
> > ahead and index to C2 (however you do that). NOTE: the physical
> > machines may be _different_ than C1, or not. That's up to you. The
> > critical bit is that you use the same Zookeeper.
> >
> > Now, when you are done you use the Collections API CREATEALIAS
> > command to point a "pseudo collection" to C1 (call it "prod"). This is
> > seamless to the users.
> >
> > The flaw in my plan so far is that you probably go at Collection C1
> > directly. So what you might do is create the "prod" alias and point it at
> > C1. Now change your LB (or client or whatever) to use the "prod"
> > collection,
> > then when indexing is complete use CREATEALIAS to point "prod" at C2
> > instead.
> >
> > This is actually a quite well-tested process, often used when you want to
> > change "atomically", e.g. when you reindex the same data nightly but want
> > all the new data available in its entirety only after it has been QA'd or
> > such.
> >
> > Best,
> > Erick
> >
> > On Tue, Aug 9, 2016 at 2:43 PM, John Bickerstaff
> > <j...@johnbickerstaff.com> wrote:
> > > In my case, I've done two things  neither of them involved taking
> the
> > > data from SOLR to SOLR...  although in my reading, I've seen that this
> is
> > > theoretically possible (I.E. sending data from one SOLR server to
> another
> > > SOLR server and  having the second SOLR instance re-index...)
> > >
> > > I haven't used the python script...  that was news to me, but it sounds
> > > interesting...
> > >
> > > What I've done is one of the following:
> > >
> > > a. Get the data from the original source (database, whatever) and
> massage
> > > it again so that i's ready for SOLR and then submit it to my new
> > SolrCloud
> > > for indexing.
> > >
> > > b. Keep a separate store of EVERY Solr document as it comes out of my
> > code
> > > (in xml) and store it in Kafka or a text file.  Then it's easy to push
> > back
> > > into another SOLR instance any time - multiple times if necessary.
> > >
> > > I'm guessing you don't have the data stored away as in "b"...  And if
> you
> > > don't have a way of getting the data from some central source, then "a"
> > > won't work either...  Which leaves you with the concept of sending data
> > > from SOLR "A" to SOLR "B" and having "B" reindex...
> > >
> > > This might serve as a starting point in that case...
> > > https://wiki.apache.org/solr/HowToReindex
> > >
> > > You'll note that there are limitations and a strong caveat against
> doing
> > > this with SOLR, but if you have no other option, then it's the best you
> > can
> > > do.
> > >
> > > Do you have the ability to get all the data again from an authoritative
> > > source?  (Relational Database or something similar?)
> > >
> > > On Tue, Aug 9, 2016 at 3:21 PM, Bharath Kumar <
> bharath.mvku...@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi John,
> > >>
> > >> Thanks so much for your inputs. We have time to build another system.
> So
> > >> how did you index the same data on the main SOLR node to the new SOLR
> > node?
> > >> Did you use the re-

Re: How to re-index SOLR data

2016-08-09 Thread John Bickerstaff

In my case, I've done two things  neither of them involved taking the
data from SOLR to SOLR...  although in my reading, I've seen that this is
theoretically possible (I.E. sending data from one SOLR server to another
SOLR server and  having the second SOLR instance re-index...)

I haven't used the python script...  that was news to me, but it sounds
interesting...

What I've done is one of the following:

a. Get the data from the original source (database, whatever) and massage
it again so that i's ready for SOLR and then submit it to my new SolrCloud
for indexing.

b. Keep a separate store of EVERY Solr document as it comes out of my code
(in xml) and store it in Kafka or a text file.  Then it's easy to push back
into another SOLR instance any time - multiple times if necessary.

I'm guessing you don't have the data stored away as in "b"...  And if you
don't have a way of getting the data from some central source, then "a"
won't work either...  Which leaves you with the concept of sending data
from SOLR "A" to SOLR "B" and having "B" reindex...

This might serve as a starting point in that case...
https://wiki.apache.org/solr/HowToReindex

You'll note that there are limitations and a strong caveat against doing
this with SOLR, but if you have no other option, then it's the best you can
do.

Do you have the ability to get all the data again from an authoritative
source?  (Relational Database or something similar?)

On Tue, Aug 9, 2016 at 3:21 PM, Bharath Kumar <bharath.mvku...@gmail.com>
wrote:

> Hi John,
>
> Thanks so much for your inputs. We have time to build another system. So
> how did you index the same data on the main SOLR node to the new SOLR node?
> Did you use the re-index python script? The new data will be indexed
> correctly with the new rules, but what about the old data?
>
> Our SOLR data is around 30GB with around 60 million documents. We use SOLR
> cloud with 3 solr nodes and 3 zookeepers.
>
> On Tue, Aug 9, 2016 at 2:13 PM, John Bickerstaff <j...@johnbickerstaff.com
> >
> wrote:
>
> > In case this helps...
> >
> > Assuming you have the resources to build a copy of your production
> > environment and assuming you have the time, you don't need to take your
> > production down - or even affect it's processing...
> >
> > What I've done (with admittedly smaller data sets) is build a separate
> > environment (usually on VM's) and once it's set up, I do the new indexing
> > according to the new "rules"  (Like your change of long to string)
> >
> > Then, in a sense, I don't care how long it takes because it is not
> > affecting Prod.
> >
> > When it's done, I simply switch my load balancer to point to the new
> > environment and shut down the old one.
> >
> > To users, this could be seamless if you handle the load balancer
> correctly
> > and have it refuse new connections to the old servers while routing all
> new
> > connections to the new Solr servers...
> >
> > On Tue, Aug 9, 2016 at 3:04 PM, Bharath Kumar <bharath.mvku...@gmail.com
> >
> > wrote:
> >
> > > Hi Nick and Shawn,
> > >
> > > Thanks so much for the pointers. I will try that out. Thank you again!
> > >
> > > On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev <
> nick.vasily...@gmail.com>
> > > wrote:
> > >
> > > > Hi, I work on a python Solr Client
> > > > <http://solrclient.readthedocs.io/en/latest/> library and there is a
> > > > reindexing helper module that you can use if you are on Solr 4.9+. I
> > use
> > > it
> > > > all the time and I think it works pretty well. You can re-index all
> > > > documents from a collection into another collection or dump them to
> the
> > > > filesystem as JSON. It also supports parallel execution and can run
> > > > independently on each shard. There is also a way to resume if your
> job
> > > > craps out half way through if your existing schema is set up with a
> > good
> > > > date field and unique id.
> > > >
> > > > You can read the documentation here:
> > > > http://solrclient.readthedocs.io/en/latest/Reindexer.html
> > > >
> > > > Code is pretty short and is here:
> > > > https://github.com/moonlitesolutions/SolrClient/
> > blob/master/SolrClient/
> > > > helpers/reindexer.py
> > > >
> > > > Here is sample:
> > > > from SolrClient import SolrClient
> > > > from SolrClient.helpers import Reindexer
> > > >
> > > > r = Reindexer(SolrClient('ht

Re: How to re-index SOLR data

2016-08-09 Thread John Bickerstaff

In case this helps...

Assuming you have the resources to build a copy of your production
environment and assuming you have the time, you don't need to take your
production down - or even affect it's processing...

What I've done (with admittedly smaller data sets) is build a separate
environment (usually on VM's) and once it's set up, I do the new indexing
according to the new "rules"  (Like your change of long to string)

Then, in a sense, I don't care how long it takes because it is not
affecting Prod.

When it's done, I simply switch my load balancer to point to the new
environment and shut down the old one.

To users, this could be seamless if you handle the load balancer correctly
and have it refuse new connections to the old servers while routing all new
connections to the new Solr servers...

On Tue, Aug 9, 2016 at 3:04 PM, Bharath Kumar 
wrote:

> Hi Nick and Shawn,
>
> Thanks so much for the pointers. I will try that out. Thank you again!
>
> On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev 
> wrote:
>
> > Hi, I work on a python Solr Client
> >  library and there is a
> > reindexing helper module that you can use if you are on Solr 4.9+. I use
> it
> > all the time and I think it works pretty well. You can re-index all
> > documents from a collection into another collection or dump them to the
> > filesystem as JSON. It also supports parallel execution and can run
> > independently on each shard. There is also a way to resume if your job
> > craps out half way through if your existing schema is set up with a good
> > date field and unique id.
> >
> > You can read the documentation here:
> > http://solrclient.readthedocs.io/en/latest/Reindexer.html
> >
> > Code is pretty short and is here:
> > https://github.com/moonlitesolutions/SolrClient/blob/master/SolrClient/
> > helpers/reindexer.py
> >
> > Here is sample:
> > from SolrClient import SolrClient
> > from SolrClient.helpers import Reindexer
> >
> > r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient('
> > http://destination_solr:8983/solr') , source_coll='source_collection',
> > dest_coll='destination-collection')
> > r.reindex()
> >
> >
> >
> >
> >
> >
> > On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey 
> wrote:
> >
> > > On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > > > What would be the best way to re-index the data in the SOLR cloud? We
> > > > have around 65 million data and we are planning to change the schema
> > > > by changing the unique key type from long to string. How long does it
> > > > take to re-index 65 million documents in SOLR and can you please
> > > > suggest how to do that?
> > >
> > > There is no magic bullet.  And there's no way for anybody but you to
> > > determine how long it's going to take.  There are people who have
> > > achieved over 50K inserts per second, and others who have difficulty
> > > reaching 1000 per second.  Many factors affect indexing speed,
> including
> > > the size of your documents, the complexity of your analysis, the
> > > capabilities of your hardware, and how many threads/processes you are
> > > using at the same time when you index.
> > >
> > > Here's some more detailed info about reindexing, but it's probably not
> > > what you wanted to hear:
> > >
> > > https://wiki.apache.org/solr/HowToReindex
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>

Re: Solr and Drupal

2016-08-09 Thread John Bickerstaff

Rose --

Further reading on the drupal site suggests to me that the latest Drupal
(8?) comes with a generic "connector" that can be tied to any search engine
and that the instructions on the page I sent may be superseded by the new
connector...

I'm not familiar with Drupal beyond simple experimentation a few years ago,
but that's how I'd build it - make a connector and consume the returned
data (json, xml, whatever) and then turn it into Drupal-formatted html or
something similar.

I think you might want to pursue the particulars on the Drupal list (I
assume one exists...)

HTH

On Tue, Aug 9, 2016 at 12:23 PM, Rose, John B  wrote:

> Sameer, John
>
> Thanks
>
>
> From: Sameer Maggon 
> Reply-To: "solr-user@lucene.apache.org" 
> Date: Tuesday, August 9, 2016 at 1:46 PM
> To: "solr-user@lucene.apache.org" 
> Subject: Re: Solr and Drupal
>
> Hi John,
>
> As John B. mentioned, you can utilize the plugin here -
> https://www.drupal.org/project/apachesolr. mailtrack.io/trace/link/5b49557fccf2653a8333a25cc6f15c
> 245ccf7ec9?url=https%3A%2F%2Fwww.drupal.org%2Fproject%
> 2Fapachesolr.=e242eddec6d9f0d9> If you are looking to not have
> to worry about hosting, deployment, scaling and management, you can take a
> look at SearchStax by Measured Search to get a Solr deployment up and
> running in a couple of minutes and not have to get into installing Solr and
> going through a learning curve around setup and scale.
>
>
> Thanks,
> Sameer.
>
>
>
> On Tue, Aug 9, 2016 at 12:11 PM, Rose, John B > wrote:
> We are looking at Solr for a Drupal web site. We have never installed Solr.
>
>
> From my readings it is not clear exactly what we need to implement a
> search in Drupal with Solr. Some sites have implied Lucene and/or Tomcat
> are needed.
>
>
> Can someone point me to the site that explains minimally what is needed to
> implement Solr within Drupal?
>
>
> Thanks for your time
>
>
>
> --
> Sameer Maggon
> www.measuredsearch.com 3404ae650cc88b51d518880f313638b7ca7d7f2c?url=http%3A%2F%
> 2Fwww.measuredsearch.com=6436799da5f290d7>
> 1.844.9.SEARCH
> [cid:ii_iog5zjpp2_154cfcda7a913c0a]
> Measured Search is the only Fully Managed Solr as a Service multi-cloud
> capable offering.
> Plus utilize our On Demand Expertise to build your applications faster and
> with more confidence.
>

Re: Solr and Drupal

2016-08-09 Thread John Bickerstaff

This might be a good place to start...

https://www.drupal.org/project/apachesolr

On Tue, Aug 9, 2016 at 11:11 AM, Rose, John B  wrote:

> We are looking at Solr for a Drupal web site. We have never installed Solr.
>
>
> From my readings it is not clear exactly what we need to implement a
> search in Drupal with Solr. Some sites have implied Lucene and/or Tomcat
> are needed.
>
>
> Can someone point me to the site that explains minimally what is needed to
> implement Solr within Drupal?
>
>
> Thanks for your time
>

Re: NoNode error on -downconfig when node does exist?

2016-08-08 Thread John Bickerstaff

LOL!  Thanks.

Oh yeah.  I've done my time in a support role!  Nothing more maddening than
a user who won't share the facts!

On Mon, Aug 8, 2016 at 9:24 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> BTW, kudos for including the commands in your first problem statement
> even though, I'm sure, you wondered if it was necessary. Saved at least
> three back-and-forths to get to the root of the problem (little pun
> there)...
>
> Erick
>
> On Mon, Aug 8, 2016 at 3:11 PM, John Bickerstaff
> <j...@johnbickerstaff.com> wrote:
> > OMG!
> >
> > Thanks.  Too long staring at the same string.
> >
> > On Mon, Aug 8, 2016 at 3:49 PM, Kevin Risden <compuwizard...@gmail.com>
> > wrote:
> >
> >> Just a quick guess: do you have a period (.) in your zk connection
> string
> >> chroot when you meant an underscore (_)?
> >>
> >> When you do the ls you use /solr6_1/configs, but you have /solr6.1 in
> your
> >> zk connection string chroot.
> >>
> >> Kevin Risden
> >>
> >> On Mon, Aug 8, 2016 at 4:44 PM, John Bickerstaff <
> j...@johnbickerstaff.com
> >> >
> >> wrote:
> >>
> >> > First, the caveat:  I understand this is technically a zookeeper
> error.
> >> It
> >> > is an error that occurs when trying to deal with Solr however, so I'm
> >> > hoping someone on the list may have some insight.  Also, I'm getting
> the
> >> > error via the zkcli.sh tool that comes with Solr...
> >> >
> >> > I have created a collection in SolrCloud (6.1) giving the
> "techproducts"
> >> > sample directory as the location of the conf files.
> >> >
> >> > I then wanted to download those files from zookeeper to the local
> machine
> >> > via the -cmd downconfig command, so I issue this command:
> >> >
> >> > sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd downconfig
> >> > -confdir /home/john/conf/ -confname statdx -z 192.168.56.5/solr6.1
> >> >
> >> > Instead of the files, I get a stacktrace / error back which says :
> >> >
> >> > exception in thread "main" java.io.IOException: Error downloading
> files
> >> > from zookeeper path /configs/statdx to /home/john/conf
> >> > at
> >> > org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(
> >> > ZkConfigManager.java:117)
> >> > at
> >> > org.apache.solr.common.cloud.ZkConfigManager.downloadConfigDir(
> >> > ZkConfigManager.java:153)
> >> > at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:237)
> >> > *Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> >> > KeeperErrorCode = NoNode for /configs/statdx*
> >> > at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:111)
> >> > at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:51)
> >> > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
> >> > at
> >> > org.apache.solr.common.cloud.SolrZkClient$6.execute(
> >> SolrZkClient.java:331)
> >> > at
> >> > org.apache.solr.common.cloud.SolrZkClient$6.execute(
> >> SolrZkClient.java:328)
> >> > at
> >> > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> >> > ZkCmdExecutor.java:60)
> >> > at
> >> > org.apache.solr.common.cloud.SolrZkClient.getChildren(
> >> > SolrZkClient.java:328)
> >> > at
> >> > org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(
> >> > ZkConfigManager.java:101)
> >> > ... 2 more
> >> >
> >> > However, when I actually look in Zookeeper, I find that the
> "directory"
> >> > does exist and that inside it are listed all the files.
> >> >
> >> > Here is the output from zookeeper:
> >> >
> >> > [zk: localhost:2181(CONNECTED) 0] *ls /solr6_1/configs*
> >> > [statdx]
> >> >
> >> > and...
> >> >
> >> > [zk: localhost:2181(CONNECTED) 1] *ls /solr6_1/configs/statdx*
> >> > [mapping-FoldToASCII.txt, currency.xml, managed-schema, protwords.txt,
> >> > synonyms.txt, stopwords.txt, _schema_analysis_synonyms_english.json,
> >> > velocity, admin-extra.html, update-script.js,
> >> > _schema_analysis_stopwords_english.json, solrconfig.xml,
> >> > admin-extra.menu-top.html, elevate.xml, clustering, xslt,
> >> > _rest_managed.json, mapping-ISOLatin1Accent.txt, spellings.txt, lang,
> >> > admin-extra.menu-bottom.html]
> >> >
> >> > I've rebooted all my zookeeper nodes and restarted them - just in
> case...
> >> > Same deal.
> >> >
> >> > Has anyone seen anything like this?
> >> >
> >>
>

Re: NoNode error on -downconfig when node does exist?

2016-08-08 Thread John Bickerstaff

OMG!

Thanks.  Too long staring at the same string.

On Mon, Aug 8, 2016 at 3:49 PM, Kevin Risden <compuwizard...@gmail.com>
wrote:

> Just a quick guess: do you have a period (.) in your zk connection string
> chroot when you meant an underscore (_)?
>
> When you do the ls you use /solr6_1/configs, but you have /solr6.1 in your
> zk connection string chroot.
>
> Kevin Risden
>
> On Mon, Aug 8, 2016 at 4:44 PM, John Bickerstaff <j...@johnbickerstaff.com
> >
> wrote:
>
> > First, the caveat:  I understand this is technically a zookeeper error.
> It
> > is an error that occurs when trying to deal with Solr however, so I'm
> > hoping someone on the list may have some insight.  Also, I'm getting the
> > error via the zkcli.sh tool that comes with Solr...
> >
> > I have created a collection in SolrCloud (6.1) giving the "techproducts"
> > sample directory as the location of the conf files.
> >
> > I then wanted to download those files from zookeeper to the local machine
> > via the -cmd downconfig command, so I issue this command:
> >
> > sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd downconfig
> > -confdir /home/john/conf/ -confname statdx -z 192.168.56.5/solr6.1
> >
> > Instead of the files, I get a stacktrace / error back which says :
> >
> > exception in thread "main" java.io.IOException: Error downloading files
> > from zookeeper path /configs/statdx to /home/john/conf
> > at
> > org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(
> > ZkConfigManager.java:117)
> > at
> > org.apache.solr.common.cloud.ZkConfigManager.downloadConfigDir(
> > ZkConfigManager.java:153)
> > at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:237)
> > *Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> > KeeperErrorCode = NoNode for /configs/statdx*
> > at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
> > at
> > org.apache.solr.common.cloud.SolrZkClient$6.execute(
> SolrZkClient.java:331)
> > at
> > org.apache.solr.common.cloud.SolrZkClient$6.execute(
> SolrZkClient.java:328)
> > at
> > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> > ZkCmdExecutor.java:60)
> > at
> > org.apache.solr.common.cloud.SolrZkClient.getChildren(
> > SolrZkClient.java:328)
> > at
> > org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(
> > ZkConfigManager.java:101)
> > ... 2 more
> >
> > However, when I actually look in Zookeeper, I find that the "directory"
> > does exist and that inside it are listed all the files.
> >
> > Here is the output from zookeeper:
> >
> > [zk: localhost:2181(CONNECTED) 0] *ls /solr6_1/configs*
> > [statdx]
> >
> > and...
> >
> > [zk: localhost:2181(CONNECTED) 1] *ls /solr6_1/configs/statdx*
> > [mapping-FoldToASCII.txt, currency.xml, managed-schema, protwords.txt,
> > synonyms.txt, stopwords.txt, _schema_analysis_synonyms_english.json,
> > velocity, admin-extra.html, update-script.js,
> > _schema_analysis_stopwords_english.json, solrconfig.xml,
> > admin-extra.menu-top.html, elevate.xml, clustering, xslt,
> > _rest_managed.json, mapping-ISOLatin1Accent.txt, spellings.txt, lang,
> > admin-extra.menu-bottom.html]
> >
> > I've rebooted all my zookeeper nodes and restarted them - just in case...
> > Same deal.
> >
> > Has anyone seen anything like this?
> >
>

NoNode error on -downconfig when node does exist?

2016-08-08 Thread John Bickerstaff

First, the caveat:  I understand this is technically a zookeeper error.  It
is an error that occurs when trying to deal with Solr however, so I'm
hoping someone on the list may have some insight.  Also, I'm getting the
error via the zkcli.sh tool that comes with Solr...

I have created a collection in SolrCloud (6.1) giving the "techproducts"
sample directory as the location of the conf files.

I then wanted to download those files from zookeeper to the local machine
via the -cmd downconfig command, so I issue this command:

sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd downconfig
-confdir /home/john/conf/ -confname statdx -z 192.168.56.5/solr6.1

Instead of the files, I get a stacktrace / error back which says :

exception in thread "main" java.io.IOException: Error downloading files
from zookeeper path /configs/statdx to /home/john/conf
at
org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(ZkConfigManager.java:117)
at
org.apache.solr.common.cloud.ZkConfigManager.downloadConfigDir(ZkConfigManager.java:153)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:237)
*Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /configs/statdx*
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:331)
at
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:328)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:328)
at
org.apache.solr.common.cloud.ZkConfigManager.downloadFromZK(ZkConfigManager.java:101)
... 2 more

However, when I actually look in Zookeeper, I find that the "directory"
does exist and that inside it are listed all the files.

Here is the output from zookeeper:

[zk: localhost:2181(CONNECTED) 0] *ls /solr6_1/configs*
[statdx]

and...

[zk: localhost:2181(CONNECTED) 1] *ls /solr6_1/configs/statdx*
[mapping-FoldToASCII.txt, currency.xml, managed-schema, protwords.txt,
synonyms.txt, stopwords.txt, _schema_analysis_synonyms_english.json,
velocity, admin-extra.html, update-script.js,
_schema_analysis_stopwords_english.json, solrconfig.xml,
admin-extra.menu-top.html, elevate.xml, clustering, xslt,
_rest_managed.json, mapping-ISOLatin1Accent.txt, spellings.txt, lang,
admin-extra.menu-bottom.html]

I've rebooted all my zookeeper nodes and restarted them - just in case...
Same deal.

Has anyone seen anything like this?

Re: Problems using fieldType text_general in copyField

2016-08-05 Thread John Bickerstaff

Many thanks for the assistance Hoss!  After a couple of bumps, it worked
great.



I followed the recommendations (and read the explanation - thanks!)

Although I swear it threw the error once again, just to be sure I rebooted
everything (Zookeeper included) then reloaded the configs into Zookeeper
and restarted my Solr servers - at that point the errors disappeared and
everything worked.

This will make upgrading super easy for us.  Given the relatively small
size of our data set, we have the luxury of just creating new Solr 6.1
instances in AWS, making a new node in Zookeeper, creating a collection,
adding the custom_schema file as you described and loading the data into
Solr from our Kafka store.  Gotta love it when your complete indexing into
Solr is in the neighborhood of two hours rather than two days or two weeks!

On Thu, Aug 4, 2016 at 8:42 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Just as a note, TYPO3 uses a lot of include files though I do not remember
> which specific mechanism they rely on.
>
> Regards,
> Alex
>
> On 5 Aug 2016 10:51 AM, "John Bickerstaff" <j...@johnbickerstaff.com>
> wrote:
>
> > Many thanks for your time!  Yes, it does make sense.
> >
> > I'll give your recommendation a shot tomorrow and update the thread.
> >
> > On Aug 4, 2016 6:22 PM, "Chris Hostetter" <hossman_luc...@fucit.org>
> > wrote:
> >
> >
> > TL;DR: use entity includes *WITH OUT TOP LEVEL WRAPPER ELEMENTS* like in
> > this example...
> >
> > https://github.com/apache/lucene-solr/blob/master/solr/
> > core/src/test-files/solr/collection1/conf/schema-snippet-types.incl
> > https://github.com/apache/lucene-solr/blob/master/solr/
> > core/src/test-files/solr/collection1/conf/schema-xinclude.xml
> >
> >
> > : The file I pasted last time is the file I was trying to include into
> the
> > : main schema.xml.  It was when that file was getting processed that I
> got
> > : the error  ['content' is not a glob and doesn't match any explicit
> field
> > or
> > : dynamicField. ]
> >
> > Ok -- so just to be crystal clear, you have two files, that look roughly
> > like this...
> >
> > --- BEGIN schema.xml ---
> > 
> > 
> >   
> >   http://www.w3.org/
> > 2001/XInclude"/>
> > 
> > --- END schema.xml ---
> >
> > -- BEGIN statdx_custom_schema.xml ---
> > 
> > 
> >   
> > 
> > --- END statdx_custom_schema.xml ---
> >
> > ...am I correct?
> >
> >
> > I'm going to skip a lot of the nitty gritty and just summarize by saying
> > that ultimately there are 2 problems here that combine to lead to the
> > error you are getting:
> >
> > 1) what you are trying to do as far as the xinclude is not really what
> > xinclude is designed for and doesn't work the way you (or any other sane
> > person) would think it does.
> >
> > 2) for historical reasons, Solr is being sloppy in what 
> > entries it recognizes.  If anything the "bug" is that Solr is
> > willing to try to load any parts of your include file at all -- it it
> were
> > behaving consistently it should be ignoring all of it.
> >
> >
> > Ok ... that seems terse, i'll clarify with a little of the nitty
> gritty...
> >
> >
> > The root of the issue is really something you alluded to earlier that
> > dind't make sense to me at the time because I didn't realize you were
> > showing us the *includED* file when you said it...
> >
> > >>> I assumed (perhaps wrongly) that I could duplicate the 
> > >>>   arrangement from the schema.xml file.
> >
> > ...that assumption is the crux of the problem, because when the XML
> parser
> > evaluates your xinclude, what it produces is functionally equivilent to
> if
> > you had a schema.xml file that looked like this
> >
> > --- BEGIN EFFECTIVE schema.xml ---
> > 
> > 
> >   
> >   
> > 
> >   
> > 
> > --- END EFFECTIVE schema.xml ---
> >
> > ...that extra  element nested inside of the original 
> > element is what's confusing the hell out of solr.  The  and
> >  parsing is fairly strict, and only expects to find them as
> top
> > level elements (or, for historical purposes, as children of  and
> >  -- note the plurals) while the  parsing is sloppy and
> > finds the one that gives you an error.
> >
> > (Even if the  and  parsing was equally sloppy, only the
> > outermost  tag would be recognized, so your default field props
> > would be based on the vers

Re: Problems using fieldType text_general in copyField

2016-08-04 Thread John Bickerstaff

Many thanks for your time!  Yes, it does make sense.

I'll give your recommendation a shot tomorrow and update the thread.

On Aug 4, 2016 6:22 PM, "Chris Hostetter"  wrote:

TL;DR: use entity includes *WITH OUT TOP LEVEL WRAPPER ELEMENTS* like in
this example...

https://github.com/apache/lucene-solr/blob/master/solr/
core/src/test-files/solr/collection1/conf/schema-snippet-types.incl
https://github.com/apache/lucene-solr/blob/master/solr/
core/src/test-files/solr/collection1/conf/schema-xinclude.xml

: The file I pasted last time is the file I was trying to include into the
: main schema.xml.  It was when that file was getting processed that I got
: the error  ['content' is not a glob and doesn't match any explicit field
or
: dynamicField. ]

Ok -- so just to be crystal clear, you have two files, that look roughly
like this...

--- BEGIN schema.xml ---

  http://www.w3.org/
2001/XInclude"/>

--- END schema.xml ---

-- BEGIN statdx_custom_schema.xml ---

--- END statdx_custom_schema.xml ---

...am I correct?

I'm going to skip a lot of the nitty gritty and just summarize by saying
that ultimately there are 2 problems here that combine to lead to the
error you are getting:

1) what you are trying to do as far as the xinclude is not really what
xinclude is designed for and doesn't work the way you (or any other sane
person) would think it does.

2) for historical reasons, Solr is being sloppy in what 
entries it recognizes.  If anything the "bug" is that Solr is
willing to try to load any parts of your include file at all -- it it were
behaving consistently it should be ignoring all of it.

Ok ... that seems terse, i'll clarify with a little of the nitty gritty...

The root of the issue is really something you alluded to earlier that
dind't make sense to me at the time because I didn't realize you were
showing us the *includED* file when you said it...

>>> I assumed (perhaps wrongly) that I could duplicate the 
>>>   arrangement from the schema.xml file.

...that assumption is the crux of the problem, because when the XML parser
evaluates your xinclude, what it produces is functionally equivilent to if
you had a schema.xml file that looked like this

--- BEGIN EFFECTIVE schema.xml ---

--- END EFFECTIVE schema.xml ---

...that extra  element nested inside of the original 
element is what's confusing the hell out of solr.  The  and
 parsing is fairly strict, and only expects to find them as top
level elements (or, for historical purposes, as children of  and
 -- note the plurals) while the  parsing is sloppy and
finds the one that gives you an error.

(Even if the  and  parsing was equally sloppy, only the
outermost  tag would be recognized, so your default field props
would be based on the version="1.5" declaration, not the version="1.6"
declaration of the included file they'd be in ... which would be confusing
as hell, so it's a good thing Solr isn't sloppy about that parsing too)

In contrast to xincludes, XML Entity includes are (almost as a side effect
of the triviality of their design) vastly supperiour 90% of the time, and
capable of doing what you want.  The key diff being that Entity includes
do not require that the file being included is valid XML -- it can be an
arbitrary snippet of xml content (w/o a top level element) that will be
inlined verbatim.  so you can/should do soemthing like this...

--- BEGIN schema.xml ---

]>

  _custom_include;

--- END schema.xml ---

-- BEGIN statdx_custom_schema.incl ---

--- END statdx_custom_schema.incl ---

...make sense?

-Hoss
http://www.lucidworks.com/

Re: Problems using fieldType text_general in copyField

2016-08-04 Thread John Bickerstaff

   
   




 
 id

 

 

  

   
   
   
   
   
   

   
   

   

   

   
   



















































  

  




  



  




  




  
  




  




  





  
  




  




  









  
  








  




  








  
  







  




  









  




  




  
  




  







  








  



  


  



  



  




  


  




  

  
  

  



  

  
  

  














   




   



  







  




  




  




  






  




  






  




  






  




  




  




  




  




  







  




  





  




  





  




  




  




  







  




  





  




  








  




  








  




  





  




  








  




  





  




  




  




  





  




  







  




  
  














  




  




  




  





  




  







  




  







  




  




  




  





  




  





  




  



  




  





  


  
  

http://www.w3.org/2001/XInclude"/>


On Thu, Aug 4, 2016 at 5:26 PM, Chris Hostetter <hossman_luc...@fucit.org>
wrote:

>
> : The schema is a copy of the techproducts sample.
> :
> : Entire include here - and I take your point about the possibility of
> : malformation - thanks.
> :
> : I assumed (perhaps wrongly) that I could duplicate the 
> :   arrangement from the schema.xml file.
>
> I really can't make heads or tails of what you're saying here -- what i
> asked you to provide was the full details on your schema.xml file and the
> file you are xincluding into it -- what you provided looks like a normal
> schema.xml, w/o any xinclude tags.  I also don't see any mention of any
> other files you are attempting to include in your schema.xml so i can see
> what it's structure looks like.
>
> For comparison, here is an example of a schema.xml file that uses an
> xinclude elements...
> https://github.com/apache/lucene-solr/blob/master/solr/
> core/src/test-files/solr/collection1/conf/schema-xinclude.xml
>
> Here is a specific example of one xinclude element from that file...
> 
>
> here is the file that is included by that xinclude element...
> https://github.com/apache/lucene-solr/blob/master/solr/
> core/src/test-files/solr/collection1/conf/schema-snippet-type.xml
>
>
> ...if you can provide the corrisponding specifics for your schema --
> showing the xinclude elements, and the files refrenced by them -- we can
> try to help make sense of why it's not working for you.
>
>
>
> :
> : I'm unfamiliar with xml entity includes, but I'll go look them up...
> :
> : 
> : 
> :
> :
> :
> : : indexed="true" stored="false"/>
> : : stored="true" multiValued="false"/>
> : : stored="true" multiValued="false"/>
> : : multiValued="false"/>
> : : multiValued="false"/>
> :
> : : stored="true" multiValued="false"/>
> :
> : : multiValued="false"/>
> :
> : : stored="true" multiValued="false"/>
> :
> :
> :
> : :  stored="true"/>
> :
> :
> : : multiValued="true"/> *//HERE IS WHERE "CONTENT" IS DEFINED*
> :
> : 
> : stored="true"
> : multiValued="true"/>
> :
> :
> :
> :
> :
> :
> :  /*/THROWING ERROR ABOUT
> : "CONTENT" NOT EXISTING HERE*
> :
> :
> :
> :
> :
> :
> :
> :   
> :
> : 
> :
> :
> : 
> :  : positionIncrementGap="100">
> :   
> : 
> :  : words="stopwords.txt" />
> : 
> : 
> :   
> :   
> : 
> :  : words="stopwords.txt" />
> : 
> :  : synonyms="contentType_synonyms.txt" ignoreCase="true" expand="true"/>
> : 
> :   
> : 
> :
> : 
> :
> :
> :
> : On Thu, Aug 4, 2016 at 3:55 PM, Chris Hostetter <
> hossman_luc...@fucit.org>
> : wrote:
> :
> : >
> : > you mentioned that the problem only happens when you use xinclude, but
> you
> : > havne't shown us hte details of your xinclude -- what exactly does your
> : > schema.xml look like (with the xinclude call) and what exactly does the
> : > file being included look like (entire contents)
> : >
> : > (I suspect the problem you are seeing is realted to the way xinclude
> : > doens't really support "snippets" of malformed xml, and instead
> requires
> : > some root tag -- i can't imagine what root tag you are using in the
> : > included file that would play nicely with mixing/matching field
> : > declarations. ... using xml entity includes may be a simpler/safer
> option)
> : >
> : >
> : >
> : > : Date: Thu, 4 Aug 2016 15:47:00 -0600
> : > : From: John Bickerstaff <j...@johnbickerstaff.com>
> : > : Reply-To: solr-user@lucene.apache.org
> : > : To: solr-user@lucene.apache.org
> : > : Subject: Re: Problems using fieldType text_general in copyField
> : > :
> : > : I would call this a bug...
> : > :
> : > : I'm going out on a limb and say that if you define a field in the
> : > included
> : > : XML file, you will get this error.
> : > :
> : >

Re: Problems using fieldType text_general in copyField

2016-08-04 Thread John Bickerstaff

I get the same error with the Entity Includes - with or without the
 tag...

I'm probably just going to make a section in schema.xml rather than worry
about this.

Includes are "nice to have" but not critical.

On Thu, Aug 4, 2016 at 4:25 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Found the Entity Includes - thanks.
>
> On Thu, Aug 4, 2016 at 4:22 PM, John Bickerstaff <j...@johnbickerstaff.com
> > wrote:
>
>> Thanks!
>>
>> The schema is a copy of the techproducts sample.
>>
>> Entire include here - and I take your point about the possibility of
>> malformation - thanks.
>>
>> I assumed (perhaps wrongly) that I could duplicate the 
>>   arrangement from the schema.xml file.
>>
>> I'm unfamiliar with xml entity includes, but I'll go look them up...
>>
>> 
>> 
>>
>>
>>
>>> indexed="true" stored="false"/>
>>> stored="true" multiValued="false"/>
>>> stored="true" multiValued="false"/>
>>> multiValued="false"/>
>>> multiValued="false"/>
>>
>>> stored="true" multiValued="false"/>
>>
>>> multiValued="false"/>
>>
>>> stored="true" multiValued="false"/>
>>
>>
>>
>>>  stored="true"/>
>>
>>
>>> multiValued="true"/> *//HERE IS WHERE "CONTENT" IS DEFINED*
>>
>> 
>>> stored="true" multiValued="true"/>
>>
>>
>>
>>
>>
>>
>>  /*/THROWING ERROR ABOUT
>> "CONTENT" NOT EXISTING HERE*
>>
>>
>>
>>
>>
>>
>>
>>   
>>
>> 
>>
>>
>> 
>> > positionIncrementGap="100">
>>   
>> 
>> > words="stopwords.txt" />
>> 
>> 
>>   
>>   
>> 
>> > words="stopwords.txt" />
>> 
>> > synonyms="contentType_synonyms.txt" ignoreCase="true" expand="true"/>
>> 
>>   
>> 
>>
>> 
>>
>>
>>
>> On Thu, Aug 4, 2016 at 3:55 PM, Chris Hostetter <hossman_luc...@fucit.org
>> > wrote:
>>
>>>
>>> you mentioned that the problem only happens when you use xinclude, but
>>> you
>>> havne't shown us hte details of your xinclude -- what exactly does your
>>> schema.xml look like (with the xinclude call) and what exactly does the
>>> file being included look like (entire contents)
>>>
>>> (I suspect the problem you are seeing is realted to the way xinclude
>>> doens't really support "snippets" of malformed xml, and instead requires
>>> some root tag -- i can't imagine what root tag you are using in the
>>> included file that would play nicely with mixing/matching field
>>> declarations. ... using xml entity includes may be a simpler/safer
>>> option)
>>>
>>>
>>>
>>> : Date: Thu, 4 Aug 2016 15:47:00 -0600
>>> : From: John Bickerstaff <j...@johnbickerstaff.com>
>>> : Reply-To: solr-user@lucene.apache.org
>>> : To: solr-user@lucene.apache.org
>>> : Subject: Re: Problems using fieldType text_general in copyField
>>> :
>>> : I would call this a bug...
>>> :
>>> : I'm going out on a limb and say that if you define a field in the
>>> included
>>> : XML file, you will get this error.
>>> :
>>> : As long as the field is defined first in schema.xml, you can
>>> "copyFIeld" it
>>> : or whatever in the include file, but apparently fields MUST be created
>>> in
>>> : the schema.xml file.
>>> :
>>> : That makes use of the include for custom things somewhat moot - at
>>> least in
>>> : my situation.
>>> :
>>> : I'd love to be wrong by the way, but that's what my tests suggest right
>>> : now...
>>> :
>>> : On Thu, Aug 4, 2016 at 1:37 PM, John Bickerstaff <
>>> j...@johnbickerstaff.com>
>>> : wrote:
>>> :
>>> : > Summary:
>>> : >
>>> : > Using xinclude to include an xml file into schema.xml
>>> : >
>>&

Re: Problems using fieldType text_general in copyField

2016-08-04 Thread John Bickerstaff

Found the Entity Includes - thanks.

On Thu, Aug 4, 2016 at 4:22 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Thanks!
>
> The schema is a copy of the techproducts sample.
>
> Entire include here - and I take your point about the possibility of
> malformation - thanks.
>
> I assumed (perhaps wrongly) that I could duplicate the 
>   arrangement from the schema.xml file.
>
> I'm unfamiliar with xml entity includes, but I'll go look them up...
>
> 
> 
>
>
>
> indexed="true" stored="false"/>
> stored="true" multiValued="false"/>
> stored="true" multiValued="false"/>
> multiValued="false"/>
> multiValued="false"/>
>
> stored="true" multiValued="false"/>
>
> multiValued="false"/>
>
> stored="true" multiValued="false"/>
>
>
>
>  stored="true"/>
>
>
> multiValued="true"/> *//HERE IS WHERE "CONTENT" IS DEFINED*
>
> 
> stored="true" multiValued="true"/>
>
>
>
>
>
>
>  /*/THROWING ERROR ABOUT
> "CONTENT" NOT EXISTING HERE*
>
>
>
>
>
>
>
>   
>
> 
>
>
> 
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" />
> 
> 
>   
>   
> 
>  words="stopwords.txt" />
> 
>  synonyms="contentType_synonyms.txt"
> ignoreCase="true" expand="true"/>
> 
>   
> 
>
> 
>
>
>
> On Thu, Aug 4, 2016 at 3:55 PM, Chris Hostetter <hossman_luc...@fucit.org>
> wrote:
>
>>
>> you mentioned that the problem only happens when you use xinclude, but you
>> havne't shown us hte details of your xinclude -- what exactly does your
>> schema.xml look like (with the xinclude call) and what exactly does the
>> file being included look like (entire contents)
>>
>> (I suspect the problem you are seeing is realted to the way xinclude
>> doens't really support "snippets" of malformed xml, and instead requires
>> some root tag -- i can't imagine what root tag you are using in the
>> included file that would play nicely with mixing/matching field
>> declarations. ... using xml entity includes may be a simpler/safer option)
>>
>>
>>
>> : Date: Thu, 4 Aug 2016 15:47:00 -0600
>> : From: John Bickerstaff <j...@johnbickerstaff.com>
>> : Reply-To: solr-user@lucene.apache.org
>> : To: solr-user@lucene.apache.org
>> : Subject: Re: Problems using fieldType text_general in copyField
>> :
>> : I would call this a bug...
>> :
>> : I'm going out on a limb and say that if you define a field in the
>> included
>> : XML file, you will get this error.
>> :
>> : As long as the field is defined first in schema.xml, you can
>> "copyFIeld" it
>> : or whatever in the include file, but apparently fields MUST be created
>> in
>> : the schema.xml file.
>> :
>> : That makes use of the include for custom things somewhat moot - at
>> least in
>> : my situation.
>> :
>> : I'd love to be wrong by the way, but that's what my tests suggest right
>> : now...
>> :
>> : On Thu, Aug 4, 2016 at 1:37 PM, John Bickerstaff <
>> j...@johnbickerstaff.com>
>> : wrote:
>> :
>> : > Summary:
>> : >
>> : > Using xinclude to include an xml file into schema.xml
>> : >
>> : > The following line
>> : >
>> : > 
>> : >
>> : > generates an error:  about a field being "not a glob and not matching
>> an
>> : > explicit field" even though I declare the field in the line just
>> above.
>> : >
>> : > This seems to happen only for for fieldType text_general?
>> : >
>> : > 
>> : >
>> : > Explanation:
>> : >
>> : > I need a little help - keep getting an error when trying to use the
>> : > ability to include an additional XML file.  I may be overlooking
>> something,
>> : > but if so, I need help to see it.
>> : >
>> : > I have the following two lines which throw zero errors when part of
>> : > schema.xml:
>> : >
>> : > > stored="true"
>> : > multiValued=&qu

Re: Problems using fieldType text_general in copyField

2016-08-04 Thread John Bickerstaff

Thanks!

The schema is a copy of the techproducts sample.

Entire include here - and I take your point about the possibility of
malformation - thanks.

I assumed (perhaps wrongly) that I could duplicate the 
  arrangement from the schema.xml file.

I'm unfamiliar with xml entity includes, but I'll go look them up...




   
   
   
   
   
   
   
   
   
   
   

   
   

   
   

   
*//HERE IS WHERE "CONTENT" IS DEFINED*


   

   
   
   
   
   
 /*/THROWING ERROR ABOUT
"CONTENT" NOT EXISTING HERE*
   
   
   

   
   

  






  




  
  





  






On Thu, Aug 4, 2016 at 3:55 PM, Chris Hostetter <hossman_luc...@fucit.org>
wrote:

>
> you mentioned that the problem only happens when you use xinclude, but you
> havne't shown us hte details of your xinclude -- what exactly does your
> schema.xml look like (with the xinclude call) and what exactly does the
> file being included look like (entire contents)
>
> (I suspect the problem you are seeing is realted to the way xinclude
> doens't really support "snippets" of malformed xml, and instead requires
> some root tag -- i can't imagine what root tag you are using in the
> included file that would play nicely with mixing/matching field
> declarations. ... using xml entity includes may be a simpler/safer option)
>
>
>
> : Date: Thu, 4 Aug 2016 15:47:00 -0600
> : From: John Bickerstaff <j...@johnbickerstaff.com>
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: Re: Problems using fieldType text_general in copyField
> :
> : I would call this a bug...
> :
> : I'm going out on a limb and say that if you define a field in the
> included
> : XML file, you will get this error.
> :
> : As long as the field is defined first in schema.xml, you can "copyFIeld"
> it
> : or whatever in the include file, but apparently fields MUST be created in
> : the schema.xml file.
> :
> : That makes use of the include for custom things somewhat moot - at least
> in
> : my situation.
> :
> : I'd love to be wrong by the way, but that's what my tests suggest right
> : now...
> :
> : On Thu, Aug 4, 2016 at 1:37 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> : wrote:
> :
> : > Summary:
> : >
> : > Using xinclude to include an xml file into schema.xml
> : >
> : > The following line
> : >
> : > 
> : >
> : > generates an error:  about a field being "not a glob and not matching
> an
> : > explicit field" even though I declare the field in the line just above.
> : >
> : > This seems to happen only for for fieldType text_general?
> : >
> : > 
> : >
> : > Explanation:
> : >
> : > I need a little help - keep getting an error when trying to use the
> : > ability to include an additional XML file.  I may be overlooking
> something,
> : > but if so, I need help to see it.
> : >
> : > I have the following two lines which throw zero errors when part of
> : > schema.xml:
> : >
> : >  : > multiValued="true"/>
> : >  
> : >
> : > However, when I put this into an include file and use xinclude, then I
> get
> : > this error when starting Solr.
> : >
> : >
> : >
> : >- *statdx_shard1_replica3:* org.apache.solr.common.
> : >SolrException:org.apache.solr.common.SolrException: Could not load
> : >conf for core statdx_shard1_replica3: Can't load schema schema.xml:
> : >copyField source :'content' is not a glob and doesn't match any
> explicit
> : >field or dynamicField.
> : >
> : >
> : > Given that I am defining the field in the line right above the
> copyField
> : > statement, I'm confused about why this works fine in schema.xml but
> NOT in
> : > an included file.
> : >
> : > I experimented and found that any field of type "text_general" will
> throw
> : > this same error if it is part of the included xml file.  Other
> fieldTypes
> : > that I tried (string, int, double) did not have this issue.
> : >
> : > I'm using Solr 5.4, although I'm pulling custom config into an included
> : > file for purposes of moving to 6.1
> : >
> : > I have the following list of copyField commands in the included xml
> file,
> : > and get no errors on any but the "content" one.  It just so happens
> that
> : > "content" is the only field of type "text_general" in there.
> : >
> : >
> : > Any hints greatly appreciated.
> : >
> : >   
> : >
> : >
> : >
> : >
> : >
> : >
> : >
> : >
> : >
> :
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Problems using fieldType text_general in copyField

2016-08-04 Thread John Bickerstaff

I would call this a bug...

I'm going out on a limb and say that if you define a field in the included
XML file, you will get this error.

As long as the field is defined first in schema.xml, you can "copyFIeld" it
or whatever in the include file, but apparently fields MUST be created in
the schema.xml file.

That makes use of the include for custom things somewhat moot - at least in
my situation.

I'd love to be wrong by the way, but that's what my tests suggest right
now...

On Thu, Aug 4, 2016 at 1:37 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Summary:
>
> Using xinclude to include an xml file into schema.xml
>
> The following line
>
> 
>
> generates an error:  about a field being "not a glob and not matching an
> explicit field" even though I declare the field in the line just above.
>
> This seems to happen only for for fieldType text_general?
>
> 
>
> Explanation:
>
> I need a little help - keep getting an error when trying to use the
> ability to include an additional XML file.  I may be overlooking something,
> but if so, I need help to see it.
>
> I have the following two lines which throw zero errors when part of
> schema.xml:
>
>  multiValued="true"/>
>  
>
> However, when I put this into an include file and use xinclude, then I get
> this error when starting Solr.
>
>
>
>- *statdx_shard1_replica3:* org.apache.solr.common.
>SolrException:org.apache.solr.common.SolrException: Could not load
>conf for core statdx_shard1_replica3: Can't load schema schema.xml:
>copyField source :'content' is not a glob and doesn't match any explicit
>field or dynamicField.
>
>
> Given that I am defining the field in the line right above the copyField
> statement, I'm confused about why this works fine in schema.xml but NOT in
> an included file.
>
> I experimented and found that any field of type "text_general" will throw
> this same error if it is part of the included xml file.  Other fieldTypes
> that I tried (string, int, double) did not have this issue.
>
> I'm using Solr 5.4, although I'm pulling custom config into an included
> file for purposes of moving to 6.1
>
> I have the following list of copyField commands in the included xml file,
> and get no errors on any but the "content" one.  It just so happens that
> "content" is the only field of type "text_general" in there.
>
>
> Any hints greatly appreciated.
>
>   
>
>
>
>
>
>
>
>
>

Problems using fieldType text_general in copyField

2016-08-04 Thread John Bickerstaff

Summary:

Using xinclude to include an xml file into schema.xml

The following line



generates an error:  about a field being "not a glob and not matching an
explicit field" even though I declare the field in the line just above.

This seems to happen only for for fieldType text_general?



Explanation:

I need a little help - keep getting an error when trying to use the ability
to include an additional XML file.  I may be overlooking something, but if
so, I need help to see it.

I have the following two lines which throw zero errors when part of
schema.xml:


 

However, when I put this into an include file and use xinclude, then I get
this error when starting Solr.



   - *statdx_shard1_replica3:*
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
   Could not load conf for core statdx_shard1_replica3: Can't load schema
   schema.xml: copyField source :'content' is not a glob and doesn't match any
   explicit field or dynamicField.


Given that I am defining the field in the line right above the copyField
statement, I'm confused about why this works fine in schema.xml but NOT in
an included file.

I experimented and found that any field of type "text_general" will throw
this same error if it is part of the included xml file.  Other fieldTypes
that I tried (string, int, double) did not have this issue.

I'm using Solr 5.4, although I'm pulling custom config into an included
file for purposes of moving to 6.1

I have the following list of copyField commands in the included xml file,
and get no errors on any but the "content" one.  It just so happens that
"content" is the only field of type "text_general" in there.


Any hints greatly appreciated.

Re: how to specify a tailored schema.xml

2016-07-29 Thread John Bickerstaff

@ Immanuel --

Not sure if this is what you're asking for, but inside /opt/solr/bin is a
post script that allows you to "post" the contents of a file to Solr..

./post -help will get you the list of options.  You could submit films.xml
for indexing this way I believe.

On Fri, Jul 29, 2016 at 9:57 AM, Immanuel Normann <
immanuel.norm...@gmail.com> wrote:

> Ok, I just figured out myself: I have to "commit" the newly updated files.
> One way (probably not the best) is to restart solr again. The alternative
> way: For an autocommit there is an option to configure in solrconfig.xml.
>
> Now the query returns results.
>
> Is there a method to re-index a single sollection on the command line or
> via http request without restarting solr?
>
> 2016-07-29 17:48 GMT+02:00 Immanuel Normann :
>
> > Thanks Alexandre for your minimal config example!
> >
> > I am trying to use it as start to understanding, but I cannot get it
> > running. To make it more explicit:
> >
> > I am running a freshly installed solr 6.1.0. Suppose I am in its home
> > directory for the following steps:
> >
> > solr-6.1.0$ bin/solr start
> >
> > solr-6.1.0$ bin/solr create -c cinema
> >
> > solr-6.1.0$ cd server/solr/cinema/conf
> >
> > solr-6.1.0/server/solr/cinema/conf$ ls
> >
> > currency.xml  elevate.xml  lang  params.json  protwords.txt
> > managed-schema  solrconfig.xml  stopwords.txt  synonyms.txt
> >
> > Here I replace managed-schema by your minimal schema.xml and
> > solrconfig.xml by your minimal solrconfig.xml and restart solr (don't
> know
> > whether this is actually necessary to activate the new config files).
> >
> > solr-6.1.0$ bin/solr restart
> >
> > solr-6.1.0$ curl http://localhost:8983/solr/cinema/update -H
> > "Content-Type: text/xml" --data-binary @example/films/films.xml
> >
> > 
> > 
> > 0 > name="QTime">123
> > 
> >
> > So no complains from solr! But the response comes too quick in my
> opinion.
> > And in fact the data folder still contains an empty index and empty tlog
> > subfolder. Consequently queries fail, too:
> >
> > $ curl http://localhost:8983/solr/cinema/select?q=genre:Drama
> >
> > 
> > 
> > 0 > name="QTime">1 > start="0">
> > 
> >
> > What am I doing wrong?
> >
> > Regards, Immanuel
> >
> >
> >
> >
> > 2016-07-29 14:51 GMT+02:00 Alexandre Rafalovitch :
> >
> >> I have the minimal 5.5 version that should work with 6.1 at:
> >>
> >>
> https://github.com/arafalov/simplest-solr-config/tree/master/solr-5.5/configset
> >>
> >> It is obviously not a good production setup (e.g. no cache), but could
> >> be a start to understanding. It uses classical schema.xml approach,
> >> and not a dynamic one.
> >>
> >> Regards,
> >> Alex.
> >> 
> >> Newsletter and resources for Solr beginners and intermediates:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 29 July 2016 at 20:54, Immanuel Normann 
> >> wrote:
> >> > Hi,
> >> >
> >> > I am a returner to solr with limited experience in solr-5.2 now diving
> >> into
> >> > solr-6.1. My problem is
> >> > how to specify a tailored schema.xml
> >> >
> >> > After reading several tutorials and book chapters about how to
> configure
> >> > schema.xml I have a basic understanding about its concepts and
> >> structure.
> >> >
> >> > Now I created as exercise a core "cinema" where I intended to load the
> >> > example/films/films.xml using the command:
> >> >
> >> > bin/solr create -c cinema
> >> >
> >> > this creates server/solr/cinema and therein conf/managed-schema. The
> >> > comment inside managed-schema says: 'This is the Solr schema file.
> This
> >> > file should be named "schema.xml"' and "This example schema is the
> >> > recommended starting point for users."
> >> >
> >> > Unfortunately I have a hard time to make use of managed-schema as
> >> starting
> >> > point! The problem is that I want to understand how to configure a
> >> > lightweight schema.xml which is tailored to a doc structure which is
> >> pretty
> >> > much under my control. For instance, the films.xml docs have such a
> >> simple
> >> > structure that it should be sufficient to have a simple schema.xml as
> >> that:
> >> >
> >> > 
> >> > 
> >> >  >> > multiValued="false"/>
> >> >  >> > stored="true" multiValued="true"/>
> >> >  >> > multiValued="false"/>
> >> >  >> > multiValued="true"/>
> >> >  >> > stored="true"/>
> >> > 
> >> > id
> >> >  >> sortMissingLast="true" />
> >> >  precisionStep="0"
> >> > positionIncrementGap="0"/>
> >> > 
> >> >
> >> > However, the managed-schema provided in
> >> > example/techproducts/solr/films/conf has 480 lines instead of my 12
> >> lines.
> >> > It is full of fieldType and dynamicField specification that never
> apply
> >> for
> >> > this data.
> >> >
> >> > Unfortunately my schema.xml doesn't work with the rest of the conf
> >> setting
> >> > that is generated with
> >> > bin/solr create -c cinema. The problem seems to be the

Re: Solr 6 managed-schema & version control

2016-07-29 Thread John Bickerstaff

Gratzi!

On Jul 28, 2016 9:14 PM, "Rachid Bouacheria" <willi...@gmail.com> wrote:

> Jonn,
> I don't want to answer on Erick's behalf, but I am pretty sure there is no
> UI built in solr 6 that allows you to update the schema and somehow check
> it in VCS. I would guess that you could do this by exposing an MBean.
> Anyway that's how I interpreted Erick's reply.
>
>
> On Wed, Jul 27, 2016 at 6:01 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Erick - the UI you mention -- something that exists or something that has
> > to be built?  (I'm upgrading to version 6 as well and this question is
> one
> > I'll have to deal with...)
> >
> > On Wed, Jul 27, 2016 at 5:31 PM, Rachid Bouacheria <willi...@gmail.com>
> > wrote:
> >
> > > Thank you very much Erick, I appreciate your feed back.
> > >
> > > On Wed, Jul 27, 2016 at 2:24 PM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > > > Using classic schema is perfectly acceptable/reasonable, you can
> > > > continue to do so freely (you'll have to change to
> > > > ClassicSchemaFactory though).
> > > >
> > > > Also, you can freely edit managed-schema just as you did schema.xml.
> > > > The "trick" here is that you have to take some care _not_ to issue
> > > > commands that modify the schema or the in-memory version will
> > > > overwrite the one in ZK. Otherwise, though, you can freely use
> > > > managed-schema just as you do classic schema.
> > > >
> > > > So you can do just what you do now, keep managed-schema in VCS and
> > > > upconfig it. Also note that Solr 6.2 has "bin/solr zk
> > > > upconfig/downconfig/cp/mv/ls" functionality.
> > > >
> > > > Managed lends itself to some kind of UI that maintains it. The
> process
> > > > (IMO) for using that in prod would be something like:
> > > > > Use the UI to build your schema
> > > > > copy from ZK to your local machine
> > > > > put the configs in VCS
> > > > > Deploy using the VCS as your system-of-record.
> > > >
> > > > But that's just my approach. If you don't want to use the
> > > > managed-schema features, switch back to classic IMO.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Jul 27, 2016 at 11:37 AM, Rachid Bouacheria <
> > willi...@gmail.com>
> > > > wrote:
> > > > > Hi All,
> > > > >
> > > > > I am upgrading from solr 4 to 6.
> > > > > In solr 4 I have a schema.xml that is under version control.
> > > > > But solr 6 has the notion of a managed schema that could be
> modified
> > > via
> > > > a
> > > > > solr api call.
> > > > > This seems great and flexible, but my assumption is that in this
> case
> > > > > zookeeper becomes the authoritative copy and not SVN or Git.
> > > > >
> > > > > And this is where things become unclear to me.
> > > > > Is the expectation to download the configuration from zk the same
> way
> > > we
> > > > do
> > > > > an svn checkout to have the configuration and run locally?
> > > > > How do we know who changed what and when?
> > > > >
> > > > > I know that there still is the option to use schema.xml by using
> > > > > the ClassicIndexSchemaFactory but I am curious to hear from y'all
> > that
> > > > use
> > > > > managed schema how you are doing it and if there are any downside,
> > > > gotchas,
> > > > > or if all is just much better :-)
> > > > >
> > > > > Seems to me that running locally is harder as you cannot just
> > checkout
> > > a
> > > > > project that contains the up to date schema.
> > > > >
> > > > > Thank you,
> > > > > Rachid.
> > > >
> > >
> >
>

Re: Solr 6 managed-schema & version control

2016-07-27 Thread John Bickerstaff

Erick - the UI you mention -- something that exists or something that has
to be built?  (I'm upgrading to version 6 as well and this question is one
I'll have to deal with...)

On Wed, Jul 27, 2016 at 5:31 PM, Rachid Bouacheria 
wrote:

> Thank you very much Erick, I appreciate your feed back.
>
> On Wed, Jul 27, 2016 at 2:24 PM, Erick Erickson 
> wrote:
>
> > Using classic schema is perfectly acceptable/reasonable, you can
> > continue to do so freely (you'll have to change to
> > ClassicSchemaFactory though).
> >
> > Also, you can freely edit managed-schema just as you did schema.xml.
> > The "trick" here is that you have to take some care _not_ to issue
> > commands that modify the schema or the in-memory version will
> > overwrite the one in ZK. Otherwise, though, you can freely use
> > managed-schema just as you do classic schema.
> >
> > So you can do just what you do now, keep managed-schema in VCS and
> > upconfig it. Also note that Solr 6.2 has "bin/solr zk
> > upconfig/downconfig/cp/mv/ls" functionality.
> >
> > Managed lends itself to some kind of UI that maintains it. The process
> > (IMO) for using that in prod would be something like:
> > > Use the UI to build your schema
> > > copy from ZK to your local machine
> > > put the configs in VCS
> > > Deploy using the VCS as your system-of-record.
> >
> > But that's just my approach. If you don't want to use the
> > managed-schema features, switch back to classic IMO.
> >
> > Best,
> > Erick
> >
> >
> >
> >
> >
> > On Wed, Jul 27, 2016 at 11:37 AM, Rachid Bouacheria 
> > wrote:
> > > Hi All,
> > >
> > > I am upgrading from solr 4 to 6.
> > > In solr 4 I have a schema.xml that is under version control.
> > > But solr 6 has the notion of a managed schema that could be modified
> via
> > a
> > > solr api call.
> > > This seems great and flexible, but my assumption is that in this case
> > > zookeeper becomes the authoritative copy and not SVN or Git.
> > >
> > > And this is where things become unclear to me.
> > > Is the expectation to download the configuration from zk the same way
> we
> > do
> > > an svn checkout to have the configuration and run locally?
> > > How do we know who changed what and when?
> > >
> > > I know that there still is the option to use schema.xml by using
> > > the ClassicIndexSchemaFactory but I am curious to hear from y'all that
> > use
> > > managed schema how you are doing it and if there are any downside,
> > gotchas,
> > > or if all is just much better :-)
> > >
> > > Seems to me that running locally is harder as you cannot just checkout
> a
> > > project that contains the up to date schema.
> > >
> > > Thank you,
> > > Rachid.
> >
>

Re: Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread John Bickerstaff

@Michael - somewhere there should be a "conf" directory for your SOLR
instance.  For my Dev efforts, I moved it to a different directory and I
forget where it was, originally -- but if you search for solrconfig.xml or
schema.xml, you should find it.

It could be on your servers (or on only one of them) or, if someone has
done a really good job, it's in source control somewhere...

On Tue, Jul 26, 2016 at 2:17 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

>   indexed="true" stored="true" required="true" multiValued="false" />
>
> and further on in the file...
>
> 
> id
>
>
> On Tue, Jul 26, 2016 at 2:17 PM, John Bickerstaff <
> j...@johnbickerstaff.com> wrote:
>
>> I don't see a managed schema file.  As far as I understand it, id is set
>> as a "uniqueKey" in the schema.xml file...
>>
>> On Tue, Jul 26, 2016 at 2:11 PM, Michael Joyner <mich...@newsrx.com>
>> wrote:
>>
>>> ok, I think I need to do a manual edit on the managed-schema file but I
>>> get "NoNode" for /managed-schema when trying to use the zkcli.sh file?
>>>
>>>
>>> How can I get to this file and edit it?
>>>
>>>
>>> On 07/26/2016 03:05 PM, Alexandre Drouin wrote:
>>>
>>>> Hello,
>>>>
>>>> You may have a uniqueKey that points to a field that do not exists
>>>> anymore.  You can try adding an "id" field using Solr's UI or the schema
>>>> API since you are using the managed-schema.
>>>>
>>>>
>>>> Alexandre Drouin
>>>>
>>>> -Original Message-
>>>> From: Michael Joyner [mailto:mich...@newsrx.com]
>>>> Sent: July 26, 2016 2:34 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Can't load schema managed-schema: unknown field 'id'
>>>>
>>>> |Help!|
>>>>
>>>> |
>>>> |
>>>>
>>>> |What is the best way to recover from: |
>>>>
>>>> Can't load schema managed-schema: unknown field 'id'
>>>> |I was managing the schema on a test collection, fat fingered it, but
>>>> now
>>>> I find out the schema ops seem to altering all collections on the core?
>>>> SolrCloud 5.5.1 |||
>>>>
>>>> |
>>>> -Mike|||
>>>>
>>>
>>>
>>
>

Re: Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread John Bickerstaff

 

and further on in the file...

 <
uniqueKey>id


On Tue, Jul 26, 2016 at 2:17 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> I don't see a managed schema file.  As far as I understand it, id is set
> as a "uniqueKey" in the schema.xml file...
>
> On Tue, Jul 26, 2016 at 2:11 PM, Michael Joyner <mich...@newsrx.com>
> wrote:
>
>> ok, I think I need to do a manual edit on the managed-schema file but I
>> get "NoNode" for /managed-schema when trying to use the zkcli.sh file?
>>
>>
>> How can I get to this file and edit it?
>>
>>
>> On 07/26/2016 03:05 PM, Alexandre Drouin wrote:
>>
>>> Hello,
>>>
>>> You may have a uniqueKey that points to a field that do not exists
>>> anymore.  You can try adding an "id" field using Solr's UI or the schema
>>> API since you are using the managed-schema.
>>>
>>>
>>> Alexandre Drouin
>>>
>>> -Original Message-
>>> From: Michael Joyner [mailto:mich...@newsrx.com]
>>> Sent: July 26, 2016 2:34 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Can't load schema managed-schema: unknown field 'id'
>>>
>>> |Help!|
>>>
>>> |
>>> |
>>>
>>> |What is the best way to recover from: |
>>>
>>> Can't load schema managed-schema: unknown field 'id'
>>> |I was managing the schema on a test collection, fat fingered it, but now
>>> I find out the schema ops seem to altering all collections on the core?
>>> SolrCloud 5.5.1 |||
>>>
>>> |
>>> -Mike|||
>>>
>>
>>
>

Re: Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread John Bickerstaff

I don't see a managed schema file.  As far as I understand it, id is set as
a "uniqueKey" in the schema.xml file...

On Tue, Jul 26, 2016 at 2:11 PM, Michael Joyner  wrote:

> ok, I think I need to do a manual edit on the managed-schema file but I
> get "NoNode" for /managed-schema when trying to use the zkcli.sh file?
>
>
> How can I get to this file and edit it?
>
>
> On 07/26/2016 03:05 PM, Alexandre Drouin wrote:
>
>> Hello,
>>
>> You may have a uniqueKey that points to a field that do not exists
>> anymore.  You can try adding an "id" field using Solr's UI or the schema
>> API since you are using the managed-schema.
>>
>>
>> Alexandre Drouin
>>
>> -Original Message-
>> From: Michael Joyner [mailto:mich...@newsrx.com]
>> Sent: July 26, 2016 2:34 PM
>> To: solr-user@lucene.apache.org
>> Subject: Can't load schema managed-schema: unknown field 'id'
>>
>> |Help!|
>>
>> |
>> |
>>
>> |What is the best way to recover from: |
>>
>> Can't load schema managed-schema: unknown field 'id'
>> |I was managing the schema on a test collection, fat fingered it, but now
>> I find out the schema ops seem to altering all collections on the core?
>> SolrCloud 5.5.1 |||
>>
>> |
>> -Mike|||
>>
>
>

Re: Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread John Bickerstaff

@Michael - if you're on Linux and decide to take Alexandre's advice, I can
possibly save you some time.  I wrestled with getting the data in and out
of zookeeper a while ago...

sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -confdir
/home/john/conf/ -confname collectionName -z 192.168.56.5/solr5_4

Explanation:

sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig = run
the code that sends config files (whatever files you modify)over to
Zookeeper

-confdir /home/john/conf/ = find the configuration directory here

-confname collectionName  = apply the configuration to this collection name

-z 192.168.56.5/solr5_4 - find Zookeeper here - and use the solr5_4
"chroot" which already exists in Zookeeper  (If you don't have chroot in
Zookeeper, ignore and don't use the slash)





On Tue, Jul 26, 2016 at 1:55 PM, Alexandre Drouin <
alexandre.dro...@orckestra.com> wrote:

> Other than deleting the collection, I think you'll have to edit the
> manage-schema file manually.
>
> Since you are using SolrCloud you will need to use Solr's zkcli (
> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities)
> utility to download and upload the file from ZooKeeper.
>
>
> Alexandre Drouin
>
>
> -Original Message-
> From: Michael Joyner [mailto:mich...@newsrx.com]
> Sent: July 26, 2016 3:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can't load schema managed-schema: unknown field 'id'
> Importance: High
>
> Same error via the UI:
>
> Can't load schema managed-schema: unknown field 'id'
>
>
> On 07/26/2016 03:05 PM, Alexandre Drouin wrote:
> > Hello,
> >
> > You may have a uniqueKey that points to a field that do not exists
> anymore.  You can try adding an "id" field using Solr's UI or the schema
> API since you are using the managed-schema.
> >
> >
> > Alexandre Drouin
> >
> > -Original Message-
> > From: Michael Joyner [mailto:mich...@newsrx.com]
> > Sent: July 26, 2016 2:34 PM
> > To: solr-user@lucene.apache.org
> > Subject: Can't load schema managed-schema: unknown field 'id'
> >
> > |Help!|
> >
> > |
> > |
> >
> > |What is the best way to recover from: |
> >
> > Can't load schema managed-schema: unknown field 'id'
> > |I was managing the schema on a test collection, fat fingered it, but now
> > I find out the schema ops seem to altering all collections on the core?
> > SolrCloud 5.5.1 |||
> >
> > |
> > -Mike|||
>
>

Re: SolrCloud start up script

2016-06-23 Thread John Bickerstaff

Good luck!  Let me know if you run into any more trouble.

On Thu, Jun 23, 2016 at 4:01 PM, Jose-Marcio Martins da Cruz <
jose-marcio.mart...@mines-paristech.fr> wrote:

>
> Thank you John ! This is a good start.
>
> I understand that is seems more complicated than I though, but... Got the
> way !!!
>
> Regards
>
> José-Marcio
>
>
> On 06/23/2016 10:44 PM, John Bickerstaff wrote:
>
>> So, if you installed with the install script (warning: I used 5.4 but I
>> think everything is the same) and add this setting in your solr.in.sh
>> file
>> your solr boxes should start in cloud mode when the server starts up.
>>
>> The -c option also works, but for a production type installation, it's a
>> lot easier to have it "just start" when you start the server and issue
>> service solr restart when you want to restart...
>>
>> There is an instruction page for "installing Solr for Production" which
>> you
>> may want to look at...
>>
>> https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
>>
>> Pay special attention to the part that talks about ZK_HOME and the chroot
>> suggestion.  Using the chroot on the end of the list of ZK hosts keeps
>> your
>> solr "data files" separate from everything else in Zookeeper - which is
>> very handy if you're using Zookeeper for other tools.
>>
>> I wrestled through all of this a few months back - take a look at the
>> "Install Solr for Prod" page - I think that will help - then ping back if
>> you need more assistance getting it to run in cloud mode automatically.
>>
>> On Thu, Jun 23, 2016 at 2:36 PM, Jose-Marcio Martins da Cruz <
>> jose-marcio.mart...@mines-paristech.fr> wrote:
>>
>>
>>> Hi John,
>>>
>>> On 06/23/2016 10:18 PM, John Bickerstaff wrote:
>>>
>>> Jose,
>>>>
>>>> There is a setting in the solr.in.sh script that should make Solr start
>>>> in
>>>> "cloud" mode...
>>>>
>>>> It's ZK_HOST
>>>>
>>>> That's where you list the IP addresses (or hostnames) of your zookeeper
>>>> machines...
>>>>
>>>> Is this set?
>>>>
>>>>
>>> Non ! :-(
>>>
>>> I though I should just launch solr with "-c" option...
>>>
>>>
>>> What version of Solr are you using?
>>>>
>>>>
>>> The last one : 6.1.0; But I tried also with 6.0.1.
>>>
>>>
>>>
>>> On Thu, Jun 23, 2016 at 2:13 PM, Jose-Marcio Martins da Cruz <
>>>> jose-marcio.mart...@mines-paristech.fr> wrote:
>>>>
>>>>
>>>> Hello,
>>>>>
>>>>> I have a quite dumb question. I'm new to Nutch/Solr and we're migrating
>>>>> our Web indexer from a commercial product to Nutch/Solr.  I haven't yet
>>>>> understood all internal I need. After spending some time, I have a
>>>>> problem...
>>>>>
>>>>> I've installed Nutch/Solr and everything works fine with Solr in
>>>>> standalone mode. I've even installed Solr with the
>>>>> install_solr_service.sh
>>>>> script. Everything is fine.
>>>>>
>>>>> Now, I want to go further and pass Solr to cloud mode. I haven't found
>>>>> an
>>>>> elegant way to modify /etc/init.d/solr script nor /etc/default/
>>>>> solr.in.sh
>>>>> script in order to launch Solr in cloud mode. But I succeeded to start
>>>>> it
>>>>> with some dirty workaround.
>>>>>
>>>>> Can you point me some hints or links on how to do this cleanly ?
>>>>>
>>>>> Remark : the only reason I'm trying to migrate to SolrCloud is to be
>>>>> able
>>>>> to do Basic Auth. We've just around 30,000 documents to index and
>>>>> standalone is enough to us.
>>>>>
>>>>> Regards
>>>>>
>>>>> JMarcio
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-23 Thread John Bickerstaff

I'll add my vote that reading the book will really expand your
understanding of search and "Relevance".

If you're working in the Search space, this book is really worth your time!

On Thu, Jun 23, 2016 at 2:59 PM, MaryJo Sminkey  wrote:

> > For someone familiar with Solr, will it be an issue to run those examples
> > in a Solr instance instead ?
> >
>
> You can't run ES code on Solr, and the syntax is quite different, so you do
> have to figure out how to convert it yourself and it's not always very
> obvious. There is an appendix that covers the differences in how you would
> do the same searches/etc in Solr chapter by chapter but it's certainly not
> a comprehensive comparison of the code examples, I did notice for instance
> since I'm working on synonyms right now that it didn't have any mention of
> the issues on Solr with multi-term synonyms or even how to do synonyms on
> Solr. I definitely would have loved to get a copy of the book that was more
> specific to Solr but I still would recommend it for the material that it
> does have and how to think about relevancy and work to improve your
> results. But it's definitely more geared for intermediate to advanced users
> that already have a good handle on all the elements of Solr and how to
> write code for them.
>
> HTH
>
> MJ
>
>
>
> Sent with MailTrack
> <
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> >
>

Re: SolrCloud start up script

2016-06-23 Thread John Bickerstaff

So, if you installed with the install script (warning: I used 5.4 but I
think everything is the same) and add this setting in your solr.in.sh file
your solr boxes should start in cloud mode when the server starts up.

The -c option also works, but for a production type installation, it's a
lot easier to have it "just start" when you start the server and issue
service solr restart when you want to restart...

There is an instruction page for "installing Solr for Production" which you
may want to look at...

https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production

Pay special attention to the part that talks about ZK_HOME and the chroot
suggestion.  Using the chroot on the end of the list of ZK hosts keeps your
solr "data files" separate from everything else in Zookeeper - which is
very handy if you're using Zookeeper for other tools.

I wrestled through all of this a few months back - take a look at the
"Install Solr for Prod" page - I think that will help - then ping back if
you need more assistance getting it to run in cloud mode automatically.

On Thu, Jun 23, 2016 at 2:36 PM, Jose-Marcio Martins da Cruz <
jose-marcio.mart...@mines-paristech.fr> wrote:

>
> Hi John,
>
> On 06/23/2016 10:18 PM, John Bickerstaff wrote:
>
>> Jose,
>>
>> There is a setting in the solr.in.sh script that should make Solr start
>> in
>> "cloud" mode...
>>
>> It's ZK_HOST
>>
>> That's where you list the IP addresses (or hostnames) of your zookeeper
>> machines...
>>
>> Is this set?
>>
>
> Non ! :-(
>
> I though I should just launch solr with "-c" option...
>
>
>> What version of Solr are you using?
>>
>
> The last one : 6.1.0; But I tried also with 6.0.1.
>
>
>
>> On Thu, Jun 23, 2016 at 2:13 PM, Jose-Marcio Martins da Cruz <
>> jose-marcio.mart...@mines-paristech.fr> wrote:
>>
>>
>>> Hello,
>>>
>>> I have a quite dumb question. I'm new to Nutch/Solr and we're migrating
>>> our Web indexer from a commercial product to Nutch/Solr.  I haven't yet
>>> understood all internal I need. After spending some time, I have a
>>> problem...
>>>
>>> I've installed Nutch/Solr and everything works fine with Solr in
>>> standalone mode. I've even installed Solr with the
>>> install_solr_service.sh
>>> script. Everything is fine.
>>>
>>> Now, I want to go further and pass Solr to cloud mode. I haven't found an
>>> elegant way to modify /etc/init.d/solr script nor /etc/default/
>>> solr.in.sh
>>> script in order to launch Solr in cloud mode. But I succeeded to start it
>>> with some dirty workaround.
>>>
>>> Can you point me some hints or links on how to do this cleanly ?
>>>
>>> Remark : the only reason I'm trying to migrate to SolrCloud is to be able
>>> to do Basic Auth. We've just around 30,000 documents to index and
>>> standalone is enough to us.
>>>
>>> Regards
>>>
>>> JMarcio
>>>
>>>
>>>
>>>
>>>
>>
>

Re: SolrCloud start up script

2016-06-23 Thread John Bickerstaff

Jose,

There is a setting in the solr.in.sh script that should make Solr start in
"cloud" mode...

It's ZK_HOST

That's where you list the IP addresses (or hostnames) of your zookeeper
machines...

Is this set?

What version of Solr are you using?

On Thu, Jun 23, 2016 at 2:13 PM, Jose-Marcio Martins da Cruz <
jose-marcio.mart...@mines-paristech.fr> wrote:

>
> Hello,
>
> I have a quite dumb question. I'm new to Nutch/Solr and we're migrating
> our Web indexer from a commercial product to Nutch/Solr.  I haven't yet
> understood all internal I need. After spending some time, I have a
> problem...
>
> I've installed Nutch/Solr and everything works fine with Solr in
> standalone mode. I've even installed Solr with the install_solr_service.sh
> script. Everything is fine.
>
> Now, I want to go further and pass Solr to cloud mode. I haven't found an
> elegant way to modify /etc/init.d/solr script nor /etc/default/solr.in.sh
> script in order to launch Solr in cloud mode. But I succeeded to start it
> with some dirty workaround.
>
> Can you point me some hints or links on how to do this cleanly ?
>
> Remark : the only reason I'm trying to migrate to SolrCloud is to be able
> to do Basic Auth. We've just around 30,000 documents to index and
> standalone is enough to us.
>
> Regards
>
> JMarcio
>
>
>
>

Re: Solr Query Processing Detail

2016-06-23 Thread John Bickerstaff

Perfect - got it.

Now I know where I stand when I'm writing the bq, bf, etc...

Thank you very much.

On Thu, Jun 23, 2016 at 1:35 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq:
>
> My big question however is whether I can
> trust that the Boost Functions and
> Boost Query are running against the entire index
>
>
> In a word "no". The bf's are intended to alter
> the score of the documents found by the primary query
> by multiplying the function results by the raw score.
> If a document doesn't satisfy the primary
> query, it's score is zero so there's no point in dealing with
> the bf clauses. So the question of whether
> the bf's run against the entire index doesn't
> "make sense" in the Solr context.
>
> Another way of thinking about it is:
> "for all the docs that satisfy the primary query,
> multiply their score by the results of the bf
> specified".
>
> Best,
> Erick
>
>
> On Wed, Jun 22, 2016 at 3:04 PM, John Bickerstaff
> <j...@johnbickerstaff.com> wrote:
> > Oh - gotcha...  Thanks for taking the time to reply.  My use of the
> phrase
> > "sub query" is probably misleading...
> >
> > Here's the XML (below).  I'm calling the Boost Query and Boost Function
> > statements "sub queries"...
> >
> > The thing I was referencing was this -- where I create an "alias" for the
> > query (titleQuery) and then use it in boost function on the next line...
> >
> > *{!edismax qf='title' bf='' bq='' v=$q}*
> > *product(query($titleQuery),4)*
> >
> >
> > My big question however is whether I can trust that the Boost Functions
> and
> > Boost Query are running against the entire index (what I would prefer).
> >
> > The alternative (it seems to me) is that the default queryParser (edismax
> > in this case) gathers a primary data set and the subsequent queries ONLY
> > query on that set, not the entire index.
> >
> > I can't find documentation anywhere (so far) that covers things in that
> > depth...  I am seeing some behavior in this requestHandler's results that
> > makes me wonder what happens under the covers...
> >
> >
> > =
> >
> > BTW - That's a nice finesse move with the fq clause!  I'll file that away
> > for future reference for sure...
> >
> > 
> >
> > 
> > 
> >  
> >all
> >20
> >
> >
> >edismax
> >*:*
> >20
> >meta_doc_type:chapterDoc
> >id category_weight title category_ss score
> > contentType
> >
> > *I'm calling the things below "sub queries" but that may be
> misleading...?*
> >
> >   *{!synonym_edismax qf='text' synonyms='true'
> > synonyms.originalBoost='1.2' *
> > * synonyms.synonymBoost='1.1' bf=''
> > bq='' v=$q}*
> >
> > *   {!edismax qf='title' bf='' bq=''
> v=$q}*
> > *   product(query($titleQuery),4)*
> > *   product(field(category_weight),20)*
> > *   text contentType^1000*
> >
> >python
> >true
> >true
> >true
> >all
> >
> > On Wed, Jun 22, 2016 at 2:34 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> >> John:
> >>
> >> I'm not objecting to the XML, but to the very presence of "more than
> >> one query in a request handler". Request handlers don't have, AFAIK,
> >> "query chains". They have a list of defaults for the _single_ query
> >> being sent at a time to that handler. So having
> >>
> >>  blah blah 
> >>
> >> is something I've never seen before, thus my weaseling that it may be
> >> functionality that's new ;).
> >>
> >> And also, AFAIK, there's no real sense of query chaining except
> >> for the Rerank stuff.
> >>
> >> That said, if you do want to use the results of one query in another
> >> you can just put the whole thing into an fq clause (perhaps with
> >> {!cache=false}. At that point you'd get back the top N does
> >> that made it through the fq clause in score order.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Jun 22, 2016 at 3:20 PM, John Bickerstaff
> >> <j...@johnbickerstaff.com> wrote:
> >> > Hi Erick -
> >> >
> >> > I was trying to simplify and not waste anyone's time parsing my
> >> > requestHandle

Re: Solr Query Processing Detail

2016-06-22 Thread John Bickerstaff

Oh - gotcha...  Thanks for taking the time to reply.  My use of the phrase
"sub query" is probably misleading...

Here's the XML (below).  I'm calling the Boost Query and Boost Function
statements "sub queries"...

The thing I was referencing was this -- where I create an "alias" for the
query (titleQuery) and then use it in boost function on the next line...

*{!edismax qf='title' bf='' bq='' v=$q}*
*product(query($titleQuery),4)*


My big question however is whether I can trust that the Boost Functions and
Boost Query are running against the entire index (what I would prefer).

The alternative (it seems to me) is that the default queryParser (edismax
in this case) gathers a primary data set and the subsequent queries ONLY
query on that set, not the entire index.

I can't find documentation anywhere (so far) that covers things in that
depth...  I am seeing some behavior in this requestHandler's results that
makes me wonder what happens under the covers...


=

BTW - That's a nice finesse move with the fq clause!  I'll file that away
for future reference for sure...





 
   all
   20
   

   edismax
   *:*
   20
   meta_doc_type:chapterDoc
   id category_weight title category_ss score
contentType

*I'm calling the things below "sub queries" but that may be misleading...?*

  *{!synonym_edismax qf='text' synonyms='true'
synonyms.originalBoost='1.2' *
* synonyms.synonymBoost='1.1' bf=''
bq='' v=$q}*

*   {!edismax qf='title' bf='' bq='' v=$q}*
*   product(query($titleQuery),4)*
*   product(field(category_weight),20)*
*   text contentType^1000*

   python
   true
   true
   true
   all

On Wed, Jun 22, 2016 at 2:34 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> John:
>
> I'm not objecting to the XML, but to the very presence of "more than
> one query in a request handler". Request handlers don't have, AFAIK,
> "query chains". They have a list of defaults for the _single_ query
> being sent at a time to that handler. So having
>
>  blah blah 
>
> is something I've never seen before, thus my weaseling that it may be
> functionality that's new ;).
>
> And also, AFAIK, there's no real sense of query chaining except
> for the Rerank stuff.
>
> That said, if you do want to use the results of one query in another
> you can just put the whole thing into an fq clause (perhaps with
> {!cache=false}. At that point you'd get back the top N does
> that made it through the fq clause in score order.
>
> Best,
> Erick
>
> On Wed, Jun 22, 2016 at 3:20 PM, John Bickerstaff
> <j...@johnbickerstaff.com> wrote:
> > Hi Erick -
> >
> > I was trying to simplify and not waste anyone's time parsing my
> > requestHandler...  That is, as you imply, bogus xml.
> >
> > The basic question is:  If I have two "sub queries" in a single
> > requestHandler, do they both run independently against the entire index?
> >
> > Alternatively, is there some kind of "chain of results" whereby (a la
> SQL)
> > earlier result sets are passed to later subqueries and therefore those
> > later subqueries are ONLY looking at the results of earlier subqueries?
> >
> > I have always assumed the former (each subquery runs agains the entire
> > index) but I can't find documentation to prove it and I have some odd
> > behavior (that I won't go into here yet) that suggests something else is
> > happening...
> >
> > Oh - and thanks for the ReRanking mention!  That sounds like it may be
> > quite useful!
> >
> > On Wed, Jun 22, 2016 at 12:08 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> Where are you seeing that this does anything? It wouldn't be the first
> time
> >> new functionality happened that I totally missed, but I've never seen
> that
> >> config.
> >>
> >> You might get some mileage out of ReRankingQParserPlugin though, that
> runs
> >> the top N queries from one query through another.
> >>
> >> Best,
> >> Erick
> >> On Jun 21, 2016 2:43 PM, "John Bickerstaff" <j...@johnbickerstaff.com>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I have a question about whether sub-queries in Solr requestHandlers go
> >> > against the total index or against the results of the previous query.
> >> >
> >> > Here's a simple example:
> >> >
> >> > 
> >> >
> >> >   {!edismax qf=blah, blah}
> >> >
> >> >   {!edismax qf=blah, blah}
> >> >
> >> > 
> >> >
> >> > My question is:
> >> >
> >> > What does Query2 run "against"?
> >> >   a. The entire Solr Index
> >> >   b. The results of Query1
> >> >
> >> > If this is clearly documented anywhere, I'm very interested in a link.
> >> >
> >> > Thanks
> >> >
> >>
>

Re: Solr Query Processing Detail

2016-06-22 Thread John Bickerstaff

Hi Erick -

I was trying to simplify and not waste anyone's time parsing my
requestHandler...  That is, as you imply, bogus xml.

The basic question is:  If I have two "sub queries" in a single
requestHandler, do they both run independently against the entire index?

Alternatively, is there some kind of "chain of results" whereby (a la SQL)
earlier result sets are passed to later subqueries and therefore those
later subqueries are ONLY looking at the results of earlier subqueries?

I have always assumed the former (each subquery runs agains the entire
index) but I can't find documentation to prove it and I have some odd
behavior (that I won't go into here yet) that suggests something else is
happening...

Oh - and thanks for the ReRanking mention!  That sounds like it may be
quite useful!

On Wed, Jun 22, 2016 at 12:08 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Where are you seeing that this does anything? It wouldn't be the first time
> new functionality happened that I totally missed, but I've never seen that
> config.
>
> You might get some mileage out of ReRankingQParserPlugin though, that runs
> the top N queries from one query through another.
>
> Best,
> Erick
> On Jun 21, 2016 2:43 PM, "John Bickerstaff" <j...@johnbickerstaff.com>
> wrote:
>
> > Hi all,
> >
> > I have a question about whether sub-queries in Solr requestHandlers go
> > against the total index or against the results of the previous query.
> >
> > Here's a simple example:
> >
> > 
> >
> >   {!edismax qf=blah, blah}
> >
> >   {!edismax qf=blah, blah}
> >
> > 
> >
> > My question is:
> >
> > What does Query2 run "against"?
> >   a. The entire Solr Index
> >   b. The results of Query1
> >
> > If this is clearly documented anywhere, I'm very interested in a link.
> >
> > Thanks
> >
>

Solr Query Processing Detail

2016-06-21 Thread John Bickerstaff

Hi all,

I have a question about whether sub-queries in Solr requestHandlers go
against the total index or against the results of the previous query.

Here's a simple example:



  {!edismax qf=blah, blah}

  {!edismax qf=blah, blah}



My question is:

What does Query2 run "against"?
  a. The entire Solr Index
  b. The results of Query1

If this is clearly documented anywhere, I'm very interested in a link.

Thanks

Re: [ANN] Relevant Search by Manning out! (Thanks Solr community!)

2016-06-21 Thread John Bickerstaff

Congrats!

Now you can enjoy those huge royalty payments that I'm sure are coming
in... 

Great book and it's been hugely helpful to me.

--JohnB

On Tue, Jun 21, 2016 at 12:12 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Not much more to add than my post here! This book is targeted towards
> Lucene-based search (Elasticsearch and Solr) relevance.
>
> Announcement with discount code:
> http://opensourceconnections.com/blog/2016/06/21/relevant-search-published/
>
> Related hacker news thread:
> https://news.ycombinator.com/item?id=11946636
>
> Thanks to everyone in the Solr community that was helpful to my efforts.
> Specifically Trey Grainger, Eric Pugh (for keeping me employed), Charlie
> Hull and the Flax team, Alex Rafalovitch, Timothy Potter, Yonik Seeley,
> Grant Ingersoll (for basically teaching me Solr back in the day), Drew
> Farris (for encouraging my early blogging), everyone at OSC, and many
> others I'm probably forgetting!
>
> Best
> -Doug
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-17 Thread John Bickerstaff

OK - Slapping forehead now... D'oh!

1.2 wrote:

> Hi all -
>
> I've successfully run the hon-lucene-synonyms plugin from the Admin
> console by adding the following to the Raw Query Parameters field...
>
>
> =text=synonym_edismax=true=1.2=1.1
>
> I got those from the Read Me on the github account.
>
> Now I'm trying to make this work via a requestHandler in solrconfig.xml.
>
> I think the following should work, but it just hangs if I add the last
> line referencing synonyms.originalBoost
>
> 
> 
>  
>explicit
>10
>synonym_edismax
>text
>true
>1.2 --> If I add this
> line, the admin console just hangs when I hit /test1
>  
>  
>
> If I do NOT add the last line and only have the line that sets
> synonyms=true, it appears to work fine.
>
> I see the dot notation all over the sample entries in solrconfig.xml...
> Am I missing something here?
>
> Essentially, how do I get these variables set correctly from inside a
> requestHandler configured in the solrconfig.xml file?
>
> On Tue, Jun 7, 2016 at 11:47 AM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
>
>> MaryJo you might want to start a new thread, I think we kinda hijacked
>> this
>> one. Also if you are interested in tuning queries check out
>> http://splainer.io/ and https://www.quepid.com which are interactive
>> tools
>> (both of which my company makes) to tune for search relevancy.
>>
>> On Tue, Jun 7, 2016 at 1:45 PM, MaryJo Sminkey 
>> wrote:
>>
>> > I'm really thinking this just might not be the right tool for us, what
>> we
>> > really need is a solution that works like the normal synonym filter
>> does,
>> > just with proper multi-term support, so I can apply the synonyms only on
>> > certain fields (copied fields) that have their own, lower boost
>> settings.
>> > The way this plugin works across the entire query just seems too
>> > problematic when you need to do complex queries with lots of different
>> > boost settings to get good relevancy. Anyone used a different method of
>> > handling multi-term synonyms that isn't as global?
>> >
>> > Mary Jo
>> >
>> >
>> >
>> > On Tue, Jun 7, 2016 at 1:31 PM, MaryJo Sminkey 
>> > wrote:
>> >
>> > > Here's the issue I am still having with getting the right search
>> > relevancy
>> > > with the synonym plugin in place. We typically have users searching on
>> > > multiple terms, and we want matches across multiple terms,
>> particularly
>> > > those that appears as phrases, to appear higher than matches for the
>> same
>> > > term multiple times. The synonym filter makes this complicated since
>> we
>> > may
>> > > have cases where the term the user enters, like "sbc", maps to a
>> > multi-term
>> > > synonym like "small block", and we always want the matches for the
>> > original
>> > > term to pop up first, so I'm trying to make sure the original boost is
>> > high
>> > > enough to override a phrase boost that the multi-term synonym would
>> give.
>> > > Unfortunately this then means matches on the same term multiple times
>> get
>> > > pushed up over my phrase matches...those aren't going to be the most
>> > > relevant matches. Not sure there's a way to solve this successfully,
>> > > without a completely different approach to the synonyms... or not
>> > counting
>> > > the number of matches on terms (I assume you can drop that ability,
>> > > although that's not ideal either...just better than what I have now).
>> > >
>> > > MJ
>> > >
>> > >
>> > >
>> > > Sent with MailTrack
>> > > <
>> >
>> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
>> > >
>> > >
>> > > On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey 
>> > > wrote:
>> > >
>> > >>
>> > >> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
>> > >> jlaw...@opensourceconnections.com> wrote:
>> > >>
>> > >>>
>> > >>> We were thinking, as you experimented with, that the 0.5 and 2.0
>> boosts
>> > >>> were no match for the product name and keyword field boosts so that
>> > would
>> > >>> influence your search as well.
>> > >>
>> > >>
>> > >>
>> > >> Yeah I definitely will have to play with the values a bit as we want
>> the
>> > >> product name matches to always appear highest, whether original or
>> > >> synonyms, but I'll have to figure out how to get that result without
>> one
>> > >> word terms that have multi word synonyms getting overly boosted for a
>> > >> phrase match while still sufficiently boosting the normal phrase
>> > match
>> > >> stuff too. With the normal synonym filter I was able to just copy
>> fields
>> > >> that could have synonyms to a new field (which would be the only one
>> > with
>> > >> the synonym filter), and use a different, lower boost on those
>> fields,
>> > but
>> > >> that won't work with this plugin which applies across everything in
>> the
>> > >> query. Makes it a bit more complicated to get everything just right.
>> > >>
>> > >> MJ
>> > >>
>> > >>
>> > >> Sent

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-17 Thread John Bickerstaff

Hi all -

I've successfully run the hon-lucene-synonyms plugin from the Admin console
by adding the following to the Raw Query Parameters field...

=text=synonym_edismax=true=1.2=1.1

I got those from the Read Me on the github account.

Now I'm trying to make this work via a requestHandler in solrconfig.xml.

I think the following should work, but it just hangs if I add the last line
referencing synonyms.originalBoost



 
   explicit
   10
   synonym_edismax
   text
   true
   1.2 --> If I add this line,
the admin console just hangs when I hit /test1
 
 

If I do NOT add the last line and only have the line that sets
synonyms=true, it appears to work fine.

I see the dot notation all over the sample entries in solrconfig.xml...  Am
I missing something here?

Essentially, how do I get these variables set correctly from inside a
requestHandler configured in the solrconfig.xml file?

On Tue, Jun 7, 2016 at 11:47 AM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> MaryJo you might want to start a new thread, I think we kinda hijacked this
> one. Also if you are interested in tuning queries check out
> http://splainer.io/ and https://www.quepid.com which are interactive tools
> (both of which my company makes) to tune for search relevancy.
>
> On Tue, Jun 7, 2016 at 1:45 PM, MaryJo Sminkey 
> wrote:
>
> > I'm really thinking this just might not be the right tool for us, what we
> > really need is a solution that works like the normal synonym filter does,
> > just with proper multi-term support, so I can apply the synonyms only on
> > certain fields (copied fields) that have their own, lower boost settings.
> > The way this plugin works across the entire query just seems too
> > problematic when you need to do complex queries with lots of different
> > boost settings to get good relevancy. Anyone used a different method of
> > handling multi-term synonyms that isn't as global?
> >
> > Mary Jo
> >
> >
> >
> > On Tue, Jun 7, 2016 at 1:31 PM, MaryJo Sminkey 
> > wrote:
> >
> > > Here's the issue I am still having with getting the right search
> > relevancy
> > > with the synonym plugin in place. We typically have users searching on
> > > multiple terms, and we want matches across multiple terms, particularly
> > > those that appears as phrases, to appear higher than matches for the
> same
> > > term multiple times. The synonym filter makes this complicated since we
> > may
> > > have cases where the term the user enters, like "sbc", maps to a
> > multi-term
> > > synonym like "small block", and we always want the matches for the
> > original
> > > term to pop up first, so I'm trying to make sure the original boost is
> > high
> > > enough to override a phrase boost that the multi-term synonym would
> give.
> > > Unfortunately this then means matches on the same term multiple times
> get
> > > pushed up over my phrase matches...those aren't going to be the most
> > > relevant matches. Not sure there's a way to solve this successfully,
> > > without a completely different approach to the synonyms... or not
> > counting
> > > the number of matches on terms (I assume you can drop that ability,
> > > although that's not ideal either...just better than what I have now).
> > >
> > > MJ
> > >
> > >
> > >
> > > Sent with MailTrack
> > > <
> >
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> > >
> > >
> > > On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey 
> > > wrote:
> > >
> > >>
> > >> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
> > >> jlaw...@opensourceconnections.com> wrote:
> > >>
> > >>>
> > >>> We were thinking, as you experimented with, that the 0.5 and 2.0
> boosts
> > >>> were no match for the product name and keyword field boosts so that
> > would
> > >>> influence your search as well.
> > >>
> > >>
> > >>
> > >> Yeah I definitely will have to play with the values a bit as we want
> the
> > >> product name matches to always appear highest, whether original or
> > >> synonyms, but I'll have to figure out how to get that result without
> one
> > >> word terms that have multi word synonyms getting overly boosted for a
> > >> phrase match while still sufficiently boosting the normal phrase
> > match
> > >> stuff too. With the normal synonym filter I was able to just copy
> fields
> > >> that could have synonyms to a new field (which would be the only one
> > with
> > >> the synonym filter), and use a different, lower boost on those fields,
> > but
> > >> that won't work with this plugin which applies across everything in
> the
> > >> query. Makes it a bit more complicated to get everything just right.
> > >>
> > >> MJ
> > >>
> > >>
> > >> Sent with MailTrack
> > >> <
> >
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> > >
> > >>
> > >
> > >
> >
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-05 Thread John Bickerstaff

Yes, query parameters/modifications mentioned in the readme.  Beyond those
I don't have useful advice at this point
On Jun 4, 2016 10:56 PM, "MaryJo Sminkey" <mjsmin...@gmail.com> wrote:

> On Sat, Jun 4, 2016 at 11:47 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > MaryJo - I'm on vacation but can't resist... iirc there are some very
> > useful query modifications suggested in the readme on the github for the
> > plugin... can't access right now.
> >
>
>
> I'm assuming you mean the various query parameters. The only ones I see in
> there that would be of use for me are the ones I'm already using. As far as
> can tell from their description.
>
> MJ
>
>
> Sent with MailTrack
> <
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> >
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-04 Thread John Bickerstaff

MaryJo - I'm on vacation but can't resist... iirc there are some very
useful query modifications suggested in the readme on the github for the
plugin... can't access right now.

You may know about them already, but if it's been a while since you looked,
those may help...
On Jun 3, 2016 12:28 PM, "MaryJo Sminkey"  wrote:

On some additional tests, it looks like it's the phrase matching in
particular that is the issue, if I take that out I do seem to be getting
better results. I definitely don't want to get rid of those so need to find
a way to make them work together.



Sent with MailTrack
<
https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
>

On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey  wrote:

> Okay so big thanks for the help with getting the hon_lucene_synonyms
> plugin working. That is a big load off to finally have a solution in place
> for all our multi-term synonyms. We did find that the information in Step
8
> about the plugin showing "SynonymExpandingExtendedDismaxQParser" for
> QParser does not seem to be correct, we only ever get
> "ExtendedDismaxQParser" but the synonym expansion is definitely working.
>
> In implementing it though, the one thing I'm still having an issue with is
> trying to figure out how I can get results on the original term to appear
> first in our results and matches on the synonyms lower in the results. The
> plugin includes settings for an originalboost and synonymboost, but that
> doesn't seem to be working along with all the other edismax boosts I'm
> doing. We search across a number of fields, each with their own boost and
> then do phrase searches with boosts as well. My params look like this:
>
> params["defType"] = 'synonym_edismax';
> params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
> prodnumbertext^20.0';
> params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["ps"] = 1;
> params["tie"] = 0.1;
> params["synonyms"] = true;
> params["synonyms.originalBoost"] = 2.0;
> params["synonyms.synonymBoost"] = 0.5;
>
> And here's an example of what the plugin gives me for a search on "sbc"
> which includes synonyms for "sb" and "small block" I don't really know
> enough about this to figure out what exactly it's doing but since all of
> the results I am getting first are ones with "small block" in the name,
and
> the ones with "sbc" in the prodname field which should be first are buried
> about 1000 documents in, I know the originalboost and synonymboost aren't
> working with all this other stuff. Ideas how to fix this? With the normal
> synonym filter we just set up copies of the fields that could have
synonyms
> to use with that filter applied and had a lower boost on those. Not sure
> how to make it work with this custom query parser though.
>
> +((prodname:sbc^10.0 | body:sbc^0.5 | productinfo:sbc | keywords:sbc^2.0 |
> (((prodnumbertext:sbc prodnumbertext:small prodnumbertext:sb)
> prodnumbertext:block)~2)^20.0)~0.1^2.0 (((+(prodname:sb^10.0 | body:sb^0.5
> | productinfo:sb | keywords:sb^2.0 | (((prodnumbertext:sb
> prodnumbertext:small prodnumbertext:sbc)
prodnumbertext:block)~2)^20.0)~0.1
> ()))^0.5) (((+(((prodname:small^10.0 | body:small^0.5 | productinfo:small
|
> keywords:small^2.0 | prodnumbertext:small^20.0)~0.1 (prodname:block^10.0 |
> body:block^0.5 | productinfo:block | keywords:block^2.0 |
> prodnumbertext:block^20.0)~0.1)~2) (productinfo:"small block"~1 |
> body:"small block"~1^5.0 | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1 (productinfo:"small block"~1 | body:"small block"~1^5.0
> | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1))^0.5)) ()
>
>
> Mary Jo
>
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff

Yes, I get that, thanks.
On Jun 1, 2016 6:38 PM, "Joe Lawson" <jlaw...@opensourceconnections.com>
wrote:

> 2.0 is compiled with Solr 5 and Java 7. It uses the namespace
> solr.SynonymExpandingExtendedDismaxQParserPlugin
>
> 5.0.4 is compiled with Solr 6 and Java 8 and is the first release that made
> it to maven central. It uses the namespace
> com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin
>
> The features are the same for all versions.
>
> Hope this clears things up.
>
> -Joe
> On Jun 1, 2016 8:11 PM, "John Bickerstaff" <j...@johnbickerstaff.com>
> wrote:
>
> > Just to be clear, I got version 2.0 of the jar from github...  should I
> be
> > look for something in a maven repository?  A bit confused at this point
> > given all the version numbers...
> >
> > I want the latest and greatest unless there's any special
> considerations..
> >
> > Thanks for the assistance!
> > On Jun 1, 2016 5:46 PM, "MaryJo Sminkey" <mjsmin...@gmail.com> wrote:
> >
> > Yup that was the issue for us as well. It doesn't seem to be throwing the
> > class error now, although I have not been able to successfully get back
> > results that seem to be using it, it's showing up as the deftype in my
> > params but the QParser in my debug is the normal edismax one. I will have
> > to play around with my config some more tomorrow and try to figure out
> what
> > we're doing wrong.
> >
> > MJ
> >
> >
> >
> > On Wed, Jun 1, 2016 at 6:38 PM, Joe Lawson <
> > jlaw...@opensourceconnections.com> wrote:
> >
> > > Nothing up until 5.0.4 was distributed on maven central. 5.0 -> 5.0.4
> was
> > > just a bunch of clean up to get it ready for maven (including the
> > namespace
> > > change).
> > >
> > > Being that nearly all docs and articles talking about the plugin
> > reference
> > > the old 2.0 one could reasonably get confused as to what config to use
> > esp
> > > when I linked the latest 5.0.4 test config prior.
> > >
> > > You can get the older jars from the links off the readme.md.
> > > On Jun 1, 2016 6:14 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:
> > >
> > > On 6/1/2016 1:10 PM, John Bickerstaff wrote:
> > > > @Joe:
> > > >
> > > > Is it possible that the jar's package name does not match the entry
> in
> > > the
> > > > sample solrconfig.xml file?
> > > >
> > > > The solrconfig.xml example file in the test directory contains the
> > > > following package name:
> > > >  > > >
> > >
> > >
> >
> >
> class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
> > > >
> > > > However, the jar file (when unzipped) has the following directory
> > > structure
> > > > down to the same class name:
> > > >
> > > > org --> apache --> solr --> search
> > > >
> > > > I just tried with the name change to the org.apache package name
> in
> > > the
> > > > solrconfig.xml file and got no errors.
> > >
> > > Looks like the package name is indeed the problem here.
> > >
> > > They changed the package name from org.apache.solr.search to
> > > com.github.healthonnet.search in the LATEST source code release --
> > > 5.0.4.  The code in the 5.0.3 version (and the 2.0.0 version indicated
> > > in the earlier message) uses org.apache.solr.search.
> > >
> > > I cannot find any files in the 2.0.0 zipfile download that contain the
> > > new package name, so I'm curious where the incorrect information on how
> > > to configure Solr to use the plugin was found.  I did not check the
> > > tarball download.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff

Just to be clear, I got version 2.0 of the jar from github...  should I be
look for something in a maven repository?  A bit confused at this point
given all the version numbers...

I want the latest and greatest unless there's any special considerations..

Thanks for the assistance!
On Jun 1, 2016 5:46 PM, "MaryJo Sminkey" <mjsmin...@gmail.com> wrote:

Yup that was the issue for us as well. It doesn't seem to be throwing the
class error now, although I have not been able to successfully get back
results that seem to be using it, it's showing up as the deftype in my
params but the QParser in my debug is the normal edismax one. I will have
to play around with my config some more tomorrow and try to figure out what
we're doing wrong.

MJ



On Wed, Jun 1, 2016 at 6:38 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> Nothing up until 5.0.4 was distributed on maven central. 5.0 -> 5.0.4 was
> just a bunch of clean up to get it ready for maven (including the
namespace
> change).
>
> Being that nearly all docs and articles talking about the plugin reference
> the old 2.0 one could reasonably get confused as to what config to use esp
> when I linked the latest 5.0.4 test config prior.
>
> You can get the older jars from the links off the readme.md.
> On Jun 1, 2016 6:14 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:
>
> On 6/1/2016 1:10 PM, John Bickerstaff wrote:
> > @Joe:
> >
> > Is it possible that the jar's package name does not match the entry in
> the
> > sample solrconfig.xml file?
> >
> > The solrconfig.xml example file in the test directory contains the
> > following package name:
> >  >
>
>
class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
> >
> > However, the jar file (when unzipped) has the following directory
> structure
> > down to the same class name:
> >
> > org --> apache --> solr --> search
> >
> > I just tried with the name change to the org.apache package name in
> the
> > solrconfig.xml file and got no errors.
>
> Looks like the package name is indeed the problem here.
>
> They changed the package name from org.apache.solr.search to
> com.github.healthonnet.search in the LATEST source code release --
> 5.0.4.  The code in the 5.0.3 version (and the 2.0.0 version indicated
> in the earlier message) uses org.apache.solr.search.
>
> I cannot find any files in the 2.0.0 zipfile download that contain the
> new package name, so I'm curious where the incorrect information on how
> to configure Solr to use the plugin was found.  I did not check the
> tarball download.
>
> Thanks,
> Shawn
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff

@Joe:

Is it possible that the jar's package name does not match the entry in the
sample solrconfig.xml file?

The solrconfig.xml example file in the test directory contains the
following package name:


However, the jar file (when unzipped) has the following directory structure
down to the same class name:

org --> apache --> solr --> search

I just tried with the name change to the org.apache package name in the
solrconfig.xml file and got no errors.

I haven't yet tried to see synonym "stuff" in the debug for a query, but
I'm betting it's much ado about nothing - just the package name has
changed...

If that makes sense to you, you may want to edit the example file...

Thanks a lot for all the work you contributed to this by the way!

--JohnB

@ MaryJo - this may be the problem in your situation for this specific file
-- good luck!

I put it in $SOLR_HOME/lib  - which, taking the default "for production"
install script on Ubuntu resolved to /var/solr/data/lib

Good luck!

On Wed, Jun 1, 2016 at 12:49 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> I tried this - it didn't fail.  I don't know if it really started in
> Denable.runtime.lib=true mode or not:
>
> service solr start -Denable.runtime.lib=true
>
> Of course, I'd still really rather be able to just drop jars into
> /var/solr/data/lib and have them work...
>
> Thanks all.
>
> On Wed, Jun 1, 2016 at 12:42 PM, John Bickerstaff <
> j...@johnbickerstaff.com> wrote:
>
>> So - the instructions on using the Blob Store API say to use the
>> Denable.runtime.lib=true option when starting Solr.
>>
>> Thing is, I've installed per the "for production" instructions which
>> gives me an entry in /etc/init.d called solr.
>>
>> Two questions.
>>
>> To test this can I still use the start.jar in /opt/solr/server as long as
>> I issue the "cloud mode" flag or does that no longer work in 5.x?
>>
>> Do I instead have to modify that start script in /etc/init.d ?
>>
>> On Wed, Jun 1, 2016 at 10:42 AM, John Bickerstaff <
>> j...@johnbickerstaff.com> wrote:
>>
>>> Ahhh - gotcha.
>>>
>>> Well, not sure why it's not picked up - seems lots of other jars are...
>>> Maybe Joe will comment...
>>>
>>> On Wed, Jun 1, 2016 at 10:22 AM, MaryJo Sminkey <mjsmin...@gmail.com>
>>> wrote:
>>>
>>>> That refers to running Solr in cloud mode. We aren't there yet.
>>>>
>>>> MJ
>>>>
>>>>
>>>>
>>>> On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff <
>>>> j...@johnbickerstaff.com>
>>>> wrote:
>>>>
>>>> > Hi Mary Jo,
>>>> >
>>>> > I'll point you to Joe's earlier comment about needing to use the Blob
>>>> Store
>>>> > API...  He put a link in his response.
>>>> >
>>>> > I'm about to try that today...  Given that Joe is a contributor to
>>>> > hon_lucene there's a good chance his experience is correct here -
>>>> > especially given the evidence you just provided...
>>>> >
>>>> > Here's a copy - paste for your convenience.  It's a bit convoluted,
>>>> > although I totally get how this kind of approach is great for large
>>>> Solr
>>>> > Cloud installations that have machines or VMs coming up and going
>>>> down as
>>>> > part of a services-based approach...
>>>> >
>>>> > Joe said:
>>>> > The docs are out of date for the synonym_edismax but it does work.
>>>> Check
>>>> > out the tests for working examples. I'll try to update it soon. I've
>>>> run
>>>> > the plugin on Solr 5 and 6, solrcloud and standalone. For running in
>>>> > SolrCloud make sure you follow
>>>> >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
>>>> >
>>>> > On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey <mjsmin...@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > So we still can't get this to work, here's the latest update my
>>>> server
>>>> > guy
>>>> > > gave me: It seems to not matter where the file is located, it does
>>>> not
>>>> > > load. Yet, the the Solr Java class path shows the file has loaded.
>>>> Only
>>>> > > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in
>>&g

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff

I tried this - it didn't fail.  I don't know if it really started in
Denable.runtime.lib=true mode or not:

service solr start -Denable.runtime.lib=true

Of course, I'd still really rather be able to just drop jars into
/var/solr/data/lib and have them work...

Thanks all.

On Wed, Jun 1, 2016 at 12:42 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> So - the instructions on using the Blob Store API say to use the
> Denable.runtime.lib=true option when starting Solr.
>
> Thing is, I've installed per the "for production" instructions which gives
> me an entry in /etc/init.d called solr.
>
> Two questions.
>
> To test this can I still use the start.jar in /opt/solr/server as long as
> I issue the "cloud mode" flag or does that no longer work in 5.x?
>
> Do I instead have to modify that start script in /etc/init.d ?
>
> On Wed, Jun 1, 2016 at 10:42 AM, John Bickerstaff <
> j...@johnbickerstaff.com> wrote:
>
>> Ahhh - gotcha.
>>
>> Well, not sure why it's not picked up - seems lots of other jars are...
>> Maybe Joe will comment...
>>
>> On Wed, Jun 1, 2016 at 10:22 AM, MaryJo Sminkey <mjsmin...@gmail.com>
>> wrote:
>>
>>> That refers to running Solr in cloud mode. We aren't there yet.
>>>
>>> MJ
>>>
>>>
>>>
>>> On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff <
>>> j...@johnbickerstaff.com>
>>> wrote:
>>>
>>> > Hi Mary Jo,
>>> >
>>> > I'll point you to Joe's earlier comment about needing to use the Blob
>>> Store
>>> > API...  He put a link in his response.
>>> >
>>> > I'm about to try that today...  Given that Joe is a contributor to
>>> > hon_lucene there's a good chance his experience is correct here -
>>> > especially given the evidence you just provided...
>>> >
>>> > Here's a copy - paste for your convenience.  It's a bit convoluted,
>>> > although I totally get how this kind of approach is great for large
>>> Solr
>>> > Cloud installations that have machines or VMs coming up and going down
>>> as
>>> > part of a services-based approach...
>>> >
>>> > Joe said:
>>> > The docs are out of date for the synonym_edismax but it does work.
>>> Check
>>> > out the tests for working examples. I'll try to update it soon. I've
>>> run
>>> > the plugin on Solr 5 and 6, solrcloud and standalone. For running in
>>> > SolrCloud make sure you follow
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
>>> >
>>> > On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey <mjsmin...@gmail.com>
>>> > wrote:
>>> >
>>> > > So we still can't get this to work, here's the latest update my
>>> server
>>> > guy
>>> > > gave me: It seems to not matter where the file is located, it does
>>> not
>>> > > load. Yet, the the Solr Java class path shows the file has loaded.
>>> Only
>>> > > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in
>>> that
>>> > it
>>> > > loads in the java class path.  I've yet to find out what the error
>>> is.
>>> > All
>>> > > I can see is this "Error loading class". Okay, but why? What error
>>> was
>>> > > encountered in trying to load the class?  I can't find any of this
>>> > > information. I'm trying to work with the documentation that is
>>> located
>>> > here
>>> > > http://wiki.apache.org/solr/SolrPlugins
>>> > >
>>> > > I found that the jar file was put into each of these locations in an
>>> > > attempt to find a place where it will load without error.
>>> > >
>>> > > find .|grep hon-lucene
>>> > >
>>> > > ./server/lib/hon-lucene-synonyms-2.0.0.jar
>>> > >
>>> > > ./server/solr/plugins/hon-lucene-synonyms-2.0.0.jar
>>> > >
>>> > > ./server/solr/classic_newdb/lib/hon-lucene-synonyms-2.0.0.jar
>>> > >
>>> > > ./server/solr/classic_search/lib/hon-lucene-synonyms-2.0.0.jar
>>> > >
>>> > > ./server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
>>> > >
>>> > >  The config specifies that files in certain paths can be loaded as
>&g

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff

Thanks Jeff.  I've installed "out of the box" with 5.4 and didn't make any
modifications on Ubuntu - so I'm not sure why it wouldn't get picked up,
but I'll keep chipping away at it...

I appreciate the new one to try.  That's a good test.

On Wed, Jun 1, 2016 at 12:45 PM, Jeff Wartes <jwar...@whitepages.com> wrote:

> In the interests of the specific questions to me:
>
> I’m using 5.4, solrcloud.
> I’ve never used the blob store thing, didn’t even know it existed before
> this thread.
>
> I’m uncertain how not finding the class could be specific to hon, it
> really feels like a general solr config issue, but you could try some other
> foreign jar and see if that works.
> Here’s one I use: https://github.com/whitepages/SOLR-4449 (although this
> one is also why I use WEB-INF/lib, because it overrides a protected method,
> so it might not be the greatest example)
>
>
> On 5/31/16, 4:02 PM, "John Bickerstaff" <j...@johnbickerstaff.com> wrote:
>
> >Thanks Jeff,
> >
> >I believe I tried that, and it still refused to load..  But I'd sure love
> >it to work since the other process is a bit convoluted - although I see
> >it's value in a large Solr installation.
> >
> >When I "locate" the jar on the linux command line I get:
> >
>
> >/opt/solr-5.4.0/server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
> >
> >But the log file is still carrying class not found exceptions when I
> >restart...
> >
> >Are you in "Cloud" mode?  What version of Solr are you using?
> >
> >On Tue, May 31, 2016 at 4:08 PM, Jeff Wartes <jwar...@whitepages.com>
> wrote:
> >
> >> I’ve generally been dropping foreign plugin jars in this dir:
> >> server/solr-webapp/webapp/WEB-INF/lib/
> >> This is because it then gets loaded by the same classloader as Solr
> >> itself, which can be useful if you’re, say, overriding some
> >> solr-protected-space method.
> >>
> >> If you don’t care about the classloader, I believe you can use whatever
> >> dir you want, with the appropriate bit of solrconfig.xml to load it.
> >> Something like:
> >> 
> >>
> >>
> >> On 5/31/16, 2:13 PM, "John Bickerstaff" <j...@johnbickerstaff.com>
> wrote:
> >>
> >> >All --
> >> >
> >> >I'm now attempting to use the hon_lucene_synonyms project from github.
> >> >
> >> >I found the documents that were infered by the dead links on the
> readme in
> >> >the repository -- however, given that I'm using Solr 5.4.x, I no longer
> >> >have the need to integrate into a war file (as far as I can see).
> >> >
> >> >The suggestion on the readme is that I can drop the hon_lucene_synonyms
> >> jar
> >> >file into the $SOLR_HOME directory, but this does not seem to be
> working -
> >> >I'm getting class not found exceptions.
> >> >
> >> >Does anyone on this list have direct experience with getting this
> plugin
> >> to
> >> >work in Solr 5.x?
> >> >
> >> >Thanks in advance...
> >> >
> >> >On Mon, May 30, 2016 at 6:57 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> >> wrote:
> >> >
> >> >> It's been awhile since I installed it so I really can't say. I'm more
> >> of a
> >> >> code monkey than a server gal (particularly Linux... I'm amazed I got
> >> Solr
> >> >> installed in the first place, LOL!) So I had asked our network guy to
> >> look
> >> >> it over recently and see if it looked like I did it okay. He said
> since
> >> it
> >> >> shows up in the list of jars in the Solr admin that it's
> installed
> >> if
> >> >> that's not necessarily true, I probably need to point him in the
> right
> >> >> direction for what else to do since he really doesn't know Solr well
> >> >> either.
> >> >>
> >> >> Mary Jo
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Mon, May 30, 2016 at 7:49 PM, John Bickerstaff <
> >> >> j...@johnbickerstaff.com>
> >> >> wrote:
> >> >>
> >> >> > Thanks for the comment Mary Jo...
> >> >> >
> >> >> > The error loading the class rings a bell - did you find and follow
> >> >> > instructions for adding that to the WAR file?  I vaguely remember
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff

So - the instructions on using the Blob Store API say to use the
Denable.runtime.lib=true option when starting Solr.

Thing is, I've installed per the "for production" instructions which gives
me an entry in /etc/init.d called solr.

Two questions.

To test this can I still use the start.jar in /opt/solr/server as long as I
issue the "cloud mode" flag or does that no longer work in 5.x?

Do I instead have to modify that start script in /etc/init.d ?

On Wed, Jun 1, 2016 at 10:42 AM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Ahhh - gotcha.
>
> Well, not sure why it's not picked up - seems lots of other jars are...
> Maybe Joe will comment...
>
> On Wed, Jun 1, 2016 at 10:22 AM, MaryJo Sminkey <mjsmin...@gmail.com>
> wrote:
>
>> That refers to running Solr in cloud mode. We aren't there yet.
>>
>> MJ
>>
>>
>>
>> On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff <
>> j...@johnbickerstaff.com>
>> wrote:
>>
>> > Hi Mary Jo,
>> >
>> > I'll point you to Joe's earlier comment about needing to use the Blob
>> Store
>> > API...  He put a link in his response.
>> >
>> > I'm about to try that today...  Given that Joe is a contributor to
>> > hon_lucene there's a good chance his experience is correct here -
>> > especially given the evidence you just provided...
>> >
>> > Here's a copy - paste for your convenience.  It's a bit convoluted,
>> > although I totally get how this kind of approach is great for large Solr
>> > Cloud installations that have machines or VMs coming up and going down
>> as
>> > part of a services-based approach...
>> >
>> > Joe said:
>> > The docs are out of date for the synonym_edismax but it does work. Check
>> > out the tests for working examples. I'll try to update it soon. I've run
>> > the plugin on Solr 5 and 6, solrcloud and standalone. For running in
>> > SolrCloud make sure you follow
>> >
>> >
>> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
>> >
>> > On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey <mjsmin...@gmail.com>
>> > wrote:
>> >
>> > > So we still can't get this to work, here's the latest update my server
>> > guy
>> > > gave me: It seems to not matter where the file is located, it does not
>> > > load. Yet, the the Solr Java class path shows the file has loaded.
>> Only
>> > > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in
>> that
>> > it
>> > > loads in the java class path.  I've yet to find out what the error is.
>> > All
>> > > I can see is this "Error loading class". Okay, but why? What error was
>> > > encountered in trying to load the class?  I can't find any of this
>> > > information. I'm trying to work with the documentation that is located
>> > here
>> > > http://wiki.apache.org/solr/SolrPlugins
>> > >
>> > > I found that the jar file was put into each of these locations in an
>> > > attempt to find a place where it will load without error.
>> > >
>> > > find .|grep hon-lucene
>> > >
>> > > ./server/lib/hon-lucene-synonyms-2.0.0.jar
>> > >
>> > > ./server/solr/plugins/hon-lucene-synonyms-2.0.0.jar
>> > >
>> > > ./server/solr/classic_newdb/lib/hon-lucene-synonyms-2.0.0.jar
>> > >
>> > > ./server/solr/classic_search/lib/hon-lucene-synonyms-2.0.0.jar
>> > >
>> > > ./server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
>> > >
>> > >  The config specifies that files in certain paths can be loaded as
>> > plugins
>> > > or I can specify a path. Following the instructions I added this path
>> > >
>> > >   > > > dir="${solr.install.dir:../../../..}/contrib/hon-lucene-synonyms/lib"
>> > > regex=".*\.jar" />
>> > >
>> > > And I put the jar file in that location.  This did not work either. I
>> > also
>> > > tried using an absolute path like this.
>> > >
>> > > > > >
>> > >
>> >
>> dir="/opt/solr/contrib/hon-lucene-synonyms/lib/hon-lucene-synonyms-2.0.0.jar"
>> > > />
>> > >
>> > > This did not work.
>> > >
>> > >
>> > >
>&g

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff

Ahhh - gotcha.

Well, not sure why it's not picked up - seems lots of other jars are...
Maybe Joe will comment...

On Wed, Jun 1, 2016 at 10:22 AM, MaryJo Sminkey <mjsmin...@gmail.com> wrote:

> That refers to running Solr in cloud mode. We aren't there yet.
>
> MJ
>
>
>
> On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Hi Mary Jo,
> >
> > I'll point you to Joe's earlier comment about needing to use the Blob
> Store
> > API...  He put a link in his response.
> >
> > I'm about to try that today...  Given that Joe is a contributor to
> > hon_lucene there's a good chance his experience is correct here -
> > especially given the evidence you just provided...
> >
> > Here's a copy - paste for your convenience.  It's a bit convoluted,
> > although I totally get how this kind of approach is great for large Solr
> > Cloud installations that have machines or VMs coming up and going down as
> > part of a services-based approach...
> >
> > Joe said:
> > The docs are out of date for the synonym_edismax but it does work. Check
> > out the tests for working examples. I'll try to update it soon. I've run
> > the plugin on Solr 5 and 6, solrcloud and standalone. For running in
> > SolrCloud make sure you follow
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
> >
> > On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey <mjsmin...@gmail.com>
> > wrote:
> >
> > > So we still can't get this to work, here's the latest update my server
> > guy
> > > gave me: It seems to not matter where the file is located, it does not
> > > load. Yet, the the Solr Java class path shows the file has loaded.
> Only
> > > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in
> that
> > it
> > > loads in the java class path.  I've yet to find out what the error is.
> > All
> > > I can see is this "Error loading class". Okay, but why? What error was
> > > encountered in trying to load the class?  I can't find any of this
> > > information. I'm trying to work with the documentation that is located
> > here
> > > http://wiki.apache.org/solr/SolrPlugins
> > >
> > > I found that the jar file was put into each of these locations in an
> > > attempt to find a place where it will load without error.
> > >
> > > find .|grep hon-lucene
> > >
> > > ./server/lib/hon-lucene-synonyms-2.0.0.jar
> > >
> > > ./server/solr/plugins/hon-lucene-synonyms-2.0.0.jar
> > >
> > > ./server/solr/classic_newdb/lib/hon-lucene-synonyms-2.0.0.jar
> > >
> > > ./server/solr/classic_search/lib/hon-lucene-synonyms-2.0.0.jar
> > >
> > > ./server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
> > >
> > >  The config specifies that files in certain paths can be loaded as
> > plugins
> > > or I can specify a path. Following the instructions I added this path
> > >
> > >> > dir="${solr.install.dir:../../../..}/contrib/hon-lucene-synonyms/lib"
> > > regex=".*\.jar" />
> > >
> > > And I put the jar file in that location.  This did not work either. I
> > also
> > > tried using an absolute path like this.
> > >
> > >  > >
> > >
> >
> dir="/opt/solr/contrib/hon-lucene-synonyms/lib/hon-lucene-synonyms-2.0.0.jar"
> > > />
> > >
> > > This did not work.
> > >
> > >
> > >
> > > I'm starting to think this isn't a configuration problem, but a
> > > compatibility problem. I have not seen anything from the maker of this
> > > plugin that it works on the exact version of Solr we are using.
> > >
> > >
> > >
> > >
> > >
> > > The best info I have found so far in the logs is this stack trace of
> the
> > > error. It still does not say why it failed to load.
> > >
> > > 2016-06-01 00:22:13.470 ERROR (qtp2096057945-14) [   ]
> > o.a.s.s.HttpSolrCall
> > > null:org.apache.solr.common.SolrException: SolrCore 'classic_search' is
> > not
> > > available due to init failure: Error loading class
> > > 'com.github.healthonnet.search.Syno
> > >
> > > nymExpandingExtendedDismaxQParserPlugin'
> > >
> > > at
> > > org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:993)
> >

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff

etty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>
> at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>
> at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>
> at org.eclipse.jetty.server.Server.handle(Server.java:499)
>
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: org.apache.solr.common.SolrException: Error loading class
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
>
> at org.apache.solr.core.SolrCore.(SolrCore.java:824)
>
> at org.apache.solr.core.SolrCore.(SolrCore.java:665)
>
> at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
>
> at
> org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:462)
>
> at
> org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:453)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at
>
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)
>
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
> ... 1 more
>
> Caused by: org.apache.solr.common.SolrException: Error loading class
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
>
> at
>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:559)
>
> at
>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:490)
>
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:573)
>
> at org.apache.solr.core.PluginBag.createPlugin(PluginBag.java:123)
>
> at org.apache.solr.core.PluginBag.init(PluginBag.java:223)
>
> at org.apache.solr.core.PluginBag.init(PluginBag.java:212)
>
> at org.apache.solr.core.SolrCore.(SolrCore.java:768)
>
> ... 9 more
>
> Caused by: java.lang.ClassNotFoundException:
> com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin
>
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>
> at
> java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>
> at java.lang.Class.forName0(Native Method)
>
> at java.lang.Class.forName(Class.java:348)
>
> at
>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:543)
>
> ... 15 more
>
> So we're giving up on this one, unless someone knows for sure it will work
> on standalone Solr installs on 5.4+. Because as far as we can tell, it
> simply doesn't work.
>
>
> Mary Jo
>
>
>
> On Wed, Jun 1, 2016 at 1:35 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
> > On 5/31/2016 3:13 PM, John Bickerstaff wrote:
> > > The suggestion on the readme is that I can drop the
> > > hon_lucene_synonyms jar file into the $SOLR_HOME directory, but this
> > > does not seem to be working - I'm getting class not found exceptions.
> >
> > What I typically do with *all* extra jars (dataimport, mysql, ICU jars,
> > etc) is put them into $SOLR_HOME/lib ... a directory that you will
> > usually need to create.  If the installer script is used with default
> > options, that directory will be /var/solr/data/lib.
> >
> > Any jar that you place in that directory will be loaded once at Solr
> > startup and available to all cores.  The best thing about this directory
> > is that it requires zero configuration.
> >
> > For 5.3 and later, loading jars into
> > server/solr-webapp/webapp/WEB-INF/lib should also work, but then you are
> > modifying the actual Solr install, which I normally avoid because it
> > makes it a little bit harder to upgrade Solr.
> >
> > > Does anyone on this list have direct experience with getting this
> > > plugin to work in Solr 5.x?
> >
> > I don't have any experience with that specific plugin, but I have
> > successfully used other plugin jars with the lib directory mentioned
> above.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff

Thanks Shawn

Yup - I created a /lib inside my $SOLR_HOME directory (which by default was
/var/solr/data)

I put the hon_lucene. jar file in there and rebooted - same errors
about class not found.

Tried again in what looked like the next most obvious spot
server/solr-webapp/webapp/WEB-INF/lib

Same result...  Class not found.

I'll go back and triple check

Joe - is that recommendation of using the Blob Store API an absolute?  I
know my IT guys are going to want to have the signing - it would be a lot
easier to just drop in jars we care about without worrying about the
signing.  Yes - I'm being lazy, I know. 

Thanks all!

On Tue, May 31, 2016 at 11:35 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 5/31/2016 3:13 PM, John Bickerstaff wrote:
> > The suggestion on the readme is that I can drop the
> > hon_lucene_synonyms jar file into the $SOLR_HOME directory, but this
> > does not seem to be working - I'm getting class not found exceptions.
>
> What I typically do with *all* extra jars (dataimport, mysql, ICU jars,
> etc) is put them into $SOLR_HOME/lib ... a directory that you will
> usually need to create.  If the installer script is used with default
> options, that directory will be /var/solr/data/lib.
>
> Any jar that you place in that directory will be loaded once at Solr
> startup and available to all cores.  The best thing about this directory
> is that it requires zero configuration.
>
> For 5.3 and later, loading jars into
> server/solr-webapp/webapp/WEB-INF/lib should also work, but then you are
> modifying the actual Solr install, which I normally avoid because it
> makes it a little bit harder to upgrade Solr.
>
> > Does anyone on this list have direct experience with getting this
> > plugin to work in Solr 5.x?
>
> I don't have any experience with that specific plugin, but I have
> successfully used other plugin jars with the lib directory mentioned above.
>
> Thanks,
> Shawn
>
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff

Jeff - Looking at the page, I'm unclear exactly how to set things up.  I
get using the blob api and I get adding the blob/jar to the collection, but
the bit about  runtimeLib=true  is confusing.

Does that go on the entry in the solrconfig.xml file like this?  Is
anything else required?  (The bit about the valuesourceyparser is a bit
confusing)



Thanks

On Tue, May 31, 2016 at 5:02 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Thanks Jeff,
>
> I believe I tried that, and it still refused to load..  But I'd sure love
> it to work since the other process is a bit convoluted - although I see
> it's value in a large Solr installation.
>
> When I "locate" the jar on the linux command line I get:
>
>
> /opt/solr-5.4.0/server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
>
> But the log file is still carrying class not found exceptions when I
> restart...
>
> Are you in "Cloud" mode?  What version of Solr are you using?
>
> On Tue, May 31, 2016 at 4:08 PM, Jeff Wartes <jwar...@whitepages.com>
> wrote:
>
>> I’ve generally been dropping foreign plugin jars in this dir:
>> server/solr-webapp/webapp/WEB-INF/lib/
>> This is because it then gets loaded by the same classloader as Solr
>> itself, which can be useful if you’re, say, overriding some
>> solr-protected-space method.
>>
>> If you don’t care about the classloader, I believe you can use whatever
>> dir you want, with the appropriate bit of solrconfig.xml to load it.
>> Something like:
>> 
>>
>>
>> On 5/31/16, 2:13 PM, "John Bickerstaff" <j...@johnbickerstaff.com> wrote:
>>
>> >All --
>> >
>> >I'm now attempting to use the hon_lucene_synonyms project from github.
>> >
>> >I found the documents that were infered by the dead links on the readme
>> in
>> >the repository -- however, given that I'm using Solr 5.4.x, I no longer
>> >have the need to integrate into a war file (as far as I can see).
>> >
>> >The suggestion on the readme is that I can drop the hon_lucene_synonyms
>> jar
>> >file into the $SOLR_HOME directory, but this does not seem to be working
>> -
>> >I'm getting class not found exceptions.
>> >
>> >Does anyone on this list have direct experience with getting this plugin
>> to
>> >work in Solr 5.x?
>> >
>> >Thanks in advance...
>> >
>> >On Mon, May 30, 2016 at 6:57 PM, MaryJo Sminkey <mjsmin...@gmail.com>
>> wrote:
>> >
>> >> It's been awhile since I installed it so I really can't say. I'm more
>> of a
>> >> code monkey than a server gal (particularly Linux... I'm amazed I got
>> Solr
>> >> installed in the first place, LOL!) So I had asked our network guy to
>> look
>> >> it over recently and see if it looked like I did it okay. He said
>> since it
>> >> shows up in the list of jars in the Solr admin that it's installed
>> if
>> >> that's not necessarily true, I probably need to point him in the right
>> >> direction for what else to do since he really doesn't know Solr well
>> >> either.
>> >>
>> >> Mary Jo
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, May 30, 2016 at 7:49 PM, John Bickerstaff <
>> >> j...@johnbickerstaff.com>
>> >> wrote:
>> >>
>> >> > Thanks for the comment Mary Jo...
>> >> >
>> >> > The error loading the class rings a bell - did you find and follow
>> >> > instructions for adding that to the WAR file?  I vaguely remember
>> seeing
>> >> > something about that.
>> >> >
>> >> > I'm going to try my own tests on the auto phrasing one..  If I'm
>> >> > successful, I'll post back.
>> >> >
>> >> > On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey <mjsmin...@gmail.com
>> >
>> >> > wrote:
>> >> >
>> >> > > This is a very timely discussion for me as well as we're trying to
>> >> tackle
>> >> > > the multi term synonym issue as well and have not been able to
>> >> hon-lucene
>> >> > > plugin to work, the jar shows up as installed but when we set up
>> the
>> >> > sample
>> >> > > request handler it throws this error:
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.solr.common.SolrException:org.apach

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff

Thanks Jeff,

I believe I tried that, and it still refused to load..  But I'd sure love
it to work since the other process is a bit convoluted - although I see
it's value in a large Solr installation.

When I "locate" the jar on the linux command line I get:

/opt/solr-5.4.0/server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar

But the log file is still carrying class not found exceptions when I
restart...

Are you in "Cloud" mode?  What version of Solr are you using?

On Tue, May 31, 2016 at 4:08 PM, Jeff Wartes <jwar...@whitepages.com> wrote:

> I’ve generally been dropping foreign plugin jars in this dir:
> server/solr-webapp/webapp/WEB-INF/lib/
> This is because it then gets loaded by the same classloader as Solr
> itself, which can be useful if you’re, say, overriding some
> solr-protected-space method.
>
> If you don’t care about the classloader, I believe you can use whatever
> dir you want, with the appropriate bit of solrconfig.xml to load it.
> Something like:
> 
>
>
> On 5/31/16, 2:13 PM, "John Bickerstaff" <j...@johnbickerstaff.com> wrote:
>
> >All --
> >
> >I'm now attempting to use the hon_lucene_synonyms project from github.
> >
> >I found the documents that were infered by the dead links on the readme in
> >the repository -- however, given that I'm using Solr 5.4.x, I no longer
> >have the need to integrate into a war file (as far as I can see).
> >
> >The suggestion on the readme is that I can drop the hon_lucene_synonyms
> jar
> >file into the $SOLR_HOME directory, but this does not seem to be working -
> >I'm getting class not found exceptions.
> >
> >Does anyone on this list have direct experience with getting this plugin
> to
> >work in Solr 5.x?
> >
> >Thanks in advance...
> >
> >On Mon, May 30, 2016 at 6:57 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> wrote:
> >
> >> It's been awhile since I installed it so I really can't say. I'm more
> of a
> >> code monkey than a server gal (particularly Linux... I'm amazed I got
> Solr
> >> installed in the first place, LOL!) So I had asked our network guy to
> look
> >> it over recently and see if it looked like I did it okay. He said since
> it
> >> shows up in the list of jars in the Solr admin that it's installed
> if
> >> that's not necessarily true, I probably need to point him in the right
> >> direction for what else to do since he really doesn't know Solr well
> >> either.
> >>
> >> Mary Jo
> >>
> >>
> >>
> >>
> >> On Mon, May 30, 2016 at 7:49 PM, John Bickerstaff <
> >> j...@johnbickerstaff.com>
> >> wrote:
> >>
> >> > Thanks for the comment Mary Jo...
> >> >
> >> > The error loading the class rings a bell - did you find and follow
> >> > instructions for adding that to the WAR file?  I vaguely remember
> seeing
> >> > something about that.
> >> >
> >> > I'm going to try my own tests on the auto phrasing one..  If I'm
> >> > successful, I'll post back.
> >> >
> >> > On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> >> > wrote:
> >> >
> >> > > This is a very timely discussion for me as well as we're trying to
> >> tackle
> >> > > the multi term synonym issue as well and have not been able to
> >> hon-lucene
> >> > > plugin to work, the jar shows up as installed but when we set up the
> >> > sample
> >> > > request handler it throws this error:
> >> > >
> >> > >
> >> >
> >>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> >> > > Error loading class
> >> > >
> >> >
> >>
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
> >> > >
> >> > > I have tried the auto-phrasing one as well (I did set up a field
> using
> >> > copy
> >> > > to configure it on) but when testing it didn't seem to return the
> >> > synonyms
> >> > > as expected. So gave up on that one too (am willing to give it
> another
> >> > try
> >> > > though, that was awhile ago). Would definitely like to hear what
> other
> >> > > people have found works on the latest versions of Solr 5.x and/or 6.
> >> Just
> >> > > sucks that this issue has never been fixed in the core product such
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff

Many thanks Joe!  I'll follow the instructions on the linked webpage.

On Tue, May 31, 2016 at 4:05 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> The docs are out of date for the synonym_edismax but it does work. Check
> out the tests for working examples. I'll try to update it soon. I've run
> the plugin on Solr 5 and 6, solrcloud and standalone. For running in
> SolrCloud make sure you follow
>
> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
> On May 31, 2016 5:13 PM, "John Bickerstaff" <j...@johnbickerstaff.com>
> wrote:
>
> > All --
> >
> > I'm now attempting to use the hon_lucene_synonyms project from github.
> >
> > I found the documents that were infered by the dead links on the readme
> in
> > the repository -- however, given that I'm using Solr 5.4.x, I no longer
> > have the need to integrate into a war file (as far as I can see).
> >
> > The suggestion on the readme is that I can drop the hon_lucene_synonyms
> jar
> > file into the $SOLR_HOME directory, but this does not seem to be working
> -
> > I'm getting class not found exceptions.
> >
> > Does anyone on this list have direct experience with getting this plugin
> to
> > work in Solr 5.x?
> >
> > Thanks in advance...
> >
> > On Mon, May 30, 2016 at 6:57 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> > wrote:
> >
> > > It's been awhile since I installed it so I really can't say. I'm more
> of
> > a
> > > code monkey than a server gal (particularly Linux... I'm amazed I got
> > Solr
> > > installed in the first place, LOL!) So I had asked our network guy to
> > look
> > > it over recently and see if it looked like I did it okay. He said since
> > it
> > > shows up in the list of jars in the Solr admin that it's installed
> if
> > > that's not necessarily true, I probably need to point him in the right
> > > direction for what else to do since he really doesn't know Solr well
> > > either.
> > >
> > > Mary Jo
> > >
> > >
> > >
> > >
> > > On Mon, May 30, 2016 at 7:49 PM, John Bickerstaff <
> > > j...@johnbickerstaff.com>
> > > wrote:
> > >
> > > > Thanks for the comment Mary Jo...
> > > >
> > > > The error loading the class rings a bell - did you find and follow
> > > > instructions for adding that to the WAR file?  I vaguely remember
> > seeing
> > > > something about that.
> > > >
> > > > I'm going to try my own tests on the auto phrasing one..  If I'm
> > > > successful, I'll post back.
> > > >
> > > > On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey <mjsmin...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > This is a very timely discussion for me as well as we're trying to
> > > tackle
> > > > > the multi term synonym issue as well and have not been able to
> > > hon-lucene
> > > > > plugin to work, the jar shows up as installed but when we set up
> the
> > > > sample
> > > > > request handler it throws this error:
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > > > > Error loading class
> > > > >
> > > >
> > >
> >
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
> > > > >
> > > > > I have tried the auto-phrasing one as well (I did set up a field
> > using
> > > > copy
> > > > > to configure it on) but when testing it didn't seem to return the
> > > > synonyms
> > > > > as expected. So gave up on that one too (am willing to give it
> > another
> > > > try
> > > > > though, that was awhile ago). Would definitely like to hear what
> > other
> > > > > people have found works on the latest versions of Solr 5.x and/or
> 6.
> > > Just
> > > > > sucks that this issue has never been fixed in the core product such
> > > that
> > > > > you still need to mess with plugins and patches to get such a basic
> > > > > functionality working properly.
> > > > >
> > > > >
> > > > > *Mary Jo Sminkey*
> > > > > *Senior ColdFusion Developer*
> > > > >
> > > > > *CF We

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff

All --

I'm now attempting to use the hon_lucene_synonyms project from github.

I found the documents that were infered by the dead links on the readme in
the repository -- however, given that I'm using Solr 5.4.x, I no longer
have the need to integrate into a war file (as far as I can see).

The suggestion on the readme is that I can drop the hon_lucene_synonyms jar
file into the $SOLR_HOME directory, but this does not seem to be working -
I'm getting class not found exceptions.

Does anyone on this list have direct experience with getting this plugin to
work in Solr 5.x?

Thanks in advance...

On Mon, May 30, 2016 at 6:57 PM, MaryJo Sminkey <mjsmin...@gmail.com> wrote:

> It's been awhile since I installed it so I really can't say. I'm more of a
> code monkey than a server gal (particularly Linux... I'm amazed I got Solr
> installed in the first place, LOL!) So I had asked our network guy to look
> it over recently and see if it looked like I did it okay. He said since it
> shows up in the list of jars in the Solr admin that it's installed if
> that's not necessarily true, I probably need to point him in the right
> direction for what else to do since he really doesn't know Solr well
> either.
>
> Mary Jo
>
>
>
>
> On Mon, May 30, 2016 at 7:49 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Thanks for the comment Mary Jo...
> >
> > The error loading the class rings a bell - did you find and follow
> > instructions for adding that to the WAR file?  I vaguely remember seeing
> > something about that.
> >
> > I'm going to try my own tests on the auto phrasing one..  If I'm
> > successful, I'll post back.
> >
> > On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> > wrote:
> >
> > > This is a very timely discussion for me as well as we're trying to
> tackle
> > > the multi term synonym issue as well and have not been able to
> hon-lucene
> > > plugin to work, the jar shows up as installed but when we set up the
> > sample
> > > request handler it throws this error:
> > >
> > >
> >
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > > Error loading class
> > >
> >
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
> > >
> > > I have tried the auto-phrasing one as well (I did set up a field using
> > copy
> > > to configure it on) but when testing it didn't seem to return the
> > synonyms
> > > as expected. So gave up on that one too (am willing to give it another
> > try
> > > though, that was awhile ago). Would definitely like to hear what other
> > > people have found works on the latest versions of Solr 5.x and/or 6.
> Just
> > > sucks that this issue has never been fixed in the core product such
> that
> > > you still need to mess with plugins and patches to get such a basic
> > > functionality working properly.
> > >
> > >
> > > *Mary Jo Sminkey*
> > > *Senior ColdFusion Developer*
> > >
> > > *CF Webtools*
> > > You Dream It... We Build It. <https://www.cfwebtools.com/>
> > > 11204 Davenport Suite 100
> > > Omaha, Nebraska 68154
> > > O: 402.408.3733 x128
> > > E:  maryjo.smin...@cfwebtools.com
> > > Skype: maryjos.cfwebtools
> > >
> > >
> > > On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
> > > j...@johnbickerstaff.com>
> > > wrote:
> > >
> > > > So I'm looking at the solution mentioned here:
> > > >
> > > >
> > >
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > > >
> > > > The thing that's troubling me slightly is that the way it's
> documented
> > it
> > > > seems to be missing a small but important link...
> > > >
> > > > What exactly causes the results listed to be returned?
> > > >
> > > > Here's my thought process:
> > > >
> > > > 1. The entry for /autophrase searchHandler does not specify a default
> > > > search field.
> > > > 2. The field type "text_autophrase" is set up as the one with the
> > > > AutoPhrasingFilterFactory as part of it's indexing
> > > >
> > > > There isn't any mention (perhaps because it's too obvious) of the
> need
> > to
> > > > copy or otherwise get data into the "text_autophrase" field at ind

Re: Alternate Port Not Working for Solr 6.0.0

2016-05-31 Thread John Bickerstaff

This may be no help at all, but my first thought is to wonder if anything
else is already running on port 80?

That might explain the somewhat silent "fail"...

Nicely said by the way - resisting the urge 

On Tue, May 31, 2016 at 2:02 PM, Teague James 
wrote:

> Hello,
>
> I am trying to install Solr 6.0.0 and have been successful with the default
> installation, following the instructions provided on the Apache Solr
> website. However, I do not want Solr running on port 8983, I want it to run
> on port 80. I started a new Ubuntu 14.04 VM, installed open JDK 8, then
> installed Solr with the following commands:
>
> Command: tar xzf solr-6.0.0.tgz solr-6.0.0/bin/install_solr_service.sh
> --strip-components=2
> Response: None, which is good.
>
> Command: ./install_solr_service.sh solr-6.0.0.tgz -p 80
> Response: Misplaced or Unknown flag -p
>
> So I tried...
> Command: ./install_solr_service.sh solr-6.0.0.tgz -i /opt -d /var/solr -u
> solr -s solr -p 80
> Response: A dump of the log, which is INFO only with no errors or warnings,
> at the top of which is "Solr process 4831 from /var/solr/solr-80.pid not
> found"
>
> If I look in the /var/solr directory I find a file called solr-80.pid, but
> nothing else. What did I miss? Previous versions of Solr, which I deployed
> with Tomcat instead of Jetty, allowed me to control this in the server.xml
> file in /etc/tomcat7/, but obviously this no longer applies. I like the
> ease
> of the installation script; I just want to be able to control the port
> assignment. Any help is appreciated! Thanks!
>
> -Teague
>
> PS - Please resist the urge to ask me why I want it on port 80. I am well
> aware of the security implications, etc., but regardless I still need to
> make this operational on port 80. Cheers!
>
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-30 Thread John Bickerstaff

Thanks for the comment Mary Jo...

The error loading the class rings a bell - did you find and follow
instructions for adding that to the WAR file?  I vaguely remember seeing
something about that.

I'm going to try my own tests on the auto phrasing one..  If I'm
successful, I'll post back.

On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey <mjsmin...@gmail.com> wrote:

> This is a very timely discussion for me as well as we're trying to tackle
> the multi term synonym issue as well and have not been able to hon-lucene
> plugin to work, the jar shows up as installed but when we set up the sample
> request handler it throws this error:
>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Error loading class
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
>
> I have tried the auto-phrasing one as well (I did set up a field using copy
> to configure it on) but when testing it didn't seem to return the synonyms
> as expected. So gave up on that one too (am willing to give it another try
> though, that was awhile ago). Would definitely like to hear what other
> people have found works on the latest versions of Solr 5.x and/or 6. Just
> sucks that this issue has never been fixed in the core product such that
> you still need to mess with plugins and patches to get such a basic
> functionality working properly.
>
>
> *Mary Jo Sminkey*
> *Senior ColdFusion Developer*
>
> *CF Webtools*
> You Dream It... We Build It. <https://www.cfwebtools.com/>
> 11204 Davenport Suite 100
> Omaha, Nebraska 68154
> O: 402.408.3733 x128
> E:  maryjo.smin...@cfwebtools.com
> Skype: maryjos.cfwebtools
>
>
> On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > So I'm looking at the solution mentioned here:
> >
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >
> > The thing that's troubling me slightly is that the way it's documented it
> > seems to be missing a small but important link...
> >
> > What exactly causes the results listed to be returned?
> >
> > Here's my thought process:
> >
> > 1. The entry for /autophrase searchHandler does not specify a default
> > search field.
> > 2. The field type "text_autophrase" is set up as the one with the
> > AutoPhrasingFilterFactory as part of it's indexing
> >
> > There isn't any mention (perhaps because it's too obvious) of the need to
> > copy or otherwise get data into the "text_autophrase" field at index
> time.
> >
> > There isn't any explicit listing of "text_autophrase" as the default
> search
> > field in the /autophrase search handler
> >
> > There isn't any explicit statement of "df=text_autophrase" in the query
> > statment: [/autophrase?q=New+York]
> >
> > Therefore it seems to me that if someone tries to implement this, they're
> > going to be disappointed in the results unless they:
> > a. copy or otherwise get ALL the text they're interested in -- into the
> > "text_autophrase" field as part of the schema.xml setup (to happen at
> index
> > time)
> > b. somehow explicitly declare "text_autophrase" as the default search
> field
> > - either in the searchHandler or wherever else the default field is
> > configured.
> >
> > If anyone out there has done this specific approach - could you validate
> > whether my thought process is correct and / or if I'm missing something?
> > Yes - I get that I can set it all up and try - but it's what I don't
> know I
> > don't know that bothers me...
> >
> > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> > j...@johnbickerstaff.com
> > > wrote:
> >
> > > Thank you Steve -- very helpful.
> > >
> > > I can see that whatever implementation I decide to try, some testing
> will
> > > be in order.  If anyone is aware of significant gotchas with this
> synonym
> > > thing that are not mentioned in the already-listed URLs, please feel
> free
> > > to comment.
> > >
> > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com> wrote:
> > >
> > >> I’m working on addressing problems using multi-term synonyms at query
> > >> time in Lucene and Solr.
> > >>
> > >> I recommend these two blogs for understanding the issues (the second
> one
> > >> was mentioned earlier in this thread):
> > >>
> > >> <
> > >>
&g

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-30 Thread John Bickerstaff

So I'm looking at the solution mentioned here:
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

The thing that's troubling me slightly is that the way it's documented it
seems to be missing a small but important link...

What exactly causes the results listed to be returned?

Here's my thought process:

1. The entry for /autophrase searchHandler does not specify a default
search field.
2. The field type "text_autophrase" is set up as the one with the
AutoPhrasingFilterFactory as part of it's indexing

There isn't any mention (perhaps because it's too obvious) of the need to
copy or otherwise get data into the "text_autophrase" field at index time.

There isn't any explicit listing of "text_autophrase" as the default search
field in the /autophrase search handler

There isn't any explicit statement of "df=text_autophrase" in the query
statment: [/autophrase?q=New+York]

Therefore it seems to me that if someone tries to implement this, they're
going to be disappointed in the results unless they:
a. copy or otherwise get ALL the text they're interested in -- into the
"text_autophrase" field as part of the schema.xml setup (to happen at index
time)
b. somehow explicitly declare "text_autophrase" as the default search field
- either in the searchHandler or wherever else the default field is
configured.

If anyone out there has done this specific approach - could you validate
whether my thought process is correct and / or if I'm missing something?
Yes - I get that I can set it all up and try - but it's what I don't know I
don't know that bothers me...

On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <j...@johnbickerstaff.com
> wrote:

> Thank you Steve -- very helpful.
>
> I can see that whatever implementation I decide to try, some testing will
> be in order.  If anyone is aware of significant gotchas with this synonym
> thing that are not mentioned in the already-listed URLs, please feel free
> to comment.
>
> On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com> wrote:
>
>> I’m working on addressing problems using multi-term synonyms at query
>> time in Lucene and Solr.
>>
>> I recommend these two blogs for understanding the issues (the second one
>> was mentioned earlier in this thread):
>>
>> <
>> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
>> >
>> <https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
>>
>> In addition to the already-mentioned projects, there is also:
>>
>> <https://issues.apache.org/jira/browse/SOLR-5379>
>>
>> All of these projects try in various ways to work around the fact that
>> Lucene’s QueryParser splits on whitespace before sending text to analysis,
>> one token at a time, so in a synonym filter, multi-word synonyms can never
>> match and add alternatives.  See <
>> https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve posted a
>> patch to directly address that problem - note that it’s still a work in
>> progress.
>>
>> Once LUCENE-2605 has been fixed, there is still work to do getting
>> (e)dismax to work with the modified Lucene QueryParser, and addressing
>> problems with how queries are constructed from Lucene’s “sausagized” token
>> stream.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On May 26, 2016, at 2:21 PM, John Bickerstaff <j...@johnbickerstaff.com>
>> wrote:
>> >
>> > Thanks Chris --
>> >
>> > The two projects I'm aware of are:
>> >
>> > https://github.com/healthonnet/hon-lucene-synonyms
>> >
>> > and the one referenced from the Lucidworks page here:
>> >
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> >
>> > ... which is here :
>> https://github.com/LucidWorks/auto-phrase-tokenfilter
>> >
>> > Is there anything else out there that you would recommend I look at?
>> >
>> > On Thu, May 26, 2016 at 12:01 PM, Chris Morley <ch...@depahelix.com>
>> wrote:
>> >
>> >> Chris Morley here, from Wayfair.  (Depahelix = my domain)
>> >>
>> >> Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
>> >> We worked mostly off of Ted Sullivan's work and also off of some
>> >> suggestions from Koorosh Vakhshoori.  We have gotten to a point where
>> we
>> >> have a more sophisticated internal implementation, however, we've found
>> >> that it is very diff

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-27 Thread John Bickerstaff

Thank you Steve -- very helpful.

I can see that whatever implementation I decide to try, some testing will
be in order.  If anyone is aware of significant gotchas with this synonym
thing that are not mentioned in the already-listed URLs, please feel free
to comment.

On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com> wrote:

> I’m working on addressing problems using multi-term synonyms at query time
> in Lucene and Solr.
>
> I recommend these two blogs for understanding the issues (the second one
> was mentioned earlier in this thread):
>
> <
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> >
> <https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
>
> In addition to the already-mentioned projects, there is also:
>
> <https://issues.apache.org/jira/browse/SOLR-5379>
>
> All of these projects try in various ways to work around the fact that
> Lucene’s QueryParser splits on whitespace before sending text to analysis,
> one token at a time, so in a synonym filter, multi-word synonyms can never
> match and add alternatives.  See <
> https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve posted a
> patch to directly address that problem - note that it’s still a work in
> progress.
>
> Once LUCENE-2605 has been fixed, there is still work to do getting
> (e)dismax to work with the modified Lucene QueryParser, and addressing
> problems with how queries are constructed from Lucene’s “sausagized” token
> stream.
>
> --
> Steve
> www.lucidworks.com
>
> > On May 26, 2016, at 2:21 PM, John Bickerstaff <j...@johnbickerstaff.com>
> wrote:
> >
> > Thanks Chris --
> >
> > The two projects I'm aware of are:
> >
> > https://github.com/healthonnet/hon-lucene-synonyms
> >
> > and the one referenced from the Lucidworks page here:
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >
> > ... which is here :
> https://github.com/LucidWorks/auto-phrase-tokenfilter
> >
> > Is there anything else out there that you would recommend I look at?
> >
> > On Thu, May 26, 2016 at 12:01 PM, Chris Morley <ch...@depahelix.com>
> wrote:
> >
> >> Chris Morley here, from Wayfair.  (Depahelix = my domain)
> >>
> >> Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
> >> We worked mostly off of Ted Sullivan's work and also off of some
> >> suggestions from Koorosh Vakhshoori.  We have gotten to a point where we
> >> have a more sophisticated internal implementation, however, we've found
> >> that it is very difficult to make it do what you want it to do, and
> also be
> >> sufficiently performant.  Watch out for exceptional situations with mm
> >> (minimum should match).
> >>
> >> Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also
> >> done work in this area.
> >>
> >> It should be very possible to get this kind of thing working on
> >> SolrCloud.  I haven't tried it yet but I think theoretically, it should
> >> just work.  The synonyms stuff is mostly about doing things at index
> time
> >> and query time.  The index time stuff should translate to SolrCloud
> >> directly, while the query time stuff might pose some issues, but
> probably
> >> not too bad, if there are any issues at all.
> >>
> >> I've had decent luck porting our various plugins from 4.10.x to 5.5.0
> >> because a lot of stuff is just Java, and it still works within the Jetty
> >> context.
> >>
> >> -Chris.
> >>
> >>
> >>
> >>
> >> 
> >> From: "John Bickerstaff" <j...@johnbickerstaff.com>
> >> Sent: Thursday, May 26, 2016 1:51 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax
> parser
> >> Hey Jeff (or anyone interested in multi-word synonyms) here are some
> >> potentially interesting links...
> >>
> >> http://wiki.apache.org/solr/QueryParser (search the page for
> >> synonum_edismax)
> >>
> >> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> (blog
> >> post about what became the synonym_edissmax Query Parser)
> >>
> >>
> >>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >>
> >> This last was us

Re: import efficiencies

2016-05-26 Thread John Bickerstaff

Having more carefully read Erick's post - I see that is essentially what he
said in a much more straightforward way.

I will also second Erick's suggestion of hammering on the SQL.  We found
that fruitful many times at the same gig.  I develop and am not a SQL
master.  In a similar situation I'll usually seek out a specialist to help
me make sure the query isn't wasteful.  It frequently was and I learned a
lot.

On Thu, May 26, 2016 at 12:31 PM, John Bickerstaff <j...@johnbickerstaff.com
> wrote:

> It may or may not be helpful, but there's a similar class of problem that
> is frequently solved either by stored procedures or by running the query on
> a time-frame and storing the results...  Doesn't matter if the end-point
> for the data is Solr or somewhere else.
>
> The problem is long running queries that are extremely complex and stress
> the database performance too heavily.
>
> The solution is to de-normalize the data you need... store it in that form
> and then the query gets really fast... sort of like a data warehouse type
> of thing.  (Don't shoot, I know this isn't data warehousing...)
>
> Postgres even has something called an "automatically updateable view" that
> might serve - if that's your back end.
>
> Anyway - the underlying strategy is to find a way to flatten your data
> preparatory to turning it into solr documents by some means - either by
> getting it out on shorter-running queries all the time into some kind of
> store (Kafka, text file, whatever) or by using some feature of the database
> (stored procs writing to a summary table, automatically updatable view or
> similar).
>
> In this way, when you make your query, you make it against the "flattened"
> data - which is, ideally, all in one table - and then all the complexity of
> joins etc... is washed away and things ought to run pretty fast.
>
> The cost, of course, is a huge table with tons of duplicated data...  Only
> you can say if that's worth it.  I did this at my last gig and we truncated
> the table every 2 weeks to prevent it growing forever.
>
> In case it's helpful...
>
> PS - if you have the resources, a duplicate database can really help here
> too - again my experience is mostly with Postgres which allows a "warm"
> backup to be live.  We frequently used this for executive queries that were
> using the database like a data warehouse because they were so
> time-consuming.  It kept the load off production.
>
> On Thu, May 26, 2016 at 12:18 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Forgot to add... sometimes really hammering at the SQL query in DIH
>> can be fruitful, can you make a huge, monster query that's faster than
>> the sub-queries?
>>
>> I've also seen people run processes on the DB that move all the
>> data into a temporary place making use of all of the nifty stuff you
>> can do there and then use DIH on _that_. Or the view.
>>
>> All that said, I generally prefer using SolrJ if DIH doesn't do the job
>> after a day or two of fiddling, it gives more control.
>>
>> Good Luck!
>> Erick
>>
>> On Thu, May 26, 2016 at 11:02 AM, John Blythe <j...@curvolabs.com> wrote:
>> > oo gotcha. cool, will make sure to check it out and bounce any related
>> > questions through here.
>> >
>> > thanks!
>> >
>> > best,
>> >
>> >
>> > --
>> > *John Blythe*
>> > Product Manager & Lead Developer
>> >
>> > 251.605.3071 | j...@curvolabs.com
>> > www.curvolabs.com
>> >
>> > 58 Adams Ave
>> > Evansville, IN 47713
>> >
>> > On Thu, May 26, 2016 at 1:45 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> >> Solr commits aren't the issue I'd guess. All the time is
>> >> probably being spent getting the data from MySQL.
>> >>
>> >> I've had some luck writing to Solr from a DB through a
>> >> SolrJ program, here's a place to get started:
>> >> searchhub.org/2012/02/14/indexing-with-solrj/
>> >> you can peel out the Tika bits pretty easily I should
>> >> think.
>> >>
>> >> One technique I've used is to cache
>> >> some of the DB tables in Java's memory to keep
>> >> from having to do the secondary lookup(s). This only
>> >> really works if the "secondary table" is small enough to fit in
>> >> Java's memory of course. You can do some creative
>> >> things with caching partial tables if you can sort appropriately.
>> >>
>> >> Best,
>> &g

Re: import efficiencies

2016-05-26 Thread John Bickerstaff

It may or may not be helpful, but there's a similar class of problem that
is frequently solved either by stored procedures or by running the query on
a time-frame and storing the results...  Doesn't matter if the end-point
for the data is Solr or somewhere else.

The problem is long running queries that are extremely complex and stress
the database performance too heavily.

The solution is to de-normalize the data you need... store it in that form
and then the query gets really fast... sort of like a data warehouse type
of thing.  (Don't shoot, I know this isn't data warehousing...)

Postgres even has something called an "automatically updateable view" that
might serve - if that's your back end.

Anyway - the underlying strategy is to find a way to flatten your data
preparatory to turning it into solr documents by some means - either by
getting it out on shorter-running queries all the time into some kind of
store (Kafka, text file, whatever) or by using some feature of the database
(stored procs writing to a summary table, automatically updatable view or
similar).

In this way, when you make your query, you make it against the "flattened"
data - which is, ideally, all in one table - and then all the complexity of
joins etc... is washed away and things ought to run pretty fast.

The cost, of course, is a huge table with tons of duplicated data...  Only
you can say if that's worth it.  I did this at my last gig and we truncated
the table every 2 weeks to prevent it growing forever.

In case it's helpful...

PS - if you have the resources, a duplicate database can really help here
too - again my experience is mostly with Postgres which allows a "warm"
backup to be live.  We frequently used this for executive queries that were
using the database like a data warehouse because they were so
time-consuming.  It kept the load off production.

On Thu, May 26, 2016 at 12:18 PM, Erick Erickson 
wrote:

> Forgot to add... sometimes really hammering at the SQL query in DIH
> can be fruitful, can you make a huge, monster query that's faster than
> the sub-queries?
>
> I've also seen people run processes on the DB that move all the
> data into a temporary place making use of all of the nifty stuff you
> can do there and then use DIH on _that_. Or the view.
>
> All that said, I generally prefer using SolrJ if DIH doesn't do the job
> after a day or two of fiddling, it gives more control.
>
> Good Luck!
> Erick
>
> On Thu, May 26, 2016 at 11:02 AM, John Blythe  wrote:
> > oo gotcha. cool, will make sure to check it out and bounce any related
> > questions through here.
> >
> > thanks!
> >
> > best,
> >
> >
> > --
> > *John Blythe*
> > Product Manager & Lead Developer
> >
> > 251.605.3071 | j...@curvolabs.com
> > www.curvolabs.com
> >
> > 58 Adams Ave
> > Evansville, IN 47713
> >
> > On Thu, May 26, 2016 at 1:45 PM, Erick Erickson  >
> > wrote:
> >
> >> Solr commits aren't the issue I'd guess. All the time is
> >> probably being spent getting the data from MySQL.
> >>
> >> I've had some luck writing to Solr from a DB through a
> >> SolrJ program, here's a place to get started:
> >> searchhub.org/2012/02/14/indexing-with-solrj/
> >> you can peel out the Tika bits pretty easily I should
> >> think.
> >>
> >> One technique I've used is to cache
> >> some of the DB tables in Java's memory to keep
> >> from having to do the secondary lookup(s). This only
> >> really works if the "secondary table" is small enough to fit in
> >> Java's memory of course. You can do some creative
> >> things with caching partial tables if you can sort appropriately.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, May 26, 2016 at 9:01 AM, John Blythe 
> wrote:
> >> > hi all,
> >> >
> >> > i've got layered entities in my solr import. it's calling on some
> >> > transactional data from a MySQL instance. there are two fields that
> are
> >> > used to then lookup other information from other tables via their
> related
> >> > UIDs, one of which has its own child entity w yet another select
> >> statement
> >> > to grab up more data.
> >> >
> >> > it fetches at about 120/s but processes at ~50-60/s. we currently only
> >> have
> >> > close to 500k records, but it's growing quickly and thus is becoming
> >> > increasingly painful to make modifications due to the reimport that
> needs
> >> > to then occur.
> >> >
> >> > i feel like i'd seen some threads regarding commits of new data,
> >> > master/slave, or solrcloud/sharding that could help in some ways
> related
> >> to
> >> > this but as of yet can't scrounge them up w my searches (ironic :p).
> >> >
> >> > can someone help by pointing me to some good material related to this
> >> sort
> >> > of thing?
> >> >
> >> > thanks-
> >>
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff

Thanks Chris --

The two projects I'm aware of are:

https://github.com/healthonnet/hon-lucene-synonyms

and the one referenced from the Lucidworks page here:
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

... which is here : https://github.com/LucidWorks/auto-phrase-tokenfilter

Is there anything else out there that you would recommend I look at?

On Thu, May 26, 2016 at 12:01 PM, Chris Morley <ch...@depahelix.com> wrote:

> Chris Morley here, from Wayfair.  (Depahelix = my domain)
>
>  Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
> We worked mostly off of Ted Sullivan's work and also off of some
> suggestions from Koorosh Vakhshoori.  We have gotten to a point where we
> have a more sophisticated internal implementation, however, we've found
> that it is very difficult to make it do what you want it to do, and also be
> sufficiently performant.  Watch out for exceptional situations with mm
> (minimum should match).
>
>  Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also
> done work in this area.
>
>  It should be very possible to get this kind of thing working on
> SolrCloud.  I haven't tried it yet but I think theoretically, it should
> just work.  The synonyms stuff is mostly about doing things at index time
> and query time.  The index time stuff should translate to SolrCloud
> directly, while the query time stuff might pose some issues, but probably
> not too bad, if there are any issues at all.
>
>  I've had decent luck porting our various plugins from 4.10.x to 5.5.0
> because a lot of stuff is just Java, and it still works within the Jetty
> context.
>
>  -Chris.
>
>
>
>
> 
>  From: "John Bickerstaff" <j...@johnbickerstaff.com>
> Sent: Thursday, May 26, 2016 1:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
> Hey Jeff (or anyone interested in multi-word synonyms) here are some
> potentially interesting links...
>
> http://wiki.apache.org/solr/QueryParser (search the page for
> synonum_edismax)
>
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ (blog
> post about what became the synonym_edissmax Query Parser)
>
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
> This last was useful for lots of reasons and contains links to other
> interesting, related web pages...
>
> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes <jwar...@whitepages.com>
> wrote:
>
> > Oh, interesting. I've certainty encountered issues with multi-word
> > synonyms, but I hadn't come across this. If you end up using it with a
> > recent solr verison, I'd be glad to hear your experience.
> >
> > I haven't used it, but I am aware of one other project in this vein that
> > you might be interested in looking at:
> > https://github.com/LucidWorks/auto-phrase-tokenfilter
> >
> >
> > On 5/26/16, 9:29 AM, "John Bickerstaff" <j...@johnbickerstaff.com>
> wrote:
> >
> > >Ahh - for question #3 I may have spoken too soon. This line from the
> > >github repository readme suggests a way.
> > >
> > >Update: We have tested to run with the jar in $SOLR_HOME/lib as well,
> and
> > >it works (Jetty).
> > >
> > >I'll try that and only respond back if that doesn't work.
> > >
> > >Questions 1 and 2 still stand of course... If anyone on the list has
> > >experience in this area...
> > >
> > >Thanks.
> > >
> > >On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
> > j...@johnbickerstaff.com
> > >> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'm creating a Solr Cloud that will index and search medical text.
> > >> Multi-word synonyms are a pretty important factor.
> > >>
> > >> I find that there are some challenges around multi-word synonyms and I
> > >> also found on the wiki that there is a recommended 3rd-party parser
> > >> (synonym_edismax parser) created by Nolan Lawson and found here:
> > >> https://github.com/healthonnet/hon-lucene-synonyms
> > >>
> > >> Here's the thing - the instructions on the github site involve
> bringing
> > >> the jar file into the war file - which is not applicable any more...
> at
> > >> least I think it's not...
> > >>
> > >> I have three questions:
> > >>
> > >> 1. Is this still a good solution for multi-word synonyms (I.e. Solr
> > Cloud
> > >> doesn't break it in some way)
> > >> 2. Is there a tool or plug-in out there that the contributors would
> > >> recommend above this one?
> > >> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated
> procedure
> > >> for bringing it in to Solr Cloud (I'm running 5.4.x)
> > >>
> > >> Thanks
> > >>
> >
> >
>
>
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff

fixing typo:

http://wiki.apache.org/solr/QueryParser  (search the page for
synonym_edismax)

On Thu, May 26, 2016 at 11:50 AM, John Bickerstaff <j...@johnbickerstaff.com
> wrote:

> Hey Jeff (or anyone interested in multi-word synonyms) here are some
> potentially interesting links...
>
> http://wiki.apache.org/solr/QueryParser  (search the page for
> synonum_edismax)
>
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
>  (blog post about what became the synonym_edissmax Query Parser)
>
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
> This last was useful for lots of reasons and contains links to other
> interesting, related web pages...
>
> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes <jwar...@whitepages.com>
> wrote:
>
>> Oh, interesting. I’ve certainty encountered issues with multi-word
>> synonyms, but I hadn’t come across this. If you end up using it with a
>> recent solr verison, I’d be glad to hear your experience.
>>
>> I haven’t used it, but I am aware of one other project in this vein that
>> you might be interested in looking at:
>> https://github.com/LucidWorks/auto-phrase-tokenfilter
>>
>>
>> On 5/26/16, 9:29 AM, "John Bickerstaff" <j...@johnbickerstaff.com> wrote:
>>
>> >Ahh - for question #3 I may have spoken too soon.  This line from the
>> >github repository readme suggests a way.
>> >
>> >Update: We have tested to run with the jar in $SOLR_HOME/lib as well, and
>> >it works (Jetty).
>> >
>> >I'll try that and only respond back if that doesn't work.
>> >
>> >Questions 1 and 2 still stand of course...  If anyone on the list has
>> >experience in this area...
>> >
>> >Thanks.
>> >
>> >On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
>> j...@johnbickerstaff.com
>> >> wrote:
>> >
>> >> Hi all,
>> >>
>> >> I'm creating a Solr Cloud that will index and search medical text.
>> >> Multi-word synonyms are a pretty important factor.
>> >>
>> >> I find that there are some challenges around multi-word synonyms and I
>> >> also found on the wiki that there is a recommended 3rd-party parser
>> >> (synonym_edismax parser) created by Nolan Lawson and found here:
>> >> https://github.com/healthonnet/hon-lucene-synonyms
>> >>
>> >> Here's the thing - the instructions on the github site involve bringing
>> >> the jar file into the war file - which is not applicable any more... at
>> >> least I think it's not...
>> >>
>> >> I have three questions:
>> >>
>> >> 1. Is this still a good solution for multi-word synonyms (I.e. Solr
>> Cloud
>> >> doesn't break it in some way)
>> >> 2. Is there a tool or plug-in out there that the contributors would
>> >> recommend above this one?
>> >> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure
>> >> for bringing it in to Solr Cloud (I'm running 5.4.x)
>> >>
>> >> Thanks
>> >>
>>
>>
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff

Hey Jeff (or anyone interested in multi-word synonyms) here are some
potentially interesting links...

http://wiki.apache.org/solr/QueryParser  (search the page for
synonum_edismax)

https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/  (blog
post about what became the synonym_edissmax Query Parser)

https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

This last was useful for lots of reasons and contains links to other
interesting, related web pages...

On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes <jwar...@whitepages.com>
wrote:

> Oh, interesting. I’ve certainty encountered issues with multi-word
> synonyms, but I hadn’t come across this. If you end up using it with a
> recent solr verison, I’d be glad to hear your experience.
>
> I haven’t used it, but I am aware of one other project in this vein that
> you might be interested in looking at:
> https://github.com/LucidWorks/auto-phrase-tokenfilter
>
>
> On 5/26/16, 9:29 AM, "John Bickerstaff" <j...@johnbickerstaff.com> wrote:
>
> >Ahh - for question #3 I may have spoken too soon.  This line from the
> >github repository readme suggests a way.
> >
> >Update: We have tested to run with the jar in $SOLR_HOME/lib as well, and
> >it works (Jetty).
> >
> >I'll try that and only respond back if that doesn't work.
> >
> >Questions 1 and 2 still stand of course...  If anyone on the list has
> >experience in this area...
> >
> >Thanks.
> >
> >On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
> j...@johnbickerstaff.com
> >> wrote:
> >
> >> Hi all,
> >>
> >> I'm creating a Solr Cloud that will index and search medical text.
> >> Multi-word synonyms are a pretty important factor.
> >>
> >> I find that there are some challenges around multi-word synonyms and I
> >> also found on the wiki that there is a recommended 3rd-party parser
> >> (synonym_edismax parser) created by Nolan Lawson and found here:
> >> https://github.com/healthonnet/hon-lucene-synonyms
> >>
> >> Here's the thing - the instructions on the github site involve bringing
> >> the jar file into the war file - which is not applicable any more... at
> >> least I think it's not...
> >>
> >> I have three questions:
> >>
> >> 1. Is this still a good solution for multi-word synonyms (I.e. Solr
> Cloud
> >> doesn't break it in some way)
> >> 2. Is there a tool or plug-in out there that the contributors would
> >> recommend above this one?
> >> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure
> >> for bringing it in to Solr Cloud (I'm running 5.4.x)
> >>
> >> Thanks
> >>
>
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff

Ahh - for question #3 I may have spoken too soon.  This line from the
github repository readme suggests a way.

Update: We have tested to run with the jar in $SOLR_HOME/lib as well, and
it works (Jetty).

I'll try that and only respond back if that doesn't work.

Questions 1 and 2 still stand of course...  If anyone on the list has
experience in this area...

Thanks.

On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <j...@johnbickerstaff.com
> wrote:

> Hi all,
>
> I'm creating a Solr Cloud that will index and search medical text.
> Multi-word synonyms are a pretty important factor.
>
> I find that there are some challenges around multi-word synonyms and I
> also found on the wiki that there is a recommended 3rd-party parser
> (synonym_edismax parser) created by Nolan Lawson and found here:
> https://github.com/healthonnet/hon-lucene-synonyms
>
> Here's the thing - the instructions on the github site involve bringing
> the jar file into the war file - which is not applicable any more... at
> least I think it's not...
>
> I have three questions:
>
> 1. Is this still a good solution for multi-word synonyms (I.e. Solr Cloud
> doesn't break it in some way)
> 2. Is there a tool or plug-in out there that the contributors would
> recommend above this one?
> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure
> for bringing it in to Solr Cloud (I'm running 5.4.x)
>
> Thanks
>

Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff

Hi all,

I'm creating a Solr Cloud that will index and search medical text.
Multi-word synonyms are a pretty important factor.

I find that there are some challenges around multi-word synonyms and I also
found on the wiki that there is a recommended 3rd-party parser
(synonym_edismax parser) created by Nolan Lawson and found here:
https://github.com/healthonnet/hon-lucene-synonyms

Here's the thing - the instructions on the github site involve bringing the
jar file into the war file - which is not applicable any more... at least I
think it's not...

I have three questions:

1. Is this still a good solution for multi-word synonyms (I.e. Solr Cloud
doesn't break it in some way)
2. Is there a tool or plug-in out there that the contributors would
recommend above this one?
3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure for
bringing it in to Solr Cloud (I'm running 5.4.x)

Thanks

Re: API call for optimising a collection

2016-05-17 Thread John Bickerstaff

Having run the optimize from the admin UI on one of my three cores in a
Solr Cloud collection, I find that when I got to try to run it on one of
the other cores, it is already "optimized"

I realize that's not the same thing as an API call, but thought it might
help.

On Tue, May 17, 2016 at 11:22 PM, Nick Vasilyev 
wrote:

> As far as I know, you have to run it on each core.
> On May 18, 2016 1:04 AM, "Binoy Dalal"  wrote:
>
> > Is there no api call that can optimize an entire collection?
> >
> > I tried the collections api page on the confluence wiki but couldn't find
> > anything, and a Google search also yielded no meaningful results.
> > --
> > Regards,
> > Binoy Dalal
> >
>

Re: [scottchu] How to specify multiple zk nodes using solr startcommand under Windows

2016-05-17 Thread John Bickerstaff

I think those zk server warning messages are expected.  Until you have 3
running instances you don't have a "Quorum" and the Zookeeper instances
complain.  Once the third one comes up they are "happy" and don't complain
any more.  You'd get similar messages if one of the Zookeeper nodes ever
went down.

As for the stopping of zk server - I've never had any problem issuing a
stop command, but I'm running Linux so I may not be much good to you in
that regard.

On Tue, May 17, 2016 at 8:41 PM, scott.chu  wrote:

> I tested yesterday and it proves my theory. I'll share what I do under
> Windows on 1 PC here with you experienced guys and further newbies:
>
> 1>Download zookeeper 3.4.8. I unzip it and copy to 3 other different
> folders: zk_1, zk_2, zk_3.
> 2>For each zk_n folder, I do these things  (Note: {n} means the last digit
> in zk_n foler name):
>  a. Create zoo_data folder under root and create 'myid' with
> notepad, the contents is pure '{n}' only.
>  b. Create zoo.cfg under conf folder with following contents:
>  clientPort=218{n}
>  initLimit=5
>  syncLimit=2
>  dataDir=D:/zk_{n}/zoo_data
>  ;if p2p-coneect-port or leader-election-port are all
> same, then we should set maxClientCnxns=n
>  ;maxClientCnxns=3
>  ;server.x=host:p2p-connect-port:leader-election-port
>  server.1=localhost:2888:3888
>  server.2=localhost:2889:3889
>  server.3=localhost:2890:3890
>  3> I download ZOOKEEPER-1122's zkServer.cmd. and go into each zk_n folder
> and issue command:
>  bin\zkServer.cmd start
>
>[Question]: There's something I'd like to ask guys: When I start
> zk_1, zk_2, the console keeps shows some warning messages.
>  Only after I start zk_3, the warning messages
> is stopped.  Is that normal?
>
> 4>  I use zkui_win to see them all go online successfully.
> 5> I goto Solr-5.4.1 folder, and issue following commands:
>bin\solr start -c -s mynodes\node1 -z localhost:2181
>bin\solr start -c -s mynodes\node1 -z localhost:2181 -p
> 7973
>bin\solr create -c cugna -d myconfigsets\cugna -shards
> 1 -replicationFactor 2 -p 8983
> 6> By using zkui_win again,  I see:
>   ** config 'cugna' are all synchronized on zk_1 to zk_3. So this
> proves my theory, we only have to specify only one zk nodes  and they'll
> sync themselves. **
>
> [Question]: I go into zk_n folder and issue 'bin\zkServer stop'. However,
> this shows error message. It seems it can't taskkill the zk process for
> some reason. The only way I stop them
>  is by closing DOS windows that has issued the
> 'bin\zkServer start' command. Does anybody know why 'bin\zkServer stop'
> doesn't work?
>
> Note: Gotta say sorry for the repitition of localhost:2181. It's my typo.
>
> scott.chu，scott@udngroup.com
> 2016/5/18 (週三)
> - Original Message -
> From: Abdel Belkasri
> To: solr-user
> CC:
> Date: 2016/5/18 (週三) 00:17
> Subject: Re: [scottchu] How to specify multiple zk nodes using solr
> startcommand under Windows
>
>
> The repetition is just a cut and paste from Scott's post.
>
> How can I check if I am getting the ensemble or just a single zk?
>
> Also if this is not the way to specify an ensemble, what is the right way?
>
>
> Because the comma delimited list does not work, I concur with Scott.
>
> On Tue, May 17, 2016 at 11:49 AM, Erick Erickson 
>
> wrote:
>
> > Are you absolutely sure you're getting an _ensemble_ and
> > not just connecting to a single node? My suspicion (without
> > proof) is that you're just getting one -z option. It'll work as
> > long as that ZK instance stays up, but it won't be fault-tolerant.
> >
> > And again you repeated the port (2181) twice.
> >
> > Best,
> > Erick
> >
> > On Tue, May 17, 2016 at 8:02 AM, Abdel Belkasri 
> > wrote:
> > > Hi Scott,
> > > what worked for me in Windows is this (no ",")
> > > bin\Solr start -c -s mynodes\node1 -z localhost:2181 -z localhost:2181
> -z
> > > localhost:2183
> > >
> > > -- Hope this helps
> > > Abdel.
> > >
> > > On Tue, May 17, 2016 at 3:35 AM, scott.chu 
> > wrote:
> > >
> > >> I start 3 zk nodes at port 2181,2182, and 2183 on my local machine.
> > >> Go into Solr 5.4.1 root folder and issue and issue the command in
> > article
> > >> 'Setting Up an External ZooKeeper Ensemble' in reference guide
> > >>
> > >> bin\Solr start -c -s mynodes\node1 -z
> > >> localhost:2181,localhost:2181,localhost:2183
> > >>
> > >> but it doesn't run but just show help page of start command in
> solr.cmd.
> > >> How should I issue the correct command?
> > >>
> > >
> > >
> > >
> > > --
> > > Abdel K. Belkasri, PhD
> >
>
>
>
> --
> Abdel K. Belkasri,

Re: [scottchu] How to specify multiple zk nodes using solr start command under Windows

2016-05-17 Thread John Bickerstaff

it's roundabout, but this might work -- ask for the healthcheck status
(from the solr box) and hit each zkNode separately.

I'm on Linux so you'll have to translate to Windows...  using the solr.cmd
file I assume...

./solr healthcheck -z 192.168.56.5:2181/solr5_4 -c collectionName
./solr healthcheck -z 192.168.56.6:2181/solr5_4 -c collectionName
./solr healthcheck -z 192.168.56.7:2181/solr5_4 -c collectionName

My original command included all the IP addresses without port numbers.
This does return healthcheck info (the same info) when I do it for each
zkNode separately...

I assume that if they've all got info for all your replicas and/or shards,
they're working as an ensemble.

You could also go looking at each zkNode (using the zkCli tool) and verify
that your collection appears where you expect it.



On Tue, May 17, 2016 at 10:17 AM, Abdel Belkasri  wrote:

> The repetition is just a cut and paste from Scott's post.
>
> How can I check if I am getting the ensemble or just a single zk?
>
> Also if this is not the way to specify an ensemble, what is the right way?
>
> Because the comma delimited list does not work, I concur with Scott.
>
> On Tue, May 17, 2016 at 11:49 AM, Erick Erickson 
> wrote:
>
> > Are you absolutely sure you're getting an _ensemble_ and
> > not just connecting to a single node? My suspicion (without
> > proof) is that you're just getting one -z option. It'll work as
> > long as that ZK instance stays up, but it won't be fault-tolerant.
> >
> > And again you repeated the port (2181) twice.
> >
> > Best,
> > Erick
> >
> > On Tue, May 17, 2016 at 8:02 AM, Abdel Belkasri 
> > wrote:
> > > Hi Scott,
> > > what worked for me in Windows is this (no ",")
> > > bin\Solr start -c -s mynodes\node1 -z localhost:2181 -z localhost:2181
> -z
> > > localhost:2183
> > >
> > > -- Hope this helps
> > > Abdel.
> > >
> > > On Tue, May 17, 2016 at 3:35 AM, scott.chu 
> > wrote:
> > >
> > >> I start 3 zk nodes at port 2181,2182, and 2183 on my local machine.
> > >> Go into Solr 5.4.1 root folder and issue and issue the command in
> > article
> > >> 'Setting Up an External ZooKeeper Ensemble' in reference guide
> > >>
> > >> bin\Solr start -c -s mynodes\node1 -z
> > >> localhost:2181,localhost:2181,localhost:2183
> > >>
> > >> but it doesn't run but just show help page of start command in
> solr.cmd.
> > >> How should I issue the correct command?
> > >>
> > >
> > >
> > >
> > > --
> > > Abdel K. Belkasri, PhD
> >
>
>
>
> --
> Abdel K. Belkasri, PhD
>

Re: [scottchu] How to specify multiple zk nodes using solr start commandunder Windows

2016-05-17 Thread John Bickerstaff

In your original command, you listed the same port twice.  That may have
been at least part of the difficulty.

It's probably fine to just use one zk node - as the zookeeper instances
should be aware of each other.

I also assume that if your solr.in.sh (or windows equavalent) has the
properly formatted entry for all the zk nodes, Solr will be able to find a
different one if the one you pass in goes down...

Here's the entry from my file in case it's helpful.  I got away without the
port number (I assume) because I'm using the default 2181 on all my
zookeeper nodes which are separate servers.

# Set the ZooKeeper connection string if using an external ZooKeeper
ensemble
# e.g. host1:2181,host2:2181/chroot
# Leave empty if not using SolrCloud
ZK_HOST="192.168.56.5,192.168.56.6,192.168.56.7/solr5_4"

On Tue, May 17, 2016 at 4:30 AM, scott.chu  wrote:

>
> I issue  '-z localhost:2181 -z localhost:2182 -z localhost:2183' for each
> node's start command and later when I create collection, all 3 zk nodes has
> registered my configset.
> Never try but I think maybe only use -z localhost:2181, then all 3 nodes
> in zk ensemble will synchronize themselves.
>
> scott.chu，scott@udngroup.com
> 2016/5/17 (週二)
> - Original Message -
> From: scott(自己)
> To: solr-user
> CC:
> Date: 2016/5/17 (週二) 15:35
> Subject: [scottchu] How to specify multiple zk nodes using solr start
> commandunder Windows
>
>
> I start 3 zk nodes at port 2181,2182, and 2183 on my local machine.
> Go into Solr 5.4.1 root folder and issue and issue the command in article
> 'Setting Up an External ZooKeeper Ensemble' in reference guide
>
> bin\Solr start -c -s mynodes\node1 -z
> localhost:2181,localhost:2181,localhost:2183
>
> but it doesn't run but just show help page of start command in solr.cmd.
> How should I issue the correct command?
>
>
> -
> 未在此訊息中找到病毒。
> 已透過 AVG 檢查 - www.avg.com
> 版本: 2015.0.6201 / 病毒庫: 4568/12245 - 發佈日期: 05/16/16
>

Re: Does anybody crawl to a database and then index from the database to Solr?

2016-05-13 Thread John Bickerstaff

I've been working on a less-complex thing along the same lines - taking all
the data from our corporate database and pumping it into Kafka for
long-term storage -- and the ability to "play back" all the Kafka messages
any time we need to re-index.

That simpler scenario has worked like a charm.  I don't need to massage the
data much once it's at rest in Kafka, so that was a straightforward
solution, although I could have gone with a DB and just stored the solr
documents with their ID's one per row in a RDBMS...

The rest sounds like good ideas for your situation as Solr isn't the best
candidate for the kind of manipulation of data you're proposing and a
database excels at that.  It's more work, but you get a lot more
flexibility and you de-couple Solr from the data crawling as you say.

It all sounds pretty good to me, but I've only been on the list here a
short time - so I'll leave it to others to add their comments.

On Fri, May 13, 2016 at 2:46 PM, Pryor, Clayton J 
wrote:

> Question:
> Do any of you have your crawlers write to a database rather than directly
> to Solr and then use a connector to index to Solr from the database?  If
> so, have you encountered any issues with this approach?  If not, why not?
>
> I have searched forums and the Solr/Lucene email archives (including
> browsing of http://www.apache.org/foundation/public-archives.html) but
> have not found any discussions of this idea.  I am certain that I am not
> the first person to think of it.  I suspect that I have just not figured
> out the proper queries to find what I am looking for.  Please forgive me if
> this idea has been discussed before and I just couldn't find the
> discussions.
>
> Background:
> I am new to Solr and have been asked to make improvements to our Solr
> configurations and crawlers.  I have read that the Solr index should not be
> considered a source of record data.  It is in essence a highly optimized
> index to be used for generating search results rather than a retainer for
> record copies of data.  The better approach is to rely on corporate data
> sources for record data and retain the ability to completely blow away a
> Solr index and repopulate it as needed for changing search requirements.
> This made me think that perhaps it would be a good idea for us to create a
> database of crawled data for our Solr index.  The idea is that the crawlers
> would write their findings to a corporate supported database of our own
> design for our own purposes and then we would populate our Solr index from
> this database using a connector that writes from the database to the Solr
> index.
> The only disadvantage that I can think of for this approach is that we
> will need to write a simple interface to the database that allows our admin
> personnel to "Delete" a record from the Solr index.  Of course, it won't be
> deleted from the database but simply flagged as not to be indexed to Solr.
> It will then send a delete command to Solr for any successfully "deleted"
> records from the database.  I suspect this admin interface will grow over
> time but we really only need to be able to delete records from the database
> for now.  All of the rest of our admin work is query related which can
> still be done through the Solr Console.
> I can think of the following advantages:
>
>   *   We have a corporate sponsored and backed up repository for our
> crawled data which would buffer us from any inadvertent losses of our Solr
> index.
>   *   We would divorce the time it takes to crawl web pages from the time
> it takes to populate our Solr index with data from the crawlers.  I have
> found that my Solr Connector takes minutes to populate the entire Solr
> index from the current Solr prod to the new Solr instances.  Compare that
> to hours and even days to actually crawl the web pages.
>   *   We use URLs for our unique IDs in our Solr index.  We can resolve
> the problem of retaining the shortest URL when duplicate content is
> detected in Solr simply by sorting the query used to populate Solr from the
> database by id length descending - this will ensure the last URL
> encountered for any duplicate is always the shortest.
>   *   We can easily ensure that certain classes of crawled content are
> always added last (or first if you prefer) whenever the data is indexed to
> Solr - rather than having to rely on the timing of crawlers.
>   *   We could quickly and easily rebuild our Solr index from scratch at
> any time.  This would be very valuable when changes to our Solr
> configurations require re-indexing our data.
>   *   We can assign unique boost values to individual "documents" at index
> time by assigning a boost value for that document in the database and then
> applying that boost at index time.
>   *   We can continuously run a batch program that removes broken links
> against this database with no impact to Solr and then refresh Solr on a
> more frequent basis than we do now because the connector

Re: Is there an equivalent to an SQL "select distinct" in Solr

2016-05-13 Thread John Bickerstaff

I should clarify:

http:/XXX.XXX.XX.XX:8983/solr/yourCoreName/select
q=*%3A*=0=json=true=true=category

"yourCoreName" will get built in for you if you use the Solr Admin UI for
queries --

On Fri, May 13, 2016 at 9:36 AM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> In case it's helpful for a quick and dirty peek at your facets, the
> following URL (in a browser or Curl) will get you basic facets for a field
> named "category" -- assuming you change the IP address / hostname to match
> yours.
>
> http:/XXX.XXX.XX.XX:8983/solr/statdx_shard1_replica3/select
> q=*%3A*=0=json=true=true=category
>
> You can also do this in the Admin UI by checking the "facet" box, and
> entering the field name in the facet.field that pops up.  You can leave the
> query field at the default *:*
>
> You need to make sure that you put a "0" in the rows field as well (right
> under "sort") in order to just get back the facet counts.
>
> On Fri, May 13, 2016 at 7:52 AM, Joel Bernstein <joels...@gmail.com>
> wrote:
>
>> You may also want to try out the SQL interface in Solr 6.0 which supports
>> SELECT DISTINCT queries.
>>
>>
>> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SELECTDISTINCTQueries
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Fri, May 13, 2016 at 9:47 AM, GW <thegeofo...@gmail.com> wrote:
>>
>> > Thank you Shawn,
>> >
>> > I will toy with these over the weekend. Solr/Hadoop/Hbase has been a
>> nasty
>> > learning curve for me,
>> > It would probably would have been a lot easier if I didn't have 30
>> years of
>> > RDBMS stuck in my head.
>> >
>> > Again,
>> >
>> > Many thanks for your response.
>> >
>> >
>> > On 13 May 2016 at 08:57, Shawn Heisey <apa...@elyograg.org> wrote:
>> >
>> > > On 5/13/2016 6:48 AM, GW wrote:
>> > > > Let's say I have 10,000 documents and there is a field named
>> "category"
>> > > and
>> > > > lets say there are 200 categories but I do not know what they are.
>> > > >
>> > > > My question: Is there a query/filter that can pull a list of
>> distinct
>> > > > categories?
>> > >
>> > > Sounds like a job for faceting or grouping.  Which one of them to use
>> > > will depend on exactly what you're trying to obtain in your results.
>> > >
>> > > https://cwiki.apache.org/confluence/display/solr/Faceting
>> > > https://cwiki.apache.org/confluence/display/solr/Result+Grouping
>> > >
>> > > Thanks,
>> > > Shawn
>> > >
>> > >
>> >
>>
>
>

Re: Is there an equivalent to an SQL "select distinct" in Solr

2016-05-13 Thread John Bickerstaff

In case it's helpful for a quick and dirty peek at your facets, the
following URL (in a browser or Curl) will get you basic facets for a field
named "category" -- assuming you change the IP address / hostname to match
yours.

http:/XXX.XXX.XX.XX:8983/solr/statdx_shard1_replica3/select
q=*%3A*=0=json=true=true=category

You can also do this in the Admin UI by checking the "facet" box, and
entering the field name in the facet.field that pops up.  You can leave the
query field at the default *:*

You need to make sure that you put a "0" in the rows field as well (right
under "sort") in order to just get back the facet counts.

On Fri, May 13, 2016 at 7:52 AM, Joel Bernstein  wrote:

> You may also want to try out the SQL interface in Solr 6.0 which supports
> SELECT DISTINCT queries.
>
>
> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SELECTDISTINCTQueries
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, May 13, 2016 at 9:47 AM, GW  wrote:
>
> > Thank you Shawn,
> >
> > I will toy with these over the weekend. Solr/Hadoop/Hbase has been a
> nasty
> > learning curve for me,
> > It would probably would have been a lot easier if I didn't have 30 years
> of
> > RDBMS stuck in my head.
> >
> > Again,
> >
> > Many thanks for your response.
> >
> >
> > On 13 May 2016 at 08:57, Shawn Heisey  wrote:
> >
> > > On 5/13/2016 6:48 AM, GW wrote:
> > > > Let's say I have 10,000 documents and there is a field named
> "category"
> > > and
> > > > lets say there are 200 categories but I do not know what they are.
> > > >
> > > > My question: Is there a query/filter that can pull a list of distinct
> > > > categories?
> > >
> > > Sounds like a job for faceting or grouping.  Which one of them to use
> > > will depend on exactly what you're trying to obtain in your results.
> > >
> > > https://cwiki.apache.org/confluence/display/solr/Faceting
> > > https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>

Re: Atomicity of commits in Solr Cloud?

2016-05-12 Thread John Bickerstaff

I'm not a dev, but I would assume the following if I were concerned with
speed and atomicity

A.  A commit WILL be reflected in all appropriate shards / replicas in a
very short time.
  I believe Solr Cloud guarantees this, although the time frame
will be dependent on "B"
B. Network, processor, size, complexity of commit and indexing, and other
variables will have an affect on how fast it happens
C. It is possible that during the process of getting the data to all Solr
instances, a user could query at just the right moment and get "stale"
data, but that state will not last very long.

Others on this list may correct me... but that's my best understanding.

On Thu, May 12, 2016 at 10:46 AM, Lars Noschinski  wrote:

> Hi everyone,
>
> does Solr (6.0) give an guarantees about the atomicity of commits in Solr
> Cloud? I.e., if I add a large amount of documents, then commit, do the
> changed state become visible on all shards at the same time? I could not
> find anything towards that, so I assume that there is no such guarantee.
>
> Best regards,
>   Lars
> --
> Lars Noschinski
> Softwareingenieur
>
> QAware GmbH
> Aschauer Str. 32
> 81549 München, Germany
> Tel +49 89 232315-213
> Fax +49 89 232315-129
> lars.noschin...@qaware.de
> www.qaware.de
> --
> Geschäftsführer: Christian Kamm, Bernd Schlüter, Johannes Weigend, Dr.
> Josef Adersberger
> Registergericht: München
> Handelsregisternummer: HRB 163761
>

Re: changing web context and port for SolrCloud Zookeeper

2016-05-11 Thread John Bickerstaff

Excellent! That file gave me fits at first.  It lives in two locations, but
the one that counts for booting SOLR is the /etc/default one.
On May 11, 2016 12:53 PM, "Tom Gullo" <tomgu...@gmail.com> wrote:

That helps.  I ended up updating the sole.in.sh file in /etc/default and
that was in getting picked up.  Thanks

> On May 11, 2016, at 2:05 PM, Tom Gullo <tomgu...@gmail.com> wrote:
>
> My Solr installation is running on Tomcat on port 8080 with a  web
context name that is different than /solr.   We want to move to a basic
jetty setup with all the defaults.  I haven’t found a clean way to do
this.  A lot of the values like baseurl and /leader/elect/shard1 have
values that need to be updated.  If I try shutting down the servers, change
the zookeeper settings and then restart Solr in Jetty I get issues - like
Solr thinks they are replicas.   So I’m looking to see if anyone knows what
is the cleanest way to move from a Tomcat/8080 install to a Jetty/8983 one.
>
> Thanks
>
>> On May 11, 2016, at 1:59 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:
>>
>> I may be answering the wrong question - but SolrCloud goes in by default
on
>> 8983, yes?  Is yours currently on 8080?
>>
>> I don't recall where, but I think I saw a config file setting for the
port
>> number (In Solr I mean)
>>
>> Am I on the right track or are you asking something other than how to get
>> Solr on host:8983/solr ?
>>
>> On Wed, May 11, 2016 at 11:56 AM, Tom Gullo <tomgu...@gmail.com> wrote:
>>
>>> I need to change the web context and the port for a SolrCloud
installation.
>>>
>>> Example, change:
>>>
>>> host:8080/some-api-here/
>>>
>>> to this:
>>>
>>> host:8983/solr/
>>>
>>> Does anyone know how to do this with SolrCloud?  There are values stored
>>> in clusterstate.json and /leader/elect and I could change
them
>>> but that seems a little messy.
>>>
>>> Thanks
>

Re: changing web context and port for SolrCloud Zookeeper

2016-05-11 Thread John Bickerstaff

Oh, I see -

Hmmm... I just did a disaster recovery work up for my IT guys and basically
I recommended they build SOLR from scratch and reindex rather than try to
recover (same for changing versions)

However, we've got a small-ish data set and that may not work for everyone.

Any chance you can just rebuild (with the default Jetty) and re-index?

On Wed, May 11, 2016 at 12:05 PM, Tom Gullo <tomgu...@gmail.com> wrote:

> My Solr installation is running on Tomcat on port 8080 with a  web context
> name that is different than /solr.   We want to move to a basic jetty setup
> with all the defaults.  I haven’t found a clean way to do this.  A lot of
> the values like baseurl and /leader/elect/shard1 have values that need to
> be updated.  If I try shutting down the servers, change the zookeeper
> settings and then restart Solr in Jetty I get issues - like Solr thinks
> they are replicas.   So I’m looking to see if anyone knows what is the
> cleanest way to move from a Tomcat/8080 install to a Jetty/8983 one.
>
> Thanks
>
> > On May 11, 2016, at 1:59 PM, John Bickerstaff <j...@johnbickerstaff.com>
> wrote:
> >
> > I may be answering the wrong question - but SolrCloud goes in by default
> on
> > 8983, yes?  Is yours currently on 8080?
> >
> > I don't recall where, but I think I saw a config file setting for the
> port
> > number (In Solr I mean)
> >
> > Am I on the right track or are you asking something other than how to get
> > Solr on host:8983/solr ?
> >
> > On Wed, May 11, 2016 at 11:56 AM, Tom Gullo <tomgu...@gmail.com> wrote:
> >
> >> I need to change the web context and the port for a SolrCloud
> installation.
> >>
> >> Example, change:
> >>
> >> host:8080/some-api-here/
> >>
> >> to this:
> >>
> >> host:8983/solr/
> >>
> >> Does anyone know how to do this with SolrCloud?  There are values stored
> >> in clusterstate.json and /leader/elect and I could change
> them
> >> but that seems a little messy.
> >>
> >> Thanks
>
>

Re: changing web context and port for SolrCloud Zookeeper

2016-05-11 Thread John Bickerstaff

Yup - bottom of solr.in.sh - if you used the "install for production"
script.

/etc/default/solr.in.sh (on linux which is all I do these days)

Hope that helps...  Ping back if not.

SOLR_PID_DIR="/var/solr"
SOLR_HOME="/var/solr/data"
LOG4J_PROPS="/var/solr/log4j.properties"
SOLR_LOGS_DIR="/var/solr/logs"
SOLR_PORT="8983"

On Wed, May 11, 2016 at 11:59 AM, John Bickerstaff <j...@johnbickerstaff.com
> wrote:

> I may be answering the wrong question - but SolrCloud goes in by default
> on 8983, yes?  Is yours currently on 8080?
>
> I don't recall where, but I think I saw a config file setting for the port
> number (In Solr I mean)
>
> Am I on the right track or are you asking something other than how to get
> Solr on host:8983/solr ?
>
> On Wed, May 11, 2016 at 11:56 AM, Tom Gullo <tomgu...@gmail.com> wrote:
>
>> I need to change the web context and the port for a SolrCloud
>> installation.
>>
>> Example, change:
>>
>> host:8080/some-api-here/
>>
>> to this:
>>
>> host:8983/solr/
>>
>> Does anyone know how to do this with SolrCloud?  There are values stored
>> in clusterstate.json and /leader/elect and I could change them
>> but that seems a little messy.
>>
>> Thanks
>
>
>

Re: changing web context and port for SolrCloud Zookeeper

2016-05-11 Thread John Bickerstaff

I may be answering the wrong question - but SolrCloud goes in by default on
8983, yes?  Is yours currently on 8080?

I don't recall where, but I think I saw a config file setting for the port
number (In Solr I mean)

Am I on the right track or are you asking something other than how to get
Solr on host:8983/solr ?

On Wed, May 11, 2016 at 11:56 AM, Tom Gullo  wrote:

> I need to change the web context and the port for a SolrCloud installation.
>
> Example, change:
>
> host:8080/some-api-here/
>
> to this:
>
> host:8983/solr/
>
> Does anyone know how to do this with SolrCloud?  There are values stored
> in clusterstate.json and /leader/elect and I could change them
> but that seems a little messy.
>
> Thanks

Re: I need Consultation/Suggestion and I am even willing to pay fee for that

2016-05-05 Thread John Bickerstaff

This statement has two possible meanings in my mind...

"I want everything as automated manner with minimal manual work."

Do you mean minimal work for your users?  Or do you mean minimal work to
get your idea up and running and generating income for you or your company?

The first meaning is laudable and a good idea -- and generally the nicer
you want to make things for your users, the more time you will spend in
analysis and development (I.E. greater cost in time and money)

The second meaning suggests you want to spend a minimum of time and money
to get something working -- which is generally incompatible with a really
great user experience...

And, of course, I may have totally missed your meaning and you may have had
something totally different in mind...

On Thu, May 5, 2016 at 8:33 AM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> I'll just briefly add some thoughts...
>
> #1 This can be done several ways - including keeping a totally separate
> document that contains ONLY the data you're willing to expose for free --
> but what you want to accomplish is not clear enough to me for me to start
> making recommendations.  I'll just say that this is not a problem or an
> issue.  A way can be found to address #1 without much problem.
>
> #2 is difficult to understand.  I have the sense that you're only
> beginning to think about a full application you want to build - with Search
> at the center -- answering #2 is going to take a lot more clarity about
> exactly what you're trying to accomplish.
>
>
> #3  SOLR allows you to store original content so that you can return it
> from Solr to an application at some future point.  You don't need to worry
> about that.  By far the simplest way to handle images is to store metadata
> about the image (including a link, or some way to get it quickly out of
> your database, say, the DB id) and then go get the image as part of a
> secondary process of building your web page after Solr has returned
> results...  At least that's the way I and the teams I've worked with have
> always handled it.
>
> #4  I must admit, I don't understand question #4...  Do you mean "Will the
> way I'm handling documents affect the way my site is ranked by Google?"
> Um.  Probably?  If you were giving everything away for free you'd
> probably end up with a higher rank over time, but that's not what you want
> to do, so maybe it's not an issue?  I'm not an expert on getting good
> rankings from Google, so I'll leave that to others to comment on.
>
> As for 5 - what is the something you want to do?  I could try to answer,
> but I don't have enough information to be sure my answer will match what
> you're looking for.
>
> On Thu, May 5, 2016 at 4:46 AM, Zara Parst <edotserv...@gmail.com> wrote:
>
>> What is in my mind!!
>>
>>
>>
>> I have data in TB mainly educational assignments and projects which will
>> contain text, image and may be codes also if this is from computer
>> Science.  I will index all the documents into solr and I will also have
>> original copy of those documents. Now, I want to create a library where
>> user can search the content and can see few parts of relevant documents
>> like 5 to 10 related documents but in restricted manner.  For unrestricted
>> manner they have to pay for each documents.
>>
>>
>>
>> I also want to create page for those content which has been already shown
>> to the user as a restricted part. So that number of page on my website
>> keep
>> on increasing which will give a boost to my website for search engine
>> ranking. Obviously more pages mean better rank. I want everything as
>> automated manner with minimal manual work. Now issue that I am facing
>>
>> 1.  How to generate restricted part out of solr which is most relevant
>> ( I can implement sliding window display which might serve this but if
>> there is already something in solr then I will prefer that one)
>>
>>
>> 2.  How to create pages from that content and how to manage url of
>> that
>> page on my website (one solution would be url based on query but what if
>> someone search almost same thing and some other document comes as first
>> option and how to resolve the issue of the same url, this will also create
>> issue of overlapping content with different url if I am implementing
>> sliding window)
>>
>>
>>
>> 3.  About creating page, shall I create the page from solr content or
>> from original content because it might have image in content so better
>> option would be from original content.  More suitable choice looks like
>> from original content, if that is the case then how to extract those part
>> from the original content corresponding to the solr result.
>>
>>
>>
>> 4.  Will this affect my site ranking in negative way.
>>
>>
>>
>> 5.  Can we do something for Meta keyword, Title etc. of generated
>> page.
>>
>
>

Re: I need Consultation/Suggestion and I am even willing to pay fee for that

2016-05-05 Thread John Bickerstaff

I'll just briefly add some thoughts...

#1 This can be done several ways - including keeping a totally separate
document that contains ONLY the data you're willing to expose for free --
but what you want to accomplish is not clear enough to me for me to start
making recommendations.  I'll just say that this is not a problem or an
issue.  A way can be found to address #1 without much problem.

#2 is difficult to understand.  I have the sense that you're only beginning
to think about a full application you want to build - with Search at the
center -- answering #2 is going to take a lot more clarity about exactly
what you're trying to accomplish.

#3  SOLR allows you to store original content so that you can return it
from Solr to an application at some future point.  You don't need to worry
about that.  By far the simplest way to handle images is to store metadata
about the image (including a link, or some way to get it quickly out of
your database, say, the DB id) and then go get the image as part of a
secondary process of building your web page after Solr has returned
results...  At least that's the way I and the teams I've worked with have
always handled it.

#4  I must admit, I don't understand question #4...  Do you mean "Will the
way I'm handling documents affect the way my site is ranked by Google?"
Um.  Probably?  If you were giving everything away for free you'd
probably end up with a higher rank over time, but that's not what you want
to do, so maybe it's not an issue?  I'm not an expert on getting good
rankings from Google, so I'll leave that to others to comment on.

As for 5 - what is the something you want to do?  I could try to answer,
but I don't have enough information to be sure my answer will match what
you're looking for.

On Thu, May 5, 2016 at 4:46 AM, Zara Parst  wrote:

> What is in my mind!!
>
>
>
> I have data in TB mainly educational assignments and projects which will
> contain text, image and may be codes also if this is from computer
> Science.  I will index all the documents into solr and I will also have
> original copy of those documents. Now, I want to create a library where
> user can search the content and can see few parts of relevant documents
> like 5 to 10 related documents but in restricted manner.  For unrestricted
> manner they have to pay for each documents.
>
>
>
> I also want to create page for those content which has been already shown
> to the user as a restricted part. So that number of page on my website keep
> on increasing which will give a boost to my website for search engine
> ranking. Obviously more pages mean better rank. I want everything as
> automated manner with minimal manual work. Now issue that I am facing
>
> 1.  How to generate restricted part out of solr which is most relevant
> ( I can implement sliding window display which might serve this but if
> there is already something in solr then I will prefer that one)
>
>
> 2.  How to create pages from that content and how to manage url of that
> page on my website (one solution would be url based on query but what if
> someone search almost same thing and some other document comes as first
> option and how to resolve the issue of the same url, this will also create
> issue of overlapping content with different url if I am implementing
> sliding window)
>
>
>
> 3.  About creating page, shall I create the page from solr content or
> from original content because it might have image in content so better
> option would be from original content.  More suitable choice looks like
> from original content, if that is the case then how to extract those part
> from the original content corresponding to the solr result.
>
>
>
> 4.  Will this affect my site ranking in negative way.
>
>
>
> 5.  Can we do something for Meta keyword, Title etc. of generated page.
>

Re: What does the "Max Doc" means in Admin interface?

2016-05-04 Thread John Bickerstaff

Max doc is the total amount of documents in the collection INCLUDING the
ones that have been deleted but not actually removed.  Don't worry, deleted
docs are not used in search results.

Yes, you can change the number by "optimizing" (see the button) but this
does take time and bandwidth so use it in a way that won't negatively
affect Production.  Right after the optimize the Num Docs and Max Docs
should be the same I believe.

The -1 is (as I learned in an email on this forum a day or three ago) a
sign of a bug and should be ignored for now.

On Mon, May 2, 2016 at 12:25 AM, Bastien Latard - MDPI AG <
lat...@mdpi.com.invalid> wrote:

> Hi All,
>
> Everything is in the title...
>
>
> Can this value be modified?
> Or is it because of my environment?
>
> Also, what does "Heap Memory Usage: -1" mean?
>
> Kind regards,
> Bastien Latard
> Web engineer
> --
> MDPI AG
> Postfach, CH-4005 Basel, Switzerland
> Office: Klybeckstrasse 64, CH-4057
> Tel. +41 61 683 77 35
> Fax: +41 61 302 89 18
> E-mail: latard@mdpi.comhttp://www.mdpi.com/
>
>

Re: [Installation] Solr log directory

2016-05-03 Thread John Bickerstaff

Hoss - I'm guessing this is all in the install script that gets created
when you run that command (can't remember it) on the tar.gz file...

In other words, Yunee can edit that file, find those variables (like
SOLR_SERVICE) and change them from what they're set to by default to
whatever he wants...

On Tue, May 3, 2016 at 4:31 PM, Chris Hostetter 
wrote:

>
> : I have a question for installing solr server. Using '
> : install_solr_service.sh' with option -d , the solr home directory can be
> : set. But the default log directory is under $SOLR_HOME/logs.
> :
> : Is it possible to specify the logs directory separately from solr home
> directory during installation?
>
> install_solr_service.sh doesn't do anything special as far where logs
> should live  -- it just writes out a (default) "/etc/default/$
> SOLR_SERVICE.in.sh" (if
> it doesn't already exist) that specifies a (default) log directory for
> solr to use once the service starts
>
> you are absolutely expected to overwrite that "$SOLR_SERVICE.in.sh" file
> with your own specific settings -- in fact you *must* to configure things
> like ZooKeeper or SSL -- after the installation script finishes, and you
> are welcome to change the SOLR_LOGS_DIR setting to anything you want.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: [Installation] Solr log directory

2016-05-03 Thread John Bickerstaff

I think you should be able to change $SOLR_HOME to any valid path.

For example: /var/logs/solr_logs

On Tue, May 3, 2016 at 4:02 PM, Yunee Lee  wrote:

> Hi, solr experts.
>
> I have a question for installing solr server.
> Using ' install_solr_service.sh' with option -d , the solr home directory
> can be set. But the default log directory is under $SOLR_HOME/logs.
>
> Is it possible to specify the logs directory separately from solr home
> directory during installation?
>
> Thank you for your help.
>
> -Y
>
>
>

Re: solr.StrField or solr.StringField?

2016-05-03 Thread John Bickerstaff

You'll note that the "name" of the field in schema.xml is "string" and the
class is solr.StrField.

Easy to get confused when you're writing something up quickly... in a sense
the "string" field IS a solr.StrField

... but I could be wrong of course.



On Tue, May 3, 2016 at 2:14 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> I'm assuming it's another "class" or data type that someone built - but
> I'm afraid I don't know any more than that.
>
> An alternative possibility (supported by at least one of the links on that
> page you linked) is that it's just a typo -- people typing quickly and
> forgetting the exact (truncated) spelling of the field.
>
> In that case, it's talking about using it for faceting and IIRC, you want
> a non-analyzed field for that - preserve it exactly as it is for facet
> queries -- that suggests to me that the author actually meant StrField
>
> 
>
> I might want to index the same data differently in three different fields
> (perhaps using the Solr copyField
> <http://wiki.apache.org/solr/SchemaXml#Copy_Fields> directive):
>
>- For searching: Tokenized, case-folded, punctuation-stripped:
>   - schildt / herbert / wolpert / lewis / davies / p
>- For sorting: Untokenized, case-folded, punctuation-stripped:
>   - schildt herbert wolpert lewis davies p
>-
>
>For faceting: Primary author only, using a solr.StringField:
>- Schildt, Herbert
>
> Then when the user drills down on the "Schildt, Herbert" string I would
> reissue the query with an added fq=author:"Schild, Herbert" parameter.
>
> On Tue, May 3, 2016 at 2:01 PM, Steven White <swhite4...@gmail.com> wrote:
>
>> Thanks John.
>>
>> Yes, the out-of-the-box schema.xml does not have solr.StringField.
>> However, a number of Solr pages on the web mention solr.StringField [1]
>> and
>> thus I'm not sure if that's a typo, a real thing and such it is missing
>> from the official Solr wiki's.
>>
>> Steve
>>
>> [1] https://wiki.apache.org/solr/SolrFacetingOverview,
>>
>> http://grokbase.com/t/lucene/solr-commits/06cw5038rk/solr-wiki-update-of-solrfacetingoverview-by-jjlarrea
>> ,
>>
>> On Tue, May 3, 2016 at 3:35 PM, John Bickerstaff <
>> j...@johnbickerstaff.com>
>> wrote:
>>
>> > My default schema.xml does not have an entry for solr.StringField so I
>> > can't tell you what that one does.
>> >
>> > If you look for solr.StrField in the schema.xml file, you'll get some
>> idea
>> > of how it's defined.  The default setting is for it not to be analyzed.
>> >
>> > On Tue, May 3, 2016 at 10:16 AM, Steven White <swhite4...@gmail.com>
>> > wrote:
>> >
>> > > Hi Everyone,
>> > >
>> > > Is solr.StrField and solr.StringField the same thing?
>> > >
>> > > Thanks in advanced!
>> > >
>> > > Steve
>> > >
>> >
>>
>
>

Re: solr.StrField or solr.StringField?

2016-05-03 Thread John Bickerstaff

I'm assuming it's another "class" or data type that someone built - but I'm
afraid I don't know any more than that.

An alternative possibility (supported by at least one of the links on that
page you linked) is that it's just a typo -- people typing quickly and
forgetting the exact (truncated) spelling of the field.

In that case, it's talking about using it for faceting and IIRC, you want a
non-analyzed field for that - preserve it exactly as it is for facet
queries -- that suggests to me that the author actually meant StrField



I might want to index the same data differently in three different fields
(perhaps using the Solr copyField
<http://wiki.apache.org/solr/SchemaXml#Copy_Fields> directive):

   - For searching: Tokenized, case-folded, punctuation-stripped:
  - schildt / herbert / wolpert / lewis / davies / p
   - For sorting: Untokenized, case-folded, punctuation-stripped:
  - schildt herbert wolpert lewis davies p
   -

   For faceting: Primary author only, using a solr.StringField:
   - Schildt, Herbert

Then when the user drills down on the "Schildt, Herbert" string I would
reissue the query with an added fq=author:"Schild, Herbert" parameter.

On Tue, May 3, 2016 at 2:01 PM, Steven White <swhite4...@gmail.com> wrote:

> Thanks John.
>
> Yes, the out-of-the-box schema.xml does not have solr.StringField.
> However, a number of Solr pages on the web mention solr.StringField [1] and
> thus I'm not sure if that's a typo, a real thing and such it is missing
> from the official Solr wiki's.
>
> Steve
>
> [1] https://wiki.apache.org/solr/SolrFacetingOverview,
>
> http://grokbase.com/t/lucene/solr-commits/06cw5038rk/solr-wiki-update-of-solrfacetingoverview-by-jjlarrea
> ,
>
> On Tue, May 3, 2016 at 3:35 PM, John Bickerstaff <j...@johnbickerstaff.com
> >
> wrote:
>
> > My default schema.xml does not have an entry for solr.StringField so I
> > can't tell you what that one does.
> >
> > If you look for solr.StrField in the schema.xml file, you'll get some
> idea
> > of how it's defined.  The default setting is for it not to be analyzed.
> >
> > On Tue, May 3, 2016 at 10:16 AM, Steven White <swhite4...@gmail.com>
> > wrote:
> >
> > > Hi Everyone,
> > >
> > > Is solr.StrField and solr.StringField the same thing?
> > >
> > > Thanks in advanced!
> > >
> > > Steve
> > >
> >
>

Re: solr.StrField or solr.StringField?

2016-05-03 Thread John Bickerstaff

My default schema.xml does not have an entry for solr.StringField so I
can't tell you what that one does.

If you look for solr.StrField in the schema.xml file, you'll get some idea
of how it's defined.  The default setting is for it not to be analyzed.

On Tue, May 3, 2016 at 10:16 AM, Steven White  wrote:

> Hi Everyone,
>
> Is solr.StrField and solr.StringField the same thing?
>
> Thanks in advanced!
>
> Steve
>

Re: Solr5.5:DocValues/CopyField does not work with Atomic updates

2016-04-21 Thread John Bickerstaff

Which field do you try to atomically update?  A or B or some other?
On Apr 21, 2016 8:29 PM, "Tirthankar Chatterjee" 
wrote:

> Hi,
> Here is the scenario for SOLR5.5:
>
> FieldA type= stored=true indexed=true
>
> FieldB type= stored=false indexed=true docValue=true
> usedocvalueasstored=false
>
> FieldA copyTo FieldB
>
> Try an Atomic update and we are getting this error:
>
> possible analysis error: DocValuesField "mtmround" appears more than once
> in this document (only one value is allowed per field)
>
> How do we resolve this.
>
>
>
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **

Re: complete cluster shutdown

2016-04-21 Thread John Bickerstaff

I guess errors like "fsync-ing the write ahead log in SyncThread:5 took
7268ms which will adversely effect operation latency."

and: "likely client has closed socket"

make me wonder if something went wrong in terms of running out of disk
space for logs (thus giving your OS no space for necessary functions)  or
if you ran into memory issues, or if something changed your network /
firewall settings to prevent communication on ports that used to work...?

I'm not an expert on the code, but those kind of external problems is where
I'd start looking if I saw errors like this.

Were all the VM's up and running or were they down too?

On Wed, Apr 20, 2016 at 10:06 PM, Zap Org  wrote:

> I have 5 zookeeper and 2 solr machines and after a month or two whole
> clustre shutdown i dont know why. The logs i get in zookeeper are attached
> below. otherwise i dont get any error. All this is based on linux VM.
>
> 2016-03-11 16:50:18,159 [myid:5] - WARN  [SyncThread:5:FileTxnLog@334] -
> fsync-ing the write ahead log in SyncThread:5 took 7268ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2016-03-11 16:50:18,161 [myid:5] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x4535f00ee370001, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2016-03-11 16:50:18,163 [myid:5] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@1007] - Closed socket connection for
> client /localhost which had sessionid 0x4535f00ee370001
> 2016-03-11 16:50:18,166 [myid:5] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x2535ef744dd0005, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
>

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-19 Thread John Bickerstaff

When combining a load balancer with SolrCloud, the handler definitions
in solrconfig.xml should set preferLocalShards to true (which Tom
mentioned)

Thanks Shawn!  I was wondering where to set this...

Yup - my IT guy is sharp, sharp, sharp -- nice to get this confirmation
from the list...

On Tue, Apr 19, 2016 at 7:59 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 4/18/2016 11:22 AM, John Bickerstaff wrote:
> > So - my IT guy makes the case that we don't really need Zookeeper / Solr
> > Cloud...
> 
> > I'm biased in terms of using the most recent functionality, but I'm aware
> > that bias is not necessarily based on facts and want to do my due
> > diligence...
> >
> > Aside from the obvious benefits of spreading work across nodes (which may
> > not be a big deal in our application and which my IT guy proposes is more
> > transparently handled with a load balancer he understands) are there any
> > other considerations that would drive a choice for Solr Cloud (zookeeper
> > etc)?
>
> Erick has a point.  If your IT guy feels comfortable with a load
> balancer, he should go ahead and set that up.
>
> For a new install like you're describing, I would probably still use
> SolrCloud on the back end, even with a load balancer.
>
> As Daniel said, a non-cloud replicated setup requires configuration of
> masters and slaves.  Instead of replication, you could go with a build
> system that sends updates to each copy of the index independently.
>
> When using replication, switching master/slave roles in the event of a
> master server failure is not trivial.  SolrCloud handles all that,
> making multi-server management a LOT easier.  Initial setup is slightly
> more complicated due to zookeeper, and configuration management requires
> an "upload to zookeeper" step ... but I do not think these are not high
> hurdles considering how much easier it is to manage multiple servers.
>
> With the deployment you have described (which I trimmed out of this
> reply), I think you'd be fine running a standalone zookeeper process on
> three of your Solr servers, so you won't even need a bunch of extra
> hardware.
>
> When combining a load balancer with SolrCloud, the handler definitions
> in solrconfig.xml should set preferLocalShards to true (which Tom
> mentioned) so the load balancer target is the machine that actually
> processes the request.  Troubleshooting becomes more difficult if you
> don't do this, and avoiding the extra network hop will help performance.
>
> Thanks,
> Shawn
>
>

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-19 Thread John Bickerstaff

@Charlie

It's easy to do and wow does it save time and database resources...

I've built a Spring Boot Micro-services architecture that also registers in
Zookeeper.  One micro-service pulls from the original data source and
pushes to Kafka.  The second micro-service pulls from Kafka into SOLR.

Because they're registered in Zookeeper, the micro-services can be brought
up anywhere in the infrastructure I'm building and "rebuild" SOLR indices
from scratch.

I.E. if you lose SOLR completely, just bring up a new VM copy with an empty
index, start your microservice, and rebuild the index from scratch

We're dropping it all into AWS eventually.

It's sweet.  The original "run" to consolidate the data from various
databases takes over an hour -- IF the load on production is light. Running
out of Kafka takes less than 10 minutes and totally avoids loading
production databases.

If you're interested, ping me -- I'm happy to share what I've got...

On Tue, Apr 19, 2016 at 2:08 AM, Charlie Hull <char...@flax.co.uk> wrote:

> On 18/04/2016 18:22, John Bickerstaff wrote:
>
>> So - my IT guy makes the case that we don't really need Zookeeper / Solr
>> Cloud...
>>
>> He may be right - we're serving static data (changes to the collection
>> occur only 2 or 3 times a year and are minor)
>>
>> We probably could have 3 or 4 Solr nodes running in non-Cloud mode -- each
>> configured the same way, behind a load balancer and do fine.
>>
>> I've got a Kafka server set up with the solr docs as topics.  It takes
>> about 10 minutes to reload a "blank" Solr Server from the Kafka topic...
>> If I target 3-4 SOLR servers from my microservice instead of one, it
>> wouldn't take much longer than 10 minutes to concurrently reload all 3 or
>> 4
>> Solr servers from scratch...
>>
>
> This is something we've been discussing as a concept - to offload all the
> scaling stuff to Kafka (which is very good at that sort of thing) and
> simply hang Solr instances onto a Kafka topic. We've not taken it any
> further than a concept at this point but interesting to hear about others
> doing so!
>
> Charlie
>
>
>
>> I'm biased in terms of using the most recent functionality, but I'm aware
>> that bias is not necessarily based on facts and want to do my due
>> diligence...
>>
>> Aside from the obvious benefits of spreading work across nodes (which may
>> not be a big deal in our application and which my IT guy proposes is more
>> transparently handled with a load balancer he understands) are there any
>> other considerations that would drive a choice for Solr Cloud (zookeeper
>> etc)?
>>
>>
>>
>> On Mon, Apr 18, 2016 at 9:26 AM, Tom Evans <tevans...@googlemail.com>
>> wrote:
>>
>> On Mon, Apr 18, 2016 at 3:52 PM, John Bickerstaff
>>> <j...@johnbickerstaff.com> wrote:
>>>
>>>> Thanks all - very helpful.
>>>>
>>>> @Shawn - your reply implies that even if I'm hitting the URL for a
>>>> single
>>>> endpoint via HTTP - the "balancing" will still occur across the Solr
>>>>
>>> Cloud
>>>
>>>> (I understand the caveat about that single endpoint being a potential
>>>>
>>> point
>>>
>>>> of failure).  I just want to verify that I'm interpreting your response
>>>> correctly...
>>>>
>>>> (I have been asked to provide IT with a comprehensive list of options
>>>>
>>> prior
>>>
>>>> to a design discussion - which is why I'm trying to get clear about the
>>>> various options)
>>>>
>>>> In a nutshell, I think I understand the following:
>>>>
>>>> a. Even if hitting a single URL, the Solr Cloud will "balance" across
>>>> all
>>>> available nodes for searching
>>>>Caveat: That single URL represents a potential single point
>>>> of
>>>> failure and this should be taken into account
>>>>
>>>> b. SolrJ's CloudSolrClient API provides the ability to distribute load
>>>> --
>>>> based on Zookeeper's "knowledge" of all available Solr instances.
>>>>Note: This is more robust than "a" due to the fact that it
>>>> eliminates the "single point of failure"
>>>>
>>>> c.  Use of a load balancer hitting all known Solr instances will be fine
>>>>
>>> -
>>>
>>>> although the search requests may not run on the Solr instance the load
>>&g

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread John Bickerstaff

Thanks Eric, for the  confirmation.
On Apr 18, 2016 5:48 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

> In short, I'm afraid I have to agree with your IT guy.
>
> I like SolrCloud, it's wy cool. But in your situation I really
> can't say it's compelling.
>
> The places SolrCloud shines: automatically routing docs to shards..
> You're not sharing.
>
> Automatically electing a new leader (analogous to master) ... You
> don't care since the pain of reindexing is so little.
>
> Not losing data when a leader/master goes down during indexing... You
> don't care since you can reindex quickly and you're indexing so
> rarely.
>
> In fact, I'd also optimize the index, Something I rarely recommend.
>
> Even the argument that you get to use all your nodes for searching
> doesn't really pertain since you can index on a node, then just copy
> the index to all your nodes, you could get by without even configuring
> master/slave. Or just, as you say, index to all your Solr nodes
> simultaneously.
>
> About the only downside is that you've got to create your Solr nodes
> independently, making sure the proper configurations are on each one
> etc, but even if those changed 2-3 times a year it's hardly onerous.
>
> You _are_ getting all the latest and greatest indexing and search
> improvements, all the SolrCloud stuff is built on top of exactly the
> Solr you'd get without using SolrCloud.
>
> And finally, there is certainly a learning curve to SolrCloud,
> particularly in this case the care and feeding of Zookeeper.
>
> The instant you need to have shards, the argument changes quite
> dramatically. The argument changes some under significant indexing
> loads. The argument totally changes if you need low latency. It
> doesn't sound like your situation is sensitive to any of these
> though
>
> Best,
> Erick
>
> On Apr 18, 2016 10:41 AM, "John Bickerstaff" <j...@johnbickerstaff.com>
> wrote:
> >
> > Nice - thanks Daniel.
> >
> > On Mon, Apr 18, 2016 at 11:38 AM, Davis, Daniel (NIH/NLM) [C] <
> > daniel.da...@nih.gov> wrote:
> >
> > > One thing I like about SolrCloud is that I don't have to configure
> > > Master/Slave replication in each "core" the same way to get them to
> > > replicate.
> > >
> > > The other thing I like about SolrCloud, which is largely theoretical at
> > > this point, is that I don't need to test changes to a collection's
> > > configuration by bringing up a whole new solr on a whole new server -
> > > SolrCloud already virtualizes this, and so I can make up a random
> > > collection name that doesn't conflict, and create the thing, and smoke
> test
> > > with it.   I know that standard practice is to bring up all new nodes,
> but
> > > I don't see why this is needed.
> > >
> > > -Original Message-
> > > From: John Bickerstaff [mailto:j...@johnbickerstaff.com]
> > > Sent: Monday, April 18, 2016 1:23 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Verifying - SOLR Cloud replaces load balancer?
> > >
> > > So - my IT guy makes the case that we don't really need Zookeeper /
> Solr
> > > Cloud...
> > >
> > > He may be right - we're serving static data (changes to the collection
> > > occur only 2 or 3 times a year and are minor)
> > >
> > > We probably could have 3 or 4 Solr nodes running in non-Cloud mode --
> each
> > > configured the same way, behind a load balancer and do fine.
> > >
> > > I've got a Kafka server set up with the solr docs as topics.  It takes
> > > about 10 minutes to reload a "blank" Solr Server from the Kafka
> topic...
> > > If I target 3-4 SOLR servers from my microservice instead of one, it
> > > wouldn't take much longer than 10 minutes to concurrently reload all 3
> or 4
> > > Solr servers from scratch...
> > >
> > > I'm biased in terms of using the most recent functionality, but I'm
> aware
> > > that bias is not necessarily based on facts and want to do my due
> > > diligence...
> > >
> > > Aside from the obvious benefits of spreading work across nodes (which
> may
> > > not be a big deal in our application and which my IT guy proposes is
> more
> > > transparently handled with a load balancer he understands) are there
> any
> > > other considerations that would drive a choice for Solr Cloud
> (zookeeper
> > > etc)?
> > >
> > >
> > >
> > > On Mon, Apr 18, 2016 at 9:26 AM, T

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread John Bickerstaff

Nice - thanks Daniel.

On Mon, Apr 18, 2016 at 11:38 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> One thing I like about SolrCloud is that I don't have to configure
> Master/Slave replication in each "core" the same way to get them to
> replicate.
>
> The other thing I like about SolrCloud, which is largely theoretical at
> this point, is that I don't need to test changes to a collection's
> configuration by bringing up a whole new solr on a whole new server -
> SolrCloud already virtualizes this, and so I can make up a random
> collection name that doesn't conflict, and create the thing, and smoke test
> with it.   I know that standard practice is to bring up all new nodes, but
> I don't see why this is needed.
>
> -Original Message-
> From: John Bickerstaff [mailto:j...@johnbickerstaff.com]
> Sent: Monday, April 18, 2016 1:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Verifying - SOLR Cloud replaces load balancer?
>
> So - my IT guy makes the case that we don't really need Zookeeper / Solr
> Cloud...
>
> He may be right - we're serving static data (changes to the collection
> occur only 2 or 3 times a year and are minor)
>
> We probably could have 3 or 4 Solr nodes running in non-Cloud mode -- each
> configured the same way, behind a load balancer and do fine.
>
> I've got a Kafka server set up with the solr docs as topics.  It takes
> about 10 minutes to reload a "blank" Solr Server from the Kafka topic...
> If I target 3-4 SOLR servers from my microservice instead of one, it
> wouldn't take much longer than 10 minutes to concurrently reload all 3 or 4
> Solr servers from scratch...
>
> I'm biased in terms of using the most recent functionality, but I'm aware
> that bias is not necessarily based on facts and want to do my due
> diligence...
>
> Aside from the obvious benefits of spreading work across nodes (which may
> not be a big deal in our application and which my IT guy proposes is more
> transparently handled with a load balancer he understands) are there any
> other considerations that would drive a choice for Solr Cloud (zookeeper
> etc)?
>
>
>
> On Mon, Apr 18, 2016 at 9:26 AM, Tom Evans <tevans...@googlemail.com>
> wrote:
>
> > On Mon, Apr 18, 2016 at 3:52 PM, John Bickerstaff
> > <j...@johnbickerstaff.com> wrote:
> > > Thanks all - very helpful.
> > >
> > > @Shawn - your reply implies that even if I'm hitting the URL for a
> > > single endpoint via HTTP - the "balancing" will still occur across
> > > the Solr
> > Cloud
> > > (I understand the caveat about that single endpoint being a
> > > potential
> > point
> > > of failure).  I just want to verify that I'm interpreting your
> > > response correctly...
> > >
> > > (I have been asked to provide IT with a comprehensive list of
> > > options
> > prior
> > > to a design discussion - which is why I'm trying to get clear about
> > > the various options)
> > >
> > > In a nutshell, I think I understand the following:
> > >
> > > a. Even if hitting a single URL, the Solr Cloud will "balance"
> > > across all available nodes for searching
> > >   Caveat: That single URL represents a potential single
> > > point of failure and this should be taken into account
> > >
> > > b. SolrJ's CloudSolrClient API provides the ability to distribute
> > > load -- based on Zookeeper's "knowledge" of all available Solr
> instances.
> > >   Note: This is more robust than "a" due to the fact that it
> > > eliminates the "single point of failure"
> > >
> > > c.  Use of a load balancer hitting all known Solr instances will be
> > > fine
> > -
> > > although the search requests may not run on the Solr instance the
> > > load balancer targeted - due to "a" above.
> > >
> > > Corrections or refinements welcomed...
> >
> > With option a), although queries will be distributed across the
> > cluster, all queries will be going through that single node. Not only
> > is that a single point of failure, but you risk saturating the
> > inter-node network traffic, possibly resulting in lower QPS and higher
> > latency on your queries.
> >
> > With option b), as well as SolrJ, recent versions of pysolr have a
> > ZK-aware SolrCloud client that behaves in a similar way.
> >
> > With option c), you can use the preferLocalShards so that shards that
> > are local to the queried node are used in preference to distributed
> > shards. Depending on your shard/cluster topology, this can increase
> > performance if you are returning large amounts of data - many or large
> > fields or many documents.
> >
> > Cheers
> >
> > Tom
> >
>

< 1 2 3 >

101 - 200 of 248 matches

Mail list logo