How to escape OR or any other keyword in solr

2018-03-26 Thread RAUNAK AGRAWAL
I have to search for state "OR" [short form for Oregon]. When I am making
query state:OR, I am getting SolrException since it is recognising it as
keyword.

Now I tried with quotes ("") or //OR as well and when doing so..Solr
doesn't give exception but it also doesn't return any matching document.

Kindly let me know what is the workaround for this issue?

Thanks


Re: edit gc parameters in solr.in.sh or solr?

2018-03-26 Thread Bernd Fehling
Hi Walter,

may I give you the advise to _NOT_ set XX:G1HeapRegionSize.
That is computed during JAVA start by the engine according to heap and 
available memory.
A wrong set size can even a huge machine with 31GB heap and 157GB RAM force 
into OOM.
Guess how I figured that out, took me about one week to locate it.

Regards
Bernd

Am 26.03.2018 um 17:08 schrieb Walter Underwood:
> We use the G1 collector in Java 8u131 and it works well. We are running 
> 6.6.2. Our Solr instances do a LOT of allocation. We have long queries (25 
> terms average) and many unique queries.
> 
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Mar 26, 2018, at 1:22 AM, Derek Poh  wrote:
>>
>> Hi
>>
>> From your experience, would like to know if It is advisable to change the gc 
>> parameters in solr.in.sh or solrfile?
>> It is mentioned in the documentation to edit solr.in.sh but would like 
>> toknow which file you actually edit.
>>
>> I am using Solr 6.6.2at the moment.
>>
>> Regards,
>> Derek
>>
>>


Re: Score different for different documents containing same value

2018-03-26 Thread Erick Erickson
add debug=true to the query and you'll see exactly how the scores are
calculated, that should give you a clue as to what's going on.

In particular look at the parsed query and be sure that your query is
parsed as you expect. It should be given you you specify the query,
but as a sanity check.

Is your setup sharded? If so, fire the query at each replica (add
&distrib=false) and see what the scores are.

If this is a very small corpus, a few deleted documents can skew the scores.

Try turning on distributed IDF (assuming your collection is sharded).
The stats on different shards can be different on a small corpus, it's
only when you get into significant numbers of docs that the stats even
out.

Oh, and a side note. To make the return order deterministic, I'd add a
secondary sort on id. It's not your problem at this point, but when
all the sort criteria match, the _internal_ Lucene doc ID is used to
break ties, and that can vary after segments are merged. For future
reference.

Best,
Erick



On Mon, Mar 26, 2018 at 11:39 AM, bbarani  wrote:
> Hi,
>
> I was trying to query a field that has specific term in it and to my
> surprise the score was different for different documents even though the
> field I am searching for contained the same exact terms in all the
> documents.
>
> Any idea when this issue would come up?
>
> *Note:* All the documents contained the value 'iphone brown case' in query_t
> field and I am on SOLR 6.1
>
> *Query:*
> select?q=iphone+brown+case&omitHeader=false&fl=score,query_t,timestamp_tdt&sort=score%20desc&wt=xml&qf=query_t&defType=edismax&mm=100%25&rows=5
>
> 
> 
> true
> 0
> 9
> 
> 100%
> iphone brown case
> edismax
> false
> query_t
> score,query_t,timestamp_tdt
> getSuggestions
> score desc
> 5
> xml
> 1521045725381
> 
> 
> 
> 
> 
> iphone brown case
> 
> 2018-03-26T13:40:14.690Z
> *6.306856*
> 
> 
> 
> iphone brown case
> 
> 2018-03-26T13:40:14.690Z
> *4.8550515*
> 
> 
> 
> iphone brown case
> 
> 2018-03-26T13:40:14.690Z
> *4.8550515*
> 
> 
> 
> iphone brown case
> 
> 2018-03-26T13:40:14.690Z
> *4.8550515*
> 
> 
> 
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: edit gc parameters in solr.in.sh or solr?

2018-03-26 Thread Shawn Heisey

On 3/26/2018 6:41 PM, Derek Poh wrote:
On my installation, "solr.in.sh" is in solr-6.6.2/bin directory. It is 
recommended to place the file in /etc/default?


Regarding the "solrfile", I was referring to the file "solr". Sorry 
for the typo.

The file "solr" is not edited normally?


I've redirected my reply to this private message back to the list.

http://people.apache.org/~hossman/#private_q

If the active solr.in.sh file is not in /etc/default, that means that 
the service installer script was NOT used.  I strongly recommend using 
the service installer script on systems that will support it.


https://lucene.apache.org/solr/guide/6_6/taking-solr-to-production.html

Moving the location of that file manually to /etc/default might not 
actually work.  I have not reviewed the script recently enough to know 
for sure.  It *might* work.  My memory is fuzzy, but checking that 
location might be part of the bin/solr script already.  Whether that 
memory is correct or not, I still recommend running the service 
installer script.


The bin/solr script should not be edited unless you're fixing a bug in 
that script.  A lot of Solr's startup settings can be changed with 
solr.in.sh.  More settings are being added over time as Solr evolves.


Thanks,
Shawn



Re: Solr 4.9 - configs and collections

2018-03-26 Thread Shawn Heisey
On 3/26/2018 8:43 AM, Abhi Basu wrote:
> Running on MS HDInsight and Solr 4.9.What is the BKM for creation, update,
> delete of configurations and collections?

I have no idea what a BKM is.  I will cover the update of configuration
below.

> I do the following:
>
> 1. First I create the zk config:
> sudo zkcli.sh -cmd upconfig -zkhost zknode
> :2181
> -confdir /home/sshuser/ems-collection-49/conf/ -confname ems-collection

Exactly what you've got configured there for the zkhost parameter is
difficult to decipher because it looks like the hsotname got replaced
with a URL by your mail client.  But I think you've only got one ZK
server there.  Usually there are at least three of them.  The command
actually only needs one, but the zkHost string usually has at least
three.  It's generally a good idea to use the same string for zkcli that
you use for Solr itself, so it works even when a server is down.

> 2. Then I create the collection:
> curl '
> http://headnode0:8983/solr/admin/collections?action=CREATE&name=ems-collection&numShards=2&replicationFactor=2&maxShardsPerNode=1
> '
>
> This works the first time. When I change the zk config, do I run the same
> command #1? Also, do I do a reload:

Yes, if you want to change an existing config and then make it active,
you re-upload the config and then reload any affected collection. 
Deleting and recreating the collection is not something you would want
to do unless you plan to completely rebuild it anyway -- deleting the
collection will also delete all the index data.  If that's what you
WANT, then deleting and recreating the collection is a good way to make
it happen.  Many config updates *do* require a reindex, and some changes
will also require completely deleting the index directories before
building it again.

> Very familiar with CDH solrctl commands that make life easier by only
> having one command for this. Any help is appreciated.

If you're using CDH, you'll want to talk to Cloudera for help.  They
customize their Solr install to the point where they're the only ones
who know how to use it properly.

Thanks,
Shawn



Re: Default Index config

2018-03-26 Thread Shawn Heisey
On 3/26/2018 10:45 AM, mganeshs wrote:
> I haven't changed the solr config wrt index config, which means it's all
> commented in the solrconfig.xml.
>
> It's something like what I pasted before. But I would like to know whats the
> default value of each of this.

Default value of *what* exactly?  There are a LOT of possible settings
you could be asking about, it'll take too much time to research them all
and give you every possible default.

I can't tell what you're referring to when you mention "what I pasted
before."  Later in your message I do see a mostly empty  tag.

If you're after defaults for indexConfig, the comments in Solr's example
config do a pretty good job of covering defaults.  I think that each
commented config section *has* the default setting already in it.  In a
download of the latest version, look for
server/solr/configsets/_default/conf/solrconfig.xml.

> Coz.. after loading to 6.5.1 and our document size also crossed 5GB in each
> of our collection. Now update of document is taking time. So would like to
> know whether we need to change any default configurations.

5GB for a *single* document?  That sounds like a document that probably
shouldn't get indexed, because it might cause all sorts of problems.

If 5GB is the size of the entire index, then I can say that while this
is not exactly a *small* index, there are a LOT of indexes in the wild
that are MUCH bigger.  I have such indexes in my Solr installs, and
while the performance of those indexes isn't lightning fast, it's pretty
good.

As your index size increases, Solr is going to slow down.  That's just
how things work and cannot be changed.  If the slowdown is extreme for a
small increase in size, that usually means that the system doesn't have
enough of some resource, usually memory.  Additional hardware or
hardware upgrades may be required.

If you describe what you are seeing, and how that differs from what you
EXPECT to be seeing, then we might be able to get somewhere.  More
information may be required to help.  If that's the case, we will ask
for additional information.

Thanks,
Shawn



Re: Solr 7 or 6 - stability and performance

2018-03-26 Thread Greg Roodt
We've been running 7.2.1 at work for a while now and it's been running very
well. We saw some performance improvements, but I would attribute most of
that to the newer instance types we used in the upgrade. Didn't see any
major performance regressions for our workload.

A couple of things to think about:
* You might want to wait for 7.3 since it is at RC now and there is an
annoying bug that is fixed when reloading configuration doesn't propagate
to PULL replicas. May or may not matter to you if you stay with NRT
replicas.
* If you are concerned with performance, try running some prod queries
against a test cluster. I think you'll be fine and probably should think
about any relevance changes that 7 introduces. There are things like "split
on whitespace" etc that are behavioural changes that may or may not matter
for you. For what it's worth, we kept the defaults, upgraded our analysis
chains to undeprecated versions and didn't have any problems.







On Tue, 27 Mar 2018 at 03:17, Walter Underwood 
wrote:

> If you are running 6.4.1, you will see a big speedup when going to a later
> version. The metrics code caused a serious performance problem.
>
> https://issues.apache.org/jira/browse/SOLR-10130
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Mar 25, 2018, at 8:55 PM, S G  wrote:
> >
> > Hi,
> >
> > Solr 7 has been out for about 6 months now. (Sep-2017 to Mar-2018)
> > We are planning some major upgrades from 6.2 and 6.4 versions of Solr
> and I
> > wanted to see how is Solr 7 looking in terms of stability and
> performance.
> > (Have seen http://lucene.apache.org/solr/news.html but some real users'
> > experience would be nice)
> >
> > 1) Has anyone encountered major stability issues that made them move back
> > to 6.x version?
> >
> > 2) Did anyone see more than 10% change in performance (good or bad)? I
> know
> > about https://issues.apache.org/jira/browse/SOLR-11078 and wish trie
> fields
> > were still kept in schema until point fields completely got over the
> > performance issue.
> >
> > Thanks
> > SG
>
>


Re: Why are cursor mark queries recommended over regular start, rows combination?

2018-03-26 Thread Webster Homer
Shawn,
Thanks. It's been a while now, but we did find issues with both cursorMark
AND start/rows. the effect was much more obvious with cursorMark.
We were able to address this by switching to use TLOG replicas. These give
consistent results. It's nice to know that the cursorMark problems were
related to relevancy retrieval order.

We found one major drawback with TLOG replicas, and that was that CDCR was
broken for TLOG replicas. There is a Jira on this, and it is being
addressed. NRT may have a use case, but I think that reproducible correct
results should trump performance everytime. We use Solr as a search engine,
we almost always want to retrieve results in order of relevancy.

I think that we will phase out the use of NRT replicas in favor of TLOG
replicas

On Fri, Mar 23, 2018 at 7:04 PM, Shawn Heisey  wrote:

> On 3/23/2018 3:47 PM, Webster Homer wrote:
> > Just FYI I had a project recently where I tried to use cursorMark in
> > Solrcloud and solr 7.2.0 and it was very unreliable. It couldn't even
> > return consistent numberFound values. I posted about it in this forum.
> > Using the start and rows arguments in SolrQuery did work reliably so I
> > abandoned cursorMark as just too buggy
> >
> > I had originally wanted to try using streaming expressions, but they
> don't
> > return results ordered by relevancy, a major limitation for a search
> > engine, in my opinion.
>
> The problems that can affect cursorMark are also problems when using
> start/rows pagination.
>
> You've mentioned relevancy ordering, so I think this is what you're
> running into:
>
> Trying to use relevancy ranking on SolrCloud with NRT replicas can break
> pagination.  The problem happens both with cursorMark and start/rows.
> NRT replicas in a SolrCloud index can have different numbers of deleted
> documents.  Even though deleted documents do not appear in search
> results, they ARE still part of the index, and can affect scoring.
> Since SolrCloud load balances requests across replicas, page 1 may use
> different replicas than page 2, and end up with different scoring, which
> can affect the order of results and change which page number they end up
> on.  Using TLOG or PULL replicas (available since 7.0) usually fixes
> that problem, because different replicas are 100% identical with those
> replica types.
>
> Changing the index in the middle of trying to page through results can
> also cause issues with pagination.
>
> Thanks,
> Shawn
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Score different for different documents containing same value

2018-03-26 Thread bbarani
Hi,

I was trying to query a field that has specific term in it and to my
surprise the score was different for different documents even though the
field I am searching for contained the same exact terms in all the
documents. 

Any idea when this issue would come up?

*Note:* All the documents contained the value 'iphone brown case' in query_t
field and I am on SOLR 6.1

*Query:*
select?q=iphone+brown+case&omitHeader=false&fl=score,query_t,timestamp_tdt&sort=score%20desc&wt=xml&qf=query_t&defType=edismax&mm=100%25&rows=5



true
0
9

100%
iphone brown case
edismax
false
query_t
score,query_t,timestamp_tdt
getSuggestions
score desc
5
xml
1521045725381





iphone brown case

2018-03-26T13:40:14.690Z
*6.306856*



iphone brown case

2018-03-26T13:40:14.690Z
*4.8550515*



iphone brown case

2018-03-26T13:40:14.690Z
*4.8550515*



iphone brown case

2018-03-26T13:40:14.690Z
*4.8550515*






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr on HDInsight to write to Active Data Lake

2018-03-26 Thread Abhi Basu
Yes, I copied the jars to all nodes and restarted Solr service.




04212org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'ems-collection-700_shard1_replica2': Unable to create
core: ems-collection-700_shard1_replica2 Caused by: Class
org.apache.hadoop.fs.adl.HdiAdlFileSystem not
foundorg.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'ems-collection-700_shard2_replica1': Unable to create
core: ems-collection-700_shard2_replica1 Caused by:
org.apache.hadoop.fs.FileSystem: Provider
org.apache.hadoop.fs.azure.NativeAzureFileSystem not a
subtypeorg.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'ems-collection-700_shard2_replica2': Unable to create
core: ems-collection-700_shard2_replica2 Caused by: Class
org.apache.hadoop.fs.adl.HdiAdlFileSystem not
foundorg.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'ems-collection-700_shard1_replica1': Unable to create
core: ems-collection-700_shard1_replica1 Caused by:
org.apache.hadoop.fs.FileSystem: Provider
org.apache.hadoop.fs.azure.NativeAzureFileSystem not a subtype


Here is an excerpt from the logs:

ERROR - 2018-03-26 18:09:45.033; org.apache.solr.core.CoreContainer; Unable
to create core: ems-collection-700_shard2_replica1
org.apache.solr.common.SolrException: org.apache.hadoop.fs.FileSystem:
Provider org.apache.hadoop.fs.azure.NativeAzureFileSystem not a subtype
at org.apache.solr.core.SolrCore.(SolrCore.java:868)
at org.apache.solr.core.SolrCore.(SolrCore.java:643)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:556)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:569)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:198)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:187)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.ServiceConfigurationError:
org.apache.hadoop.fs.FileSystem: Provider
org.apache.hadoop.fs.azure.NativeAzureFileSystem not a subtype





On Mon, Mar 26, 2018 at 11:28 AM, Erick Erickson 
wrote:

> Several things:
>
> 1> I often start with an absolute path, knowing the exact relative
> path from where Solr starts can be confusing. If you've pathed
> properly and the jar file is in the path, it'll be found.
>
> 2> Are you sure HdiAdlFileSystem is in one of the jars?
>
> 3> did you restart the JVM?
>
> Best,
> Erick
>
> On Mon, Mar 26, 2018 at 6:49 AM, Abhi Bas

Re: solrj question

2018-03-26 Thread Shawn Heisey
On 3/26/2018 11:19 AM, Webster Homer wrote:
> You may say that the String in the constructor is "meant to be query
> syntax", nothing in the Javadoc says anything about the expected syntax.
> Since there is also a method to set the query, it seemed reasonable to
> expect that it would take the output of the toString method. (or some other
> serialization method)

You're right that the javadoc is not very specific.  It says this:

Parameters:
    q - query string

In general in Solr, "query string" is understood to be something you
would put in the "q" parameter when you send a query.  Or maybe the "fq"
parameter.  The javadoc could definitely be improved.

The javadoc for the toString specifically used here is a little more
specific.  (SolrQuery inherits from SolrParams, and that's where the
toString method is defined):

https://lucene.apache.org/solr/6_6_0/solr-solrj/org/apache/solr/common/params/SolrParams.html#toString--

It says "so that the URL may be unambiguously pasted back into a browser."

> So how would a user play back logged queries? This seems like an important
> use case. I can parse the toString output, It seems like the constructor
> should be able to take it.
> If not a constructor and toString, methods, I don't see methods to
> serialize and deserialize the query
> Being able to write the complete query to a log is important, but we also
> want to be able to read the log and submit the query to solr. Being able to
> playback the logs allows us to  trouble shoot search issues on our site. It
> also provides a way to create load tests.
>
> Yes I can and am going to create this functionality, it's not that
> complicated, but I don't think it's unreasonable to think that the existing
> API should handle it.

Yes, that would be great capability to have.  But it hasn't been written
yet.  A method like "parseUrlString" on SolrQuery would be a good thing
to have.

Thanks,
Shawn



Re: solrj question

2018-03-26 Thread Webster Homer
You may say that the String in the constructor is "meant to be query
syntax", nothing in the Javadoc says anything about the expected syntax.
Since there is also a method to set the query, it seemed reasonable to
expect that it would take the output of the toString method. (or some other
serialization method)
https://lucene.apache.org/solr/6_6_0/solr-solrj/org/apache/solr/client/solrj/SolrQuery.html#SolrQuery-java.lang.String-

So how would a user play back logged queries? This seems like an important
use case. I can parse the toString output, It seems like the constructor
should be able to take it.
If not a constructor and toString, methods, I don't see methods to
serialize and deserialize the query
Being able to write the complete query to a log is important, but we also
want to be able to read the log and submit the query to solr. Being able to
playback the logs allows us to  trouble shoot search issues on our site. It
also provides a way to create load tests.

Yes I can and am going to create this functionality, it's not that
complicated, but I don't think it's unreasonable to think that the existing
API should handle it.

Thanks,


On Fri, Mar 23, 2018 at 6:44 PM, Shawn Heisey  wrote:

> On 3/23/2018 3:24 PM, Webster Homer wrote:
> > I see this in the output:
> > Lexical error at line 1, column 1759.  Encountered:  after :
> > "/select?defType=edismax&start=0&rows=25&...
> > It has basically the entire solr query which it obviously couldn't parse.
> >
> > solrQuery = new SolrQuery(log.getQuery());
>
> This isn't going to work.  The string in the constructor is expected to
> be query syntax -- so something like this:
>
> company:Google AND (city:"San Jose" OR state:WA)
>
> It has no idea what to do with a URL path and parameters.
>
> > Is there something I'm doing wrong, or is it that the SolrQuery class
> > cannot really take its toString output to make itself? Does it have a
> > different serialization method that could be used?
>
> I don't think there's any expectation that an object's toString() output
> can be used as input for anything.  This is the javadoc for
> Object.toString():
>
> https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#toString--
>
> The emphasis there is on human readability.  It is not intended for
> deserialization.  You *could* be looking at a toString() output like
> SolrQuery@1d44bcfa instead of something you can actually read.
>
> For the incomplete string shown in the error message you mentioned, you
> could do:
>
> SolrQuery q = new SolrQuery();
> q.setRequestHandler("/select");
> // The default handler is /select, so
> // the above is actually not necessary.
>
> q.set("defType", "edismax");
> q.set("start", "0");
> q.set("rows","25");
> // sugar method: q.setStart(0);
> // sugar method: q.setRows(25);
>
> Thanks,
> Shawn
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Default Index config

2018-03-26 Thread mganeshs
Hi,

I haven't changed the solr config wrt index config, which means it's all
commented in the solrconfig.xml.

It's something like what I pasted before. But I would like to know whats the
default value of each of this. 

Coz.. after loading to 6.5.1 and our document size also crossed 5GB in each
of our collection. Now update of document is taking time. So would like to
know whether we need to change any default configurations.




















${solr.lock.type:native}













  




Advice...



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: querying vs. highlighting: complete freedom?

2018-03-26 Thread Erick Erickson
Arturas:

Thanks for the "atta boy's", but I have to confess I poked a
developer's list and the person (David Smiley) who, you know, like
understands the highlighting code replied, and I passed it on ;

I have great respect for the SO forum, but don't post to it since
there's only so much time in a day, so please feel free to put that
explanation over there.

As for the rest, I'll have to pass today, the aforementioned time
constraints are calling

Best,
Erick

On Mon, Mar 26, 2018 at 12:12 AM, Arturas Mazeika  wrote:
> Hi Erick,
>
> Adding a field-qualify to the hl.q parameter solved the issue. My
> excitement is steaming over the roof! What a thorough answer: the
> explanation about the behavior of solr, how it tries to interpret what I
> mean when I supply a keyword without the field-qualifier. Very impressive.
> Would you care (re)posting this answer to stackoverflow? If that is too
> much of a hassle, I'll do this in a couple of days myself on your behalf.
>
> I am impressed how well, thorough, fast and fully the question was answered.
>
> Steven hint pushed me into this direction further: he suggested to use the
> query part of solr to filter and sort out the relevant answers in the 1st
> step and in the 2nd step he'd highlight all the keywords using CTR+F (in
> the browser or some alternative viewer). This brought be to the next
> question:
>
> How can one match query terms with the analyze-chained documents in an
> efficient and distributed manner? My current understanding how to achieve
> this is the following:
>
> 1. Get the list of ids (contents) of the documents that match the query
> 2. Use the http://localhost:8983/solr/#/trans/analysis to re-analyze the
> document and the query
> 3. Use the matching of the substrings from the original text to last
> filter/tokenizer/analyzer in the analyze-chain to map the terms of the query
> 4. Emulate CTRL+F highlighting
>
> Web Interface of Solr offers quite a bit to advance towards this goal. If
> one fires this request:
>
> * analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955) was a
> German-born theoretical physicist[5] who developed the theory of
> relativity, one of the two pillars of modern physics (alongside quantum
> mechanics).&
> * analysis.query=reletivity theory
>
> to one of the cores of solr, one gets the steps 1-3 done:
>
> http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml&analysis.showmatch=true&analysis.fieldvalue=Albert%20Einstein%20(14%20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics%20(alongside%20quantum%20mechanics).&analysis.query=reletivity%20theory&analysis.fieldtype=text_en
>
> Questions:
>
> 1. Is there a way to "load-balance" this? In the above url, I need to
> specify a specific core. Is it possible to generalize it, so the core that
> receives the request is not necessarily the one that processes it? Or this
> already is distributed in a sense that receiving core and processing cores
> are never the same?
>
> 2. The document was already analyze-chained. Is is possible to store this
> information so one does not need to re-analyze-chain it once more?
>
> Cheers
> Arturas
>
> On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson 
> wrote:
>
>> Arturas:
>>
>> Try to field-qualify your hl.q parameter. That looks like:
>>
>> hl.q=trans:Kundigung
>> or
>> hl.q=trans:Kündigung
>>
>> I saw the exact behavior you describe when I did _not_ specify the
>> field in the hl.q parameter, i.e.
>>
>> hl.q=Kundigung
>> or
>> hl.q=Kündigung
>>
>> didn't show all highlights.
>>
>> But when I did specify the field, it worked.
>>
>> Here's what I think is happening: Solr uses the default search
>> field when parsing an un-field-qualified query. I.e.
>>
>> q=something
>>
>> is parsed as
>>
>> q=default_search_field:something.
>>
>> The default field is controlled in solrconfig.xml with the "df"
>> parameter, you'll see entries like:
>> my_field
>>
>> Also when I changed the "df" parameter to the field I was highlighting
>> on, I didn't need to specify the field on the hl.q parameter.
>>
>> hl.q=Kundigung
>> or
>> hl.q=Kündigung
>>
>> The default  field is usually "text", which knows nothing about
>> the German-specific filters you've applied unless you changed it.
>>
>> So in the absence of a field-qualification for the hl.q parameter Solr
>> was parsing the query according to the analysis chain specifed
>> in your default field, and probably passed ü through without
>> transforming it. Since your indexing analysis chain for that field
>> folded ü to just plain u, it wasn't found or highlighted.
>>
>> On the surface, this does seem like something that should be
>> changed, I'll go ahead and ping the dev list.
>>
>> NOTE: I was trying this on Solr 7.1
>>
>> Best,
>> Erick
>>
>> On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika 
>> wrote:
>> > Hi Erick,
>

Re: Solr on HDInsight to write to Active Data Lake

2018-03-26 Thread Erick Erickson
Several things:

1> I often start with an absolute path, knowing the exact relative
path from where Solr starts can be confusing. If you've pathed
properly and the jar file is in the path, it'll be found.

2> Are you sure HdiAdlFileSystem is in one of the jars?

3> did you restart the JVM?

Best,
Erick

On Mon, Mar 26, 2018 at 6:49 AM, Abhi Basu <9000r...@gmail.com> wrote:
> Adding this to solrconfig.xml did not work. I put all the azure and hadoop
> jars in the ext folder.
>
> 
>
> Caused by: Class org.apache.hadoop.fs.adl.HdiAdlFileSystem not found
>
> Thanks,
>
> Abhi
>
> On Fri, Mar 23, 2018 at 7:40 PM, Abhi Basu <9000r...@gmail.com> wrote:
>
>> I'll try it out.
>>
>> Thanks
>>
>> Abhi
>>
>> On Fri, Mar 23, 2018, 6:22 PM Rick Leir  wrote:
>>
>>> Abhi
>>> Check your lib directives.
>>> https://lucene.apache.org/solr/guide/6_6/lib-directives-
>>> in-solrconfig.html#lib-directives-in-solrconfig
>>>
>>> I suspect your jars are not in a lib dir mentioned in solrconfig.xml
>>> Cheers -- Rick
>>>
>>> On March 23, 2018 11:12:17 AM EDT, Abhi Basu <9000r...@gmail.com> wrote:
>>> >MS Azure does not support Solr 4.9 on HDI, so I am posting here. I
>>> >would
>>> >like to write index collection data to HDFS (hosted on ADL).
>>> >
>>> >Note: I am able to get to ADL from hadoop fs command like, so hadoop is
>>> >configured correctly to get to ADL:
>>> >hadoop fs -ls adl://
>>> >
>>> >This is what I have done so far:
>>> >1. Copied all required jars to sol ext lib folder:
>>> >sudo cp -f /usr/hdp/current/hadoop-client/*.jar
>>> >/usr/hdp/current/solr/example/lib/ext
>>> >sudo cp -f /usr/hdp/current/hadoop-client/lib/*.jar
>>> >/usr/hdp/current/solr/example/lib/ext
>>> >sudo cp -f /usr/hdp/current/hadoop-hdfs-client/*.jar
>>> >/usr/hdp/current/solr/example/lib/ext
>>> >sudo cp -f /usr/hdp/current/hadoop-hdfs-client/lib/*.jar
>>> >/usr/hdp/current/solr/example/lib/ext
>>> >sudo cp -f
>>> >/usr/hdp/current/storm-client/contrib/storm-hbase/storm-hbase*.jar
>>> >/usr/hdp/current/solr/example/lib/ext
>>> >sudo cp -f /usr/hdp/current/phoenix-client/lib/phoenix*.jar
>>> >/usr/hdp/current/solr/example/lib/ext
>>> >sudo cp -f /usr/hdp/current/hbase-client/lib/hbase*.jar
>>> >/usr/hdp/current/solr/example/lib/ext
>>> >
>>> >This includes the Azure active data lake jars also.
>>> >
>>> >2. Edited my solr-config.xml file for my collection:
>>> >
>>> >${solr.core.name}/data/
>>> >
>>> >>> >class="solr.HdfsDirectoryFactory">
>>> >>> >name="solr.hdfs.home">adl://esodevdleus2.azuredatalakestore.net/
>>> clusters/esohadoopdeveus2/solr/
>>> >  /usr/hdp/2.6.2.25-1/hadoop/conf
>>> >>> >name="solr.hdfs.blockcache.global">${solr.hdfs.
>>> blockcache.global:true}
>>> >  true
>>> >  1
>>> > true
>>> >  16384
>>> >  true
>>> >  true
>>> >  16
>>> >
>>> >
>>> >
>>> >When this collection is deployed to solr, I see this error message:
>>> >
>>> >
>>> >
>>> >0
>>> >2189
>>> >
>>> >org.apache.solr.client.solrj.impl.HttpSolrServer$
>>> RemoteSolrException:Error
>>> >CREATEing SolrCore 'ems-collection_shard2_replica2':
>>> >Unable to create core: ems-collection_shard2_replica2 Caused by: Class
>>> >org.apache.hadoop.fs.adl.HdiAdlFileSystem not
>>> >foundorg.apache.solr.client.solrj.impl.HttpSolrServer$
>>> RemoteSolrException:Error
>>> >CREATEing SolrCore 'ems-collection_shard2_replica1': Unable to create
>>> >core: ems-collection_shard2_replica1 Caused by: Class
>>> >org.apache.hadoop.fs.adl.HdiAdlFileSystem not
>>> >foundorg.apache.solr.client.solrj.impl.HttpSolrServer$
>>> RemoteSolrException:Error
>>> >CREATEing SolrCore 'ems-collection_shard1_replica1': Unable to create
>>> >core: ems-collection_shard1_replica1 Caused by: Class
>>> >org.apache.hadoop.fs.adl.HdiAdlFileSystem not
>>> >foundorg.apache.solr.client.solrj.impl.HttpSolrServer$
>>> RemoteSolrException:Error
>>> >CREATEing SolrCore 'ems-collection_shard1_replica2': Unable to create
>>> >core: ems-collection_shard1_replica2 Caused by: Class
>>> >org.apache.hadoop.fs.adl.HdiAdlFileSystem not found
>>> >
>>> >
>>> >
>>> >
>>> >Has anyone done this and can help me out?
>>> >
>>> >Thanks,
>>> >
>>> >Abhi
>>> >
>>> >
>>> >--
>>> >Abhi Basu
>>>
>>> --
>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>>
>>
>
>
> --
> Abhi Basu


Re: Solr 4.9 - configs and collections

2018-03-26 Thread Erick Erickson
Yes, use the same command to upload a config. Yes, you need to reload
a collection.

Lots of this functionality is in the newer bin/solr scripts, 4.9 is
3.5 years old.

Best,
Erick

On Mon, Mar 26, 2018 at 7:43 AM, Abhi Basu <9000r...@gmail.com> wrote:
> Running on MS HDInsight and Solr 4.9.What is the BKM for creation, update,
> delete of configurations and collections?
>
> I do the following:
>
> 1. First I create the zk config:
> sudo zkcli.sh -cmd upconfig -zkhost zknode
> :2181
> -confdir /home/sshuser/ems-collection-49/conf/ -confname ems-collection
>
> 2. Then I create the collection:
> curl '
> http://headnode0:8983/solr/admin/collections?action=CREATE&name=ems-collection&numShards=2&replicationFactor=2&maxShardsPerNode=1
> '
>
> This works the first time. When I change the zk config, do I run the same
> command #1? Also, do I do a reload:
>
> curl '
> http://headnode0:8983/solr/admin/collections?action=RELOAD&name=ems-collection
> '
>
> Or, do I need to delete and recreate the collection?
>
>
> Very familiar with CDH solrctl commands that make life easier by only
> having one command for this. Any help is appreciated.
>
> Thanks,
>
> Abhi
>
> --
> Abhi Basu


Re: Solr 7 or 6 - stability and performance

2018-03-26 Thread Walter Underwood
If you are running 6.4.1, you will see a big speedup when going to a later 
version. The metrics code caused a serious performance problem.

https://issues.apache.org/jira/browse/SOLR-10130

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 25, 2018, at 8:55 PM, S G  wrote:
> 
> Hi,
> 
> Solr 7 has been out for about 6 months now. (Sep-2017 to Mar-2018)
> We are planning some major upgrades from 6.2 and 6.4 versions of Solr and I
> wanted to see how is Solr 7 looking in terms of stability and performance.
> (Have seen http://lucene.apache.org/solr/news.html but some real users'
> experience would be nice)
> 
> 1) Has anyone encountered major stability issues that made them move back
> to 6.x version?
> 
> 2) Did anyone see more than 10% change in performance (good or bad)? I know
> about https://issues.apache.org/jira/browse/SOLR-11078 and wish trie fields
> were still kept in schema until point fields completely got over the
> performance issue.
> 
> Thanks
> SG



Re: Phrase search with Solr 7.2

2018-03-26 Thread Steven White
Please ignore this.  It was a user error.  I was pointing to the wrong
analyzer in my app's cfg file.

Steve

On Mon, Mar 26, 2018 at 10:17 AM, Steven White  wrote:

> Setting "sow=true" didn't make a difference.
>
> Here is what I'm using now: http://localhost:8983/
> solr/ccfts/select_test?q=%22record%20type%20session%22&
> wt=json&indent=true&sow=true&debugQuery=true
>
> And here is the output:
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"\"record type session\"",
>   "indent":"true",
>   "sow":"true",
>   "wt":"json",
>   "debugQuery":"true"}},
>   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
>   },
>   "debug":{
> "rawquerystring":"\"record type session\"",
> "querystring":"\"record type session\"",
> "parsedquery":"+DisjunctionMaxQuery((CC_ALL_FIELDS_DATA:\"record type 
> session\")~1.0)",
> "parsedquery_toString":"+(CC_ALL_FIELDS_DATA:\"record type 
> session\")~1.0",
> "explain":{},
> "QParser":"ExtendedDismaxQParser",
> "altquerystring":null,
> "boost_queries":null,
> "parsed_boost_queries":[],
> "boostfuncs":null,
> "timing":{
>   "time":1.0,
>   "prepare":{
> "time":1.0,
> "query":{
>   "time":0.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "terms":{
>   "time":0.0},
> "debug":{
>   "time":0.0}},
>   "process":{
> "time":0.0,
> "query":{
>   "time":0.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "terms":{
>   "time":0.0},
> "debug":{
>   "time":0.0}
>
>
> How do I debug this?
>
> Steve
>
> On Mon, Mar 26, 2018 at 12:50 AM, Mikhail Khludnev 
> wrote:
>
>> Hello, Steven.
>>
>> Have you tried sow=true?
>> see
>> https://lucene.apache.org/solr/guide/7_2/the-extended-dismax
>> -query-parser.html
>>
>>
>> Anyway, you can start from debugQuery=true, then try to explore
>> explainOther, and get to Analysis page after all.
>>
>> On Mon, Mar 26, 2018 at 3:10 AM, Steven White 
>> wrote:
>>
>> > Hi everyone,
>> >
>> > I switched over from Solr 5.2.1 to 7.2.1 other than re-indexing my data
>> and
>> > schema design remain the same.
>> >
>> > The issue I see now is I'm getting 0 hits on phrase searches, why?
>> >
>> > Here is the query I'm sending that gives me 0 hits:
>> >
>> > http://localhost:8983/solr/ccfts/select_test?q=%22cat+
>> > dog%22&wt=json&indent=true
>> >
>> > But this query will give me hits:
>> >
>> > http://localhost:8983/solr/ccfts/select_test?q=cat+dog&wt=
>> json&indent=true
>> >
>> > Here is my schema:
>> >
>> > > > positionIncrementGap="100" autoGeneratePhraseQueries="true">
>> >
>> >   
>> >   > > synonyms="synonyms.txt" ignoreCase="true"/>
>> >   > > generateNumberParts="1" splitOnCaseChange="0" catenateWords="1"
>> > splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1"
>> > catenateAll="1" catenateNumbers="1"/>
>> >   > > ignoreCase="true"/>
>> >   
>> >   
>> >   > > protected="protwords.txt"/>
>> >   
>> >   
>> >
>> > 
>> >
>> > Here are my fields:
>> >
>> > > > required="false" stored="false"  multiValued="true" />
>> > >  indexed="true"
>> > required="true"  stored="true"   multiValued="false" />
>> > > > required="false" stored="false"  multiValued="true" />
>> > >  indexed="true"
>> > required="true"  stored="false"  multiValued="false" docValues="true" />
>> > >  indexed="true"
>> > required="true"  stored="false"  multiValued="false" docValues="true" />
>> > >  indexed="true"
>> > required="true"  stored="false"  multiValued="false" docValues="true" />
>> > >  indexed="true"
>> > required="true"  stored="true"   multiValued="false" docValues="true" />
>> >
>> > And here is my handler:
>> >
>> > {"requestHandler":{"/select_test":{
>> >   "class":"solr.SearchHandler",
>> >   "name":"/select_test",
>> >   "defaults":{
>> > "defType":"edismax",
>> > "echoParams":"explicit",
>> > "fl":"CC_UNIQUE_FIELD,CC_FILE_PATH,score",
>> > "indent":"true",
>> > "qf":"CC_ALL_FIELDS_DATA",
>> > "rows":"10",
>> > "tie":"1.0",
>> > "wt":"xml"
>> >
>> > What am I dong wrong?
>> >
>> > Steven
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>


Re: edit gc parameters in solr.in.sh or solr?

2018-03-26 Thread Walter Underwood
We use the G1 collector in Java 8u131 and it works well. We are running 6.6.2. 
Our Solr instances do a LOT of allocation. We have long queries (25 terms 
average) and many unique queries.

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 26, 2018, at 1:22 AM, Derek Poh  wrote:
> 
> Hi
> 
> From your experience, would like to know if It is advisable to change the gc 
> parameters in solr.in.sh or solrfile?
> It is mentioned in the documentation to edit solr.in.sh but would like toknow 
> which file you actually edit.
> 
> I am using Solr 6.6.2at the moment.
> 
> Regards,
> Derek
> 
> 
> --
> CONFIDENTIALITY NOTICE 
> This e-mail (including any attachments) may contain confidential and/or 
> privileged information. If you are not the intended recipient or have 
> received this e-mail in error, please inform the sender immediately and 
> delete this e-mail (including any attachments) from your computer, and you 
> must not use, disclose to anyone else or copy this e-mail (including any 
> attachments), whether in whole or in part. 
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.



Solr 4.9 - configs and collections

2018-03-26 Thread Abhi Basu
Running on MS HDInsight and Solr 4.9.What is the BKM for creation, update,
delete of configurations and collections?

I do the following:

1. First I create the zk config:
sudo zkcli.sh -cmd upconfig -zkhost zknode
:2181
-confdir /home/sshuser/ems-collection-49/conf/ -confname ems-collection

2. Then I create the collection:
curl '
http://headnode0:8983/solr/admin/collections?action=CREATE&name=ems-collection&numShards=2&replicationFactor=2&maxShardsPerNode=1
'

This works the first time. When I change the zk config, do I run the same
command #1? Also, do I do a reload:

curl '
http://headnode0:8983/solr/admin/collections?action=RELOAD&name=ems-collection
'

Or, do I need to delete and recreate the collection?


Very familiar with CDH solrctl commands that make life easier by only
having one command for this. Any help is appreciated.

Thanks,

Abhi

-- 
Abhi Basu


Re: edit gc parameters in solr.in.sh or solr?

2018-03-26 Thread Shawn Heisey

On 3/26/2018 2:22 AM, Derek Poh wrote:
From your experience, would like to know if It is advisable to change 
the gc parameters in solr.in.sh or solrfile?
It is mentioned in the documentation to edit solr.in.sh but would like 
toknow which file you actually edit.


You need a GC_TUNE variable in solr.in.sh.  The java commandline 
parameters specified there will replace the standard GC tuning 
parameters.  If recommendations are followed, this file will be found in 
/etc/default, and could have "solr" in the filename replaced with 
something different, specifically the name given to the installed 
service.  On my dev server, it is named "solr6.in.sh".


What is the "solrfile" you have referenced?  I've not heard of this.

Thanks,
Shawn



Re: Phrase search with Solr 7.2

2018-03-26 Thread Steven White
Setting "sow=true" didn't make a difference.

Here is what I'm using now:
http://localhost:8983/solr/ccfts/select_test?q=%22record%20type%20session%22&wt=json&indent=true&sow=true&debugQuery=true

And here is the output:

{
  "responseHeader":{
"status":0,
"QTime":1,
"params":{
  "q":"\"record type session\"",
  "indent":"true",
  "sow":"true",
  "wt":"json",
  "debugQuery":"true"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  },
  "debug":{
"rawquerystring":"\"record type session\"",
"querystring":"\"record type session\"",
"parsedquery":"+DisjunctionMaxQuery((CC_ALL_FIELDS_DATA:\"record
type session\")~1.0)",
"parsedquery_toString":"+(CC_ALL_FIELDS_DATA:\"record type session\")~1.0",
"explain":{},
"QParser":"ExtendedDismaxQParser",
"altquerystring":null,
"boost_queries":null,
"parsed_boost_queries":[],
"boostfuncs":null,
"timing":{
  "time":1.0,
  "prepare":{
"time":1.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":0.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}


How do I debug this?

Steve

On Mon, Mar 26, 2018 at 12:50 AM, Mikhail Khludnev  wrote:

> Hello, Steven.
>
> Have you tried sow=true?
> see
> https://lucene.apache.org/solr/guide/7_2/the-extended-
> dismax-query-parser.html
>
>
> Anyway, you can start from debugQuery=true, then try to explore
> explainOther, and get to Analysis page after all.
>
> On Mon, Mar 26, 2018 at 3:10 AM, Steven White 
> wrote:
>
> > Hi everyone,
> >
> > I switched over from Solr 5.2.1 to 7.2.1 other than re-indexing my data
> and
> > schema design remain the same.
> >
> > The issue I see now is I'm getting 0 hits on phrase searches, why?
> >
> > Here is the query I'm sending that gives me 0 hits:
> >
> > http://localhost:8983/solr/ccfts/select_test?q=%22cat+
> > dog%22&wt=json&indent=true
> >
> > But this query will give me hits:
> >
> > http://localhost:8983/solr/ccfts/select_test?q=cat+dog&;
> wt=json&indent=true
> >
> > Here is my schema:
> >
> >  > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >
> >   
> >> synonyms="synonyms.txt" ignoreCase="true"/>
> >> generateNumberParts="1" splitOnCaseChange="0" catenateWords="1"
> > splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1"
> > catenateAll="1" catenateNumbers="1"/>
> >> ignoreCase="true"/>
> >   
> >   
> >> protected="protwords.txt"/>
> >   
> >   
> >
> > 
> >
> > Here are my fields:
> >
> >  > required="false" stored="false"  multiValued="true" />
> >  > required="true"  stored="true"   multiValued="false" />
> >  > required="false" stored="false"  multiValued="true" />
> >  > required="true"  stored="false"  multiValued="false" docValues="true" />
> >  > required="true"  stored="false"  multiValued="false" docValues="true" />
> >  > required="true"  stored="false"  multiValued="false" docValues="true" />
> >  > required="true"  stored="true"   multiValued="false" docValues="true" />
> >
> > And here is my handler:
> >
> > {"requestHandler":{"/select_test":{
> >   "class":"solr.SearchHandler",
> >   "name":"/select_test",
> >   "defaults":{
> > "defType":"edismax",
> > "echoParams":"explicit",
> > "fl":"CC_UNIQUE_FIELD,CC_FILE_PATH,score",
> > "indent":"true",
> > "qf":"CC_ALL_FIELDS_DATA",
> > "rows":"10",
> > "tie":"1.0",
> > "wt":"xml"
> >
> > What am I dong wrong?
> >
> > Steven
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Solr on HDInsight to write to Active Data Lake

2018-03-26 Thread Abhi Basu
Adding this to solrconfig.xml did not work. I put all the azure and hadoop
jars in the ext folder.



Caused by: Class org.apache.hadoop.fs.adl.HdiAdlFileSystem not found

Thanks,

Abhi

On Fri, Mar 23, 2018 at 7:40 PM, Abhi Basu <9000r...@gmail.com> wrote:

> I'll try it out.
>
> Thanks
>
> Abhi
>
> On Fri, Mar 23, 2018, 6:22 PM Rick Leir  wrote:
>
>> Abhi
>> Check your lib directives.
>> https://lucene.apache.org/solr/guide/6_6/lib-directives-
>> in-solrconfig.html#lib-directives-in-solrconfig
>>
>> I suspect your jars are not in a lib dir mentioned in solrconfig.xml
>> Cheers -- Rick
>>
>> On March 23, 2018 11:12:17 AM EDT, Abhi Basu <9000r...@gmail.com> wrote:
>> >MS Azure does not support Solr 4.9 on HDI, so I am posting here. I
>> >would
>> >like to write index collection data to HDFS (hosted on ADL).
>> >
>> >Note: I am able to get to ADL from hadoop fs command like, so hadoop is
>> >configured correctly to get to ADL:
>> >hadoop fs -ls adl://
>> >
>> >This is what I have done so far:
>> >1. Copied all required jars to sol ext lib folder:
>> >sudo cp -f /usr/hdp/current/hadoop-client/*.jar
>> >/usr/hdp/current/solr/example/lib/ext
>> >sudo cp -f /usr/hdp/current/hadoop-client/lib/*.jar
>> >/usr/hdp/current/solr/example/lib/ext
>> >sudo cp -f /usr/hdp/current/hadoop-hdfs-client/*.jar
>> >/usr/hdp/current/solr/example/lib/ext
>> >sudo cp -f /usr/hdp/current/hadoop-hdfs-client/lib/*.jar
>> >/usr/hdp/current/solr/example/lib/ext
>> >sudo cp -f
>> >/usr/hdp/current/storm-client/contrib/storm-hbase/storm-hbase*.jar
>> >/usr/hdp/current/solr/example/lib/ext
>> >sudo cp -f /usr/hdp/current/phoenix-client/lib/phoenix*.jar
>> >/usr/hdp/current/solr/example/lib/ext
>> >sudo cp -f /usr/hdp/current/hbase-client/lib/hbase*.jar
>> >/usr/hdp/current/solr/example/lib/ext
>> >
>> >This includes the Azure active data lake jars also.
>> >
>> >2. Edited my solr-config.xml file for my collection:
>> >
>> >${solr.core.name}/data/
>> >
>> >> >class="solr.HdfsDirectoryFactory">
>> >> >name="solr.hdfs.home">adl://esodevdleus2.azuredatalakestore.net/
>> clusters/esohadoopdeveus2/solr/
>> >  /usr/hdp/2.6.2.25-1/hadoop/conf
>> >> >name="solr.hdfs.blockcache.global">${solr.hdfs.
>> blockcache.global:true}
>> >  true
>> >  1
>> > true
>> >  16384
>> >  true
>> >  true
>> >  16
>> >
>> >
>> >
>> >When this collection is deployed to solr, I see this error message:
>> >
>> >
>> >
>> >0
>> >2189
>> >
>> >org.apache.solr.client.solrj.impl.HttpSolrServer$
>> RemoteSolrException:Error
>> >CREATEing SolrCore 'ems-collection_shard2_replica2':
>> >Unable to create core: ems-collection_shard2_replica2 Caused by: Class
>> >org.apache.hadoop.fs.adl.HdiAdlFileSystem not
>> >foundorg.apache.solr.client.solrj.impl.HttpSolrServer$
>> RemoteSolrException:Error
>> >CREATEing SolrCore 'ems-collection_shard2_replica1': Unable to create
>> >core: ems-collection_shard2_replica1 Caused by: Class
>> >org.apache.hadoop.fs.adl.HdiAdlFileSystem not
>> >foundorg.apache.solr.client.solrj.impl.HttpSolrServer$
>> RemoteSolrException:Error
>> >CREATEing SolrCore 'ems-collection_shard1_replica1': Unable to create
>> >core: ems-collection_shard1_replica1 Caused by: Class
>> >org.apache.hadoop.fs.adl.HdiAdlFileSystem not
>> >foundorg.apache.solr.client.solrj.impl.HttpSolrServer$
>> RemoteSolrException:Error
>> >CREATEing SolrCore 'ems-collection_shard1_replica2': Unable to create
>> >core: ems-collection_shard1_replica2 Caused by: Class
>> >org.apache.hadoop.fs.adl.HdiAdlFileSystem not found
>> >
>> >
>> >
>> >
>> >Has anyone done this and can help me out?
>> >
>> >Thanks,
>> >
>> >Abhi
>> >
>> >
>> >--
>> >Abhi Basu
>>
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>
>


-- 
Abhi Basu


Indexing multi level Nested JSON using curl

2018-03-26 Thread Zheng Lin Edwin Yeo
Hi,

I'm trying to index the following JSON with 2 child level using the
following curl command using cygwin:

curl 'http://localhost:8983/solr/collection1/update/json/docs?split=/|/orgs'
-H 'Content-type:application/json' -d '
{
  "id":"1",
  "name_s": "JoeSmith",
  "phone_s": 876876687,
  "orgs": [
{
  "name1_s" : "Microsoft",
  "city_s" : "Seattle",
  "zip_s" : 98052,
  "orgs":[{"name2_ss":"alan","phone2_ss":"123"},{"name2_ss":
"edwin","phone2_ss":"456"}]
},
{
  "name1_s" : "Apple",
  "city_s" : "Cupertino",
  "zip_s" : 95014,
  "orgs":[{"name2_ss":"alan","phone2_ss":"123"},{"name2_ss":
"edwin","phone2_ss":"456"}]
}
  ]
}'

However, after indexing, this is what is shown in Solr. The 2nd child have
been place together under the 1st child as a multi-valued field, which is
wrong. If I have set the field for the 2nd child to be non-multi-valued
field,  it will have error saying "multiple values encountered for non
multiValued field orgs2.name2_s:".

{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":41,
"params":{
  "q":"phone_s:876876687",
  "fl":"*,[child parentFilter=phone_s:876876687]",
  "sort":"id asc"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"1",
"name_s":"JoeSmith",
"phone_s":"876876687",
"language_s":"en",
"_version_":1595632041779527680,
"_childDocuments_":[
{
  "name1_s":"Microsoft",
  "city_s":"Seattle",
  "zip_s":"98052",
  "orgs.name2_ss":["alan",
"edwin"],
  "orgs.phone2_ss":["123",
"456"],
  "_version_":1595632041779527680},
{
  "name1_s":"Apple",
  "city_s":"Cupertino",
  "zip_s":"95014",
  "orgs.name2_ss":["alan",
"edwin"],
  "orgs.phone2_ss":["123",
"456"],
  "_version_":1595632041779527680}]}]
  }}


How can we structure the curl command so it will be able to accept child of
child relationship? We should not be doing any pre-processing to the JSON
to achieve that.

I'm using Solr 7.2.1.

Regards,
Edwin


edit gc parameters in solr.in.sh or solr?

2018-03-26 Thread Derek Poh

Hi

From your experience, would like to know if It is advisable to change 

the gc parameters in solr.in.sh or solrfile?
It is mentioned in the documentation to edit solr.in.sh but would like 
toknow which file you actually edit.


I am using Solr 6.6.2at the moment.

Regards,
Derek


--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: querying vs. highlighting: complete freedom?

2018-03-26 Thread Arturas Mazeika
Hi Erick,

Adding a field-qualify to the hl.q parameter solved the issue. My
excitement is steaming over the roof! What a thorough answer: the
explanation about the behavior of solr, how it tries to interpret what I
mean when I supply a keyword without the field-qualifier. Very impressive.
Would you care (re)posting this answer to stackoverflow? If that is too
much of a hassle, I'll do this in a couple of days myself on your behalf.

I am impressed how well, thorough, fast and fully the question was answered.

Steven hint pushed me into this direction further: he suggested to use the
query part of solr to filter and sort out the relevant answers in the 1st
step and in the 2nd step he'd highlight all the keywords using CTR+F (in
the browser or some alternative viewer). This brought be to the next
question:

How can one match query terms with the analyze-chained documents in an
efficient and distributed manner? My current understanding how to achieve
this is the following:

1. Get the list of ids (contents) of the documents that match the query
2. Use the http://localhost:8983/solr/#/trans/analysis to re-analyze the
document and the query
3. Use the matching of the substrings from the original text to last
filter/tokenizer/analyzer in the analyze-chain to map the terms of the query
4. Emulate CTRL+F highlighting

Web Interface of Solr offers quite a bit to advance towards this goal. If
one fires this request:

* analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955) was a
German-born theoretical physicist[5] who developed the theory of
relativity, one of the two pillars of modern physics (alongside quantum
mechanics).&
* analysis.query=reletivity theory

to one of the cores of solr, one gets the steps 1-3 done:

http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml&analysis.showmatch=true&analysis.fieldvalue=Albert%20Einstein%20(14%20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics%20(alongside%20quantum%20mechanics).&analysis.query=reletivity%20theory&analysis.fieldtype=text_en

Questions:

1. Is there a way to "load-balance" this? In the above url, I need to
specify a specific core. Is it possible to generalize it, so the core that
receives the request is not necessarily the one that processes it? Or this
already is distributed in a sense that receiving core and processing cores
are never the same?

2. The document was already analyze-chained. Is is possible to store this
information so one does not need to re-analyze-chain it once more?

Cheers
Arturas

On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson 
wrote:

> Arturas:
>
> Try to field-qualify your hl.q parameter. That looks like:
>
> hl.q=trans:Kundigung
> or
> hl.q=trans:Kündigung
>
> I saw the exact behavior you describe when I did _not_ specify the
> field in the hl.q parameter, i.e.
>
> hl.q=Kundigung
> or
> hl.q=Kündigung
>
> didn't show all highlights.
>
> But when I did specify the field, it worked.
>
> Here's what I think is happening: Solr uses the default search
> field when parsing an un-field-qualified query. I.e.
>
> q=something
>
> is parsed as
>
> q=default_search_field:something.
>
> The default field is controlled in solrconfig.xml with the "df"
> parameter, you'll see entries like:
> my_field
>
> Also when I changed the "df" parameter to the field I was highlighting
> on, I didn't need to specify the field on the hl.q parameter.
>
> hl.q=Kundigung
> or
> hl.q=Kündigung
>
> The default  field is usually "text", which knows nothing about
> the German-specific filters you've applied unless you changed it.
>
> So in the absence of a field-qualification for the hl.q parameter Solr
> was parsing the query according to the analysis chain specifed
> in your default field, and probably passed ü through without
> transforming it. Since your indexing analysis chain for that field
> folded ü to just plain u, it wasn't found or highlighted.
>
> On the surface, this does seem like something that should be
> changed, I'll go ahead and ping the dev list.
>
> NOTE: I was trying this on Solr 7.1
>
> Best,
> Erick
>
> On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika 
> wrote:
> > Hi Erick,
> >
> > Thanks for the update and the infos. Your post brought quite a bit of
> light
> > into the picture and now I understand quite a bit more about what you are
> > saying. Your explanation makes sense and can be quite useful in certain
> > scenarious.
> >
> > What stroke me from your description is that you are saying that the
> > analyzer-chain needs to be applied for the highlighting queries as well.
> > The tragedy is that I am not able to get this for a german collection: if
> > the query is set (no explicit highlighting query), the highlighting is
> > correct. It is also correct, if I replace the umaults into the
> > corresponding latin chars. Getting the analyzer chain f