from:"Erik Hatcher"

Re: Difference in q.op param behavior between Solr 6.3 and Solr 8.5.2

2020-09-23 Thread Erik Hatcher

In 6.3 it did that?   It shouldn't have.  q and fq shouldn't share parameters.  
fq's themselves shouldn't, IMO, have global defaults.  fq's need to be stable 
and often uniquely specified kinds of constraining query parsers 
({!terms/term/field,etc}) or rely on basic Lucene query parser syntax and be 
able to stably rely on AND/OR.

Relevancy tuning on q and friends, tweaking those parameters, shouldn't affect 
fq's, to say it a little differently.

One can fq={!lucene q.op=AND}id:(1 2 3)

Erik

> On Sep 23, 2020, at 4:23 PM, gnandre  wrote:
> 
> Is there a way to set default operator as AND for fq parameter in Solr
> 8.5.2 now?
> 
> On Tue, Sep 22, 2020 at 7:44 PM gnandre  wrote:
> 
>> In 6.3, q.op param used to affect q as well fq param behavior. E.g. if
>> q.op is set to AND and fq is set to id:(1 2 3), no results will show up but
>> if it is set to OR then all 3 results will show up. This does not happen in
>> Solr 8.5.2 anymore.
>> 
>> Is this a bug? What does one need to do in Solr 8.5.2 to achieve the same
>> behavior besides passing the operator directly in fq param i.e. id:(1 OR 2
>> OR 3)
>>

Re: Best field definition which is only use for filter query.

2020-07-22 Thread Erik Hatcher




> On Jul 22, 2020, at 08:52, raj.yadav  wrote:
> 
> Erik Hatcher-4 wrote
>> Wouldn’t a “string” field be as good, if not better, for this use case?
> 
> What is the rationale behind this type change to 'string'. How will it speed
> up search/filtering? Will it not increase the index size. Since in general
> string type takes more space storage then int (not sure about whats case in
> lucene). 

You tell me? ;)   Easy enough to try in your environment, I imagine, in 
parallel in same collection index.  

As I understand it (in regards to Erick’s points), range queries aren’t being 
used here.  

Erik

> 
> Regards,
> Raj
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Best field definition which is only use for filter query.

2020-07-22 Thread Erik Hatcher

Wouldn’t a “string” field be as good, if not better, for this use case?

> On Jul 22, 2020, at 08:02, Erick Erickson  wrote:
> 
> fq clauses are just like the q clause except for two things:
> 1> no scoring is done
> 2> the entire result set _can_ be stored in the filterCache.
> 
> so if a value isn’t indexed, it can’t be used in either an fq or q clause.
> 
> The thread you reference is under the assumption (and this is the default in 
> some versions of Solr) that docValues=true. And yes, that will be very, very 
> slow. Think “table scan”.
> 
> Also, the default pint type is not as efficient for single-value searches 
> like this, the trie fields are better. Trie support will be kept until 
> there’s a good alternative for the single-value lookup with pint.
> 
> So for what you’re doing, I’d change to TrieInt, docValues=false, index=true. 
> If you have neither docValues=true nor index=true, the query won’t work at 
> all. You’ll have to adequately size your hardware if index size is a concern.
> 
> Best,
> Erick
> 
>> On Jul 22, 2020, at 7:18 AM, Raj Yadav  wrote:
>> 
>> Below is the sample document
>> 
>> 
>> 
>> 
>> 
>> *{"filedA": 1,"filedB": "","filedC": "Sher","filedD":
>> "random","rules":[203,7843,43,283,6603,83,513,5303,243,103,323,163,403,363,5333,2483,313,703,523,503,563,8543,1003,483,1083,2043,6523,603,963,683,5353,763,443,643,743,723,1123,843,1243,1663,1803,1403,1783,7563,3843,1843,1523,1203,1563,1703,1883,8913,1923,1323,5313,1623,1963,2033,2763,2623,2083,2123,2143,123,2183,2333,8183,7323,2323,7243,2313,2463,2423,2383,5833,2343,2503,2663,8263,3083,2683,2543,8313,2883,2923,3043,2703,3243,3123,2263,3003,2393,3203,3163,6243,3283,3443,3343,3403,1913,3323,3483,3603,3723,3763,8333,3563,863,3683,3643,3523,3803,8323,3883,4003,3923,4043,4173,1163,2963,1743,6593,4083,4103,4143,1363,3983,4183,4223,6623,4383,1443,4303,4263,4403,4423,4283,4343,5043,4923,4983,4993,6633,4503,5843,8073,4663]}*
>> As you can see we have 5 fields and one of the field names is "rules".
>> Field Definition:
>> > multiValued="true">
>> 
>> The only operation that we do on this field is filtering.
>> example: => fq=rules:203
>> 
>> *Problems:*
>> 1. The problem over here is, for `rules` field we have
>> marked indexed="true" and it is consuming a large percentage of total index
>> size.
>> 2. Another problem is, a large chunk of our document update request is
>> mainly for this(rules) field.
>> 
>> If I marked `indexed=false` for this field (by default pint field type have
>> docValue=true)
>> *> multiValued="true">*
>> Then following thread is suggesting that filter operation (which is also
>> one kind of search operation) will be very slow
>> https://lucene.472066.n3.nabble.com/Facet-performance-problem-td4375925.html
>> 
>> Is there a way to not keep indexed=true for `rules` field and still does
>> not impact our search(filtering performance). Or any other solution which
>> can help in reducing our total index size and also does not increase
>> search(filter) latency
>> 
>> Regards,
>> Raj
>

Re: Solr heap Old generation grows and it is not recovered by G1GC

2020-07-14 Thread Erik Hatcher

What kind of statistics?Are these stats that you could perhaps get from 
faceting or the stats component instead of gathering docs and accumulating 
stats yourself?

> On Jul 14, 2020, at 8:51 AM, Odysci  wrote:
> 
> Hi Erick,
> 
> I agree. The 300K docs in one search is an anomaly.
> But we do use 'fq' to return a large number of docs for the purposes of
> generating statistics for the whole index. We do use CursorMark extensively.
> Thanks!
> 
> Reinaldo
> 
> On Tue, Jul 14, 2020 at 8:55 AM Erick Erickson 
> wrote:
> 
>> I’d add that you’re abusing Solr horribly by returning 300K documents in a
>> single go.
>> 
>> Solr is built to return the top N docs where N is usually quite small, <
>> 100. If you allow
>> an unlimited number of docs to be returned, you’re simply kicking the can
>> down
>> the road, somebody will ask for 1,000,000 docs sometime and you’ll be back
>> where
>> you started.
>> 
>> I _strongly_ recommend you do one of two things for such large result sets:
>> 
>> 1> Use Streaming. Perhaps Streaming Expressions will do what you want
>>without you having to process all those docs on the client if you’re
>>doing some kind of analytics.
>> 
>> 2> if you really, truly need all 300K docs, try getting them in chunks
>> using CursorMark.
>> 
>> Best,
>> Erick
>> 
>>> On Jul 13, 2020, at 10:03 PM, Odysci  wrote:
>>> 
>>> Shawn,
>>> 
>>> thanks for the extra info.
>>> The OOM errors were indeed because of heap space. In my case most of the
>> GC
>>> calls were not full GC. Only when heap was really near the top, a full GC
>>> was done.
>>> I'll try out your suggestion of increasing the G1 heap region size. I've
>>> been using 4m, and from what you said, a 2m allocation would be
>> considered
>>> humongous. My test cases have a few allocations that are definitely
>> bigger
>>> than 2m (estimating based on the number of docs returned), but most of
>> them
>>> are not.
>>> 
>>> When i was using maxRamMB, the size used was "compatible" with the the
>> size
>>> values, assuming the avg 2K bytes docs that our index has.
>>> As far as I could tell in my runs, removing maxRamMB did change the GC
>>> behavior for the better. That is, now, heap goes up and down as expected,
>>> and before (with maxRamMB) it seemed to increase continuously.
>>> Thanks
>>> 
>>> Reinaldo
>>> 
>>> On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey 
>> wrote:
>>> 
 On 6/25/2020 2:08 PM, Odysci wrote:
> I have a solrcloud setup with 12GB heap and I've been trying to
>> optimize
 it
> to avoid OOM errors. My index has about 30million docs and about 80GB
> total, 2 shards, 2 replicas.

 Have you seen the full OutOfMemoryError exception text?  OOME can be
 caused by problems that are not actually memory-related.  Unless the
 error specifically mentions "heap space" we might be chasing the wrong
 thing here.

> When the queries return a smallish number of docs (say, below 1000),
>> the
> heap behavior seems "normal". Monitoring the gc log I see that young
> generation grows then when GC kicks in, it goes considerably down. And
 the
> old generation grows just a bit.
> 
> However, at some point i have a query that returns over 300K docs (for
>> a
> total size of approximately 1GB). At this very point the OLD generation
> size grows (almost by 2GB), and it remains high for all remaining time.
> Even as new queries are executed, the OLD generation size does not go
 down,
> despite multiple GC calls done afterwards.

 Assuming the OOME exceptions were indeed caused by running out of heap,
 then the following paragraphs will apply:

 G1 has this concept called "humongous allocations".  In order to reach
 this designation, a memory allocation must get to half of the G1 heap
 region size.  You have set this to 4 megabytes, so any allocation of 2
 megabytes or larger is humongous.  Humongous allocations bypass the new
 generation entirely and go directly into the old generation.  The max
 value that can be set for the G1 region size is 32MB.  If you increase
 the region size and the behavior changes, then humongous allocations
 could be something to investigate.

 In the versions of Java that I have used, humongous allocations can only
 be reclaimed as garbage by a full GC.  I do not know if Oracle has
 changed this so the smaller collections will do it or not.

 Were any of those multiple GCs a Full GC?  If they were, then there is
 probably little or no garbage to collect.  You've gotten a reply from
 "Zisis T." with some possible causes for this.  I do not have anything
 to add.

 I did not know about any problems with maxRamMB ... but if I were
 attempting to limit cache sizes, I would do so by the size values, not a
 specific RAM size.  The size values you have chosen (8192 and 16384)
 will most likely result in a total

Re: 'velocity' does not exist . Do an 'create-queryresponsewriter' , if you want to create it

2020-05-19 Thread Erik Hatcher

Need to also make sure the velocity writer and dependencies are ’d in in 
solrconfig.xml

> On May 19, 2020, at 02:30, Prakhar Kumar  
> wrote:
> 
> Hello Team,
> 
> I am using Solr 8.5.0 and here is the full log for the error which I am
> getting:
> 
> SolrConfigHandler Error checking plugin :  =>
> org.apache.solr.common.SolrException: Error loading class
> 'solr.VelocityResponseWriter'
> @40005ec3702b3710a43c at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:570)
> @40005ec3702b3710a824 org.apache.solr.common.SolrException: Error
> loading class 'solr.VelocityResponseWriter'
> @40005ec3702b3710ac0c at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:570)
> ~[?:?]
> @40005ec3702b3710f25c at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:501)
> ~[?:?]
> @40005ec3702b3710f644 at
> org.apache.solr.core.SolrCore.createInstance(SolrCore.java:824) ~[?:?]
> @40005ec3702b3710f644 at
> org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:880) ~[?:?]
> @40005ec3702b3710fa2c at
> org.apache.solr.handler.SolrConfigHandler$Command.verifyClass(SolrConfigHandler.java:601)
> ~[?:?]
> @40005ec3702b371105e4 at
> org.apache.solr.handler.SolrConfigHandler$Command.updateNamedPlugin(SolrConfigHandler.java:565)
> ~[?:?]
> @40005ec3702b371105e4 at
> org.apache.solr.handler.SolrConfigHandler$Command.handleCommands(SolrConfigHandler.java:502)
> ~[?:?]
> @40005ec3702b3711196c at
> org.apache.solr.handler.SolrConfigHandler$Command.handlePOST(SolrConfigHandler.java:363)
> ~[?:?]
> @40005ec3702b3711196c at
> org.apache.solr.handler.SolrConfigHandler$Command.access$100(SolrConfigHandler.java:161)
> ~[?:?]
> @40005ec3702b37111d54 at
> org.apache.solr.handler.SolrConfigHandler.handleRequestBody(SolrConfigHandler.java:139)
> ~[?:?]
> @40005ec3702b3711213c at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
> ~[?:?]
> @40005ec3702b3711290c at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2596) ~[?:?]
> @40005ec3702b37112cf4 at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802) ~[?:?]
> @40005ec3702b37112cf4 at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579) ~[?:?]
> @40005ec3702b371130dc at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)
> ~[?:?]
> @40005ec3702b37115404 at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)
> ~[?:?]
> @40005ec3702b371157ec at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
> ~[jetty-servlet-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b371157ec at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> ~[jetty-servlet-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711678c at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711678c at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> ~[jetty-security-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b37116b74 at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b37117344 at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711772c at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b37117b14 at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b371182e4 at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b37119284 at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b37119284 at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> ~[jetty-servlet-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711966c at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711a224 at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711a60c at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)
> ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
> @40005ec3702b3711a9f4 at
>

Re: solr payloads performance

2020-05-11 Thread Erik Hatcher

Wei -

Here's some details on the various payload capabilities and short-comings: 
https://lucidworks.com/post/solr-payloads/

SOLR-10541 is the main functional constraint (range faceting over functions).

Erik

> On May 8, 2020, at 7:26 PM, Wei  wrote:
> 
> Hi everyone,
> 
> Have a question regarding typical  e-commerce scenario: each item may have
> different price in different store. suppose there are 10 million items and
> 1000 stores.
> 
> Option 1:  use solr payloads, each document have
> store_prices_payload:store1|price1 store2|price2  .
> store1000|price1000
> 
> Option 2: use dynamic fields and have 1000 fields in each document, i.e.
>   field1:  store1_price:  price1
>   field2:  store2_price:  price2
>   ...
>   field1000:  store1000_price: price1000
> 
> Option 2 doesn't look elegant,  but is there any performance benchmark on
> solr payloads? In terms of filtering, sorting or faceting, how would query
> performance compare between the two?
> 
> Thanks,
> Wei

Re: 'velocity' does not exist . Do an 'create-queryresponsewriter' , if you want to create it

2020-04-28 Thread Erik Hatcher

Try add-queryresponsewriter instead of "update" - it's not currently defined so 
there's nothing to update.   Add it first, but also make sure you've got the 
Velocity contrib and dependencies wired into your configset as well.

Erik


> On Apr 28, 2020, at 9:15 AM, Prakhar Kumar  
> wrote:
> 
> Hello Team,
> 
> I am getting this weird error in Solr logs. Does anyone know how to prevent
> it from happening?
> 
> ERROR:[{
>"update-queryresponsewriter":{
>  "startup":"lazy",
>  "name":"velocity",
>  "class":"solr.VelocityResponseWriter",
>  "template.base.dir":"",
>  "solr.resource.loader.enabled":"true",
>  "params.resource.loader.enabled":"true"},
>"errorMessages":[" 'velocity' does not exist . Do an
> 'create-queryresponsewriter' , if you want to create it "]}]
> 
> 
> -- 
> Kind Regards,
> Prakhar Kumar
> Sr. Enterprise Software Engineer
> 
> *HotWax Systems*
> *Enterprise open source experts*
> cell: +91-89628-81820
> office: 0731-409-3684
> http://www.hotwaxsystems.com

Re: ResourceManager : unable to find resource 'custom.vm' in any resource loader.

2020-04-22 Thread Erik Hatcher

What's the full request that is logged?   You're using the Velocity response 
writer (wt=velocity) and a request is being made to render a custom.vm template 
(v.template=custom, or a template is #parse'ing("custom.vm")) that doesn't 
exist.

Erik

> On Apr 22, 2020, at 8:07 AM, Prakhar Kumar  
> wrote:
> 
> Hello Team,
> 
> I am getting this weird error in Solr logs.
> 
> null:java.io.IOException: Unable to find resource 'custom.vm'
>   at 
> org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:308)
>   at 
> org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:141)
>   at 
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:53)
>   at 
> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:727)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> 
> 
> Could anyone please tell me how to fix this.
> 
> 
> -- 
> Kind Regards,
> Prakhar Kumar
> Sr. Enterprise Software Engineer
> 
> *HotWax Systems*
> *Enterprise open source experts*
> cell: +91-89628-81820
> office: 0731-409-3684
> http://www.hotwaxsystems.com

Re: Query is taking a time in Solr 6.1.0

2020-03-13 Thread Erik Hatcher

Looks like you have two, maybe three, wildcard/prefix clauses in there.  
Consider tokenizing differently so you can optimize the queries to not need 
wildcards - thats my first observation and suggestion. 

Erik 

> On Mar 13, 2020, at 05:56, vishal patel  wrote:
> 
> Some query is taking time in Solr 6.1.0.
> 
> 2020-03-12 11:05:36.752 INFO  (qtp1239731077-2513155) [c:documents s:shard1 
> r:core_node1 x:documents] o.a.s.c.S.Request [documents]  webapp=/solr 
> path=/select 
> params={df=summary=false=id=4=0=true=doc_ref+asc,id+desc==s3.test.com:8983/solr/documents|s3r1.test.com:8983/solr/documents=250=2=(doc_ref:((*n205*)+))+AND+(title:((*Distribution\+Board\+Schedule*)+))+AND+project_id:(2104616)+AND+is_active:true+AND+((isLatest:(true)+AND+isFolderActive:true+AND+isXref:false+AND+-document_type_id:(3+7)+AND+((is_public:true+OR+distribution_list:7249777+OR+folderadmin_list:7249777+OR+author_user_id:7249777)+AND+(((allowedUsers:(7249777)+OR+allowedRoles:(6368666)+OR+combinationUsers:(7249777))+AND+-blockedUsers:(7249777))+OR+(defaultAccess:(true)+AND+-blockedUsers:(7249777)+AND+-blockedRoles:(6368666)+OR+(isLatestRevPrivate:(true)+AND+allowedUsersForPvtRev:(7249777)+AND+-folderadmin_list:(7249777)))=true=1584011129462=true=javabin}
>  hits=0 status=0 QTime=7276.
> 
> Is there any way to reduce the query execution time(7276 Milli)?

Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-23 Thread Erik Hatcher

It's a great idea.   And then index that file into a separate lean collection 
of just the suggestions, along with the weight as another field on those 
documents, to use for ranking them at query time with standard /select queries. 
 (this separate suggest collection would also have appropriate tokenization to 
match the partial words as the user types, like ngramming)

Erik


> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - audrey.lorberf...@ibm.com 
>  wrote:
> 
> David, 
> 
> Thank you, that is useful. So, would you recommend using a (clean) field over 
> an external dictionary file? We have lots of "top queries" and measure their 
> nDCG. A thought was to programmatically generate an external file where the 
> weight per query term (or phrase) == its nDCG. Bad idea?
> 
> Best,
> Audrey
> 
> On 1/20/20, 11:51 AM, "David Hastings"  wrote:
> 
>Ive used this quite a bit, my biggest piece of advice is to choose a field
>that you know is clean, with well defined terms/words, you dont want an
>autocomplete that has a massive dictionary, also it will make the
>start/reload times pretty slow
> 
>On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
>audrey.lorberf...@ibm.com  wrote:
> 
>> Hi All,
>> 
>> We plan to incorporate a query autocomplete functionality into our search
>> engine (like this: 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
>>  
>> ). And I was wondering if anyone has personal experience with this
>> component and would like to share? Basically, we are just looking for some
>> best practices from more experienced Solr admins so that we have a starting
>> place to launch this in our beta.
>> 
>> Thank you!
>> 
>> Best,
>> Audrey
>> 
> 
>

[CVE-2019-17558] Apache Solr RCE through VelocityResponseWriter

2019-12-30 Thread Erik Hatcher

[CVE-2019-17558] Apache Solr RCE through VelocityResponseWriter

Severity: High

Vendor: The Apache Software Foundation

Versions Affected: 5.0.0 to 8.3.1

Description:
The affected versions are vulnerable to a Remote Code Execution through the
VelocityResponseWriter.  A Velocity template can be provided through
Velocity templates in a configset `velocity/` directory or as a parameter.
A user defined configset could contain renderable, potentially malicious,
templates.  Parameter provided templates are disabled by default, but can
be enabled by setting `params.resource.loader.enabled` by defining a
response writer with that setting set to `true`.  Defining a response
writer requires configuration API access.

Solr 8.4 removed the params resource loader entirely, and only enables the
configset-provided template rendering when the configset is `trusted` (has
been uploaded by an authenticated user).

Mitigation: Ensure your network settings are configured so that only
trusted traffic
communicates with Solr, especially to the configuration APIs.

Credits: Github user `s00py`

References:
  * https://cwiki.apache.org/confluence/display/solr/SolrSecurity
  * https://issues.apache.org/jira/browse/SOLR-13971
  * https://issues.apache.org/jira/browse/SOLR-14025

Re: How to tell which core was used based on Json or XML response from Solr

2019-11-24 Thread Erik Hatcher

add ==all and the parameter will be in the response 
header. 

   Erik

> On Nov 22, 2019, at 13:27, rhys J  wrote:
> 
> I'm implementing an autocomplete search box for Solr.
> 
> I'm using JSON as my response style, and this is the jquery code.
> 
> 
> var url='http://10.40.10.14:8983/solr/'+core+'/select/?q='+queryField +
> 
> query+'=2.2=true=0=50=on=json=?=on_data';
> 
> jQuery_3_4_1.getJSON(url);
> 
> ___
> 
> on_data(data)
> {
> var docs = data.response.docs;
> jQuery_3_4_1.each(docs, function(i, item) {
> 
> var trLink = ' href="#" onclick=local_goto_dbtr(' + item.debtor_id + '); return true;"> '
> + item.debtor_id + '';
> 
> trLink += '' + item.name1 + '';
> trLink += '' + item.dl1 + '';
> trLink += '';
> 
> jQuery_3_4_1('#resultsTable').prepend(jQuery_3_4_1(trLink));
> });
> 
> }
> 
> the jQuery_3_4_1 variable is replacing $ because I needed to have 2
> different versions of jQuery running in the same document.
> 
> I'd like to know if there's something I'm missing that will indicate which
> core I've used in Solr based on the response.
> 
> Thanks,
> 
> Rhys

Re: using fq means no results

2019-11-12 Thread Erik Hatcher

To add bq in there makes it query parser specific.  But I’m being pedantic 
since most folks are using edismax where that applies (along with a bunch of 
other params that would also deserve mention, like boost and bf).  q and fq, 
agreed for the explanation.  bq mentioned only if specifics and siblings 
described too :)

> On Nov 12, 2019, at 12:16, Walter Underwood  wrote:
> 
> I explain it this way:
> 
> * fq: filtering
> * q: filtering and scoring
> * bq: scoring
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>>> On Nov 12, 2019, at 9:08 AM, Erik Hatcher  wrote:
>>> 
>>> 
>>> 
>>>> On Nov 12, 2019, at 12:01 PM, rhys J  wrote:
>>> 
>>> On Tue, Nov 12, 2019 at 11:57 AM Erik Hatcher 
>>> wrote:
>>> 
>>>> fq is a filter query, and thus narrows the result set provided by the q
>>>> down to what also matches all specified fq's.
>>>> 
>>>> 
>>> So this can be used instead of scoring? Or alongside scoring?
>> 
>> That's right.   Only `q` (and it's query parser associated params) are used 
>> for scoring.   fq's narrow the result set, but don't influence score.
>> 
>>Erik
>> 
>

Re: using fq means no results

2019-11-12 Thread Erik Hatcher

> On Nov 12, 2019, at 12:01 PM, rhys J  wrote:
> 
> On Tue, Nov 12, 2019 at 11:57 AM Erik Hatcher 
> wrote:
> 
>> fq is a filter query, and thus narrows the result set provided by the q
>> down to what also matches all specified fq's.
>> 
>> 
> So this can be used instead of scoring? Or alongside scoring?

That's right.   Only `q` (and it's query parser associated params) are used for 
scoring.   fq's narrow the result set, but don't influence score.

Erik

Re: using fq means no results

2019-11-12 Thread Erik Hatcher

fq is a filter query, and thus narrows the result set provided by the q down to 
what also matches all specified fq's.

You gave it a query, "cat_ref_no", which literally looks for that string in 
your default field.   Looking at your q parameter, cat_ref_no looks like a 
field name, and your fq should probably also have a value for that field (say 
fq=cat_ref_no=owl-2924-8)

Use debug=true to see how your q and fq's are parsed, and that should 
shed some light on the issue.

Erik

> On Nov 12, 2019, at 11:33 AM, rhys J  wrote:
> 
> If I do this query in the browser:
> 
> http://10.40.10.14:8983/solr/debt/select?q=(clt_ref_no:+owl-2924-8)^=1.0+clt_ref_no:owl-2924-8
> 
> I get 84662 results.
> 
> If I do this query:
> 
> http://10.40.10.14:8983/solr/debt/select?q=(clt_ref_no:+owl-2924-8)^=1.0+clt_ref_no:owl-2924-8=clt_ref_no
> 
> I get 0 results.
> 
> Why does using fq do this?
> 
> What am I missing in my query?
> 
> Thanks,
> 
> Rhys

Re: Good Open Source Front End for Solr

2019-11-07 Thread Erik Hatcher

Blacklight: http://projectblacklight.org/ 

;)



> On Nov 6, 2019, at 11:16 PM, Java Developer  wrote:
> 
> Hi,
> 
> What is the best open source front-end for Solr
> 
> Thanks

Re: Security Vulnerability Consultation

2019-11-01 Thread Erik Hatcher

Hi -

There are many "vulnerabilities" that can be enabled when one has 
administrative access to Solr, with this being one example.   The setting 
mentioned defaults to false, and requires admin access to enable.

The warning from the Solr Reference Guide is worth repeating here:

>> No Solr API, including the Admin UI, is designed to be exposed to 
>> non-trusted parties. 

Turning on authentication is the first step I'd recommend.

Erik

> On Oct 31, 2019, at 11:45 PM, Huawei PSIRT  wrote:
> 
> Dear,
> 
> 
> 
>This is Huawei PSIRT. We have learned that a security researcher
>  released an
> Apache Solr RCE suspected vulnerability on October 31, 2019.
> 
>The links are as follow:
> https://meterpreter.org/unpatch-apache-solr-remote-command-execution-vulnera
> bility-alert/
> 
> https://gist.github.com/s00py/a1ba36a3689fa13759ff910e179fc133
> 
> 
> 
> We want to confirm if the issue exists. If it exists, when will the
> patches be released ?
> 
> Looking forward to your reply. Thank you.
> 
> 
> 
> Best Regards,
> 
> Huawei PSIRT
>

Re: Solr Paryload example

2019-10-21 Thread Erik Hatcher

Yes.   The decoding of a payload based on its schema type is what the payload() 
function does.   Your Payloader won't currently work well/legibly for fields 
encoded numerically:


https://github.com/o19s/payload-component/blob/master/src/main/java/com/o19s/payloads/Payloader.java#L130
 
<https://github.com/o19s/payload-component/blob/master/src/main/java/com/o19s/payloads/Payloader.java#L130>

I think that code could probably be slightly enhanced to leverage 
PayloadUtils.getPayloadDecoder(fieldType) and use bytes if the field type 
doesn't have a better decoder. 

Erik


> On Oct 21, 2019, at 2:55 PM, Eric Pugh  
> wrote:
> 
> Have you checked out
> https://github.com/o19s/payload-component
> 
> On Mon, Oct 21, 2019 at 2:47 PM Erik Hatcher  wrote:
> 
>> How about a single field, with terms like:
>> 
>>store1_USD|125.0 store2_EUR|220.0 store3_GBP|225.0
>> 
>> Would that do the trick?
>> 
>> And yeah, payload decoding is currently limited to float and int with the
>> built-in payload() function.   We'd need a new way to pull out
>> textual/bytes payloads - like maybe a DocTransformer?
>> 
>>Erik
>> 
>> 
>>> On Oct 21, 2019, at 9:59 AM, Vincenzo D'Amore 
>> wrote:
>>> 
>>> Hi Erick,
>>> 
>>> thanks for getting back to me. We started to use payloads because we have
>>> the classical per-store pricing problem.
>>> Thousands of stores across and different prices.
>>> Then we found the payloads very useful started to use it for many
>> reasons,
>>> like enabling/disabling the product for such store, save the stock
>>> availability, or save the other info like buy/sell price, discount rates,
>>> and so on.
>>> All those information are numbers, but stores can also be in different
>>> countries, I mean would be useful also have the currency and other
>>> attributes related to the store.
>>> 
>>> Thinking about an alternative for payloads maybe I could use the dynamic
>>> fields, well, I know it is ugly.
>>> 
>>> Consider this hypothetical case where I have two field payload :
>>> 
>>> payloadPrice: [
>>> "store1|125.0",
>>> "store2|220.0",
>>> "store3|225.0"
>>> ]
>>> 
>>> payloadCurrency: [
>>> "store1|USD",
>>> "store2|EUR",
>>> "store3|GBP"
>>> ]
>>> 
>>> with dynamic fields I could have different fields for each document.
>>> 
>>> currency_store1_s: "USD"
>>> currency_store2_s: "EUR"
>>> currency_store3_s: "GBP"
>>> 
>>> But how many dynamic fields like this can I have? more than thousands?
>>> 
>>> Again, I've just started to look at solr-ocrhighlighting github project
>> you
>>> suggested.
>>> Those seems have written their own payload object type where store ocr
>>> highlighting information.
>>> It seems interesting, I'll take a look immediately.
>>> 
>>> Thanks again for your time.
>>> 
>>> Best regards,
>>> Vincenzo
>>> 
>>> 
>>> On Mon, Oct 21, 2019 at 2:55 PM Erick Erickson 
>>> wrote:
>>> 
>>>> This is one of those situations where I know a client did it, but didn’t
>>>> see the code myself.
>>>> 
>>>> So I can’t help much.
>>>> 
>>>> Perhaps a good question at this point, though, is “why do you want to
>> add
>>>> string payloads anyway”?
>>>> 
>>>> This isn’t the client, but it might give you some pointers:
>>>> 
>>>> 
>>>> 
>> https://github.com/dbmdz/solr-ocrpayload-plugin/blob/master/src/main/java/de/digitalcollections/solr/plugin/components/ocrhighlighting/OcrHighlighting.java
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>>> On Oct 21, 2019, at 6:37 AM, Vincenzo D'Amore 
>>>> wrote:
>>>>> 
>>>>> Hi Erick,
>>>>> 
>>>>> It seems I've reached a dead-point, or at least it seems looking at the
>>>>> code, it seems I can't  easily add a custom decoder:
>>>>> 
>>>>> Looking at PayloadUtils class there is getPayloadDecoder method invoked
>>>> to
>>>>> return the PayloadDecoder :
>>>>> 
>>>>> public static PayloadDecoder getPayloadDecoder(FieldType fieldType) {
>>>>>  PayloadDecoder

Re: Solr Paryload example

2019-10-21 Thread Erik Hatcher

How about a single field, with terms like:

store1_USD|125.0 store2_EUR|220.0 store3_GBP|225.0

Would that do the trick?

And yeah, payload decoding is currently limited to float and int with the 
built-in payload() function.   We'd need a new way to pull out textual/bytes 
payloads - like maybe a DocTransformer?

Erik


> On Oct 21, 2019, at 9:59 AM, Vincenzo D'Amore  wrote:
> 
> Hi Erick,
> 
> thanks for getting back to me. We started to use payloads because we have
> the classical per-store pricing problem.
> Thousands of stores across and different prices.
> Then we found the payloads very useful started to use it for many reasons,
> like enabling/disabling the product for such store, save the stock
> availability, or save the other info like buy/sell price, discount rates,
> and so on.
> All those information are numbers, but stores can also be in different
> countries, I mean would be useful also have the currency and other
> attributes related to the store.
> 
> Thinking about an alternative for payloads maybe I could use the dynamic
> fields, well, I know it is ugly.
> 
> Consider this hypothetical case where I have two field payload :
> 
> payloadPrice: [
> "store1|125.0",
> "store2|220.0",
> "store3|225.0"
> ]
> 
> payloadCurrency: [
> "store1|USD",
> "store2|EUR",
> "store3|GBP"
> ]
> 
> with dynamic fields I could have different fields for each document.
> 
> currency_store1_s: "USD"
> currency_store2_s: "EUR"
> currency_store3_s: "GBP"
> 
> But how many dynamic fields like this can I have? more than thousands?
> 
> Again, I've just started to look at solr-ocrhighlighting github project you
> suggested.
> Those seems have written their own payload object type where store ocr
> highlighting information.
> It seems interesting, I'll take a look immediately.
> 
> Thanks again for your time.
> 
> Best regards,
> Vincenzo
> 
> 
> On Mon, Oct 21, 2019 at 2:55 PM Erick Erickson 
> wrote:
> 
>> This is one of those situations where I know a client did it, but didn’t
>> see the code myself.
>> 
>> So I can’t help much.
>> 
>> Perhaps a good question at this point, though, is “why do you want to add
>> string payloads anyway”?
>> 
>> This isn’t the client, but it might give you some pointers:
>> 
>> 
>> https://github.com/dbmdz/solr-ocrpayload-plugin/blob/master/src/main/java/de/digitalcollections/solr/plugin/components/ocrhighlighting/OcrHighlighting.java
>> 
>> Best,
>> Erick
>> 
>>> On Oct 21, 2019, at 6:37 AM, Vincenzo D'Amore 
>> wrote:
>>> 
>>> Hi Erick,
>>> 
>>> It seems I've reached a dead-point, or at least it seems looking at the
>>> code, it seems I can't  easily add a custom decoder:
>>> 
>>> Looking at PayloadUtils class there is getPayloadDecoder method invoked
>> to
>>> return the PayloadDecoder :
>>> 
>>> public static PayloadDecoder getPayloadDecoder(FieldType fieldType) {
>>>   PayloadDecoder decoder = null;
>>> 
>>>   String encoder = getPayloadEncoder(fieldType);
>>> 
>>>   if ("integer".equals(encoder)) {
>>> decoder = (BytesRef payload) -> payload == null ? 1 :
>>> PayloadHelper.decodeInt(payload.bytes, payload.offset);
>>>   }
>>>   if ("float".equals(encoder)) {
>>> decoder = (BytesRef payload) -> payload == null ? 1 :
>>> PayloadHelper.decodeFloat(payload.bytes, payload.offset);
>>>   }
>>>   // encoder could be "identity" at this point, in the case of
>>> DelimitedTokenFilterFactory encoder="identity"
>>> 
>>>   // TODO: support pluggable payload decoders?
>>> 
>>>   return decoder;
>>> }
>>> 
>>> Any advice to work around this situation?
>>> 
>>> 
>>> On Mon, Oct 21, 2019 at 1:51 AM Erick Erickson 
>>> wrote:
>>> 
 You’d need to write one. Payloads are generally intended to hold
>> numerics
 you can then use in a function query to factor into the score…
 
 Best,
 Erick
 
> On Oct 20, 2019, at 4:57 PM, Vincenzo D'Amore 
 wrote:
> 
> Sorry, I just realized that I was wrong in how I'm using the payload
> function.
> Give that the payload function only handles a numeric (integer or
>> float)
> payload, could you suggest me an alternative function that handles
 strings?
> If not, should I write one?
> 
> On Sun, Oct 20, 2019 at 10:43 PM Vincenzo D'Amore 
> wrote:
> 
>> Hi all,
>> 
>> I'm trying to understand what I did wrong with a payload query that
>> returns
>> 
>> error: {
>> metadata: [ "error-class", "org.apache.solr.common.SolrException",
>> "root-error-class", "org.apache.solr.common.SolrException" ],
>> msg: "No payload decoder found for field: colorCode",
>> code: 400
>> }
>> 
>> I have reduced my problem in a little sample to show what happens to
>> me.
>> Basically I have a document with a couple of payload fields one
>> delimited_payloads_string and one delimited_payloads_integer
>> 
>> {
>> field_dps: "key|data",
>> field_dpi: "key|1",
>> }
>> 
>> When I execute this query solr

Re: solr.HTMLStripCharFilterFactory issue

2019-09-02 Thread Erik Hatcher

Analysis has no effect on the stored (what you get back from fl) value.   The 
html stripping is happening behind the scenes on the indexed/searchable terms. 

 Erik

> On Sep 2, 2019, at 09:30, Big Gosh  wrote:
> 
> Hi,
> 
> I've configured in solr 8.2.0 a field type as follows:
> 
>  positionIncrementGap="100" multiValued="true">
>  
>
>
> words="stopwords.txt" />
>
>
>  
>  
>
> words="stopwords.txt" />
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
>  
>
> 
> I expected that the search returns the field stripped, instead HTML tags
> are still in the field.
> 
> Is this correct or I made a mistake in configuration
> 
> I'm quite sure in the past I used this approach to strip html from the text
> 
> Thanks in advance

Re: modify query response plugin

2019-08-06 Thread Erik Hatcher

I think you’re looking for the Solr Tagger, described here: 
https://lucidworks.com/post/solr-tagger-improving-relevancy/

> On Aug 6, 2019, at 16:04, Maria Muslea  wrote:
> 
> Hi,
> 
> I am trying to implement a plugin that will modify my query response. For
> example, I would like to execute a query that will return something like:
> 
> {...
> "description":"flights at LAX",
> "highlight":"airport;11;3"
> ...}
> This is information that I have in my document, so I can return it.
> 
> Now, I would like the plugin to intercept the result, do some processing on
> it, and return something like:
> 
> {...
> "description":"flights at LAX",
> "highlight":{
>   "concept":"airport",
>   "description":"flights at LAX"
> ...}
> 
> I looked at some RequestHandler implementations, but I can't find any
> sample code that would help me with this. Would this type of plugin be
> handled by a RequestHandler? Could you maybe point me to a sample plugin
> that does something similar?
> 
> I would really appreciate your help.
> 
> Thank you,
> Maria

Re: Ranking

2019-07-27 Thread Erik Hatcher

The details of the scoring can be seen by setting =true 

Erik 

> On Jul 27, 2019, at 15:40, Steven White  wrote:
> 
> Hi everyone,
> 
> I have 2 files like so:
> 
> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
> FB has the letter "i" 362 times and the file size is 9,953
> 
> When I search on the letter "i" FB is ranked lower which confuses me
> because I was under the impression the occurrences of the term in a
> document and the document size is a factor as such I was expecting FB to
> rank higher.  Did I get this right?  If not, what's causing FB to rank
> lower?
> 
> I'm on Solr 8.1
> 
> Thanks
> 
> Steven

Re: Boosting using Range

2019-05-31 Thread Erik Hatcher

Sachin - that’s a confusing name for a field that represents a price and not a 
“range”, but ok use the first one but with your field name: 
=price_range:[10 TO 25]

My bad below saying “boost” (takes a function, not a raw query).   Use “bq”, 
which takes a regular query. 

Erik

> On May 31, 2019, at 01:26, sachin gk  wrote:
> 
> Hi Erik,
> 
> We have indexed it as a double and has individual value Eg Price_Range: 10.
> 
>> On Thu, 30 May 2019 at 23:34, Erik Hatcher  wrote:
>> 
>> The simplest given your example, with edismax add =price:[10 TO 25]
>> 
>> Or you literally have a price_range field?=price_range:10_25
>> (assuming that's how you indexed it).   What type of field is price_range?
>> What did you index into it?
>> 
>>Erik
>> 
>> 
>>> On May 30, 2019, at 1:24 PM, sachin gk  wrote:
>>> 
>>> Hi All,
>>> 
>>> I am trying to boost solr documents using the range attribute as
>> mentioned
>>> below.
>>> 
>>> price_range: 10 25  is it possible, if so how to form a query.
>>> 
>>> --
>>> Regards,
>>> Sachin
>> 
>> 
> 
> -- 
> Regards,
> Sachin

Re: Boosting using Range

2019-05-30 Thread Erik Hatcher

The simplest given your example, with edismax add =price:[10 TO 25]

Or you literally have a price_range field?=price_range:10_25 
(assuming that's how you indexed it).   What type of field is price_range?   
What did you index into it?

Erik

> On May 30, 2019, at 1:24 PM, sachin gk  wrote:
> 
> Hi All,
> 
> I am trying to boost solr documents using the range attribute as mentioned
> below.
> 
> price_range: 10 25  is it possible, if so how to form a query.
> 
> -- 
> Regards,
> Sachin

Re: Retrieving docs in the same order as provided in the query

2019-05-09 Thread Erik Hatcher

So yeah, this constant score trick isn't meant for a "large" list, of course.

For bigger result sets, the ExternalFileField feature would be one way to go.   
Or maybe the QueryElevationComponent.   And I suppose also, the LTR feature 
could be abused for this?

Erik



> On May 9, 2019, at 9:35 AM, Atita Arora  wrote:
> 
> Sure,
> I can give this a shot! Hope it works out well for bigger resultsets too :)
> 
> Big Thanks, Erik :)
> 
> 
> 
> On Thu, May 9, 2019 at 3:20 PM Erik Hatcher  wrote:
> 
>> Atita -
>> 
>> You mean something like q=id:(X Y Z) to be able to order them arbitrarily?
>> 
>> Yes, you can use the constant score query syntax to set the score, e.g.:
>> 
>>   q=id:Z^=3 OR id:Y^=2 OR id:X^=1
>> 
>> Hope that helps.
>> 
>>Erik
>> 
>> 
>>> On May 9, 2019, at 8:55 AM, Atita Arora  wrote:
>>> 
>>> Hi,
>>> 
>>> Is there someway way to retrieve the docs in the same order as queried in
>>> the solr query?
>>> 
>>> I am aware of leveraging bq for this and have even tried overriding
>> custom
>>> similarity to achieve this but I am looking for something simpler.
>>> 
>>> Please enlighten me.
>>> 
>>> Best Regards,
>>> Atita
>> 
>>

Re: Retrieving docs in the same order as provided in the query

2019-05-09 Thread Erik Hatcher

Atita -

You mean something like q=id:(X Y Z) to be able to order them arbitrarily?

Yes, you can use the constant score query syntax to set the score, e.g.:

   q=id:Z^=3 OR id:Y^=2 OR id:X^=1

Hope that helps.

Erik


> On May 9, 2019, at 8:55 AM, Atita Arora  wrote:
> 
> Hi,
> 
> Is there someway way to retrieve the docs in the same order as queried in
> the solr query?
> 
> I am aware of leveraging bq for this and have even tried overriding custom
> similarity to achieve this but I am looking for something simpler.
> 
> Please enlighten me.
> 
> Best Regards,
> Atita

Re: Custom post filter with support for 'OR' queries

2019-05-05 Thread Erik Hatcher

Can you detail you actual querying need here?You’re down into some trenches 
with PostFilter, which is designed purely as an AND-like filtering mechanism, 
and contrary to ORing with it, generally speaking.  

Let’s see the real data and need to see what’s the best way to tackle it.  

Also, with nesting qparsers, the curly brackets need to fully enclose the 
query, not leaving the expression to outside the brackets ambiguously.Use 
{!myparser v=$my_q} where _q=the expression.  

Erik 

> On May 5, 2019, at 11:07, alexpusch  wrote:
> 
> Hi, 
> 
> I'm trying to write my own custom post filter. I'm following the following
> guide -
> http://qaware.blogspot.com/2014/11/how-to-write-postfilter-for-solr-49.html
> 
> My implementation works for a simple query:
> {!myFilter}query
> 
> But I need to perform OR queries in addition to my post filter:
> field:value OR {!myFilter}query
> 
> I'm getting the follow error: 
> java.lang.UnsupportedOperationException: Query {!cache=false cost=100} does
> not implement createWeight
> 
> As I only want this queryParser to only run on results on a post filter
> manner, I presume I do not need or can implement createWeight.
> 
> Can a post filter be applied like this? Or should I look for a different
> approach?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Term Freq Vector with SOLR cell?

2019-05-01 Thread Erik Hatcher

q=doc_content?Try q=id:""

Solr Cell and DIH are comparable (in that they are about getting content into 
Solr) but "unrelated" to TVRH.   TVRH is about inspecting indexed content, 
regardless of how it got in.

Erik


> On May 1, 2019, at 3:14 PM, Geoffrey Willis  
> wrote:
> 
> I am using Solr in a web app to extract text from .pdf, and docx files. I was 
> wondering if I can access the TermFreq and TermPosition vectors via the HTTP 
> interface exposed by Solr Cell. I’m posting/getting documents fine, I’ve 
> enabled the TV, TFV etc in the managed schema:
> 
>  stored="true" termPayloads="true" termPositions="true" termVectors="true”/>
> 
> And use a get request similar to :
> 
>   
> http://localhost:8983/solr/myCore/tvrh?q=doc_content=true=true=true=true=true
>  s=true=includes
> 
> When I look in the browser network tab, I see that the query went in as 
> expected with tv=true, tv.positions= true etc. But no Term Positions/Offsets 
> in the results. I’ve done similar using the Data Import Handler with java, 
> but looking for a web solution. Before I “Roll my own” Term Vector, thought 
> I’d see if it’s available from Solr Cell.

Re: bin/post command not working when run from crontab

2019-04-18 Thread Erik Hatcher

Jason - thanks for replying 

and I concur, it makes sense to open a JIRA for this.I'm glad there is 
an acceptable workaround, at least.

I recall doing a fair bit of trial and error, asking 'nix folk and 
stackoverflow how to handle this stdin situation and honing in on what's there 
now.   But it's obviously weirdly broken, sorry.  

Erik


> On Apr 14, 2019, at 8:30 AM, Jason Gerlowski  wrote:
> 
> Hi Carsten,
> 
> I think this is probably worth a jira.  I'm not familiar enough with
> bin/post to say definitively whether the behavior you mention is a
> bug, or whether it's "expected" in some odd sense.  But there's enough
> uncertainty that I think it's worth recording there.
> 
> Best,
> 
> Jason
> 
> On Fri, Apr 12, 2019 at 5:52 AM Carsten Agger  wrote:
>> 
>> Hi all
>> 
>> I posted the question below some time back, concerning the unusual
>> behaviour of bin/post if there is no stdin.
>> 
>> There has been no comments to that, and maybe bin/post is quaint in that
>> regard - I ended up changing my application to POST directly on the Web
>> endpoint instead.
>> 
>> But I do have one question, though: Should this be considered a bug, and
>> should I report it as such? Unfortunately I don't have the time to
>> prepare a proper fix myself.
>> 
>> Best
>> Carsten
>> 
>> On 3/27/19 7:55 AM, Carsten Agger wrote:
>>> I'm working with a script where I want to send a command to delete all
>>> elements in an index; notably,
>>> 
>>> 
>>> /opt/solr/bin/post -c  -d  
>>> "*:*"
>>> 
>>> 
>>> When run interactively, this works fine.
>>> 
>>> However, when run automatically as a cron job, it gives this interesting
>>> output:
>>> 
>>> 
>>> Unrecognized argument:   "*:*"
>>> 
>>> If this was intended to be a data file, it does not exist relative to /root
>>> 
>>> The culprit seems to be these lines, 143-148:
>>> 
>>> if [[ ! -t 0 ]]; then
>>>   MODE="stdin"
>>> else
>>>   # when no stdin exists and -d specified, the rest of the arguments
>>>   # are assumed to be strings to post as-is
>>>   MODE="args"
>>> 
>>> This code seems to be doing the opposite of what the comment says - it
>>> sets MODE="stdin" if stdin is NOT a terminal, but if it IS (i.e., there
>>> IS an stdin) it assumes the rest of the args can be posted as-is.
>>> 
>>> On the other hand, if the condition is reversed, my command will fail
>>> interactively but not when run as a cron job. Both options are, of
>>> course, unsatisfactory.
>>> 
>>> It /will/ actually work in both cases, if instead the command to delete
>>> the contents of the index is written as:
>>> 
>>> echo "*:*" |  /opt/solr/bin/post -c 
>>> departments -d
>>> 
>>> 
>>> I've seen this bug in SOLR 7.5.0 and 7.7.1. Should I report it as a bug
>>> or is there an easy explanation?
>>> 
>>> 
>>> Best
>>> 
>>> Carsten Agger
>>> 
>>> 
>> --
>> Carsten Agger
>> 
>> Chief Technologist
>> Magenta ApS
>> Skt. Johannes Allé 2
>> 8000 Århus C
>> 
>> Tlf  +45 5060 1476
>> http://www.magenta-aps.dk
>> carst...@magenta-aps.dk
>>

Re: Understanding Performance of Function Query

2019-04-09 Thread Erik Hatcher

maybe something like q=

({!edismax  v=$q1} OR {!edismax  v=$q2} OR {!edismax ... v=$q3})

 and setting q1, q2, q3 as needed (or all to the same maybe with different qf’s 
and such)

  Erik

> On Apr 9, 2019, at 09:12, sidharth228  wrote:
> 
> I did infact use "bf" parameter for individual edismax queries. 
> 
> However, the reason I can't condense these edismax queries into a single
> edismax query is because each of them uses different fields in "qf". 
> 
> Basically what I'm trying to do is this: each of these edismax queries (q1,
> q2, q3) has a logic, and scores docs using it. I am then trying to combine
> the scores (to get an overall score) from these scores later by summing
> them.
> 
> What options do I have of implementing this?
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Understanding Performance of Function Query

2019-04-09 Thread Erik Hatcher

Function queries in ‘q’ score EVERY DOCUMENT.   Use ‘bf’ or ‘boost’ for the 
function part, so its only computed on main query matching docs.  

Erik

> On Apr 9, 2019, at 03:29, Sidharth Negi  wrote:
> 
> Hi,
> 
> I'm working with "edismax" and "function-query" parsers in Solr and have
> difficulty in understanding whether the query time taken by
> "function-query" makes sense. The query I'm trying to optimize looks as
> follows:
> 
> q={!func sum($q1,$q2,$q3)} where q1,q2,q3 are edismax queries.
> 
> The QTime returned by edismax queries takes well under 50ms but it seems
> that function-query is the rate determining step since combined query above
> takes around 200-300ms. I also analyzed the performance of function query
> using only constants.
> 
> The QTime results for different q are as follows:
> 
>   -
> 
>   097ms for q={!func} sum(10,20)
>   -
> 
>   109ms for q={!func} sum(10,20,30)
>   -
> 
>   127ms for q={!func} sum(10,20,30,40)
>   -
> 
>   145ms for q={!func} sum(10,20,30,40,50)
> 
> Does this trend make sense? Are function-queries expected to be this slow?
> 
> What makes edismax queries so much faster?
> 
> What can I do to optimize my original query (which has edismax subqueries
> q1,q2,q3) to work under 100ms?
> 
> I originally posted this question
> 
> on
> StackOverflow with no success, so any help here would be appreciated.

Re: Behavior of Function Query

2019-03-19 Thread Erik Hatcher

Try adding fl=* into the request.   There’s an oddity with fl, iirc, where it 
can skip functions if * isn’t there (or maybe a concrete non-score field?)

   Erik

> On Mar 18, 2019, at 10:19, Ashish Bisht  wrote:
> 
> Please see the below requests and response
> 
> http://Sol:8983/solr/SCSpell/select?q="*internet of
> things*"=edismax=spellcontent=json=1=score,internet_of_things:query({!edismax
> v='"*internet of things*"'}),instant_of_things:query({!edismax v='"instant
> of things"'})
> 
> 
> Response contains score from function query
> 
> "fl":"score,internet_of_things:query({!edismax v='\"internet of
> things\"'}),instant_of_things:query({!edismax v='\"instant of things\"'})",
>  "rows":"1",
>  "wt":"json"}},
>  "response":{"numFound":851,"start":0,"maxScore":7.6176834,"docs":[
>  {
>"score":7.6176834,
>   * "internet_of_things":7.6176834*}]
>  }}
> 
> 
> But if in the same request q is changed,it doesn't give score
> 
> http://Sol-1:8983/solr/SCSpell/select?q="*wall
> street*"=edismax=spellcontent=json=1=score,internet_of_things:query({!edismax
> v='"*internet of things*"'}),instant_of_things:query({!edismax v='"instant
> of things"'})
> 
>   "q":"\"wall street\"",
>  "defType":"edismax",
>  "qf":"spellcontent",
>  "fl":"score,internet_of_things:query({!edismax v='\"internet of
> things\"'}),instant_of_things:query({!edismax v='\"instant of things\"'})",
>  "rows":"1",
>  "wt":"json"}},
>  "response":{"numFound":46,"start":0,"maxScore":15.670144,"docs":[
>  {
>"score":15.670144}]
>  }}
> 
> 
> Why score of function query is getting applied when q is a different.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Different behavior when using function queries

2019-03-18 Thread Erik Hatcher

If you have no documents in the results, there’s nothing to attach the function 
result to.`fl` is field list of fields to show in matched documents.   You 
have no matches documents. 

Erik

> On Mar 18, 2019, at 07:55, Ashish Bisht  wrote:
> 
> Can someone please explain the below behavior.For different q parameter
> function query response differs although function queries are same
> 
> http://:8983/solr/SCSpell/select?q="*market
> place*"=edismax=spellcontent=json=1=internet_of_things:if(exists(query({!edismax
> v='"internet of
> things"'})),true,false),instant_of_things:if(exists(query({!edismax
> v='"instant of things"'})),true,false)
> 
> Response contains function query results
> 
> "response":{"numFound":80,"start":0,"docs":[
>  {
>"internet_of_things":false,
>"instant_of_things":false}]
>  }}
> 
> wheras for different q
> 
> http://:8983/solr/SCSpell/select?q="*intent of
> things*"=edismax=spellcontent=json=1=internet_of_things:if(exists(query({!edismax
> v='"internet of
> things"'})),true,false),instant_of_things:if(exists(query({!edismax
> v='"instant of things"'})),true,false)
> 
> Response doesnot contain function query results
> 
> "response":{"numFound":0,"start":0,"docs":[]
>  }}
> 
> 
> From the results it looks like if the results of q doesn't yield result
> function queries don't work.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Is it possible to force solr show all facet values for the field with an enum type?

2019-01-06 Thread Erik Hatcher

How about =-field:[* TO *] as a way to see a count of docs that 
don’t have field?

Erik 

> On Jan 5, 2019, at 04:45, Arvydas Silanskas  
> wrote:
> 
> Hello,
> I have an enum solr fieldtype. When I do a facet search, I want that all
> the enum values appear in the facet -- and setting field.mincount = 0 is
> not enough. It only works, if there exist a document with the matching
> value for the field, but it was filtered out by current query (and then I'm
> returned that facet value with the count 0). But can I make it to also
> return the values that literally none of the documents in the index have?
> The values, that only appear in the enum declaration xml.

Re: boost query

2018-12-07 Thread Erik Hatcher

Only way to know is to try!   ;)

You have a typo on “noika”.I’d use ‘bf’ instead of bq so as to specify the 
function without the _val_ stuff. 

  Erik

> On Dec 7, 2018, at 02:19, Midas A  wrote:
> 
> Thanks Erik.
> Please confirm
> if keyword = "nokia"
> *bq=_val_:%22payload(vals_dpf,noika)%22=edismax*
> *wil this query work for me ?.*
> 
> 
> 
> 
> 
>> On Fri, Dec 7, 2018 at 12:12 PM Erik Hatcher  wrote:
>> 
>> This blog I wrote will help.   Let us know how it goes.
>> 
>> https://lucidworks.com/2017/09/14/solr-payloads/
>> 
>>   Erik
>> 
>>> On Dec 7, 2018, at 01:31, Midas A  wrote:
>>> 
>>> I have a field at my schema named  *val_dpf* . I want that *val_dpf*
>> should
>>> have payloaded values. i.e.
>>> 
>>> noika|0.46  mobile|0.37  samsung|0.19 redmi|0.22
>>> 
>>> When a user searches for a keyword i.e. nokia I want to add 0.46 to usual
>>> score. If user searches for samsung, 0.19 should be added .
>>> 
>>> how can i achieve this .
>>

Re: boost query

2018-12-06 Thread Erik Hatcher

This blog I wrote will help.   Let us know how it goes.  

 https://lucidworks.com/2017/09/14/solr-payloads/

   Erik

> On Dec 7, 2018, at 01:31, Midas A  wrote:
> 
> I have a field at my schema named  *val_dpf* . I want that *val_dpf* should
> have payloaded values. i.e.
> 
> noika|0.46  mobile|0.37  samsung|0.19 redmi|0.22
> 
> When a user searches for a keyword i.e. nokia I want to add 0.46 to usual
> score. If user searches for samsung, 0.19 should be added .
> 
> how can i achieve this .

Re: Date Query Using Local Params

2018-09-10 Thread Erik Hatcher

When using the {!...} syntax, and combining it with other clauses, the 
expression parsed needs to come from a local-param `v` parameter (otherwise, 
without `v`, the parser eats the rest of the string after the closing curly 
bracket).  So you could do something like this:


q={!field f=collection_date_range op=Within v='[2013-07-08 TO 2013-07-09]'} 
OR {!field
f=collection_date_range op=Within v='[2013-07-21 TO 2013-07-25]'}

Or you could do this sort of thing, which allows the date ranges to be 
parameterized:

q={!field f=collection_date_range op=Within v=$range1} OR {!field
f=collection_date_range op=Within v=$range2}
 =[2013-07-08 TO 2013-07-09]
 =[2013-07-21 TO 2013-07-25]

Erik





> On Sep 10, 2018, at 3:59 PM, Antelmo Aguilar  wrote:
> 
> Hi Shawn,
> 
> Thank you.  So just to confirm, there is no way for me to use an OR
> operator with also using the "within" op parameter described in the bottom
> of this page?
> 
> https://lucene.apache.org/solr/guide/6_6/working-with-dates.html#WorkingwithDates-MoreDateRangeFieldDetails
> 
> I appreciate your resposne.
> 
> Best,
> Antelmo
> 
> On Mon, Sep 10, 2018 at 3:51 PM, Shawn Heisey  wrote:
> 
>> On 9/10/2018 1:21 PM, Antelmo Aguilar wrote:
>> 
>>> Hi,
>>> 
>>> I have a question.  I am trying to use the "within" op parameter in a Date
>>> Search.  This works like I would expect: {!field f=collection_date_range
>>> op=Within}[2013-07-08 TO 2013-07-09]
>>> 
>>> I would like to use an OR with the query though, something like this:
>>> {!field
>>> f=collection_date_range op=Within}[2013-07-08 TO 2013-07-09] OR {!field
>>> f=collection_date_range op=Within}[2013-07-21 TO 2013-07-25]
>>> 
>>> However, I tried different approaches and none of them worked.  Is there a
>>> way of doing something like this for querying dates using the "within" op
>>> parameter?
>>> 
>> 
>> I don't think the field parser can do this.  Also, usually it's not
>> possible to use localparams in a second query clause like that --
>> localparams must almost always be the very first thing in the "q"
>> parameter, or they will not be interpreted as localparams.  Use the
>> standard (lucene) parser without localparams.  The q parameter should look
>> like this:
>> 
>> collection_date_range:[2013-07-08 TO 2013-07-09] OR
>> collection_date_range:[2013-07-21 TO 2013-07-25]
>> 
>> If the default operator hasn't been changed (which would mean it is using
>> OR), then you could remove the "OR" from that.
>> 
>> Thanks,
>> Shawn
>> 
>>

Re: How long does a query?q=field1:2312 should cost? exactly hit one document.

2018-09-03 Thread Erik Hatcher

Add debug=true and see where the time goes, in which components? 

Highlighting is my culprit guess.   Or faceting?

> On Sep 3, 2018, at 07:45, zhenyuan wei  wrote:
> 
> Hi ，
>   I am curious “How long does a  query q=field1:2312 cost ,   which
> exactly match only one document? ”，  Of course we just discuss  no
> queryResultCache with match in this situation.
>   In fact  my QTime is  150ms+， it is too long.

Re: Can I use RegEx function?

2018-07-23 Thread Erik Hatcher

this is best done at index-time.   (it seems like you're trying to avoid doing 
that though)



> On Jul 23, 2018, at 5:36 AM, Peter Sh  wrote:
> 
> I want to be able to parse "KEY:VALUE" pairs from my text and have a facet
> representing distribution of VALUES
> 
> On Mon, Jul 23, 2018 at 12:25 PM Markus Jelsma 
> wrote:
> 
>> Hello,
>> 
>> Neither fl nor facet.field support functions, but facet.query is analogous
>> to the latter. I do not understand what you need/want with fl and regex.
>> 
>> Regards,
>> Markus
>> 
>> 
>> 
>> -Original message-
>>> From:Peter Sh 
>>> Sent: Monday 23rd July 2018 11:21
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Can I use RegEx function?
>>> 
>>> Can I use it in "fl" and  "facet.field" as a function
>>> 
>>> On Mon, Jul 23, 2018 at 11:33 AM Markus Jelsma <
>> markus.jel...@openindex.io>
>>> wrote:
>>> 
 Hello,
 
 The usual faceting works for all queries, facet.query=q:field:/[a-z]+$/
 will probably work too, i would be really surprised if it didn't. Keep
>> in
 mind that my example doesn't work, the + needs to be URL encoded!
 
 Regards,
 Markus
 
 
 
 -Original message-
> From:Peter Sh 
> Sent: Monday 23rd July 2018 10:26
> To: solr-user@lucene.apache.org
> Subject: Re: Can I use RegEx function?
> 
> can it be used in facets?
> 
> On Mon, Jul 23, 2018, 11:24 Markus Jelsma <
>> markus.jel...@openindex.io>
> wrote:
> 
>> Hello,
>> 
>> It is not really obvious in documentation, but the standard query
 parser
>> supports regular expressions. Encapsulate your regex with forward
 slashes
>> /, q=field:/[a-z]+$/ will work.
>> 
>> Regards,
>> Markus
>> 
>> 
>> 
>> -Original message-
>>> From:Peter Sh 
>>> Sent: Monday 23rd July 2018 10:09
>>> To: solr-user@lucene.apache.org
>>> Subject: Can I use RegEx function?
>>> 
>>> I've got collection with a string or text field storing
>> free-text.
 I'd
>> like
>>> to use some RexEx function looking for patterns like "KEY:VALUE"
 from the
>>> text and use it for filtering and faceting.
>>> 
>> 
> 
 
>>> 
>>

Re: How to avoid join queries

2018-06-13 Thread Erik Hatcher




> On Jun 13, 2018, at 4:24 PM, root23  wrote:

...

>  But i
> know use of join is discouraged in solr and i do not want to use it.

…

Why do you say that?   I, for one, find great power and joy using `{!join}`.

Erik

Re: How to find out which search terms have matches in a search

2018-06-12 Thread Erik Hatcher

Derek -

One trick I like to do is try various forms of a query all in one go.   With 
facet=on, you can:

  =big brown bear
  =big brown
  =brown bear
  =big
  =brown
  =bear

The returned counts give you an indication of what queries matched docs in the 
result set, and which didn’t.   If you did this with q=*:* you’d see how each 
of those matched across the entire collection.   

Grouping and group.query could be used similarly.  

I’ve used facet.query to do some Venn diagramming of overlap of search results 
like that.   An oldie but a goodie: 
https://www.slideshare.net/lucenerevolution/hatcher-erik-rapid-prototyping-with-solr/12
 
<https://www.slideshare.net/lucenerevolution/hatcher-erik-rapid-prototyping-with-solr/12>

4.10.4?   woah

    Erik Hatcher
Senior Solutions Architect, Lucidworks.com


> On Jun 11, 2018, at 11:16 PM, Derek Poh  wrote:
> 
> Hi
> 
> How can I find out which search terms have matches in a search?
> 
> Eg.
> The search terms are "big brown bear".And only "big" and "brown" have matches 
> in the searchresult.
> Can Solr return this information that "big" and "brown" have matches in the 
> search result?
> I want touse this information to display on the search result page that "big" 
> and "brown" have matches.
> Somethinglike "big brown bear".
> 
> Amusing solr 4.10.4.
> 
> Derek
> 
> --
> CONFIDENTIALITY NOTICE 
> This e-mail (including any attachments) may contain confidential and/or 
> privileged information. If you are not the intended recipient or have 
> received this e-mail in error, please inform the sender immediately and 
> delete this e-mail (including any attachments) from your computer, and you 
> must not use, disclose to anyone else or copy this e-mail (including any 
> attachments), whether in whole or in part. 
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.

Re: indexer used in solr

2018-06-11 Thread Erik Hatcher

Vivek -

Can you provide us specific examples of what you’re sending in (and how you are 
doing so) and how you are querying and what you expect?

Erik

> On Jun 11, 2018, at 7:34 AM, Vivek Singh  wrote:
> 
> HI Team ,
> I am new to solr ,wanted to know which indexer is used in apache solr by
> default . I am not getting good results.
> 
> -- 
> Regards,
> Vivek Singh.
> 9818214334

Re: sharding guidelines

2018-06-04 Thread Erik Hatcher

I’d say that 100M/shard is in the smallest doc use case possible, such as 
straight up log items with only a timestamp, id, and short message kind of 
thing.

In other contexts, big full text docs, 10M/shard is kind of a max.

How many documents do you have in your collection?

Erik Hatcher
Senior Solutions Architect
Lucidworks.com



> On Jun 4, 2018, at 6:36 PM, Oakley, Craig (NIH/NLM/NCBI) [C] 
>  wrote:
> 
> I have a sharding question.
> 
> 
> 
> 
> 
> We have a collection (one shard, two replicas, currently running Solr6.6) 
> which sometimes becomes unresponsive on the non-leader node. It is 214 
> gigabytes, and we were wondering whether there is a rule of thumb how large 
> to allow a core to grow before sharding. I have a reference in my notes from 
> the 2015 Solr conference in Austin "baseline no more than 100 million 
> docs/shard" and "ideal shard-to-memory ratio, if at all possible index should 
> fit into RAM, but other than that it gets really specific really fast"; but 
> that was several versions ago, and so I wanted to ask whether these 
> suggestions have been recalculated.
> 
> Thanks

Re: Three Indexing Questions

2018-03-29 Thread Erik Hatcher

Terry -

You’re speaking of bin/post, looks like.   bin/post is _just_ a simple tool to 
provide some basic utility.   The fact that it can recurse a directory 
structure at all is an extra bonus that really isn’t about “Solr” per se, but 
about posting content into it.   

Frankly, (even as the author of bin/post) I don’t think bin/post for file 
system crawling is the rightest way to go.   Having Solr parse content (which 
bin/post sends into Solr’s /update/extract handler) itself is recommended for 
production/scale.

All caveats aside and recommendations to upsize your file crawler…. it’s just a 
bin/post shell script and a Java class called SimplePostTool - I’d encourage 
you to adapt what it does to your requirements so that it will send over .eml 
files like apparently work manually (how did you test that?  curious on the 
details), and handle multiple directories.   It wasn’t designed to handle 
robust file crawls, but certainly is there for your taking to adjust to your 
needs if it is close enough.   And of course, if you want to generalize the 
handling and submit that back then bin/post can improve!

In short: no, bin/post can’t do the things you’re asking of it, but there’s no 
reason it couldn’t be evolved to handle those things.

Erik


> 
> I note this message that's displayed when I begin indexing: "Entering
> auto mode. File endings considered are
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> 
> Is there a way to get it to recurse through files with different
> extensions, for example, like .eml?  When I manually add all the
> subdirectory content, solr seems to parse the content very well,
> recognizing all the standard email metadata.  I just can't get it to do
> the indexing recursively.
> 
> Second question: if I want to index files from many different source
> directories, is there a way to specify these different sources in one
> command? (Right now I have to issue a separate indexing command for each
> directory - which means I have to sit around and wait till each is
> finished.)
> 
> Third question: I have a very large directory structure that includes a
> couple of subdirectories I'd like to exclude from indexing.  Is there a
> way to index recursively, but exclude specified directories?
>

Re: query regarding Solr partial search

2018-03-27 Thread Erik Hatcher

This is as much about your schema as it is about your query parser usage.   
What’s parsed_query say in your debug=true output?   What query parser are you 
using?   If edismax, check qf/pf/mm settings, etc.

Erik


> On Mar 27, 2018, at 9:56 AM, Paul, Lulu  wrote:
> 
> Hi ,
> 
> Below is my SOLR configuration (schema.xml) for a keyword search field.
> 
>  stored="false" multiValued="true"/>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  positionIncrementGap="100">
>  
>
> words="stopwords.txt" />
>
>  
>  
>
> words="stopwords.txt" />
> ignoreCase="true" expand="true"/>
>
> preserveOriginal="true"/>
>  
> 
> 
> 
> · If I search for “Autograph full score”, Solr returns all items that 
> contains this string in exactly the same order.
> 
> · If I search for “full Autograph score”, Solr doesn’t return any 
> results.
> 
> The requirement is that regardless of the order of the string, Solr should 
> return all records which “CONTAIN” these 3 strings. Please advise how can 
> this be made possible?
> 
> Thanks & Regards,
> Lulu
> 
> 
> 
> **
> Experience the British Library online at www.bl.uk
> The British Library’s latest Annual Report and Accounts : 
> www.bl.uk/aboutus/annrep/index.html
> Help the British Library conserve the world's knowledge. Adopt a Book. 
> www.bl.uk/adoptabook
> The Library's St Pancras site is WiFi - enabled
> *
> The information contained in this e-mail is confidential and may be legally 
> privileged. It is intended for the addressee(s) only. If you are not the 
> intended recipient, please delete this e-mail and notify the 
> postmas...@bl.uk : The contents of this e-mail must 
> not be disclosed or copied without the sender's consent.
> The statements and opinions expressed in this message are those of the author 
> and do not necessarily reflect those of the British Library. The British 
> Library does not take any responsibility for the views of the author.
> *
> Think before you print

Re: Solr Expression Slow

2018-02-12 Thread Erik Hatcher

I suggest applying that logic at index time and build yourself a SORT_CRITERIA 
field and use that rather than that sophisticated function that looks like it 
could collapse down to a single index-time field.

Erik


> On Feb 12, 2018, at 10:32 AM, ~$alpha`  wrote:
> 
> In the below Solr query I am sorting based on the below expression.
> 
>> sum(if(and(tf(CRITERIA1_FILTER,Y),if(tf(PARTNER_CRITERIA1,N),0,1)),1,0),if(and(tf(CRITERIA2_FILTER,Y),if(tf(PARTNER_CRITERIA2,1),0,1)),1,0))
>> asc
> 
> I have mentioned PARTNER_CRITERIA1 and PARTNER_CRITERIA2 in the expression.
> I have similar10 expression like this.
> 
> I want to remove expression as it is slow,
> Is it better to use custom sorting for such case and if yes, can someone
> help how to do so?
> 
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Post-processing of Solr responses in Velocity templates by calling external Java methods

2018-01-25 Thread Erik Hatcher

Ravindra -

So you have documents that represent *lines*, but for each line document you 
want to render the 3 lines (documents) before and after.

Hmmm - tricky!   

Velocity itself isn’t going to help you here.   You’ll need to do additional 
searches to get the “context documents”.   

Given that you’re using VrW, I’d suggest an Ajaxy solution.   For each hit 
being rendering, the browser (not server-side Velocity) would use `line_no` and 
do a query, something like q=line_no:[ TO ] and render those as desired.

If you attempt this server-side, whew - honestly I wouldn’t do that within Solr 
myself - so I’m not sure what exactly to advise.

To your specific question - it is possible to add custom Velocity “tools” to 
the mix.   The test cases for this look like this:


https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/contrib/velocity/src/test/org/apache/solr/velocity/VelocityResponseWriterTest.java#L122-L144

(specifically note the $mytool.star(…) usage in that test method)

The configuration to plug in a custom tool looks like this:

   
https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/contrib/velocity/src/test-files/velocity/solr/collection1/conf/solrconfig.xml#L40-L54

Here’s how the tool itself is written, which comes out pretty lean and clean:

   
https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/contrib/velocity/src/test/org/apache/solr/velocity/MockTool.java

Again, though - I wouldn’t advise using Velocity trickery to do Solr searches 
internal to template rendering (even though you can).

Erik


> On Jan 24, 2018, at 9:50 AM, Ravindra Soni  wrote:
> 
> Hi,
> 
> *Quick context of the application first.*
> 
> I am currently using Solr in standalone mode to index thousands of
> text-like documents (not exactly textual documents).
> 
> A single document has a structure like following:
> 
>   - *id* - unique file id
>   - *line_text* - textual data
>   - *file_local_url* - file's directory location path
>   - *line_no* - line number of the line_text
> 
> *What I am trying to achieve?*
> 
> After I search on the line_text field, as a result I should see the
> matching text snippets. A text snippet can be the lines from (line_no - 3)
> to (line_no + 3). Currently for search front-end I am using Solr's in-built
> Velocity response writer and its templates.
> 
> *How I am thinking of doing this?*
> 
> I am using the default Velocity templates which are shipped with Solr. In
> the template file hit.vm, following is the way it currently fetches,
> processes and displays the responses:
> 
> #foreach( $fieldName in $doc.fieldNames )
> 
>
>  
>$esc.html($fieldName):
>  
> 
>  
>#field($fieldName)
>  
>#end
> 
> Now to get the snippet text I would like to define an external function
> somewhere in a Java class something of this form:
> 
> public String getSnippet(file_local_url, line_no)
> 
> which will return the snippet in a string format.
> 
> Now I want to consume this response in the velocity template. There I am
> thinking of something like this:
> 
> ## get the snippet string by calling the external ava function
> #set($snippet = $someClass.getSnippet(#field("file_local_url"),
> #field("line_no)))
> 
> ## print the snippet
>  snippet
> 
> (I am not sure if this is the correct syntax.)
> 
> *Questions:*
> 
>   1. What kind of file should contain someClass.getSnippet()? Java file?
>   Class file? A jar?
>   2. Where should I keep this file? How will velocity know where to find
>   this class?
>   3. Am I using the write syntax to call the method and use its results in
>   above Velocity template?
> 
> I am quite not getting the bigger picture yet (especially the question 2
> above).
> 
> Please provide some direction. Thanks.
> Ravi.
> 
> -- 
> 
> What if the cure for cancer is trapped inside the mind of someone who
> can't afford an education? - anonymous

Re: trivia question: why q=: doesn't return same result as q.alt=:

2018-01-07 Thread Erik Hatcher

I think what Erick meant to say ;) was 

defType=dismax does NOT do anything special with *:* other than treat it as 
plain text and does dismaxy things with it.   That’s exactly why there is q.alt 
for the dismax parser - so you can have your dismax and still match all docs by 
not having a q. 
 
   Erik

> On Jan 6, 2018, at 22:48, Erick Erickson  wrote:
> 
> As Chris explained, this is special:
> q=*:*
> in terms of scoring or anything of the like. It's just match-all-docs
> 
> It makes no sense to distribute *:* among "pf" fields. The whole point
> of pf is to influence scoring by providing a mechanism for boosting
> when words in some field(s) appear together for docs that _already_
> match the main clause. The fp fields may be totally
> unrelated to the qf fields. There's no reason to couple those together.
> 
> pf means "for docs that match the main query, add an additional boost
> if there are phrase matches in these fields". Whether the pf fields match
> a document has no influence on whether that doc is a hit, it only changes
> the score of docs that have been selected anyway because they matched
> the main clause.
> 
> Another way of saying the above is numFound won't change at all no
> matter whether there are matches on "pf" fields or not. Only the scores
> of those docs might change.
> 
> Since q=hello isn't match-all-docs, it does make sense to boost by "pf"
> field matches, even though it's just a single word. In that case it really
> means "boost docs matching the main clause if this word appears in
> the pf field".
> 
> On a different note, field names with hyphens aren't necessarily supported,
> so "name_shingle_zh-cn" may work, but there also may be edge cases
> where that causes problems. If there are, it's unlikely that fixing them
> will
> be a priority.
> 
> From the ref guide:
> 
> "The name of the field. Field names should consist of alphanumeric or
> underscore characters only and not start with a digit."
> 
> There has been talk at times of throwing warnings or errors if names violate
> this, but that'd break existing apps. It's one of those things that's we
> live
> with ;)
> 
> Best,
> Erick
> 
> On Sat, Jan 6, 2018 at 6:13 PM, Nawab Zada Asad Iqbal 
> wrote:
> 
>> Thanks everyone, that was a very informative thread.
>> 
>> One more curiosity: why are different set of fields being used based on the
>> query string:-
>> 
>> 
>> http://localhost:8983/solr/filesearch/select?fq=id:1193;
>> q=*:*=true
>> 
>> 
>>   - parsedquery: "+DisjunctionMaxQuery((user_email:*:* | user_name:*:* |
>>   tags:*:* | (name_shingle_zh-cn:, , name_shingle_zh-cn:, ,) |
>> id:*:*)~0.01)
>>   DisjunctionMaxQuery(((name_shingle_zh-cn:", , , ,"~100)^100.0 |
>>   tags:*:*)~0.01)",
>> 
>> 
>> 
>> I find it perplexing as the default values for qf and pf are very different
>> from above so I am not sure where these fields are coming from (although
>> they are all valid fields)
>> e.g. following query uses the my expected set of pf and qf.
>> 
>> http://localhost:8983/solr/filesearch/select?fq=id:1193;
>> q=hello=true
>> 
>> 
>> 
>>   - parsedquery: "+DisjunctionMaxQuery(((name_token:hello)^60.0 |
>>   user_email:hello | (name_combined:hello)^10.0 | (name_zh-cn:hello)^10.0
>> |
>>   name_shingle:hello | comments:hello | user_name:hello |
>> description:hello |
>>   file_content_zh-cn:hello | file_content_de:hello | tags:hello |
>>   file_content_it:hell | file_content_fr:hello | file_content_es:hell |
>>   file_content_en:hello | id:hello)~0.01)
>> DisjunctionMaxQuery((description:hello
>>   | (name_shingle:hello)^100.0 | comments:hello | tags:hello)~0.01)",
>> 
>> 
>> On Sat, Jan 6, 2018 at 12:05 PM, Chris Hostetter >> 
>> wrote:
>> 
>>> 
>>> : Yes, i am using dismax. But dismax allows *:* for q.alt ,which also
>> seems
>>> : like inconsistency.
>>> 
>>> dismax is a *parser* that affects how a single query string is parsed.
>>> 
>>> when you use defType=dismax, that only changes how the "q" param is
>>> parsed -- not any other query string params, like "fq" or "facet.query"
>>> (or "q.alt")
>>> 
>>> when you have a request like "defType=dismax==*:*" what you are
>>> saying, and what solr is doing, is...
>>> 
>>> * YOU: hey solr, use dismax as the default parser for the q param
>>> * SEARCHHANDLER: ok, if the "q" param does not use local params to
>>> override the parser, i will use dismax
>>> * SEARCHHANDLER: hey dismax qparser, go parse the string ""
>>> * DISMAXQP: that string is empty, so instead we should use q.alt
>>> * SEARCHHANDLER: ok, i will parse the q.alt param and use that query in
>>> place of the empty q param
>>> * SEARCHHANDLER: hey lucene qparser, the string "*:*" does not use local
>>> params to override the parser, please parse it
>>> * LUCENEQP: the string "*:*" is a MatchAllDocsQuery
>>> * SEARCHHANDLER: cool, i'll use that as my main query
>>> 
>>> 
>>> 
>>> -Hoss
>>> http://www.lucidworks.com/
>>> 
>>

Re: Solr - how does faceting returned unstored values?

2018-01-05 Thread Erik Hatcher

Facets return the *indexed* value.   This is an important, ahem, facet to 
facets.   Field analysis matters, so tokenized fields will have tokenized 
facets.

Erik


> On Jan 5, 2018, at 10:17 AM, ruby  wrote:
> 
> The Solr document states that the purpose of the stored attribute is to tell
> Solr to store the original text in the index somewhere.
> 
> If that is true, then how is Solr able to return original texts when we
> facet on fields which are not stored?
> 
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr - custom ordering

2018-01-05 Thread Erik Hatcher

Vineet -

Solr’s QueryElevationComponent can do this. 

Or you could use a query like:

   q=id:C^=300 id:B^=200 id:A^=100

The ^= is a constant score syntax, so you can assign a “score” to a clause (in 
this case a single document with a unique id).   

Erik


> On Jan 4, 2018, at 11:47 PM, Vineet Mangla  wrote:
> 
> Hi,
>  
> We have a Solr cloud core where “jobid” is our primary key. We have a use 
> case where we have a list of 15000 jobids in a particular order in an 
> external system. We are calling solr with these 15000 jobids as filter query 
> and in result, we want all the jobids after filtering in the same order of 
> input. Is this possible in Solr?
>  
>  
>  
>  
> Thanks & Regards
> Vineet Mangla | Project Lead
>  
> 
> BOLD Technology Systems Pvt. Ltd.
> (formerly LiveCareer)
> Aykon Tower, Plot No. 4, Sector – 135, Noida-201301
> URL- - www.bold.com  | Cell: +91 (965) 088 0606

Re: Deliver static html content via solr

2018-01-05 Thread Erik Hatcher

Rick - fair enough, indeed.

However, for a “static” resource, no Velocity syntax or learning curve needed.  
 In fact, correcting myself, VelocityResponseWriter isn’t even part of the 
picture for serving a static resource. 

Have a look at example/files - 
https://github.com/apache/lucene-solr/tree/master/solr/example/files

The  of each page (from head.vm) pulls a “static” resource like this:

The /admin/file handler will serve the bytes of any resource in config.  

As for separate front-end app - always recommended by me, to be sure for 
real(!) applications, but for internal, one-off, quick and dirty, prototyping, 
showing off, or handy utility kinda things I’m not opposed to doing the 
Simplest Possible Thing That Works.As for security - VelocityResponseWriter 
doesn’t itself add any additional security concerns to Solr - it just 
transforms the Solr response into some textual (often HTML) format, instead of 
JSON or XML - so it itself isn’t a security concern.   What you need to do for 
Solr proper for security is a different story, but that is irrelevant to 
whether wt=velocity is in the mix.   It can actually be handy to use 
wt=velocity from inside a real app - it has been used it for generating e-mails 
in production systems and simply returning something formatted textually the 
way you want without an app template tier having to do so.   And Velocity, true 
to name, ain’t slow.

For more on /browse, VrW, and example/files usage of those, check out 
https://lucidworks.com/2015/12/08/browse-new-improved-solr-5/

Erik

> On Jan 5, 2018, at 4:19 AM, Rick Leir <rl...@leirtech.com> wrote:
> 
> Using Velocity, you can have some results-driven HTML served by Solr and all 
> your JS, CSS etc 'assets' served by Apache from /var/www/html. Warning: the 
> Velocity learning curve is steep and you still need a separate front-end web 
> app for security because Velocity is a templating output filter. Eric, please 
> correct me!
> 
> cheers -- Rick
> 
> 
> On 01/04/2018 11:45 AM, Erik Hatcher wrote:
>> All judgements aside on whether this is a preferred way to go, have a look 
>> at /browse and the VelocityResponseWriter (wt=velocity).  It can serve 
>> static resources.
>> 
>> I’ve built several prototypes this way that have been effective and business 
>> generating.
>> 
>>Erik
>> 
>>> On Jan 4, 2018, at 11:19, Matthias Geiger <matzschman...@gmail.com> wrote:
>>> 
>>> Hello,
>>> i have a web application that delivers static html content to the user.
>>> 
>>> I have been thinking about the possibility to deliver this content from
>>> solr instead of delivering it from the filesystem.
>>> This would prevent the "double" stored content (html files on file
>>> systems + additional solr cores)
>>> 
>>> Is this a viable approach or a no go?
>>> In case of a no go why do you think it is wrong
>>> 
>>> In case of the suggestion of a nosql database, what makes noSql superior to
>>> solr?
>>> 
>>> Regards and Thanks for your time
>

Re: Personalized search parameters

2018-01-05 Thread Erik Hatcher

IMO you’re making this more complicated than it needs to be.

Forget for a moment where the user profile is stored.  Say user A likes 
turtles.  User B likes puppies.

User A queries, and this gets sent to Solr:  q=something=turtles
User B queries: q=something=puppies

I’d fetch the user preference details _before_ making the call to Solr, and 
augment the call to Solr with the user-specific boosting/parameters.

If you happen to store the user preferences in a Solr document, fetch that 
document in your application tier before making the call to Solr (and I’d 
suggest using a separate collection for user preferences).   Sure, that’s two 
Solr requests, but… no worries!   Fetching a single document from Solr that is 
likely cached anyway won’t be slow.   And if you account for implementation 
time in your effort, it’s a big win. :)

For the record - this is the kind of thing we do in Lucidworks Fusion - sidecar 
collections, looking stuff up (preferences, recommendations, rules, etc etc) 
and augmenting the final/real Solr request.   Sounds kinda simplistic, and it 
is.  But the synergy of these simple things working together is Powerful Magic. 
  I’d hate to see you go down a really complicated and custom route to achieve 
what you’re asking, but I do empathize with the sentiment to roll all this 
together into a single Solr request hiding all the magic.   But simpler and 
straightforward is better than complex and custom if the end result is the same 
:)

Erik

> On Jan 5, 2018, at 6:10 AM, marco  wrote:
> 
> Hi, first of all I want to say that i'm a beginner with the whole Lucene/Solr
> environment.
> I'm trying to create a simple personalized search engine, and to do so i was
> thinking about adding a parameter user= to the uri of the query
> requests, that i would need during the scoring phase to rerank the result on
> based on the user profile (stored as a normal document).
> 
> My question is: how can i create a custom Similarity class that is able to
> retrieve a parameter passed during the request phase? I "know" from this 
> https://medium.com/@wkaichan/custom-query-parser-in-apache-solr-4634504bc5da
> 
>   
> that extending QParsePlugin I can access the request parameters, but how can
> i pass them during the whole chain of search operations so that they are
> accessible during the scoring phase?
> 
> Thank you for your help.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: trivia question: why q=: doesn't return same result as q.alt=:

2018-01-04 Thread Erik Hatcher

defType=???  Probably dismax.  It doesn’t do *:* like edismax or lucene.  

> On Jan 4, 2018, at 20:39, Nawab Zada Asad Iqbal  wrote:
> 
> Thanks Erik
> Here is the output,
> 
> http://localhost:8983/solr/filesearch/select?fq=id:1193=*:*=true
> 
> 
>   - parsedquery: "+MatchAllDocsQuery(*:*)",
> 
> 
> 
> http://localhost:8983/solr/filesearch/select?fq=id:1193=*:*=true
> 
> 
>   - parsedquery: "+DisjunctionMaxQuery((user_email:*:* | user_name:*:* |
>   tags:*:* | (name_shingle_zh-cn:, , name_shingle_zh-cn:, ,) | id:*:*)~0.01)
>   DisjunctionMaxQuery(((name_shingle_zh-cn:", , , ,"~100)^100.0 |
>   tags:*:*)~0.01)",
> 
> 
> 
> I find it perplexing as the default values for qf and pf are very different
> from above so I am not sure where these fields are coming from (although
> they are all valid fields)
> e.g. following query uses the my expected set of pf and qf.
> 
> http://localhost:8983/solr/filesearch/select?fq=id:1193=hello=true
> 
> 
> 
>   - parsedquery: "+DisjunctionMaxQuery(((name_token:hello)^60.0 |
>   user_email:hello | (name_combined:hello)^10.0 | (name_zh-cn:hello)^10.0 |
>   name_shingle:hello | comments:hello | user_name:hello | description:hello |
>   file_content_zh-cn:hello | file_content_de:hello | tags:hello |
>   file_content_it:hell | file_content_fr:hello | file_content_es:hell |
>   file_content_en:hello | id:hello)~0.01)
>   DisjunctionMaxQuery((description:hello | (name_shingle:hello)^100.0 |
>   comments:hello | tags:hello)~0.01)",
> 
> 
> 
> 
> 
> On Thu, Jan 4, 2018 at 5:22 PM, Erick Erickson 
> wrote:
> 
>> Hmm, seems odd. What happens when you attach =query? I'm curious how
>> the parsed queries differ.
>> 
>>> On Jan 4, 2018 15:14, "Nawab Zada Asad Iqbal"  wrote:
>>> 
>>> Hi,
>>> 
>>> In my SearchHandler solrconfig, i have q.alt=*:* . This allows me to run
>>> queries which only have `fq` filters and no `q`.
>>> 
>>> If I remove q.alt from the solrconfig and specify `q=*:*` in the query
>>> parameters, it does not give any results. I also tried `q=*` but of no
>>> avail.
>>> 
>>> Is there some good reason for this behavior? Since I already know a work
>>> around, this question is only for my curiosity.
>>> 
>>> 
>>> Thanks
>>> Nawab
>>> 
>>

Re: Deliver static html content via solr

2018-01-04 Thread Erik Hatcher

All judgements aside on whether this is a preferred way to go, have a look at 
/browse and the VelocityResponseWriter (wt=velocity).  It can serve static 
resources.

I’ve built several prototypes this way that have been effective and business 
generating.  

   Erik

> On Jan 4, 2018, at 11:19, Matthias Geiger  wrote:
> 
> Hello,
> i have a web application that delivers static html content to the user.
> 
> I have been thinking about the possibility to deliver this content from
> solr instead of delivering it from the filesystem.
> This would prevent the "double" stored content (html files on file
> systems + additional solr cores)
> 
> Is this a viable approach or a no go?
> In case of a no go why do you think it is wrong
> 
> In case of the suggestion of a nosql database, what makes noSql superior to
> solr?
> 
> Regards and Thanks for your time

Re: DIH XPathEntityProcessor XPath subset?

2018-01-03 Thread Erik Hatcher

Stefan -

If you pre-transform the XML, I’d personally recommend either transforming it 
into straight up Solr XML (docs/fields/values) or some other format or posting 
directly to Solr.   Avoid this DIH thing when things get complicated.

Erik

> On Jan 3, 2018, at 11:40 AM, Stefan Moises  wrote:
> 
> Hi there,
> 
> I'm trying to index a wordpress site using DIH XPathEntityProcessor... I've 
> read it only supports a subset of XPath, but I couldn't find any docs what 
> exactly is supported.
> 
> After some painful trial and error, I've found that xpath expressions like 
> the following don't work:
> 
>  xpath="/methodResponse/params/param/value/array/data/value/struct/member[name='post_title']/value/string"
>  />
> 
> I want to find elements like this ("the 'value' element after a 'member' 
> element with a name element 'post_title'"):
> 
> 
>   
> 
>   
> 
> 
> 
> 
> post_id11809
> post_titleSome 
> titel
> 
> Unfortunately that is the default output structure of Wordpress' XMLrpc calls.
> 
> My Xpath expression works e.g. when testing it with 
> https://www.freeformatter.com/xpath-tester.html but not if I try to index it 
> with Solr any ideas? Or do I have to pre-transform the XML myself to 
> match XPathEntityProcessors limited abilites?
> 
> Thanks in advance,
> 
> Stefan
> 
> -- 
> --
> 
> Stefan Moises
> Manager Research & Development
> shoptimax GmbH
> Ulmenstraße 52 H
> 90443 Nürnberg
> Tel.: 0911/25566-0
> Fax: 0911/25566-29
> moi...@shoptimax.de
> http://www.shoptimax.de
> 
> Geschäftsführung: Friedrich Schreieck
> Ust.-IdNr.: DE 814340642
> Amtsgericht Nürnberg HRB 21703
>  
>

Re: does the payload_check query parser have support for simple query parser operators?

2017-11-30 Thread Erik Hatcher

No it doesn’t.   The payload parsers currently just simple tokenize with no 
special syntax supported.  

 Erik

> On Nov 30, 2017, at 02:41, John Anonymous  wrote:
> 
> I would like to use wildcards and fuzzy search with the payload_check query
> parser. Are these supported?
> 
> {!payload_check f=text payloads='NOUN'}apple~1
> 
> {!payload_check f=text payloads='NOUN'}app*
> 
> Thanks

Re: Analyse Fieldname API

2017-11-15 Thread Erik Hatcher

Turn on your browsers developer mode and check out the HTTP requests behind the 
scenes of that page.   

Yes!   ;)

> On Nov 15, 2017, at 07:19, kumar gaurav  wrote:
> 
> Hi
> 
> Solr has panel to Analyse Fieldname i.e.
> 
> http://localhost:8983/solr/#/corename/analysis
> 
> I need an API which will return analysis information in JSON format like
> search handler .
> 
> Someone ! Is there any API regarding the same ?
> 
> Thanks in advance :)

Re: tf function query

2017-10-05 Thread Erik Hatcher

How about the query() function?  Just be clever about the query you specify ;)

> On Oct 5, 2017, at 06:14, Dmitry Kan  wrote:
> 
> Hi,
> 
> According to
> https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions
> 
> tf(field, term) requires a term as a second parameter. Is there a
> possibility to pass in an entire input query (multiterm and boolean) to the
> function?
> 
> The context here is that we don't use edismax parser to apply multifield
> boosts, but instead use a custom ranking function.
> 
> Would appreciate any thoughts,
> 
> Dmitry
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: https://semanticanalyzer.info

Re: Solr fields for Microsoft files, image files, PDF, text files

2017-09-25 Thread Erik Hatcher

Phillip - You may be interested to start with the example/files that ships with 
Solr.   It is specifically designed as a configuration (and UI!) that deals 
with indexing rich files with a bit more than other examples - it pulls out 
acronyms, e-mail addresses, and URLs from text, as well as what you’ve asked 
about, mapping content types to more friendly human types (“image” instead of 
the whole gamut of image/* content-types).

Erik

> On Sep 24, 2017, at 10:55 PM, Phillip Wu  wrote:
> 
> 
> Hi,
> I'm starting out with Solr on a Windows box.
> 
> I want to index the following documents:
> doc;docx
> xls;xlsx
> ppt
> vsd
> 
> pdf
> txt
> 
> gif;jpeg;tiff
> 
> I undersand that solr uses Apache Tika to read these file types and return an 
> xml stream back to Solr.
> For Tika image processing, I've loaded Tesseract.
> 
> To be able to search the documents, I need to define "fields" in a file 
> called meta-schema.
> 
> How do I get a list of all valid field names based on the file type? For 
> example *.doc, what "fields" exist so I choose what to store?
> 
> I'm assuming that for example, *.doc files there is metadata put into the 
> file by Microsoft Word eg.author,date and "free form" text.
> 
> So where is the list of valid fields per file type?
> 
> Also how do I search the "free form" text for a word/pattern in the Solr 
> search tool?
> 
> 
> 
>

Re: Boost by Integer value on top of query

2017-07-20 Thread Erik Hatcher

If you’re using edismax, adding a boost parameter 
`boost=num_employees=num_locations` should incorporate those integers 
into the scores.  Just try one at a time at first - you’ll likely want to wrap 
it into a single function, along the lines of something like 
`boost=mul(num_employees,num_locations)` 

Erik



> On Jul 20, 2017, at 6:35 AM, marotosg  wrote:
> 
> Hi,
> 
> I have a use where I need to boost documents based on two integer values.
> Basically I need to retrieve companies using specific criteria like Company
> name, nationality etc. 
> On top of that query I need to boost the most important ones which are
> suppose to be the ones with higher number of employees or locations around
> the world.
> 
> These are two integer fields on my Solr index. My question here is
> How can I boost  the companies with a higher number of employees or
> locations?
> 
> Thanks,
> Sergio MAroto
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Boost-by-Integer-value-on-top-of-query-tp4346948.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Field Search on Solr

2017-07-10 Thread Erik Hatcher

I recommend first understanding the Solr API, and the parameters you need to 
add the capabilities with just the /select API.   Once you are familiar with 
that, you can then learn what’s needed and apply that to the HTML and 
JavaScript.   While the /browse UI is fairly straightforward, there’s a fair  
bit of HTML, JavaScript, and Solr know-how needed to do what you’re asking.

A first step would be to try using `fq` instead of appending to `q` for things 
you want to “AND" to the query that aren’t relevancy related.

Erik

> On Jul 10, 2017, at 6:20 AM, Clare Lee  wrote:
> 
> Hello,
> 
> My name is Clare Lee and I'm working on Apache Solr-6.6.0, Solritas right
> now and I'm not able to do something I want to do. Could you help me with
> this?
> 
> I want to be able to search solr with multiple fields. With the basic
> configurations(I'm using the core techproducts and just changing the data),
> I can search like this [image: enter image description here]
> 
> 
> but I want to search like this[image: enter image description here]
> 
> 
> I want to know which file I have to look into and how I should change the
> code to do so.
> 
> I can put the space to put the additional information by copying and
> pasting this in the query_form.vm file.
> 
> 
> 
> 
> 
> but this doesn't AND the values that I put in.
> 
> I was told that I should look where the action file is(code below), but I
> cannot reach that location.
> 
>   method="GET">
> 
> 
>  
>Name:
>
> 
> 
> The below code is relevant, but I don't know how to change it. (from
> head.vm)
> 
>

Re: Slowly running OOM due to Query instances?!

2017-07-07 Thread Erik Hatcher

With generated Query’s, one has to be really careful with .equals and .hashCode 
implementations.  That may not be applicable here, but something that has 
bitten me with caching.   Note that there were fixes made in Solr 6.6 with 
PayloadScoreQuery in this regard.   See LUCENE-7808 and LUCENE-7481

Erik


> On Jul 7, 2017, at 7:01 AM, Markus Jelsma  wrote:
> 
> Hello,
> 
> This morning i spotted our QTime suddenly go up. This has been going on for a 
> few hours by now and coincides with a serious increase in heap consumption. 
> No node ran out of memory so far but either that is going to happen soon, or 
> the nodes become unusable in another manner.
> 
> I restarted one of the Solr instances and launched VisualVM at it, and some 
> other nodes that use to much heap. Starting the memory sampler, something was 
> obvious straight away.
> 
> The nodes consuming too much heap all have a serious amount of *Query, and 
> BooleanClause instances, PayloadScoreQuery, TermQuery, BoostQuery, 
> BooleanQuery, SpanTermQuery and so forth. Lots of Builder and Term instances 
> too, very distinct from the node that was just freshly restarted.
> 
> Another peculiarity, some nodes have exactly 65536 instances of TermQuery 
> and/or BoostQuery, probably unrelated but not something i would have expected 
> to see anyway.
> 
> So, what's up? We do have a custom query parser extending EdismaxQParser, it 
> transliterates dates and creates payload and span queries. I may be doing 
> something wrong but i don't know, i have made and used a variety of QParsers, 
> for many years but this is new. Any hints on where to look, what to watch out 
> for? 
> 
> Many thanks!
> Markus
> 
> Xmx 800m, 8 GB RAM, SSD
> 2 shards, three replica's
> replica size ~17 GB, 2.2 million docs/replica

Re: Boosting Documents using the field Value

2017-06-24 Thread Erik Hatcher

With dismax use bf=domain_ct. you can also use boost=domain_ct with edismax. 

> On Jun 23, 2017, at 23:01, govind nitk  wrote:
> 
> Hi Solr,
> 
> My Index Data:
> 
> id name category domain domain_ct
> 1 Banana Fruits Home > Fruits > Banana 2
> 2 Orange Fruits Home > Fruits > Orange 4
> 3 Samsung Mobile Electronics > Mobile > Samsung 3
> 
> 
> I am able to retrieve the documents with dismax parser with the weights
> mentioned as below.
> 
> http://localhost:8983/solr/my_index/select?defType=dismax=on=fruits=category
> ^0.9=name^0.7=json
> 
> 
> Is it possible to retrieve the documents with weight taken from the indexed
> field like:
> 
> http://localhost:8983/solr/my_index/select?defType=dismax=on=fruits=category
> ^domain_ct=name^domain_ct=json
> 
> Is this possible to give weight from an indexed field ? Am I doing
> something wrong?
> Is there any other way of doing this?
> 
> 
> Regards

Re: Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-19 Thread Erik Hatcher

Ziyuan -

You may be interested in the example/files that ships with Solr too.  It’s got 
schema and config and even UI for file indexing and searching.   Check it out 
README.txt under example/files in your Solr install.

Erik

> On Jun 19, 2017, at 6:52 AM, ZiYuan  wrote:
> 
> Hi Erick,
> 
> thanks very much for the explanations! Clarification for question 2: more
> specifically I cannot see the field content in the returned JSON, with the
> the same definitions as in the post
> 
> :
> 
> 
>  stored="false"/>
> 
> 
> Is it so that Tika does not fill these two fields automatically and I have
> to write some client code to fill them?
> 
> Best regards,
> Ziyuan
> 
> 
> On Sun, Jun 18, 2017 at 8:07 PM, Erick Erickson 
> wrote:
> 
>> 1> Yes, you can use your single definition. The author identifies the
>> "text" field as a catch-all. Somewhere in the schema there'll be a
>> copyField directive copying (perhaps) many different fields to the
>> "text" field. That permits simple searches against a single field
>> rather than, say, using edismax to search across multiple separate
>> fields.
>> 
>> 2> The link you referenced is for Data Import Handler, which is much
>> different than just posting files to Solr. See
>> ExtractingRequestHandler:
>> https://cwiki.apache.org/confluence/display/solr/
>> Uploading+Data+with+Solr+Cell+using+Apache+Tika.
>> There are ways to map meta-data fields from the doc into specific
>> fields matching your schema. Be a little careful here. There is no
>> standard across different types of docs as to what meta-data field is
>> included. PDF might have a "last_edited" field. Word might have a
>> "last_modified" field where the two mean the same thing. Here's a link
>> to a SolrJ program that'll dump all the fields:
>> https://lucidworks.com/2012/02/14/indexing-with-solrj/. You can easily
>> hack out the DB bits.
>> 
>> BTW, once you get more familiar with processing, I strongly recommend
>> you do the document processing on the client, the reasons are outlined
>> in that article.
>> 
>> bq: even I define the fields as he said I cannot see them in the
>> search results as keys in JSON
>> are the fields set as stored="true"? They must be to be returned in
>> requests (skipping the docValues discussion here).
>> 
>> 3> Yes, the text field is a concatenation of all the other ones.
>> Because it has stored=false, you can only search it, you cannot
>> highlight or view. Fields you highlight must have stored=true BTW.
>> 
>> Whether or not you can highlight "Trevor Hastie" depends an a lot of
>> things, most particularly whether that text is ever actually in a
>> field in your index. Just because there's no guarantee that the name
>> of the file is indexed in a searchable/highlightable way.
>> 
>> And the query q=id:Trevor Hastie won't do what you think. It'll be parsed
>> as
>> id:Trevor _text_:Hastie
>> _text_ is the default field, look for a "df" parameter in your request
>> handler in solrconfig.xml (usually "/select" or "/query").
>> 
>> On Sat, Jun 17, 2017 at 3:04 PM, ZiYuan  wrote:
>>> Hi,
>>> 
>>> I am new to Solr and I need to implement a full-text search of some PDF
>>> files. The indexing part works out of the box by using bin/post. I can
>> see
>>> search results in the admin UI given some queries, though without the
>>> matched texts and the context.
>>> 
>>> Now I am reading this post
>>> > hilight-matched-text-inside-documents-indexed-with-solr-plus-tika/>
>>> for the highlighting part. It is for an older version of Solr when
>> managed
>>> schema was not available. Before fully understand what it is doing I have
>>> some questions:
>>> 
>>> 1. He defined two fields:
>>> 
>>> >> multiValued="false"/>
>>> >> multiValued="true"/>
>>> 
>>> But why are there two fields needed? Can I define a field
>>> 
>>> >> multiValued="true"/>
>>> 
>>> to capture the full text?
>>> 
>>> 2. How are the fields filled? I don't see relevant information in
>>> TikaEntityProcessor's documentation
>>> > apache/solr/handler/dataimport/TikaEntityProcessor.html#
>> fields.inherited.from.class.org.apache.solr.handler.
>> dataimport.EntityProcessorBase>.
>>> The current text extractor should already be Tika (I can see
>>> 
>>> "x_parsed_by":
>>> ["org.apache.tika.parser.DefaultParser","org.apache.
>> tika.parser.pdf.PDFParser"]
>>> 
>>> in the returned JSON of some query). But even I define the fields as he
>>> said I cannot see them in the search results as keys in JSON.
>>> 
>>> 3. The _text_ field seems a concatenation of other fields, does it
>> contain
>>> the full text? Though it does not seem to be accessible by default.
>>> 
>>> To be brief, using The Elements of Statistical

Re: CSV output

2017-06-15 Thread Erik Hatcher

Is it the proxy affecting the output?What do you get going directly to 
Solr's endpoint?

   Erik

> On Jun 14, 2017, at 22:13, Phil Scadden  wrote:
> 
> If I try
> /getsolr? 
> fl=id,title,datasource,score=true=9000=unified=Wainui-1=AND=csv
> 
> The response I get is:
> id,title,datasource,scoreW:\PR_Reports\OCR\PR869.pdf,,Petroleum 
> Reports,8.233313W:\PR_Reports\OCR\PR3440.pdf,,Petroleum 
> Reports,8.217836W:\PR_Reports\OCR\PR4313.pdf,,Petroleum 
> Reports,8.206703W:\PR_Reports\OCR\PR3906.pdf,,Petroleum 
> Reports,8.185147W:\PR_Reports\OCR\PR1592.pdf,,Petroleum 
> Reports,8.167614W:\PR_Reports\OCR\PR998.pdf,,Petroleum 
> Reports,8.161142W:\PR_Reports\OCR\PR2457.pdf,,Petroleum 
> Reports,8.155497W:\PR_Reports\OCR\PR2433.pdf,,Petroleum 
> Reports,8.152924W:\PR_Reports\OCR\PR1184.pdf,,Petroleum 
> Reports,8.124402W:\PR_Reports\OCR\PR3551.pdf,,Petroleum Reports,8.124402
> 
> ie no newline separators at all (Solr 6.5.1) (/getsolr is api that proxy to 
> the solr server).
> Changing it to
> /getsolr?csv.newline=%0A=id,title,datasource,score=true=9000=unified=Wainui-1=AND=csv
> 
> Makes no difference. What I am doing wrong here? Is there another way to 
> specify csv parameters? It says default is \n but I am not seeing that.
> 
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.

Re: Odd Boolean Query behavior in SOLR 3.6

2017-06-13 Thread Erik Hatcher

Inner purely negative queries match nothing.  A query is about matching, and 
skipping over things that don’t match.  The fix is when using (-something) to 
do (*:* -something) to match everything and skip the negative clause items.

In your example, try fq=((*:* -documentTypeId:3) AND companyId:29096)

Erik

> On Jun 13, 2017, at 3:15 AM, abhi Abhishek  wrote:
> 
> Hi Everyone,
> 
>I have hit a weird behavior of Boolean Query, when I am
> running the query with below param’s  it’s not behaving as expected. can
> you please help me understand the behavior here?
> 
> 
> 
> q=*:*=((-documentTypeId:3)+AND+companyId:29096)=2.2=0=10=on=true
> 
> èReturns 0 matches
> 
> filter_queries: ((-documentTypeId:3) AND companyId:29096)
> 
> parsed_filter_queries: +(-documentTypeId:3) +companyId:29096
> 
> 
> 
> q=*:*=(-documentTypeId:3+AND+companyId:29096)=2.2=0=10=on=true
> 
> è returns 1600 matches
> 
> filter_queries:(-documentTypeId:3 AND companyId:29096)
> 
> parsed_filter_queries:-documentTypeId:3 +companyId:29096
> 
> 
> 
> Can you please help me understand what am I missing here?
> 
> 
> Thanks in Advance.
> 
> 
> Thanks & Best Regards,
> 
> Abhishek

Re: Phrase Query only forward direction

2017-06-12 Thread Erik Hatcher

Understood.   If you need ordered, “sloppy” (some distance) phrases, you could 
OR in a {!complexphrase} query.

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
 
<https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser>

Something like:

q=({!edismax … ps=0 v=$qq}) OR {!complexphrase df=nameSearch v=$qq}

where =12345 masitha

Erik


> On Jun 12, 2017, at 9:57 AM, Aman Deep Singh <amandeep.coo...@gmail.com> 
> wrote:
> 
> Yes Erik I can use ps=0 but, my problem is that I want phrase which have
> same sequence and they can be present with in some distance
> E.g.
> If I have document masitha xyz 12345
> I want that to be boosted since the sequence is in order .That's why I have
> use ps=5
> Thanks,
> Aman Deep Singh
> 
> On 12-Jun-2017 5:44 PM, "Erik Hatcher" <erik.hatc...@gmail.com> wrote:
> 
> Using ps=5 causes the phrase matching to be unordered matching.   You’ll
> have to set ps=0, if using edismax, to get exact order phrase matches.
> 
>Erik
> 
> 
>> On Jun 12, 2017, at 1:09 AM, Aman Deep Singh <amandeep.coo...@gmail.com>
> wrote:
>> 
>> Hi,
>> I'm using a phrase query ,but it was applying the phrase boost to the
> query
>> where terms are in reverse order also ,which i don't want.Is their any way
>> to avoid the phrase boost for reverse order and apply boost only in case
> of
>> terms are in same sequence
>> 
>> Solr version 6.5.1
>> 
>> e.g.
>> http://localhost:8983/solr/l4_collection/select?debugQuery=o
> n=edismax=score,nameSearch=on=100%25&
> pf=nameSearch=12345%20masitha=nameSearch=xml=5
>> 
>> 
>> while my document has value
>> 
>> in the debug query it is applying boost as
>> 23.28365 = sum of:
>> 15.112219 = sum of:
>> 9.669338 = weight(nameSearch:12345 in 0) [SchemaSimilarity], result of:
>> 9.669338 = score(doc=0,freq=1.0 = termFreq=1.0
>> ), product of:
>> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>> + 0.5)) from:
>> 2.0 = docFreq
>> 5197.0 = docCount
>> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
> b
>> * fieldLength / avgFieldLength)) from:
>> 1.0 = termFreq=1.0
>> 1.2 = parameter k1
>> 0.75 = parameter b
>> 5.2576485 = avgFieldLength
>> 2.56 = fieldLength
>> 5.44288 = weight(nameSearch:masitha in 0) [SchemaSimilarity], result of:
>> 5.44288 = score(doc=0,freq=1.0 = termFreq=1.0
>> ), product of:
>> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>> + 0.5)) from:
>> 70.0 = docFreq
>> 5197.0 = docCount
>> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
> b
>> * fieldLength / avgFieldLength)) from:
>> 1.0 = termFreq=1.0
>> 1.2 = parameter k1
>> 0.75 = parameter b
>> 5.2576485 = avgFieldLength
>> 2.56 = fieldLength
>> 8.171431 = weight(*nameSearch:"12345 masitha"~5 *in 0) [SchemaSimilarity],
>> result of:
>> 8.171431 = score(doc=0,freq=0.3334 = phraseFreq=0.3334
>> ), product of:
>> 11.940155 = idf(), sum of:
>> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>> + 0.5)) from:
>> 2.0 = docFreq
>> 5197.0 = docCount
>> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
>> + 0.5)) from:
>> 70.0 = docFreq
>> 5197.0 = docCount
>> 0.6843655 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
> b
>> * fieldLength / avgFieldLength)) from:
>> 0.3334 = phraseFreq=0.3334
>> 1.2 = parameter k1
>> 0.75 = parameter b
>> 5.2576485 = avgFieldLength
>> 2.56 = fieldLength
>> 
>> Thanks,
>> Aman Deep Singh

Re: Phrase Query only forward direction

2017-06-12 Thread Erik Hatcher

Using ps=5 causes the phrase matching to be unordered matching.   You’ll have 
to set ps=0, if using edismax, to get exact order phrase matches.

Erik


> On Jun 12, 2017, at 1:09 AM, Aman Deep Singh  
> wrote:
> 
> Hi,
> I'm using a phrase query ,but it was applying the phrase boost to the query
> where terms are in reverse order also ,which i don't want.Is their any way
> to avoid the phrase boost for reverse order and apply boost only in case of
> terms are in same sequence
> 
> Solr version 6.5.1
> 
> e.g.
> http://localhost:8983/solr/l4_collection/select?debugQuery=on=edismax=score,nameSearch=on=100%25=nameSearch=12345%20masitha=nameSearch=xml=5
> 
> 
> while my document has value masitha 12345
> 
> in the debug query it is applying boost as
> 23.28365 = sum of:
> 15.112219 = sum of:
> 9.669338 = weight(nameSearch:12345 in 0) [SchemaSimilarity], result of:
> 9.669338 = score(doc=0,freq=1.0 = termFreq=1.0
> ), product of:
> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 2.0 = docFreq
> 5197.0 = docCount
> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b
> * fieldLength / avgFieldLength)) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 5.2576485 = avgFieldLength
> 2.56 = fieldLength
> 5.44288 = weight(nameSearch:masitha in 0) [SchemaSimilarity], result of:
> 5.44288 = score(doc=0,freq=1.0 = termFreq=1.0
> ), product of:
> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 70.0 = docFreq
> 5197.0 = docCount
> 1.2656635 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b
> * fieldLength / avgFieldLength)) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 5.2576485 = avgFieldLength
> 2.56 = fieldLength
> 8.171431 = weight(*nameSearch:"12345 masitha"~5 *in 0) [SchemaSimilarity],
> result of:
> 8.171431 = score(doc=0,freq=0.3334 = phraseFreq=0.3334
> ), product of:
> 11.940155 = idf(), sum of:
> 7.6397386 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 2.0 = docFreq
> 5197.0 = docCount
> 4.3004165 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:
> 70.0 = docFreq
> 5197.0 = docCount
> 0.6843655 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b
> * fieldLength / avgFieldLength)) from:
> 0.3334 = phraseFreq=0.3334
> 1.2 = parameter k1
> 0.75 = parameter b
> 5.2576485 = avgFieldLength
> 2.56 = fieldLength
> 
> Thanks,
> Aman Deep Singh

Re: Proximity Search using edismax parser.

2017-06-12 Thread Erik Hatcher

Adding =true to your search requests will give you the parsing details, 
so you can see how edismax interprets the query string and parameters to turn 
it into the underlying dismax and phrase queries.

Erik

> On Jun 12, 2017, at 3:22 AM, abhi Abhishek  wrote:
> 
> Hi All,
>  How does proximity Query work in SOLR.
> 
> Example if i am running a query like below, for the field containing the
> text “India registered a historical test match win against the arch rival
> Pakistan here in Lords, England on Sunday”
> 
> Query: “Test match India Pakistan” ~ 10
> 
>I am interested in understanding the intermediate steps
> involved here to understand the search behavior and determine how results
> are being matched to the search phrase.
> 
> Thanks in Advance,
> 
> Abhishek

Re: I want "john smi" to find "john smith" in my custom "fullname_s" field

2017-06-06 Thread Erik Hatcher

Nick - try escaping the space, so that your query is q=fullname_s:john\ smi* 

However, whitespace and escaping is problematic.  There is a handy prefix query 
parser, so this would work on a string field with spaces:

q={!prefix f=fullname_s}john smi

note no trailing asterisk on that one.   Even better, IMO, is to separate the 
query string from the query parser:

q={!prefix f=fullname_s v=$qq}=john smi

Erik



Amrit - the issue with your example below is that q=fullname_s:john smi* parses 
“john” against fullname_s and “smi” as a prefix query against the default 
field, not likely fullname_s.   Check your parsed query to see exactly how it 
parsed.It works for you because… magic!   (copyField * => _text_)




> On Jun 6, 2017, at 5:14 AM, Amrit Sarkar  wrote:
> 
> Nick,
> 
> "string" is a primitive data-type and the entire value of a field is
> indexed as single token. The regex matching happens against the tokens for
> text fields and against the full content for string fields. So once a piece
> of text is tokenized, there is no way to perform a regex query across word
> boundaries.
> 
> fullname_s:john smi* is working for me.
> 
> {
>  "responseHeader":{
>"zkConnected":true,
>"status":0,
>"QTime":16,
>"params":{
>  "q":"fullname_s:john smi*",
>  "indent":"on",
>  "wt":"json"}},
>  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
>  {
>"id":"1",
>"fullname_s":"john smith",
>"_version_":1569446064473243648}]
>  }}
> 
> I am on Solr 6.5.0. What version you are on?
> 
> 
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> 
> On Tue, Jun 6, 2017 at 1:30 PM, Nick Way 
> wrote:
> 
>> Hi - I have a Solr collection with a custom field "fullname_s" (a string).
>> 
>> I want "john smi" to find "john smith" (I lower-cased the names upon
>> indexing them)
>> 
>> I have tried
>> 
>> fullname_s:"john smi*"
>> fullname_s:john smi*
>> fullname_s:"john smi?"
>> fullname_s:john smi?
>> 
>> 
>> but nothing gives the expected result - am I missing something? I spent
>> hours on this one point yesterday so if anyone can please point me in the
>> right direction I'd be really grateful.
>> 
>> I'm using Solr with Adobe Coldfusion by the way but I think the principles
>> are the same.
>> 
>> Thank you!
>> 
>> Nick
>>

Re: Velocity UI with Analyzing Infix Suggester?

2017-06-06 Thread Erik Hatcher

Walter -

I’ve done several one-off demos that have incorporated as-you-type Ajax actions 
into /browse.   The first one I did was “instant search” (not suggest) and left 
that sitting over at my “instant_search” branch - of svn(!).  See the top two 
commits listed here: 
https://github.com/erikhatcher/lucene-solr-svn/commits/instant_search

Lately I’ve been building typeahead solutions using a separate collection 
rather than the Suggester component and wiring that into /browse with just this 
sort of thing:

$(function() { $(‘#search_box').bind("keyup",load_results); });

where load_results() does this:

  $(‘#results’).load(…url with q=…)

It’s awesome to hear you use wt=velocity - made my day!   And by “in 6.5.1” you 
mean it is in the way old tech products configset where it uses an ancient 
jquery.autocomplete feature.  You could probably adapt that bit 
straightforwardly to another endpoint and adjusting the `extraParams` in there 
appropriately.  The trick used here is that the response from /terms is simply 
a single suggestion per line in plain text, by way of using wt=velocity with 
v.template=suggest:

#foreach($t in $response.response.terms.name)
  $t.key
#end

Adjust that template to deal with your suggester end-point response so that it 
writes out one per line as plain text and you’re there.Happy to help 
further if you run into any issues.

And yes, it’d be nice if this got built-in more modernly into the out of the 
box /browse.  If you want to open a JIRA and hack through it together I’m game.

Erik

> On Jun 5, 2017, at 4:14 PM, Walter Underwood  wrote:
> 
> Does anyone have the new suggester working in the Velocity browse UI? In 
> 6.5.1, it uses the terms component.
> 
> I could probably figure out how to do that in Velocity, but if someone has 
> already done that, it would be great.
> 
> We use the Velocity UI as an internal exploration and diagnostic search page.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>

Re: Long string in fq value parameter, more than 2000000 chars

2017-05-27 Thread Erik Hatcher

Another technique to consider is {!join}.  Index the cross ref id "sets" to 
another core and use a short and sweet join, if there are stable sets of id's. 

   Erik

> On May 27, 2017, at 11:39, Alexandre Rafalovitch  wrote:
> 
> On top of Shawn's analysis, I am also wondering how often those FQ
> queries are reused. Because they and the matching documents are
> getting cached, so there might be quite a bit of space taken with that
> too.
> 
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
> 
> 
>> On 27 May 2017 at 11:32, Shawn Heisey  wrote:
>>> On 5/27/2017 9:05 AM, Shawn Heisey wrote:
 On 5/27/2017 7:14 AM, Daniel Angelov wrote:
 I would like to ask, what could be the memory/cpu impact, if the fq
 parameter in many of the queries is a long string (fq={!terms
 f=...}..., ) around 200 chars. Most of the queries are like:
 "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria".
 This is with SolrCloud 4.1, on 10 hosts, 3 collections, summary in
 all collections are around 1000 docs. The queries are over all 3
 collections.
>> 
>> Followup after a little more thought:
>> 
>> If we assume that the terms in your filter query are a generous 15
>> characters each (plus a comma), that means there are in the ballpark of
>> 125 thousand of them in a two million byte filter query.  If they're
>> smaller, then there would be more.  Considering 56 bytes of overhead for
>> each one, there's at least another 7 million bytes of memory for 125000
>> terms when the terms parser divides that filter into multiple String
>> objects, plus memory required for the data in each of those small
>> strings, which will be just a little bit less than the original four
>> million bytes, because it will exclude the commas.  A fair amount of
>> garbage will probably also be generated in order to parse the filter ...
>> and then once the query is done, the 15 megabytes (or more) of memory
>> for the strings will also be garbage.  This is going to repeat for every
>> shard.
>> 
>> I haven't even discussed what happens for memory requirements on the
>> Lucene frange parser, because I don't have any idea what those are, and
>> you didn't describe the function you're using.  I also don't know how
>> much memory Lucene is going to require in order to execute a terms
>> filter with at least 125K terms.  I don't imagine it's going to be small.
>> 
>> Thanks,
>> Shawn
>>

Re: knowing which fields were successfully hit

2017-05-16 Thread Erik Hatcher

Is this the equivalent of facet.query’s?   or maybe rather, group.query?

Erik



> On May 16, 2017, at 1:16 PM, Dorian Hoxha  wrote:
> 
> Something like elasticsearch named-queries, right
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html
> ?
> 
> 
> On Tue, May 16, 2017 at 7:10 PM, John Blythe  wrote:
> 
>> sorry for the confusion. as in i received results due to matches on field x
>> vs. field y.
>> 
>> i've gone w a highlighting solution for now. the fact that it requires
>> field storage isn't yet prohibitive for me, so can serve well for now. open
>> to any alternative approaches all the same
>> 
>> thanks-
>> 
>> --
>> *John Blythe*
>> Product Manager & Lead Developer
>> 
>> 251.605.3071 | j...@curvolabs.com
>> www.curvolabs.com
>> 
>> 58 Adams Ave
>> Evansville, IN 47713
>> 
>> On Tue, May 16, 2017 at 11:37 AM, David Hastings <
>> hastings.recurs...@gmail.com> wrote:
>> 
>>> what do you mean "hit?" As in the user clicked it?
>>> 
>>> On Tue, May 16, 2017 at 11:35 AM, John Blythe 
>> wrote:
>>> 
 hey all. i'm sending data out that could represent a purchased item or
>> a
 competitive alternative. when the results are returned i'm needing to
>>> know
 which of the two were hit so i can serve up the *other*.
 
 i can make a blunt instrument in the application layer to simply look
>>> for a
 match between the queried terms and the resulting fields, but the
>> problem
 of fuzzy matching and some of the special analysis being done to get
>> the
 hits will be for naught.
 
 cursory googling landed me at a similar discussion that suggested using
>>> hit
 highlighting or retrieving the debuggers explain data to sort through.
 
 is there another, more efficient means or are these the two tools in
>> the
 toolbox?
 
 thanks!
 
>>> 
>>

Re: Solr Features 6.5.1 v/s 6.1

2017-05-12 Thread Erik Hatcher

Sweta -

There’s been an enormous number of changes between 6.1 and 6.5.1.  See CHANGES: 

 
https://github.com/apache/lucene-solr/blob/master/solr/CHANGES.txt#L439-L1796 


wow, huh?

And yes, there have been dramatic improvements (Solr 6.5+) in multi-word 
synonym handling, see Steve’s blog here for details: 

   
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
 


As for your other questions, not quite sure exactly what you mean on those.  
What features/improvements are you looking for specifically here?

Erik


> On May 12, 2017, at 8:39 AM, Sweta Parekh  wrote:
> 
> Hi Team,
> Can you please help me with new features, enhancements and improvements on 
> Solr 6.5.1 v/s 6.1 as we are planning to upgrade the version.
> * Has there been major improvement in multi-term / phrase synonyms 
> and match mode
> 
> * Can we perform secondary search using different mm to find better 
> results like auto relax mm
> 
> * Any new update in results exclusion, elevation etc..
> 
> 
> Regards,
> Sweta Parekh
> Search / CRO - Associate Program Manager
> Digital Marketing Services
> sweta.par...@clerx.com
> Extn: 284887 | Mobile: +(91) 9004667625
> eClerx Services Limited [www.eClerx.com]
>

Re: Dynamic facets during runtime

2017-05-12 Thread Erik Hatcher

Use "appends" instead of "defaults". 

> On May 11, 2017, at 23:23, Jeyaprakash Singarayar  
> wrote:
> 
> Hi,
> 
> Our application has a facet select admin screen UI that would allow the
> users to add/update/delete the facets that has to be returned from Solr.
> 
> Right now we have the facet fields defined in the defaults of
> requestHandler.
> 
> So if a user wanted a new facet, I know sending that newly selected facet
> with the query would override the list in the solrconfig.xml
> 
> If there any better way rather than making all the facets sent through
> querytime.
> 
> Thanks,
> Jeyaprakash

Re: Automatic conversion to Range Query

2017-05-07 Thread Erik Hatcher

Fair enough indeed.   And as you've experienced, that other functionality 
includes syntax that needs escaping.   If you're using SolrJ then there's a 
utility method to escape characters.  

Erik

> On May 6, 2017, at 20:53, Aman Deep Singh <amandeep.coo...@gmail.com> wrote:
> 
> Hi Erik,
> We can't use dismax as we are using the other functionality of edismax
> parser
> 
> On 07-May-2017 12:13 AM, "Erik Hatcher" <erik.hatc...@gmail.com> wrote:
> 
> What about dismax instead of edismax?It might do the righter thing here
> without escaping.
> 
>>> On May 6, 2017, at 12:57, Shawn Heisey <apa...@elyograg.org> wrote:
>>> 
>>> On 5/6/2017 7:09 AM, Aman Deep Singh wrote:
>>> After escaping the square bracket the query is working fine, Is their
>>> any way in the parser to avoid the automatic conversion if not proper
>>> query will be passed like in my case even though I haven't passed
>>> proper range query (with keyword TO).
>> 
>> If you use characters special to the query parser but don't want them
>> acted on by the query parser, then they need to be escaped.  That's just
>> how things work, and it's not going to change.
>> 
>> Thanks,
>> Shawn
>>

Re: Automatic conversion to Range Query

2017-05-06 Thread Erik Hatcher

What about dismax instead of edismax?It might do the righter thing here 
without escaping. 

> On May 6, 2017, at 12:57, Shawn Heisey  wrote:
> 
>> On 5/6/2017 7:09 AM, Aman Deep Singh wrote:
>> After escaping the square bracket the query is working fine, Is their
>> any way in the parser to avoid the automatic conversion if not proper
>> query will be passed like in my case even though I haven't passed
>> proper range query (with keyword TO).
> 
> If you use characters special to the query parser but don't want them
> acted on by the query parser, then they need to be escaped.  That's just
> how things work, and it's not going to change.
> 
> Thanks,
> Shawn
>

Re: Import Handler using shell scripts

2017-04-28 Thread Erik Hatcher

Yes, via the HTTP API (via curl or other tool).  See the commands and URL 
examples here: 
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-DataImportHandlerCommands

> On Apr 28, 2017, at 2:14 PM, Vijay Kokatnur  wrote:
> 
> Is it possible to call dataimport handler from a shell script?  I have not
> found any documentation regarding this. Any pointers?
> 
> -- 
> Best,
> Vijay

Re: Modify solr score

2017-04-22 Thread Erik Hatcher

This may be suggesting a solution that is too experimental or using the wrong 
hammer for the job, but to me it sounds like you could use “payloads” for this 
type of ranking of terms relationship to a document.   

See SOLR-1485 for the recent work I’ve been doing (and aim to get committed 
soon).   You could index documents in this way:

   id, weighted_terms_dpf
   1, A|5.0 B|95.0
2,A|88.7 B|0.1

And then search for “A” and use the 88.7 value to factor into the score or 
sorting.  

Erik



> On Apr 21, 2017, at 12:35 PM, tstusr  wrote:
> 
> Since we report the score, we think there will be some relation between them.
> As far as we know scoring (and then ranking) are calculated based on tf-idf.
> 
> What we want to do is to make a qualitative ranking, it means, according to
> one topic we will tag documents as "very related", "fairly related" or "poor
> related". So, we select some documents completely unrelated to a topic.
> 
> On a very related document we found a ratio of ~2% of words that reports
> ~0.85 of score (what we think is related to ranking). On a test document we
> found a ratio of less than 0.01% and the score is heigher than the first
> one. What we expect is that documents not related (those ones with less
> ratio) report lower scores so we can then use them as minimum and create the
> scale.
> 
> We came with multiply (of affect in some way) the default rank solr provide
> us with the ratio of documents so unrelated documents will be penalized
> while those with higher ratio values will be overrated.
> 
> Greetings, and thanks for your help.
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Modify-solr-score-tp4331300p4331315.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filter if Field Exists

2017-04-17 Thread Erik Hatcher

If you need to do an inner purely negative clause, it must be OR’d with *:* - 
queries about matching not excluding.  There’s a shortcut in Solr to allow a 
top-level purely negative clause as convenience but when it gets nested it 
needs pairing explicitly.

Those queries below don’t quite do what you’re asking for though.

Erik



> On Apr 17, 2017, at 7:18 AM, Furkan KAMACI  wrote:
> 
> Btw, what is the difference between
> 
> +name:test +(type:research (*:* -type:[* TO *]))
> 
> and
> 
> +name:test +(type:research -type:[* TO *])
> 
> On Mon, Apr 17, 2017 at 1:33 PM, Furkan KAMACI 
> wrote:
> 
>> Actually, amount of documents which have 'type' field is relatively too
>> small across all documents at index.
>> 
>> On Mon, Apr 17, 2017 at 7:08 AM, Alexandre Rafalovitch >> wrote:
>> 
>>> What about setting a default value for the field? That is probably
>>> faster than negative search clauses?
>>> 
>>> Regards,
>>>   Alex.
>>> 
>>> http://www.solr-start.com/ - Resources for Solr users, new and
>>> experienced
>>> 
>>> 
>>> On 16 April 2017 at 23:58, Mikhail Khludnev  wrote:
 +name:test +(type:research (*:* -type:[* TO *]))
 
 On Sun, Apr 16, 2017 at 11:47 PM, Furkan KAMACI  Hi,
> 
> I have a schema like:
> 
> name,
> department,
> type
> 
> type is an optional field. Some documents don't have that field. Let's
> assume I have these:
> 
> Doc 1:
> name: test
> type: research
> 
> Doc 2:
> name: test
> type: developer
> 
> Doc 3:
> name: test
> 
> I want to search name: test and type:research if type field exists
>>> (result
> will be Doc 1 and Doc 3).
> 
> How can I do that?
> 
> Kind Regards,
> Furkan KAMACI
> 
 
 
 
 --
 Sincerely yours
 Mikhail Khludnev
>>> 
>> 
>>

Re: Filter if Field Exists

2017-04-17 Thread Erik Hatcher

Too many ‘+’’s in there, I think.  

I think the query you want is this, and let’s be precise about the query parser 
here too in case that’s getting in the way and split this stuff up into 
separate reusable clauses:

?qq=test
_no_type=({!field f=name v=$qq} -type:*)
_research=({!field f=name v=$qq} type:research)
={!lucene}${q_no_type) OR ${q_research}

Sorry, that’s just how my brain thinks, above - in splitting this stuff out.   
But in one line it’s really:

   q=(name:test -type:*) OR (name:test AND type:research)

Erik



> On Apr 17, 2017, at 7:22 AM, Furkan KAMACI  wrote:
> 
> On the other hand, that query does not do what I want.
> 
> On Mon, Apr 17, 2017 at 2:18 PM, Furkan KAMACI 
> wrote:
> 
>> Btw, what is the difference between
>> 
>> +name:test +(type:research (*:* -type:[* TO *]))
>> 
>> and
>> 
>> +name:test +(type:research -type:[* TO *])
>> 
>> On Mon, Apr 17, 2017 at 1:33 PM, Furkan KAMACI 
>> wrote:
>> 
>>> Actually, amount of documents which have 'type' field is relatively too
>>> small across all documents at index.
>>> 
>>> On Mon, Apr 17, 2017 at 7:08 AM, Alexandre Rafalovitch <
>>> arafa...@gmail.com> wrote:
>>> 
 What about setting a default value for the field? That is probably
 faster than negative search clauses?
 
 Regards,
   Alex.
 
 http://www.solr-start.com/ - Resources for Solr users, new and
 experienced
 
 
 On 16 April 2017 at 23:58, Mikhail Khludnev  wrote:
> +name:test +(type:research (*:* -type:[* TO *]))
> 
> On Sun, Apr 16, 2017 at 11:47 PM, Furkan KAMACI <
 furkankam...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> I have a schema like:
>> 
>> name,
>> department,
>> type
>> 
>> type is an optional field. Some documents don't have that field. Let's
>> assume I have these:
>> 
>> Doc 1:
>> name: test
>> type: research
>> 
>> Doc 2:
>> name: test
>> type: developer
>> 
>> Doc 3:
>> name: test
>> 
>> I want to search name: test and type:research if type field exists
 (result
>> will be Doc 1 and Doc 3).
>> 
>> How can I do that?
>> 
>> Kind Regards,
>> Furkan KAMACI
>> 
> 
> 
> 
> --
> Sincerely yours
> Mikhail Khludnev
 
>>> 
>>> 
>>

Re: Solr/ Velocity dont show full field value

2017-04-11 Thread Erik Hatcher

#field() is defined in _macros.vm as this monstrosity:

# TODO: make this parameterized fully, no context sensitivity
#macro(field $f)
  #if($response.response.highlighting.get($docId).get($f).get(0))
#set($pad = "")
  #foreach($v in $response.response.highlighting.get($docId).get($f))
$pad$v##  #TODO: $esc.html() or maybe make that optional?
#set($pad = " ... ")
  #end
  #else
$esc.html($display.list($doc.getFieldValues($f), ", "))
  #end
#end
Basically that’s saying if there is highlighting returned for the specified 
field, then render it, otherwise render the full field value.  
$doc.getFieldValue() won’t ever work with highlighting - it’s the raw returned 
field value (or empty, potentially) - highlighting has to be looked up 
separately and that’s what the #field() macro tries to do - make it look a bit 
more seamless and slick, to just do #field(“field_name”).  But it does rely on 
highlighting working - so try the json or xml response until you get the 
highlighting configured as needed.

Erik

> On Apr 11, 2017, at 6:14 AM, Hamso  wrote:
> 
> Hey guys,
> I have a problem:
> 
> In Velocity:
> 
> *Beschreibung:*#field('LONG_TEXT')
> 
> In Solr the field "LONG_TEXT" dont show everything only the first ~90-110
> characters.
> But if I set "$doc.getFieldValue('LONG_TEXT')" in the Velocity file, then he
> show me everything whats inside in the field "LONG_TEXT".
> But there is one problem, if I use "$doc.getFieldValue('LONG_TEXT')" instead
> of #field('LONG_TEXT'), the highlight doesnt work.
> Can someone please help me, why #field('LONG_TEXT') doesnt show everthing
> whats inside the field, or why highlighting with
> "$doc.getFieldValue('LONG_TEXT')" doesnt work.
> 
> Schema.xml:
> 
>   />
> 
> positionIncrementGap="100">
>
>  
>   ignoreCase="true"/>
>  
>   maxGramSize="500"/>
> 
>
>  
>   ignoreCase="true"/>
>   ignoreCase="true" synonyms="synonyms.txt"/>
>  
>
> 
> 
> solrconfig only in /browse:
> 
>   on
>   LONG_TEXT
>   true
>   html
>   b
>   /b
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Velocity-dont-show-full-field-value-tp4329290.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Pagination bug? when sorting by a field (not unique field)

2017-03-29 Thread Erik Hatcher

Certainly not intended behavior.  Can you show us a way to replicate the issue?


> On Mar 29, 2017, at 8:35 AM, Pablo Anzorena  wrote:
> 
> Hey,
> 
> I was paginating the results of a query and noticed that some documents
> were repeated across pagination buckets of 100 rows.
> When I sort by the unique field there is no repeated document but when I
> sort by another field then repeated documents appear.
> I assume is a bug and it's not the intended behaviour, right?
> 
> Solr version:5.2.1
> 
> Regards,
> Pablo.

Re: I want to contribute custom made NLP based solr filters but dont know how.

2017-03-07 Thread Erik Hatcher

Nice use of the VelocityResponseWriter :)

(and looks like, at quick glance, several other goodies under there too) 

Erik


> On Mar 5, 2017, at 7:40 AM, Avtar Singh Mehra  wrote:
> 
> Hello everyone,
> I have developed project called WiseOwl which is basically a fact based
> question answering system which can be accessed at :
> https://github.com/asmehra95/wiseowl
> 
> In the process of making the project work i have developed pluggable solr
> filters optimised for solr 6.3.0.
> I would like to donate them to solr.
> 1. *WiseOwlStanford Filter* :It uses StanfordCoreNLP to tag named entities
> and it also normalises Dates during indexing or searching. DEmonstration
> screenshots are available on the github profile. But i don't know how to
> donate them.
> 
> If there is a way then please let me know. As it may be useful for anyone
> doing natural language processing.

Re: Using parameter values in a sort

2017-03-01 Thread Erik Hatcher

FYI - I recalled, and located, a solr-user thread from 2015 with subject 
“Parameter Substitution” with this same issue.   Tricky issue - overloaded `${` 
usage and interpretation time.

Erik

> On Mar 1, 2017, at 1:41 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
> 
> I just found a workaround, check out this trick:
> 
> ${sort_field:${sort_field}} desc
> 
> when the core is loaded, it looks for a system property “sort_field”, doesn’t 
> find it, and defaults the value to ${sort_field} and voila:
> 
>/browse?q=*:*=xml_field=id
> 
>   Erik
> 
> 
>> On Mar 1, 2017, at 1:14 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>> 
>> Walter -
>> 
>> Apologies for not trying this sooner first-hand.   I’m used to passing in 
>> all the params (even the dynamic ${…} ones) in the request these days, not 
>> so much putting them into request handler definitions.
>> 
>> I finally tried it with a default (master/trunk) with modifying the /browse 
>> handler with this:
>> 
>> > useParams="query,facets,velocity,browse">
>>   
>> explicit
>> ${sort_field} desc
>>   
>> 
>> 
>> And get this startup error:
>> 
>> Caused by: org.apache.solr.common.SolrException: No system property or 
>> default value specified for sort_field value:${sort_field} desc
>>  at 
>> org.apache.solr.util.PropertiesUtil.substituteProperty(PropertiesUtil.java:65)
>>  at org.apache.solr.util.DOMUtil.substituteProperties(DOMUtil.java:303)
>> 
>> *sigh* and sorry for leading you astray.   Definitely a Solr bug.
>> 
>> However, this technique does work when everything is in the params:
>> 
>>   /select?q=*:*=${sort_field}%20desc_field=id
>> 
>> @Yonik or others - is this a known/filed issue?Workarounds or escaping 
>> that could make it work?
>> 
>>  Erik
>> 
>> 
>>> On Feb 27, 2017, at 10:39 PM, Walter Underwood <wun...@wunderwood.org> 
>>> wrote:
>>> 
>>> No, I tried that before adding the default.
>>> 
>>> But the solrconfig.xml is rejected before there is a request, so this is 
>>> not about requests. I did try “scores” in the requests, but of course it 
>>> didn’t work because the solrconfig.xml was not loaded. I did not turn off 
>>> parameter substitutions. This is a pretty vanilla solrconfig.xml.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Feb 27, 2017, at 6:44 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>>>> 
>>>> `scores` (plural), you’ve got this below:   
>>>> 
>>>> Remove that, and like my previous e-mail, and use `scores` (plural) from 
>>>> the request and _should_ work?
>>>> 
>>>>Erik
>>>> 
>>>>> On Feb 27, 2017, at 9:42 PM, Walter Underwood <wun...@wunderwood.org> 
>>>>> wrote:
>>>>> 
>>>>> I’ve passed in a score parameter, but the solrconfig.xml is rejected 
>>>>> before any requests.
>>>>> 
>>>>> Pretty ready to give up. The documentation around function queries and 
>>>>> params is not working for me, though I’ve been using Solr for ten years. 
>>>>> I have figured out a lot of systems. This is impenetrable.
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wun...@wunderwood.org
>>>>> http://observer.wunderwood.org/  (my blog)
>>>>> 
>>>>> 
>>>>>> On Feb 27, 2017, at 6:35 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>>>>>> 
>>>>>> You have an empty “scores” parameter in there.  You’re not showing your 
>>>>>> full search request, but did you provide that in the request?   Have you 
>>>>>> perhaps turned off parameter substitutions?
>>>>>> 
>>>>>>  Erik
>>>>>> 
>>>>>>> On Feb 27, 2017, at 9:26 PM, Walter Underwood <wun...@wunderwood.org> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> With this in the config…
>>>>>>> 
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> edismax
>>>>>>> 0
>>>>>>> false
>>>>>>> id,
>>>>

Re: Using parameter values in a sort

2017-03-01 Thread Erik Hatcher

I just found a workaround, check out this trick:

 ${sort_field:${sort_field}} desc

when the core is loaded, it looks for a system property “sort_field”, doesn’t 
find it, and defaults the value to ${sort_field} and voila:

/browse?q=*:*=xml_field=id

Erik


> On Mar 1, 2017, at 1:14 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
> 
> Walter -
> 
> Apologies for not trying this sooner first-hand.   I’m used to passing in all 
> the params (even the dynamic ${…} ones) in the request these days, not so 
> much putting them into request handler definitions.
> 
> I finally tried it with a default (master/trunk) with modifying the /browse 
> handler with this:
> 
>   useParams="query,facets,velocity,browse">
>
>  explicit
>  ${sort_field} desc
>
>  
> 
> And get this startup error:
> 
> Caused by: org.apache.solr.common.SolrException: No system property or 
> default value specified for sort_field value:${sort_field} desc
>   at 
> org.apache.solr.util.PropertiesUtil.substituteProperty(PropertiesUtil.java:65)
>   at org.apache.solr.util.DOMUtil.substituteProperties(DOMUtil.java:303)
> 
> *sigh* and sorry for leading you astray.   Definitely a Solr bug.
> 
> However, this technique does work when everything is in the params:
> 
>/select?q=*:*=${sort_field}%20desc_field=id
> 
> @Yonik or others - is this a known/filed issue?Workarounds or escaping 
> that could make it work?
> 
>   Erik
> 
> 
>> On Feb 27, 2017, at 10:39 PM, Walter Underwood <wun...@wunderwood.org> wrote:
>> 
>> No, I tried that before adding the default.
>> 
>> But the solrconfig.xml is rejected before there is a request, so this is not 
>> about requests. I did try “scores” in the requests, but of course it didn’t 
>> work because the solrconfig.xml was not loaded. I did not turn off parameter 
>> substitutions. This is a pretty vanilla solrconfig.xml.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Feb 27, 2017, at 6:44 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>>> 
>>> `scores` (plural), you’ve got this below:   
>>> 
>>> Remove that, and like my previous e-mail, and use `scores` (plural) from 
>>> the request and _should_ work?
>>> 
>>> Erik
>>> 
>>>> On Feb 27, 2017, at 9:42 PM, Walter Underwood <wun...@wunderwood.org> 
>>>> wrote:
>>>> 
>>>> I’ve passed in a score parameter, but the solrconfig.xml is rejected 
>>>> before any requests.
>>>> 
>>>> Pretty ready to give up. The documentation around function queries and 
>>>> params is not working for me, though I’ve been using Solr for ten years. I 
>>>> have figured out a lot of systems. This is impenetrable.
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>> 
>>>>> On Feb 27, 2017, at 6:35 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>>>>> 
>>>>> You have an empty “scores” parameter in there.  You’re not showing your 
>>>>> full search request, but did you provide that in the request?   Have you 
>>>>> perhaps turned off parameter substitutions?
>>>>> 
>>>>>   Erik
>>>>> 
>>>>>> On Feb 27, 2017, at 9:26 PM, Walter Underwood <wun...@wunderwood.org> 
>>>>>> wrote:
>>>>>> 
>>>>>> With this in the config…
>>>>>> 
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>>  edismax
>>>>>>  0
>>>>>>  false
>>>>>>  id,
>>>>>>  image_thumb_large, image_thumb_medium, image_thumb_small,
>>>>>>  image_thumb_xlarge, uri, user_id, last_name, first_name,
>>>>>>  name, school, major, graduation_year, tutor_profile_id,
>>>>>>  positive_reviews, negative_reviews, gender, about_experience,
>>>>>>  about_extracurricular, time_approved
>>>>>>  *:*
>>>>>>  about_experience  about_extracurricular School  
>>>>>> Major
>>>>>>  
>>>>>>  >>>>> name="sort">sum(interaction_responsiveness_score,profile_completeness_score,school_score,us_

Re: Using parameter values in a sort

2017-03-01 Thread Erik Hatcher

Walter -

Apologies for not trying this sooner first-hand.   I’m used to passing in all 
the params (even the dynamic ${…} ones) in the request these days, not so much 
putting them into request handler definitions.

I finally tried it with a default (master/trunk) with modifying the /browse 
handler with this:

  

  explicit
  ${sort_field} desc

  

And get this startup error:

Caused by: org.apache.solr.common.SolrException: No system property or default 
value specified for sort_field value:${sort_field} desc
at 
org.apache.solr.util.PropertiesUtil.substituteProperty(PropertiesUtil.java:65)
at org.apache.solr.util.DOMUtil.substituteProperties(DOMUtil.java:303)

*sigh* and sorry for leading you astray.   Definitely a Solr bug.

However, this technique does work when everything is in the params:

/select?q=*:*=${sort_field}%20desc_field=id

@Yonik or others - is this a known/filed issue?Workarounds or escaping that 
could make it work?

Erik


> On Feb 27, 2017, at 10:39 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> No, I tried that before adding the default.
> 
> But the solrconfig.xml is rejected before there is a request, so this is not 
> about requests. I did try “scores” in the requests, but of course it didn’t 
> work because the solrconfig.xml was not loaded. I did not turn off parameter 
> substitutions. This is a pretty vanilla solrconfig.xml.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Feb 27, 2017, at 6:44 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>> 
>> `scores` (plural), you’ve got this below:   
>> 
>> Remove that, and like my previous e-mail, and use `scores` (plural) from the 
>> request and _should_ work?
>> 
>>  Erik
>> 
>>> On Feb 27, 2017, at 9:42 PM, Walter Underwood <wun...@wunderwood.org> wrote:
>>> 
>>> I’ve passed in a score parameter, but the solrconfig.xml is rejected before 
>>> any requests.
>>> 
>>> Pretty ready to give up. The documentation around function queries and 
>>> params is not working for me, though I’ve been using Solr for ten years. I 
>>> have figured out a lot of systems. This is impenetrable.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Feb 27, 2017, at 6:35 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>>>> 
>>>> You have an empty “scores” parameter in there.  You’re not showing your 
>>>> full search request, but did you provide that in the request?   Have you 
>>>> perhaps turned off parameter substitutions?
>>>> 
>>>>Erik
>>>> 
>>>>> On Feb 27, 2017, at 9:26 PM, Walter Underwood <wun...@wunderwood.org> 
>>>>> wrote:
>>>>> 
>>>>> With this in the config…
>>>>> 
>>>>> 
>>>>>  
>>>>> 
>>>>>   edismax
>>>>>   0
>>>>>   false
>>>>>   id,
>>>>>   image_thumb_large, image_thumb_medium, image_thumb_small,
>>>>>   image_thumb_xlarge, uri, user_id, last_name, first_name,
>>>>>   name, school, major, graduation_year, tutor_profile_id,
>>>>>   positive_reviews, negative_reviews, gender, about_experience,
>>>>>   about_extracurricular, time_approved
>>>>>   *:*
>>>>>   about_experience  about_extracurricular School  
>>>>> Major
>>>>>   
>>>>>   >>>> name="sort">sum(interaction_responsiveness_score,profile_completeness_score,school_score,us_tax_id_score,highlight_score,${scores})
>>>>>  desc
>>>>>   log(sum(1,max(positive_reviews,0)))
>>>>>   0.1
>>>>> 
>>>>> 
>>>>> 
>>>>> I see this… [Solr 6.3.0]
>>>>> 
>>>>> org.apache.solr.common.SolrException: Unable to reload core 
>>>>> [tutors_shard1_replica11]
>>>>>   at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:950)
>>>>>   at 
>>>>> org.apache.solr.core.SolrCore.lambda$getConfListener$6(SolrCore.java:2708)
>>>>>   at 
>>>>> org.apache.solr.cloud.ZkController.lambda$fireEventListeners$4(ZkController.java:2448)
>>>>>   at java.lang.Thread.run(Thread.java:745)
>>>>> Caused by: org.apache.solr.common.SolrException: Could n

Re: Using parameter values in a sort

2017-02-27 Thread Erik Hatcher

It looks like you’ve got score/scores mismatching going on.   

I feel your frustration through e-mail.   It’s voodoo for sure, but hopefully 
it’s just typos that are getting  you now.  I’d simplify it down to an example 
like I provided without mixing in any other variables or params (even for us 
seasoned pros, simplification is often the root of problem solving in 
frustrating times; get back to what works) and let’s see the request (and 
echoParam=all output when the request is successful).

Erik


> On Feb 27, 2017, at 9:48 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> I added that line because I was getting an error about it being undefined.
> 
>   
> 
> At this point, I’m just doing random shit hoping it will work. There is not 
> enough documentation to use this.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Feb 27, 2017, at 6:44 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>> 
>> `scores` (plural), you’ve got this below:   
>> 
>> Remove that, and like my previous e-mail, and use `scores` (plural) from the 
>> request and _should_ work?
>> 
>>  Erik
>> 
>>> On Feb 27, 2017, at 9:42 PM, Walter Underwood <wun...@wunderwood.org> wrote:
>>> 
>>> I’ve passed in a score parameter, but the solrconfig.xml is rejected before 
>>> any requests.
>>> 
>>> Pretty ready to give up. The documentation around function queries and 
>>> params is not working for me, though I’ve been using Solr for ten years. I 
>>> have figured out a lot of systems. This is impenetrable.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Feb 27, 2017, at 6:35 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>>>> 
>>>> You have an empty “scores” parameter in there.  You’re not showing your 
>>>> full search request, but did you provide that in the request?   Have you 
>>>> perhaps turned off parameter substitutions?
>>>> 
>>>>Erik
>>>> 
>>>>> On Feb 27, 2017, at 9:26 PM, Walter Underwood <wun...@wunderwood.org> 
>>>>> wrote:
>>>>> 
>>>>> With this in the config…
>>>>> 
>>>>> 
>>>>>  
>>>>> 
>>>>>   edismax
>>>>>   0
>>>>>   false
>>>>>   id,
>>>>>   image_thumb_large, image_thumb_medium, image_thumb_small,
>>>>>   image_thumb_xlarge, uri, user_id, last_name, first_name,
>>>>>   name, school, major, graduation_year, tutor_profile_id,
>>>>>   positive_reviews, negative_reviews, gender, about_experience,
>>>>>   about_extracurricular, time_approved
>>>>>   *:*
>>>>>   about_experience  about_extracurricular School  
>>>>> Major
>>>>>   
>>>>>   >>>> name="sort">sum(interaction_responsiveness_score,profile_completeness_score,school_score,us_tax_id_score,highlight_score,${scores})
>>>>>  desc
>>>>>   log(sum(1,max(positive_reviews,0)))
>>>>>   0.1
>>>>> 
>>>>> 
>>>>> 
>>>>> I see this… [Solr 6.3.0]
>>>>> 
>>>>> org.apache.solr.common.SolrException: Unable to reload core 
>>>>> [tutors_shard1_replica11]
>>>>>   at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:950)
>>>>>   at 
>>>>> org.apache.solr.core.SolrCore.lambda$getConfListener$6(SolrCore.java:2708)
>>>>>   at 
>>>>> org.apache.solr.cloud.ZkController.lambda$fireEventListeners$4(ZkController.java:2448)
>>>>>   at java.lang.Thread.run(Thread.java:745)
>>>>> Caused by: org.apache.solr.common.SolrException: Could not load conf for 
>>>>> core tutors_shard1_replica11: Error loading solr config from 
>>>>> solrconfig.xml
>>>>>   at 
>>>>> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:85)
>>>>>   at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:942)
>>>>>   ... 3 more
>>>>> Caused by: org.apache.solr.common.SolrException: Error loading solr 
>>>>> config from solrconfig.xml
>>>>>   at 
>>>>> org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:

Re: Using parameter values in a sort

2017-02-27 Thread Erik Hatcher

> On Feb 27, 2017, at 9:42 PM, Walter Underwood  wrote:
> Pretty ready to give up. The documentation around function queries and params 
> is not working for me, though I’ve been using Solr for ten years. I have 
> figured out a lot of systems. This is impenetrable.

Here’s how I (try to) explain it in Solr training:

There’s plain uncurly bracketed parameter substitution.  This is where a 
feature (say local params, or function queries) is designed to indirect an 
exact parameter and pull it in:

/select?q=*:*={!terms f=id v=$id_list}_list=1,2,3

And then there’s macro substitution, with curly bracketed syntax, and this 
is an in place string substitution allowing gluing of things together:

  /select?q=*:*=${sort_field_name} desc_field_name=price

When in doubt, the curly bracketed syntax actually should do the trick, but 
there’s always the fun aspect of whitespace and escaping and quoting and such, 
so it is tricky business.   A browser, some trial-and-error, and hopefully some 
tips and tricks like these that emerge online will be helpful in demystifying 
this and making it more accessible and useful/usable.

Erik

Re: Using parameter values in a sort

2017-02-27 Thread Erik Hatcher

`scores` (plural), you’ve got this below:   

Remove that, and like my previous e-mail, and use `scores` (plural) from the 
request and _should_ work?

Erik

> On Feb 27, 2017, at 9:42 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> I’ve passed in a score parameter, but the solrconfig.xml is rejected before 
> any requests.
> 
> Pretty ready to give up. The documentation around function queries and params 
> is not working for me, though I’ve been using Solr for ten years. I have 
> figured out a lot of systems. This is impenetrable.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Feb 27, 2017, at 6:35 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>> 
>> You have an empty “scores” parameter in there.  You’re not showing your full 
>> search request, but did you provide that in the request?   Have you perhaps 
>> turned off parameter substitutions?
>> 
>>  Erik
>> 
>>> On Feb 27, 2017, at 9:26 PM, Walter Underwood <wun...@wunderwood.org> wrote:
>>> 
>>> With this in the config…
>>> 
>>> 
>>>
>>>   
>>> edismax
>>> 0
>>> false
>>> id,
>>> image_thumb_large, image_thumb_medium, image_thumb_small,
>>> image_thumb_xlarge, uri, user_id, last_name, first_name,
>>> name, school, major, graduation_year, tutor_profile_id,
>>> positive_reviews, negative_reviews, gender, about_experience,
>>> about_extracurricular, time_approved
>>> *:*
>>> about_experience  about_extracurricular School  
>>> Major
>>> 
>>> >> name="sort">sum(interaction_responsiveness_score,profile_completeness_score,school_score,us_tax_id_score,highlight_score,${scores})
>>>  desc
>>> log(sum(1,max(positive_reviews,0)))
>>> 0.1
>>>   
>>> 
>>> 
>>> I see this… [Solr 6.3.0]
>>> 
>>> org.apache.solr.common.SolrException: Unable to reload core 
>>> [tutors_shard1_replica11]
>>> at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:950)
>>> at 
>>> org.apache.solr.core.SolrCore.lambda$getConfListener$6(SolrCore.java:2708)
>>> at 
>>> org.apache.solr.cloud.ZkController.lambda$fireEventListeners$4(ZkController.java:2448)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: org.apache.solr.common.SolrException: Could not load conf for 
>>> core tutors_shard1_replica11: Error loading solr config from solrconfig.xml
>>> at 
>>> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:85)
>>> at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:942)
>>> ... 3 more
>>> Caused by: org.apache.solr.common.SolrException: Error loading solr config 
>>> from solrconfig.xml
>>> at 
>>> org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:187)
>>> at 
>>> org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:97)
>>> at 
>>> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:77)
>>>     ... 4 more
>>> Caused by: org.apache.solr.common.SolrException: No system property or 
>>> default value specified for scores 
>>> value:sum(interaction_responsiveness_score,profile_completeness_score,school_score,us_tax_id_score,highlight_score,${scores})
>>>  desc
>>> at 
>>> org.apache.solr.util.PropertiesUtil.substituteProperty(PropertiesUtil.java:65)
>>> at org.apache.solr.util.DOMUtil.substituteProperties(DOMUtil.java:298)
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Feb 27, 2017, at 6:17 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>>>> 
>>>> Walter -
>>>> 
>>>> How about this, for the latter part of your request:
>>>> 
>>>> /handler?features=a,b,c
>>>> 
>>>>  with =sum(${features}) desc
>>>> 
>>>> That ought to do the trick.   At first I thought the #foreach nature of 
>>>> the list of features was prohibitive, but since you’re literally plugging 
>>>> in the exact string value and it’s used as a comma-separated list then 
>>>> this should work.
>>>> 
>>>> But with the just a list of subject

Re: Using parameter values in a sort

2017-02-27 Thread Erik Hatcher

And by turning off parameter substitutions I meant disable `expandMacros` - 
https://cwiki.apache.org/confluence/display/solr/Parameter+Substitution 
<https://cwiki.apache.org/confluence/display/solr/Parameter+Substitution>

Likely you haven’t and this feature should work.   I’d remove `scores` from 
your config (unless you’re going to provide a valid default, by maybe moving 
where  you put “desc” in the parameters) and provide that as a mandatory (or 
optional depending on how you arrange the params, possibly) `scores` param.


   /select?q=*:*=feature_a_1,feature_b_2
 =sum(${scores}) desc




> On Feb 27, 2017, at 9:35 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
> 
> You have an empty “scores” parameter in there.  You’re not showing your full 
> search request, but did you provide that in the request?   Have you perhaps 
> turned off parameter substitutions?
> 
>   Erik
> 
> 
> 
> 
>> On Feb 27, 2017, at 9:26 PM, Walter Underwood <wun...@wunderwood.org> wrote:
>> 
>> With this in the config…
>> 
>> 
>> 
>>
>>  edismax
>>  0
>>  false
>>  id,
>>  image_thumb_large, image_thumb_medium, image_thumb_small,
>>  image_thumb_xlarge, uri, user_id, last_name, first_name,
>>  name, school, major, graduation_year, tutor_profile_id,
>>  positive_reviews, negative_reviews, gender, about_experience,
>>  about_extracurricular, time_approved
>>  *:*
>>  about_experience  about_extracurricular School  
>> Major
>>  
>>  > name="sort">sum(interaction_responsiveness_score,profile_completeness_score,school_score,us_tax_id_score,highlight_score,${scores})
>>  desc
>>  log(sum(1,max(positive_reviews,0)))
>>  0.1
>>
>> 
>> 
>> I see this… [Solr 6.3.0]
>> 
>> org.apache.solr.common.SolrException: Unable to reload core 
>> [tutors_shard1_replica11]
>>  at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:950)
>>  at 
>> org.apache.solr.core.SolrCore.lambda$getConfListener$6(SolrCore.java:2708)
>>  at 
>> org.apache.solr.cloud.ZkController.lambda$fireEventListeners$4(ZkController.java:2448)
>>  at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.solr.common.SolrException: Could not load conf for 
>> core tutors_shard1_replica11: Error loading solr config from solrconfig.xml
>>  at 
>> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:85)
>>  at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:942)
>>  ... 3 more
>> Caused by: org.apache.solr.common.SolrException: Error loading solr config 
>> from solrconfig.xml
>>  at 
>> org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:187)
>>  at 
>> org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:97)
>>  at 
>> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:77)
>>  ... 4 more
>> Caused by: org.apache.solr.common.SolrException: No system property or 
>> default value specified for scores 
>> value:sum(interaction_responsiveness_score,profile_completeness_score,school_score,us_tax_id_score,highlight_score,${scores})
>>  desc
>>  at 
>> org.apache.solr.util.PropertiesUtil.substituteProperty(PropertiesUtil.java:65)
>>  at org.apache.solr.util.DOMUtil.substituteProperties(DOMUtil.java:298)
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Feb 27, 2017, at 6:17 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>>> 
>>> Walter -
>>> 
>>> How about this, for the latter part of your request:
>>> 
>>> /handler?features=a,b,c
>>> 
>>>   with =sum(${features}) desc
>>> 
>>> That ought to do the trick.   At first I thought the #foreach nature of the 
>>> list of features was prohibitive, but since you’re literally plugging in 
>>> the exact string value and it’s used as a comma-separated list then this 
>>> should work.
>>> 
>>> But with the just a list of subject ID’s I think you’re in custom 
>>> development now (or a JavaScript stage in a Fusion query pipeline ;) in 
>>> building a SearchComponent that takes a `features` parameter and builds the 
>>> sort param as needed from that.
>>> 
>>> Erik
>>> 
>>> 
>>> 
>>>> On Feb 27, 2017, at 7:17 PM, Walter Underwood <wun...@wunderwood.

Re: Using parameter values in a sort

2017-02-27 Thread Erik Hatcher

You have an empty “scores” parameter in there.  You’re not showing your full 
search request, but did you provide that in the request?   Have you perhaps 
turned off parameter substitutions?

Erik




> On Feb 27, 2017, at 9:26 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> With this in the config…
> 
> 
>  
> 
>   edismax
>   0
>   false
>   id,
>   image_thumb_large, image_thumb_medium, image_thumb_small,
>   image_thumb_xlarge, uri, user_id, last_name, first_name,
>   name, school, major, graduation_year, tutor_profile_id,
>   positive_reviews, negative_reviews, gender, about_experience,
>   about_extracurricular, time_approved
>   *:*
>   about_experience  about_extracurricular School  
> Major
>   
>name="sort">sum(interaction_responsiveness_score,profile_completeness_score,school_score,us_tax_id_score,highlight_score,${scores})
>  desc
>   log(sum(1,max(positive_reviews,0)))
>   0.1
> 
>  
> 
> I see this… [Solr 6.3.0]
> 
> org.apache.solr.common.SolrException: Unable to reload core 
> [tutors_shard1_replica11]
>   at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:950)
>   at 
> org.apache.solr.core.SolrCore.lambda$getConfListener$6(SolrCore.java:2708)
>   at 
> org.apache.solr.cloud.ZkController.lambda$fireEventListeners$4(ZkController.java:2448)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: Could not load conf for core 
> tutors_shard1_replica11: Error loading solr config from solrconfig.xml
>   at 
> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:85)
>   at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:942)
>   ... 3 more
> Caused by: org.apache.solr.common.SolrException: Error loading solr config 
> from solrconfig.xml
>   at 
> org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:187)
>   at 
> org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:97)
>   at 
> org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:77)
>   ... 4 more
> Caused by: org.apache.solr.common.SolrException: No system property or 
> default value specified for scores 
> value:sum(interaction_responsiveness_score,profile_completeness_score,school_score,us_tax_id_score,highlight_score,${scores})
>  desc
>   at 
> org.apache.solr.util.PropertiesUtil.substituteProperty(PropertiesUtil.java:65)
>   at org.apache.solr.util.DOMUtil.substituteProperties(DOMUtil.java:298)
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Feb 27, 2017, at 6:17 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>> 
>> Walter -
>> 
>> How about this, for the latter part of your request:
>> 
>>  /handler?features=a,b,c
>> 
>>with =sum(${features}) desc
>> 
>> That ought to do the trick.   At first I thought the #foreach nature of the 
>> list of features was prohibitive, but since you’re literally plugging in the 
>> exact string value and it’s used as a comma-separated list then this should 
>> work.
>> 
>> But with the just a list of subject ID’s I think you’re in custom 
>> development now (or a JavaScript stage in a Fusion query pipeline ;) in 
>> building a SearchComponent that takes a `features` parameter and builds the 
>> sort param as needed from that.
>> 
>>  Erik
>> 
>> 
>> 
>>> On Feb 27, 2017, at 7:17 PM, Walter Underwood <wun...@wunderwood.org> wrote:
>>> 
>>> We have documents with parameterized features. For a school subject 
>>> (calculus, accounting), we have three sets of features. So for subject=4 
>>> and subject=186, we have:
>>> 
>>> feature_a_4: 0.9
>>> feature_b_4: 1.6
>>> feature_c_4: 8.2
>>> feature_a_186: 3.0
>>> feature_b_186: 2.1
>>> feature_c_186: 99.2
>>> 
>>> I’d like to pass in the subject IDs and make a function query (for sorting) 
>>> from those, ending up with
>>> 
>>> sum(feature_x, feature_y, feature_a_4, feature_b_4, feature_c_4, 
>>> feature_a_186, feature_b_186, feature c_186) desc
>>> 
>>> That would be used for the sort parameter.
>>> 
>>> Failing that, it would be nice so just pass in the parameterized portion, 
>>> like this:
>>> 
>>> /handler?features=feature_a_4,feature_b_4,feature_c_4,feature_a_186,feature_b_186,feature
>>>  c_186
>>> 
>>> Right now, I can’t even make a solrconfig.xml that will load. I’ve read 
>>> everything I can find on params and function queries.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>> 
>

Re: Using parameter values in a sort

2017-02-27 Thread Erik Hatcher

Walter -

How about this, for the latter part of your request:

   /handler?features=a,b,c

 with =sum(${features}) desc

That ought to do the trick.   At first I thought the #foreach nature of the 
list of features was prohibitive, but since you’re literally plugging in the 
exact string value and it’s used as a comma-separated list then this should 
work.

But with the just a list of subject ID’s I think you’re in custom development 
now (or a JavaScript stage in a Fusion query pipeline ;) in building a 
SearchComponent that takes a `features` parameter and builds the sort param as 
needed from that.

Erik

> On Feb 27, 2017, at 7:17 PM, Walter Underwood  wrote:
> 
> We have documents with parameterized features. For a school subject 
> (calculus, accounting), we have three sets of features. So for subject=4 and 
> subject=186, we have:
> 
> feature_a_4: 0.9
> feature_b_4: 1.6
> feature_c_4: 8.2
> feature_a_186: 3.0
> feature_b_186: 2.1
> feature_c_186: 99.2
> 
> I’d like to pass in the subject IDs and make a function query (for sorting) 
> from those, ending up with
> 
> sum(feature_x, feature_y, feature_a_4, feature_b_4, feature_c_4, 
> feature_a_186, feature_b_186, feature c_186) desc
> 
> That would be used for the sort parameter.
> 
> Failing that, it would be nice so just pass in the parameterized portion, 
> like this:
> 
> /handler?features=feature_a_4,feature_b_4,feature_c_4,feature_a_186,feature_b_186,feature
>  c_186
> 
> Right now, I can’t even make a solrconfig.xml that will load. I’ve read 
> everything I can find on params and function queries.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>

Re: Issues with uniqueKey != id?

2017-02-06 Thread Erik Hatcher

Personally I'd leave it as "id" - and adjust your other domain specific field 
name to something else.  Why?   Keep Solr and other potential tools from having 
issues.  I don't know exactly what may break, but I'd rather keep things 
straightforward. 

   Erik

> On Feb 6, 2017, at 02:33, Matthias X Falkenberg  wrote:
> 
> Hi Susheel,
> 
> My question is about the name of the "uniqueKey" field rather than the 
> composition of its values. By default, Solr uses a field with the name 
> "id". For reasons of ambiguity with the applications in my environment, I 
> am considering to change the field name to, for example, "docId". Is that 
> what you have also done for your compound keys?
> 
> One important aspect to consider after using a "uniqueKey" with a 
> different name is 
> http://lucene.apache.org/solr/6_3_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html
> : "This class assumes the id field for your documents is called 'id' - if 
> this is not the case, you must set the right name with 
> setIdField(String)."
> 
> I am wondering whether there are more details or pitfalls that I should be 
> aware of?
> 
> Mit freundlichen Grüßen / Kind regards,
> 
> Matthias Falkenberg
> 
> Team Lead - IBM Digital Experience Development
> IBM Watson Content Hub, IBM WebSphere Portal, IBM Web Content Manager
> IBM Deutschland Research & Development GmbH / Vorsitzende des 
> Aufsichtsrats: Martina Koederitz
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
> HRB 243294
> 
> 
> 
> From:   Susheel Kumar 
> To: solr-user@lucene.apache.org
> Date:   05-02-17 03:21 AM
> Subject:Re: Issues with uniqueKey != id?
> 
> 
> 
> Hello,
> 
> So far in my experience haven't come across scenario where unique key/id 
> is
> not required.  Most of the times, I have put combination of few fields
> like aggregate
> or compound keys .  (e.g.
> organization_id + employee_id etc.).  The reason it makes sense to have
> some form of unique key is two fold
> a) if there is no unique key, it kind of become impossible to update any
> existing records since you can't uniquely identify them which means your
> index will keep growing
> b)  If no unique key then when you return search results, you wouldn't 
> have
> anything to relate with other/external system
> 
> Sometime you may have time-series data in which case may be timestamp or
> combination of timestamp / other field may make sense  but yes Unique key
> is not mandatory.
> 
> Thanks,
> Susheel
> 
> On Fri, Feb 3, 2017 at 11:49 AM, Matthias X Falkenberg 
> 
> wrote:
> 
>> Howdy,
>> 
>> In the Solr Wiki I stumbled upon a somewhat vague statement on the
>> uniqueKey:
>> 
>>> https://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field
>>> It shouldn't matter whether you rename this to something else (and
>> change the  value), but occasionally it has in the past. We
>> recommend that you just leave this definition alone.
>> 
>> I'd be very grateful for any positive or negative experiences with
>> "uniqueKey" not being set to "id" - especially if your experiences are
>> related to Solr 6.2.1+.
>> 
>> Many thanks,
>> 
>> Matthias Falkenberg
>> 
>> IBM Deutschland Research & Development GmbH / Vorsitzende des
>> Aufsichtsrats: Martina Koederitz
>> Geschäftsführung: Dirk Wittkopp
>> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht 
> Stuttgart,
>> HRB 243294
>> 
>> 
> 
> 
> 
>

Re: Search for ISBN-like identifiers

2017-01-05 Thread Erik Hatcher

Sebastian -

There’s some precedent out there for ISBN’s.  Bill Dueber and the 
UMICH/code4lib folks have done amazing work, check it out here -

https://github.com/mlibrary/umich_solr_library_filters 


  - Erik


> On Jan 5, 2017, at 5:08 AM, Sebastian Riemer  wrote:
> 
> Hi folks,
> 
> 
> TL;DR: Is there an easy way, to copy ISBNs with hyphens to the general text 
> field, respectively configure the analyser on that field, so that a search 
> for the hyphenated ISBN returns exactly the matching document?
> 
> Long version:
> I've defined a field "text" of type "text_general", where I copy all my other 
> fields to, to be able to do a "quick search" where I set q=text
> 
> The definition of the type text_general is like this:
> 
> 
> 
>  positionIncrementGap="100">
> 
>  
> 
>
> 
> words="stopwords.txt" />
> 
>
> 
>  
> 
>  
> 
>
> 
> words="stopwords.txt" />
> 
> ignoreCase="true" expand="true"/>
> 
>
> 
>  
> 
>
> 
> 
> I now face the problem, that searching for a book with 
> text:978-3-8052-5094-8* does not return the single result I expect. However 
> searching for text:9783805250948* instead returns a result. Note, that I am 
> adding a wildcard at the end automatically, to further broaden the resultset. 
> Note also, that it does not seem to matter whether I put backslashes in front 
> of the hyphen or not (to be exact, when sending via SolrJ from my 
> application, I put in the backslashes, but I don't see a difference when 
> using SolrAdmin as I guess SolrAdmin automatically inserts backslashes if 
> needed?)
> 
> When storing ISBNs, I do store them twice, once with hyphens 
> (978-3-8052-5094-8) and once without (9783805250948). A pure phrase search on 
> both those values return also the single document.
> 
> I learned that the StandardTokenizer splits up values from fields at index 
> time, and I've also learned that I can use the solrAdmin analysis and the 
> debugQuery to help understand what is going on. From the analysis screen I 
> see, that given the value 9783805250948 at index-time and 9783805250948* 
> query-time both leads to an unchanged value 9783805250948 at the end.
> When given the value 978-3-8052-5094-8 for "Field Value (Index)" and 
> 978-3-8052-5094-8* for "Field Value (Query)"  I can see how the ISBN is 
> tokenized into 5 parts. Again, the values match on both sides (Index and 
> Query).
> 
> How does the left side correlate with the right side? My guess: The left side 
> means, "Values stored in field text will be tokenized while indexing as show 
> here on the left". The right side means, "When querying on the field text, 
> I'll tokenize the entered value like this, and see if I find something on the 
> index" Is this correct?
> 
> Another question: when querying and investigating the single document in 
> solrAdmin, the contents I see In the column text represents the _stored_ 
> value of the field text, right?
> And am I correct that this actually has nothing to do, with what is actually 
> stored in  the index for searching?
> 
> When storing the value 978-3-8052-5094-8, are only the tokenized values 
> stored for search, or is the "whole word" also stored? Is there a way to 
> actually see all the values which are stored for search?
> When searching text:" 978-3-8052-5094-8" I get the single result, so I guess 
> the value as a whole must also be stored in the index for searching?
> 
> One more thing which confuses me:
> Searching for text: 978-3-8052-5094-8 gives me 72 results, because it leads 
> to searching for "parsedquery_toString":"text:978 text:3 text:8052 text:5094 
> text:8",
> but searching for text: 978-3-8052-5094-8* gives me 0 results, this leads to 
> "parsedquery_toString":"text:978-3-8052-5094-8*",
> 
> Why is the appended wildcard changing the behaviour so radically? I'd rather 
> expect to get something like "parsedquery_toString":"text:978 text:3 
> text:8052 text:5094 text:8*",  and thus even more results.
> 
> Btw. I've found and read an interesting blog about storing ISBNs and alikes 
> here: 
> http://robotlibrarian.billdueber.com/2012/03/solr-field-type-for-numericish-ids/
>  However, I already store my ISBN also in a separate field, of type string, 
> which works fine when I use this field for searching.
> 
> Best regards, sorry for the enormously long question and thank you for 
> listening.
> 
> Sebastian

Re: Solr ACL Plugin Windows

2017-01-04 Thread Erik Hatcher

Thanks, Mike, for emphasizing that point.   I put that point in the blog post 
as well - the recommended approach if it's sufficient for sure.  

Erik

> On Jan 4, 2017, at 07:36, Mike Thomsen  wrote:
> 
> I didn't see a real Java project there, but the directions to compile on
> Linux are almost always applicable to Windows with Java. If you find a
> project that says it uses Ant or Maven, all you need to do is download Ant
> or Maven, the Java Development Kit and put both of them on the windows
> path. Then it's either "ant package" (IIRC most of the time) or "mvn
> install" from within the folder that has the project.
> 
> FWIW, creating a simple ACL doesn't even require a custom plugin. This is
> roughly how you would do it w/ an application that your team has written
> that works with solr:
> 
> 1. Add a multivalue string field called ACL or privileges
> 2. Write something for your app that can pull a list of
> attributes/privileges from a database for the current user.
> 3. Append a filter query to the query that matches those attributes. Ex:
> 
> fq=privileges:(DEVELOPER AND DEVOPS)
> 
> 
> If you are using a role-based system that bundles groups of permissions
> into a role, all you need to do is decompose the role into a list of
> permissions for the user and put all of the required permissions into that
> multivalue field.
> 
> Mike
> 
>> On Wed, Jan 4, 2017 at 2:55 AM,  wrote:
>> 
>> I am searching a SOLR ACL Plugin, i found this
>> https://lucidworks.com/blog/2015/05/15/custom-security-filtering-solr-5/
>> 
>> but i don't know how i can compile the jave into to a jar - all Infos i
>> found was how to complie it on linux - but this doesn't help.
>> 
>> I am running solr version 6.3.0 on windows Server 2003
>> 
>> So i am searching for infos about compiling a plugin under windows.
>> 
>> Thanxs in advance :D
>> 
>> 
>> This message was sent using IMP, the Internet Messaging Program.
>> 
>>

Re: How to solve?

2016-12-28 Thread Erik Hatcher

I'll have to not be mobile and thumbing a reply to give a concrete example but 
you'll need to use the nested query parsing facility to make a boolean AND 
query of two geofilts or bboxes, each with local params.   

   Erik

> On Dec 28, 2016, at 02:12, William Bell  wrote:
> 
> We are entering entries into SOLR like the following, and we want to see if
> my pt matches any of these radiuses.
> 
> 1. Red, pt=39,-107, radius=10km
> 2. Blue, pt=39,-108, radius=50km
> 
> I want to run a SOLR select with a pt=39,-104 and see if it is within 10km
> of point 1, and 50km of point 2?
> 
> Usually I know you can :
> 
> http://localhost:8983/select?q=*:*=39,-104=solr_geohash= ??
> 
> One idea was to use bbox and find the N,S,E,W pt for point 1 and point 2.
> But this is not idea, we want to use Great Circle.
> 
> Thoughts?
> 
> 
> -- 
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076

Re: Easy way to preserve Solr Admin form input

2016-12-27 Thread Erik Hatcher

How's /browse fare for you?   What params are you adjusting regularly?

> On Dec 27, 2016, at 06:09, Sebastian Riemer  wrote:
> 
> Hi,
> 
> is there an easy way to preserve the query data I input in SolrAdmin?
> 
> E.g. when debugging a query, I often have the desire to reopen the current 
> query in solrAdmin in a new browser tab to make slight adaptions to the query 
> without losing the original query.  What happens instead is the form is 
> opened blank in the new tab and I have to manually copy/paste the entered 
> form values.
> 
> This is not such a big problem, when I only use the "Raw Query Parameters" 
> field, but editing something in that tiny input is a real pain ...
> 
> I wonder how others come around this?
> 
> Sebastian
>

Re: prefix query help

2016-12-08 Thread Erik Hatcher

It’s hard to tell how _exact_ to be here, but if you’re indexing those strings 
and your queries are literally always -MM, then do the truncation of the 
actual data into that format or via analysis techniques to index only the 
-MM piece of the incoming string.  

But given what you’ve got so far, using what the prefix examples I provided 
below, your two queries would be this:

   q={!prefix f=metatag.date v=‘2016-06'}

and

   q=({!prefix f=metatag.date v=‘2016-06’} OR {!prefix f=metatag.date 
v=‘2014-04’} )

Does that work for you?

It really should work to do this q=metadata.date:(2016-06* OR 2014-04*) as 
you’ve got it, but you said that sort of thing wasn’t working (debug out would 
help suss that issue out).

If you did index those strings cleaner as -MM to accommodate the types of 
query you’ve shown then you could do q=metadata.date:(2016-06 OR 2014-04), or 
q={!terms f=metadata.date}2016-06,2014-04

Erik




> On Dec 8, 2016, at 11:34 AM, KRIS MUSSHORN <mussho...@comcast.net> wrote:
> 
> yes I did attach rather than paste sorry. 
>   
> Ok heres an actual, truncated, example of the metatag.date field contents in 
> solr. 
> NONE-NN-NN is the default setting. 
>   
> doc 1 
> " metatag.date ": [ 
>   "2016-06-15T14:51:04Z" ,
>   "2016-06-15T14:51:04Z" 
> ] 
>   
> doc 2 
> " metatag.date ": [ 
>   "2016-06-15" 
> ] 
> doc 3 
> " metatag.date ": [ 
>   "NONE-NN-NN" 
> ] 
> doc 4 
> " metatag.date ": [ 
>   "-mm-dd" 
> ] 
>   
> doc 5 
> " metatag.date ": [ 
>   "2016-07-06" 
> ] 
> 
> doc 6 
> " metatag.date ": [ 
>   "2014-04-15T14:51:06Z" , 
>   "2014-04-15T14:51:06Z" 
> ] 
>   
> q=2016-06 should return doc 2 and 1 
> q=2016-06 OR 2014-04 should return docs 1, 2 and 6 
>   
> yes I know its wonky but its what I have to deal with until he content is 
> cleaned up. 
> I cant use date type.. that would make my life to easy. 
>   
> TIA again 
> Kris 
> 
> - Original Message -
> 
> From: "Erik Hatcher" <erik.hatc...@gmail.com> 
> To: solr-user@lucene.apache.org 
> Sent: Thursday, December 8, 2016 12:36:26 PM 
> Subject: Re: prefix query help 
> 
> Kris - 
> 
> To chain multiple prefix queries together: 
> 
> q=({!prefix f=field1 v=‘prefix1'} {!prefix f=field2 v=‘prefix2’}) 
> 
> The leading paren is needed to ensure it’s being parsed with the lucene 
> qparser (be sure not to have defType set, or a variant would be needed) and 
> that allows multiple {!…} expressions to be parsed.  The outside-the-curlys 
> value for the prefix shouldn’t be attempted with multiples, so the `v` is the 
> way to go, either inline or $referenced. 
> 
> If you do have defType set, say to edismax, then do something like this 
> instead: 
> q={!lucene v=prefixed_queries} 
> _queries={!prefix f=field1 v=‘prefix1'} {!prefix f=field2 
> v=‘prefix2’} 
>// I don’t think parens are needed with _queries, but maybe.  
>  
> 
> =query (or =true) is your friend - see how things are parsed.  I 
> presume in your example that didn’t work that the dash didn’t work as you 
> expected?   or… not sure.  What’s the parsed_query output in debug on that 
> one? 
> 
> Erik 
> 
> p.s. did you really just send a Word doc to the list that could have been 
> inlined in text?  :)   
> 
> 
> 
>> On Dec 8, 2016, at 7:18 AM, KRIS MUSSHORN <mussho...@comcast.net> wrote: 
>> 
>> Im indexing data from Nutch into SOLR 5.4.1. 
>> I've got a date metatag that I have to store as text type because the data 
>> stinks. 
>> It's stored in SOLR as field metatag.date. 
>> At the source the dates are formatted (when they are entered correctly ) as 
>> -MM-DD 
>>   
>> q=metatag.date:2016-01* does not produce the correct results and returns 
>> undesireable matches2016-05-01 etc as example. 
>> q={!prefix f=metatag.date}2016-01 gives me exactly what I want for one 
>> month/year. 
>>   
>> My question is how do I chain n prefix queries together? 
>> i.e. 
>> I want all docs where metatag.date prefix is 2016-01 or 2016-07 or 2016-10 
>>   
>> TIA, 
>> Kris 
>>   
> 
>

Re: prefix query help

2016-12-08 Thread Erik Hatcher

Kris -

To chain multiple prefix queries together:

q=({!prefix f=field1 v=‘prefix1'} {!prefix f=field2 v=‘prefix2’})

The leading paren is needed to ensure it’s being parsed with the lucene qparser 
(be sure not to have defType set, or a variant would be needed) and that allows 
multiple {!…} expressions to be parsed.  The outside-the-curlys value for the 
prefix shouldn’t be attempted with multiples, so the `v` is the way to go, 
either inline or $referenced.

If you do have defType set, say to edismax, then do something like this instead:
q={!lucene v=prefixed_queries}
_queries={!prefix f=field1 v=‘prefix1'} {!prefix f=field2 
v=‘prefix2’} 
   // I don’t think parens are needed with _queries, but maybe.  

=query (or =true) is your friend - see how things are parsed.  I 
presume in your example that didn’t work that the dash didn’t work as you 
expected?   or… not sure.  What’s the parsed_query output in debug on that one?

Erik

p.s. did you really just send a Word doc to the list that could have been 
inlined in text?  :)  

> On Dec 8, 2016, at 7:18 AM, KRIS MUSSHORN  wrote:
> 
> Im indexing data from Nutch into SOLR 5.4.1. 
> I've got a date metatag that I have to store as text type because the data 
> stinks. 
> It's stored in SOLR as field metatag.date. 
> At the source the dates are formatted (when they are entered correctly ) as 
> -MM-DD 
>   
> q=metatag.date:2016-01* does not produce the correct results and returns 
> undesireable matches2016-05-01 etc as example. 
> q={!prefix f=metatag.date}2016-01 gives me exactly what I want for one 
> month/year. 
>   
> My question is how do I chain n prefix queries together? 
> i.e. 
> I want all docs where metatag.date prefix is 2016-01 or 2016-07 or 2016-10 
>   
> TIA, 
> Kris 
>

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1610 matches

Mail list logo