Re: Sync failure after shard leader election when adding new replica.

2015-05-26 Thread Erick Erickson
Please, please, please do _not_ try to use core discovery to add new
replicas by manually editing stuff.

bq: and my deployment tools create an empty core on newly provisioned machines.

This is a really bad idea (as you have discovered).  Basically, your
deployment tools have to do everything right to get this to play
nice with SolrCloud. Your core names can't conflict. You have to
spell all the parameters in core.properties right. Etc. There are
endless places to go wrong. And this is all done for you (and tested
with unit tests) via the Collections API.

Assuming that in your scenario you started machine2 before machine1,
how would Solr have any clue that that machine1 would _ever_ come back
up? It'll do the best it can and try to elect a leader, but there's
only one machine to choose from... and it's sorely out of date

Absolutely use the collections api to add replicas to running
SolrCloud clusters. And adding a replica via the Collections API
_will_ use core discovery, as in it'll cause a core.properties file to
be written on the node in question, populate it with all the necessary
parameters, initiate a synch from the (running) leader, put itself
into the query rotation automatically when the sync is done etc. All
without you
1 having to try to figure all this out yourself
2 take the collection offline

Best,
Erick

On Tue, May 26, 2015 at 2:46 PM, Michael Roberts mrobe...@tableau.com wrote:
 Hi,

 I have a SolrCloud setup, running 4.10.3. The setup consists of several 
 cores, each with a single shard and initially each shard has a single replica 
 (so, basically, one machine). I am using core discovery, and my deployment 
 tools create an empty core on newly provisioned machines.

 The scenario that I am testing is, Machine 1 is running and writes are 
 occurring from my application to Solr. At some point, I stop Machine 1, and 
 reconfigure my application to add Machine 2. Both machines are then started.

 What I would expect to happen at this point, is Machine 2 cannot become 
 leader because it is behind compared to Machine 1. Machine 2 would then 
 restore from Machine 1.

 However, looking at the logs. I am seeing Machine 2 become elected leader and 
 fail the PeerRestore

 2015-05-24 17:20:25.983 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
 org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to 
 continue.
 2015-05-24 17:20:25.983 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
 org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader - 
 try and sync
 2015-05-24 17:20:25.997 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
 org.apache.solr.update.PeerSync - PeerSync: core=project 
 url=http://10.32.132.64:11000/solr START 
 replicas=[http://jchar-1:11000/solr/project/] nUpdates=100
 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
 org.apache.solr.update.PeerSync - PeerSync: core=project 
 url=http://10.32.132.64:11000/solr DONE.  We have no versions.  sync failed.
 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
 org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we 
 have no versions - we can't sync in that case - we were active before, so 
 become leader anyway
 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
 org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader: 
 http://10.32.132.64:11000/solr/project/ shard1

 What is the expected behavior here? What’s the best practice for adding a new 
 replica? Should I have the SolrCloud running and do it via the Collections 
 API or can I continue to use core discovery?

 Thanks.




Re: Solr relevancy score in percentage

2015-05-26 Thread Zheng Lin Edwin Yeo
Thank you everyone for your comments and recommendations. Will consider all
these points in my implementation.

Regards,
Edwin

On 27 May 2015 at 05:15, Walter Underwood wun...@wunderwood.org wrote:

 On May 26, 2015, at 7:10 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:

  We want the user to see how relevant the result is with respect to the
  search query entered, and not how good the results are.

 That is the meaning of the score from a probabilistic model search engine.
 Solr is not a probabilistic engine, it is vector space engine. The scores
 are fundamentally different.  Treating it as a probability of relevance
 will not work.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)




Re: Removing characters like '\n \n' from indexing

2015-05-26 Thread Zheng Lin Edwin Yeo
I'm using ExtractingRequestHandler to do the indexing. Do I have to
implement the UpdateProcessor method at the ExtractingRequestHandler or as
a separate method?

Regards,
Edwin

On 26 May 2015 at 23:42, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

 I think this is still in topic,
 Assuming we are using the Extract Update handler, I think the update
 processor approach still applies.
 But is it not possible to strip them directly with some extract request
 handler param?


 2015-05-26 16:33 GMT+01:00 Jack Krupansky jack.krupan...@gmail.com:

  Neither - it removes the characters before indexing. The distinction is
  that if you remove them during indexing they will still appear in the
  stored field values even if they are removed from the indexed values, but
  by removing them before indexing, they will not appear in the stored
 field
  values. Again, the distinction is between indexed field values and stored
  field values.
 
  -- Jack Krupansky
 
  On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo 
  edwinye...@gmail.com
  wrote:
 
   It is showing up in the search results. Just to confirm, does this
   UpdateProcessor method remove the characters during indexing or only
  after
   indexing has been done?
  
   Regards,
   Edwin
  
   On 26 May 2015 at 21:30, Upayavira u...@odoko.co.uk wrote:
  
   
   
On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote:
 Hi,

 Is there a way to remove the special characters like \n during
  indexing
 of
 the rich text documents.

 I have quite alot of leading \n \n in front of my indexed content
 of
   rich
 text documents due to the space and empty lines with the original
 documents, and it's causing the content to be flooded with '\n \n'
 at
   the
 start before the actual content comes in. This causes the content
 to
   look
 ugly, and also takes up unnecessary bandwidth in the system.
   
Where is this showing up?
   
If it is in search results, you must use an UpdateProcessor, as these
happen before fields are stored (E.g. RegexpReplaceProcessorFactory).
   
If you are concerned about facet results, then you can do it in an
analysis chain, for example with a RegexpFilterFactory.
   
Upayavira
   
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: docValues: Can we apply synonym

2015-05-26 Thread Aman Tandon
Yes it could be :)

Anyway thanks for helping.

With Regards
Aman Tandon

On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 I should investigate that, as usually synonyms are analysis stage.
 A simple way is to replace the word with all its synonyms ( including
 original word), but simply using this kind of processor will change the
 token position and offsets, modifying the actual content of the document .

  I am from Bombay will become  I am from Bombay Mumbai which can be
 annoying.
 So a clever approach must be investigated.

 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Okay So how could I do it with UpdateProcessors?
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   mmm this is different !
   Without any customisation, right now you could :
   - use docValues to provide exact value facets.
   - Than you can use a copy field, with the proper analysis, to search
  when a
   user click on a filter !
  
   So you will see in your facets :
   Mumbai(3)
   Bombay(2)
  
   And when clicking you see 5 results.
   A little bit misleading for the users …
  
   On the other hand if you you want to apply the synonyms before, the
   indexing pipeline ( because docValues field can not be analysed), I
 think
   you should play with UpdateProcessors.
  
   Cheers
  
   2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:
  
We are interested in using docValues for better memory utilization
 and
speed.
   
Currently we are faceting the search results on *city. *In city we
 have
also added the synonym for cities like mumbai, bombay (These are
 Indian
cities). So that result of mumbai is also eligible when somebody will
applying filter of bombay on search results.
   
I need this functionality to apply with docValues enabled field.
   
With Regards
Aman Tandon
   
On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:
   
 I checked in the Documentation to be sure, but apparently :

 DocValues are only available for specific field types. The types
  chosen
 determine the underlying Lucene docValue type that will be used.
 The
 available Solr field types are:

- StrField and UUIDField.
- If the field is single-valued (i.e., multi-valued is false),
   Lucene
   will use the SORTED type.
   - If the field is multi-valued, Lucene will use the
 SORTED_SET
type.
- Any Trie* numeric fields and EnumField.
- If the field is single-valued (i.e., multi-valued is false),
   Lucene
   will use the NUMERIC type.
   - If the field is multi-valued, Lucene will use the
 SORTED_SET
type.


 This means you should not analyse a field where DocValues is
 enabled.
 Can your explain us your use case ? Why are you interested in
  synonyms
 DocValues level ?

 Cheers

 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:

  To my understanding, docValues are just an uninverted index. That
  is,
it
  contains the terms that are generated at the end of an analysis
   chain.
  Therefore, you simply enable docValues and include the
  SynonymFilterFactory in your analysis.
 
  Is that enough, or are you struggling with some other issue?
 
  Upayavira
 
  On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
   Hi,
  
   We have some field *city* in which the docValues are enabled.
 We
   need
 to
   add the synonym in that field so how could we do it?
  
   With Regards
   Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England

   
  
  
  
   --
   --
  
   Benedetti Alessandro
   Visiting card : http://about.me/alessandro_benedetti
  
   Tyger, tyger burning bright
   In the forests of the night,
   What immortal hand or eye
   Could frame thy fearful symmetry?
  
   William Blake - Songs of Experience -1794 England
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: Help/Guidance Needed : To reload kstem protword hash without full core reload

2015-05-26 Thread Aman Tandon
Thank you so much Ahmet :)

With Regards
Aman Tandon

On Wed, May 27, 2015 at 1:29 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Aman,

 Start with creating a jira account and vote/watch that issue.
 Post on the issue to see if there is still interest on this.
 Declare that you will be volunteer and ask kindly for guidance.
 Creator of the issue or one the watchers may respond.
 Try to digest ideas discussed on the issue. Rise yours. Collaborate.
 Don't get discouraged if nobody responds, please remember that committers
 are busy people.

 If you have implement something you want to share, upload a patch :
 https://wiki.apache.org/solr/HowToContribute

 Good luck,
 Ahmet



 On Tuesday, May 26, 2015 7:47 PM, Aman Tandon amantandon...@gmail.com
 wrote:
 Hi Ahmet,

 Can you please guide me to contribute for this *issue*. I haven't did this
 before.

 So I need to know...what should I need to know and how should I start..what
 IDE or whatever you thought is need to know for a novice. I will be
 thankful to you :)

 With Regards
 Aman Tandon


 On Tue, May 19, 2015 at 8:10 PM, Aman Tandon amantandon...@gmail.com
 wrote:

  That link you provided is exactly I want to do. Thanks Ahmet.
 
  With Regards
  Aman Tandon
 
  On Tue, May 19, 2015 at 5:06 PM, Ahmet Arslan iori...@yahoo.com.invalid
 
  wrote:
 
  Hi Aman,
 
  changing protected words without reindexing makes little or no sense.
  Regarding protected words, trend is to use
  solr.KeywordMarkerFilterFactory.
 
  Instead I suggest you to work on a more general issue:
  https://issues.apache.org/jira/browse/SOLR-1307
  Ahmet
 
 
  On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com
  wrote:
  Please help or I am not clear here?
 
  With Regards
  Aman Tandon
 
 
  On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com
  wrote:
 
   Hi,
  
   *Problem Statement: *I want to reload an hash of protwords created by
  the
   kstem filter without reloading the whole index core.
  
   *My Thought: *I am thinking to reload the hash by passing a parameter
   like *r=1 *to analysis url request (to somehow pass the parameter via
   url). And I am thinking if somehow by changing the IndexSchema.java I
  might
   can pass this parameter though my analyzer chain to KStemFilter. In
  which I
   will call the initializeDictionary function to make protwords hash
 again
   from the file if *r=1*, instead of making full core reload request.
  
   Please guide me, I know question might be stupid, the thought came in
 my
   mind and I want to share and ask some suggestions here. Is it possible
  or
   not and how can i achieve the same?
  
   I will be thankful for guidance.
  
   With Regards
   Aman Tandon
  
 
 
 
 
  On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com
  wrote:
  Please help or I am not clear here?
 
  With Regards
  Aman Tandon
 
 
  On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com
  wrote:
 
   Hi,
  
   *Problem Statement: *I want to reload an hash of protwords created by
  the
   kstem filter without reloading the whole index core.
  
   *My Thought: *I am thinking to reload the hash by passing a parameter
   like *r=1 *to analysis url request (to somehow pass the parameter via
   url). And I am thinking if somehow by changing the IndexSchema.java I
  might
   can pass this parameter though my analyzer chain to KStemFilter. In
  which I
   will call the initializeDictionary function to make protwords hash
 again
   from the file if *r=1*, instead of making full core reload request.
  
   Please guide me, I know question might be stupid, the thought came in
 my
   mind and I want to share and ask some suggestions here. Is it possible
  or
   not and how can i achieve the same?
  
   I will be thankful for guidance.
  
   With Regards
   Aman Tandon
  
 
 
 



Re: Removing characters like '\n \n' from indexing

2015-05-26 Thread Zheng Lin Edwin Yeo
I tried to follow the example here
https://wiki.apache.org/solr/UpdateRequestProcessor, by putting
the updateRequestProcessorChain in my solrconfig.xml

But I'm getting the following error when I tried to reload the core.

Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.CustomUpdateRequestProcessorFactory'

Is there anything I might have missed out? I'm using Solr 5.1.


Regards,
Edwin


On 27 May 2015 at 10:13, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 I'm using ExtractingRequestHandler to do the indexing. Do I have to
 implement the UpdateProcessor method at the ExtractingRequestHandler or
 as a separate method?

 Regards,
 Edwin

 On 26 May 2015 at 23:42, Alessandro Benedetti benedetti.ale...@gmail.com
 wrote:

 I think this is still in topic,
 Assuming we are using the Extract Update handler, I think the update
 processor approach still applies.
 But is it not possible to strip them directly with some extract request
 handler param?


 2015-05-26 16:33 GMT+01:00 Jack Krupansky jack.krupan...@gmail.com:

  Neither - it removes the characters before indexing. The distinction is
  that if you remove them during indexing they will still appear in the
  stored field values even if they are removed from the indexed values,
 but
  by removing them before indexing, they will not appear in the stored
 field
  values. Again, the distinction is between indexed field values and
 stored
  field values.
 
  -- Jack Krupansky
 
  On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo 
  edwinye...@gmail.com
  wrote:
 
   It is showing up in the search results. Just to confirm, does this
   UpdateProcessor method remove the characters during indexing or only
  after
   indexing has been done?
  
   Regards,
   Edwin
  
   On 26 May 2015 at 21:30, Upayavira u...@odoko.co.uk wrote:
  
   
   
On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote:
 Hi,

 Is there a way to remove the special characters like \n during
  indexing
 of
 the rich text documents.

 I have quite alot of leading \n \n in front of my indexed content
 of
   rich
 text documents due to the space and empty lines with the original
 documents, and it's causing the content to be flooded with '\n
 \n' at
   the
 start before the actual content comes in. This causes the content
 to
   look
 ugly, and also takes up unnecessary bandwidth in the system.
   
Where is this showing up?
   
If it is in search results, you must use an UpdateProcessor, as
 these
happen before fields are stored (E.g.
 RegexpReplaceProcessorFactory).
   
If you are concerned about facet results, then you can do it in an
analysis chain, for example with a RegexpFilterFactory.
   
Upayavira
   
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England





Re: Index optimize runs in background.

2015-05-26 Thread Modassar Ather
Our index has almost 100M documents running on SolrCloud of 5 shards and
each shard has an index size of about 170+GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to the
index. Most of the queries that we run are pretty complex with hundreds of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the latest
version and try without optimizing so that we can save the many hours it
take to optimize such a huge index, but find optimized index work well for
us.

Erick I was indexing today the documents and saw the optimize happening in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com
wrote:

 No results yet. I finished the test harness last night (not really a
 unit test, a stand-alone program that endlessly adds stuff and tests
 that every commit returns the correct number of docs).

 8,000 cycles later there aren't any problems reported.

 Siiigh.


 On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com
 wrote:
  Hi,
 
  Erick you mentioned about a unit test to test the optimize running in
  background. Kindly share your findings if any.
 
  Thanks,
  Modassar
 
  On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
 
  wrote:
 
  Thanks everybody for your replies.
 
  I have noticed the optimization running in background every time I
  indexed. This is 5 node cluster with solr-5.1.0 and uses the
  CloudSolrClient. Kindly share your findings on this issue.
 
  Our index has almost 100M documents running on SolrCloud. We have been
  optimizing the index after indexing for years and it has worked well for
  us.
 
  Thanks,
  Modassar
 
  On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Actually, I've recently seen very similar behavior in Solr 4.10.3, but
  involving hard commits openSearcher=true, see:
  https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
  reproduce this at will, sii.
 
  A unit test should be very simple to write though, maybe I can get to
 it
  today.
 
  Erick
 
 
 
  On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
  
  
   On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
   On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits after
 the
invocation of optimize and the optimization keeps on running in
 the
background.
Kindly let me know if it is per design and how can I make my
 indexer
  to
wait until the optimization is over. Is there a
  configuration/parameter I
need to set for the same.
   
Please note that the same indexer with
  cloudSolrServer.optimize(true, true,
1) on Solr-4.10 used to wait till the optimize was over before
  exiting.
  
   This is very odd, because I could not get HttpSolrServer to
 optimize in
   the background, even when that was what I wanted.
  
   I wondered if maybe the Cloud object behaves differently with
 regard to
   blocking until an optimize is finished ... except that there is no
 code
   for optimizing in CloudSolrClient at all ... so I don't know where
 the
   different behavior would actually be happening.
  
   A more important question is, why are you optimising? Generally it
 isn't
   recommended anymore as it reduces the natural distribution of
 documents
   amongst segments and makes future merges more costly.
  
   Upayavira
 
 
 



Re: Solr relevancy score in percentage

2015-05-26 Thread Ahmet Arslan
Hi Edwin,

Somehow, it is not recommended to display the relevancy score in percentage:
https://wiki.apache.org/lucene-java/ScoresAsPercentages

Ahmet



On Tuesday, May 26, 2015 8:34 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com 
wrote:
Hi,

Would like to check, does the new version of Solr allows this function of
display the relevancy score in percentage?
I understand from the older version that it is not able to, and the only
way is to take the highest score and use that as 100%, and calculate other
percentage from that number (For example if the max score is 10 and the
next result has a score of 5, you would do (5 / 10) * 100 = 50%)

Is there a better way to do this now? I'm using Solr 5.1


Regards,
Edwin


AW: Setting system property

2015-05-26 Thread Clemens Wyss DEV
For my EmbeddedSolr-mode I do
...
System.setProperty( solr.allow.unsafe.resourceloading, true );
...
which works fine.

For the remote-mode, i.e. Solr/jetty server I put 
SOLR_OPTS=$SOLR_OPTS -Dsolr.allow.unsafe.resourceloading=true
into solr.in.sh. Unfortunately this setting/option does not seem to be applied. 
When I try to do an xi:include in solrconfig.xml (or schema.xml) I am getting 
security exceptions...

Any advices?

-Ursprüngliche Nachricht-
Von: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Gesendet: Mittwoch, 13. Mai 2015 16:57
An: solr-user@lucene.apache.org
Betreff: Re: Setting system property

Clemens -

For this particular property, it is only accessed as a system property 
directly, so it must be set on the JVM startup and cannot be set any other way.

Erik

—
Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com 
http://www.lucidworks.com/




 On May 13, 2015, at 3:49 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 
 I'd like to make use of solr.allow.unsafe.resourceloading=true.
 Is the commandline -D solr.allow.unsafe.resourceloading=true the only way 
 to inject/set this property or can it be done (e.g.) in solr.xml ?
 
 Thx
 Clemens



Re: Solr relevancy score in percentage

2015-05-26 Thread Zheng Lin Edwin Yeo
Hi Arslan,

Thank you for the link. That means we are not advisable to show anything
that's related to the relevancy score, even though the default sorting of
the result is by relevancy score? Since showing the raw relevancy score
does not make any sense to the user since they won't understand what it
means too.


Regards,
Edwin



On 26 May 2015 at 14:16, Ahmet Arslan iori...@yahoo.com.invalid wrote:

 Hi Edwin,

 Somehow, it is not recommended to display the relevancy score in
 percentage:
 https://wiki.apache.org/lucene-java/ScoresAsPercentages

 Ahmet



 On Tuesday, May 26, 2015 8:34 AM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com wrote:
 Hi,

 Would like to check, does the new version of Solr allows this function of
 display the relevancy score in percentage?
 I understand from the older version that it is not able to, and the only
 way is to take the highest score and use that as 100%, and calculate other
 percentage from that number (For example if the max score is 10 and the
 next result has a score of 5, you would do (5 / 10) * 100 = 50%)

 Is there a better way to do this now? I'm using Solr 5.1


 Regards,
 Edwin



Re: Index of Hit in MultiValue fields

2015-05-26 Thread Upayavira
The result that Solr returns is the document, not anything beneath, so
no, you cannot do this.

You could use highlighting, you could parse the output of explains
(debug.explains.structured=true will help) to identify which field
triggered the match. Alternatively, you could use block joins. Make a
parent doc and each of your colours as child docs, then you could return
which doc matched. You could use the ExpandComponent to retrieve details
of the parent doc (http://heliosearch.org/expand-block-join/)

Dunno if any of that helps.

Upayavira

On Tue, May 26, 2015, at 08:33 AM, Rodolfo Zitellini wrote:
 Dear List,
 In my schema I have a couple multi value fields and I would need to
 retrive
 the index of which one generated a match. For example let's suppose I
 have
 a text field like this with three values:
 
 MyField:
 [0] Red
 [1] Blue
 [2] Green
 
 Searching for Blue gets me the document, but I would also need the
 index
 (1) in the multi value field. I tried using the highligter but it is a
 bit
 hackish then to calculate the index. Is it possible without resorting to
 highlighting?
 Thanks!
 Rodolfo


Re: YAJar

2015-05-26 Thread Upayavira
Why is your app tied that closely to Solr? I can understand if you are
talking about SolrJ, but normal usage you use a different application in
a different JVM from Solr.

Upayavira

On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
 I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i
 noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is
 the pattern to override a jar version uploaded into jetty?
 
 I am using maven, and solr is being started the old way
 
 java -jar start.jar
 -Dsolr.solr.home=...
 -Djetty.home=...
 
 I tried to edit jetty's start.config (then run java
 -DSTART=/my/dir/start.config
 -jar start.jar) but got no where...
 
 any help would be much appreciated
 
 Peyman


Re: Solr relevancy score in percentage

2015-05-26 Thread Upayavira
Correct. The relevancy score simply states that we think result #1 is
more relevant than result #2. It doesn't say that #1 is relevant.

The score doesn't have any validity across queries either, as, for
example, a different number of query terms will cause the score to
change.

Upayavira

On Tue, May 26, 2015, at 08:57 AM, Zheng Lin Edwin Yeo wrote:
 Hi Arslan,
 
 Thank you for the link. That means we are not advisable to show anything
 that's related to the relevancy score, even though the default sorting of
 the result is by relevancy score? Since showing the raw relevancy score
 does not make any sense to the user since they won't understand what it
 means too.
 
 
 Regards,
 Edwin
 
 
 
 On 26 May 2015 at 14:16, Ahmet Arslan iori...@yahoo.com.invalid wrote:
 
  Hi Edwin,
 
  Somehow, it is not recommended to display the relevancy score in
  percentage:
  https://wiki.apache.org/lucene-java/ScoresAsPercentages
 
  Ahmet
 
 
 
  On Tuesday, May 26, 2015 8:34 AM, Zheng Lin Edwin Yeo 
  edwinye...@gmail.com wrote:
  Hi,
 
  Would like to check, does the new version of Solr allows this function of
  display the relevancy score in percentage?
  I understand from the older version that it is not able to, and the only
  way is to take the highest score and use that as 100%, and calculate other
  percentage from that number (For example if the max score is 10 and the
  next result has a score of 5, you would do (5 / 10) * 100 = 50%)
 
  Is there a better way to do this now? I'm using Solr 5.1
 
 
  Regards,
  Edwin
 


Index of Hit in MultiValue fields

2015-05-26 Thread Rodolfo Zitellini
Dear List,
In my schema I have a couple multi value fields and I would need to retrive
the index of which one generated a match. For example let's suppose I have
a text field like this with three values:

MyField:
[0] Red
[1] Blue
[2] Green

Searching for Blue gets me the document, but I would also need the index
(1) in the multi value field. I tried using the highligter but it is a bit
hackish then to calculate the index. Is it possible without resorting to
highlighting?
Thanks!
Rodolfo


Re: Index optimize runs in background.

2015-05-26 Thread Modassar Ather
Hi,

Erick you mentioned about a unit test to test the optimize running in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
wrote:

 Thanks everybody for your replies.

 I have noticed the optimization running in background every time I
 indexed. This is 5 node cluster with solr-5.1.0 and uses the
 CloudSolrClient. Kindly share your findings on this issue.

 Our index has almost 100M documents running on SolrCloud. We have been
 optimizing the index after indexing for years and it has worked well for
 us.

 Thanks,
 Modassar

 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Actually, I've recently seen very similar behavior in Solr 4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
 reproduce this at will, sii.

 A unit test should be very simple to write though, maybe I can get to it
 today.

 Erick



 On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
 
 
  On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
  On 5/21/2015 6:21 AM, Modassar Ather wrote:
   I am using Solr-5.1.0. I have an indexer class which invokes
   cloudSolrClient.optimize(true, true, 1). My indexer exits after the
   invocation of optimize and the optimization keeps on running in the
   background.
   Kindly let me know if it is per design and how can I make my indexer
 to
   wait until the optimization is over. Is there a
 configuration/parameter I
   need to set for the same.
  
   Please note that the same indexer with
 cloudSolrServer.optimize(true, true,
   1) on Solr-4.10 used to wait till the optimize was over before
 exiting.
 
  This is very odd, because I could not get HttpSolrServer to optimize in
  the background, even when that was what I wanted.
 
  I wondered if maybe the Cloud object behaves differently with regard to
  blocking until an optimize is finished ... except that there is no code
  for optimizing in CloudSolrClient at all ... so I don't know where the
  different behavior would actually be happening.
 
  A more important question is, why are you optimising? Generally it isn't
  recommended anymore as it reduces the natural distribution of documents
  amongst segments and makes future merges more costly.
 
  Upayavira





docValues: Can we apply synonym

2015-05-26 Thread Aman Tandon
Hi,

We have some field *city* in which the docValues are enabled. We need to
add the synonym in that field so how could we do it?

With Regards
Aman Tandon


Re: YAJar

2015-05-26 Thread Robust Links
i have custom search components.

On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:

 Why is your app tied that closely to Solr? I can understand if you are
 talking about SolrJ, but normal usage you use a different application in
 a different JVM from Solr.

 Upayavira

 On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
  I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i
  noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is
  the pattern to override a jar version uploaded into jetty?
 
  I am using maven, and solr is being started the old way
 
  java -jar start.jar
  -Dsolr.solr.home=...
  -Djetty.home=...
 
  I tried to edit jetty's start.config (then run java
  -DSTART=/my/dir/start.config
  -jar start.jar) but got no where...
 
  any help would be much appreciated
 
  Peyman



Re: Solr relevancy score in percentage

2015-05-26 Thread Daniel Collins
The question is more why do you want your users to see the scores?

If they are wanting to affect ranking, what you want is the ability to run
the same query with different boosting and see the difference (2 result
sets), then see if the new ordering is better or worse.  What the
actual/raw score is is irrelevant to that, what is important is ordering?
If you want to show how good your results are, then as the link shows,
that is very difficult to measure (and very subjective!)

On 26 May 2015 at 09:37, Upayavira u...@odoko.co.uk wrote:

 Correct. The relevancy score simply states that we think result #1 is
 more relevant than result #2. It doesn't say that #1 is relevant.

 The score doesn't have any validity across queries either, as, for
 example, a different number of query terms will cause the score to
 change.

 Upayavira

 On Tue, May 26, 2015, at 08:57 AM, Zheng Lin Edwin Yeo wrote:
  Hi Arslan,
 
  Thank you for the link. That means we are not advisable to show anything
  that's related to the relevancy score, even though the default sorting of
  the result is by relevancy score? Since showing the raw relevancy score
  does not make any sense to the user since they won't understand what it
  means too.
 
 
  Regards,
  Edwin
 
 
 
  On 26 May 2015 at 14:16, Ahmet Arslan iori...@yahoo.com.invalid wrote:
 
   Hi Edwin,
  
   Somehow, it is not recommended to display the relevancy score in
   percentage:
   https://wiki.apache.org/lucene-java/ScoresAsPercentages
  
   Ahmet
  
  
  
   On Tuesday, May 26, 2015 8:34 AM, Zheng Lin Edwin Yeo 
   edwinye...@gmail.com wrote:
   Hi,
  
   Would like to check, does the new version of Solr allows this function
 of
   display the relevancy score in percentage?
   I understand from the older version that it is not able to, and the
 only
   way is to take the highest score and use that as 100%, and calculate
 other
   percentage from that number (For example if the max score is 10 and the
   next result has a score of 5, you would do (5 / 10) * 100 = 50%)
  
   Is there a better way to do this now? I'm using Solr 5.1
  
  
   Regards,
   Edwin
  



Re: YAJar

2015-05-26 Thread Daniel Collins
I guess this is one reason why the whole WAR approach is being removed!
Solr should be a black-box that you talk to, and get responses from.  What
it depends on and how it is deployed, should be irrelevant to you.

If you are wanting to override the version of guava that Solr uses, then
you'd have to rebuild Solr (can be done with maven) and manually update the
pom.xml to use guava 18.0, but why would you? You need to test Solr
completely (in case any guava bugs affect Solr), deal with any build issues
that arise (if guava changes any APIs), and cause yourself a world of pain,
for what gain?


On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:

 i have custom search components.

 On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:

  Why is your app tied that closely to Solr? I can understand if you are
  talking about SolrJ, but normal usage you use a different application in
  a different JVM from Solr.
 
  Upayavira
 
  On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
   I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
 question. i
   noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What
 is
   the pattern to override a jar version uploaded into jetty?
  
   I am using maven, and solr is being started the old way
  
   java -jar start.jar
   -Dsolr.solr.home=...
   -Djetty.home=...
  
   I tried to edit jetty's start.config (then run java
   -DSTART=/my/dir/start.config
   -jar start.jar) but got no where...
  
   any help would be much appreciated
  
   Peyman
 



Re: docValues: Can we apply synonym

2015-05-26 Thread Upayavira
To my understanding, docValues are just an uninverted index. That is, it
contains the terms that are generated at the end of an analysis chain.
Therefore, you simply enable docValues and include the
SynonymFilterFactory in your analysis.

Is that enough, or are you struggling with some other issue?

Upayavira

On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
 Hi,
 
 We have some field *city* in which the docValues are enabled. We need to
 add the synonym in that field so how could we do it?
 
 With Regards
 Aman Tandon


Re: Index optimize runs in background.

2015-05-26 Thread Upayavira
Modassar,

Are you saying that the reason you are optimising is because you have
been doing it for years? If this is the only reason, you should stop
doing it immediately. 

The one scenario in which optimisation still makes some sense is when
you reindex every night and optimise straight after. This will leave you
with a single segment which will search faster.

However, if you are doing a lot of indexing, especially with
deletes/updates, you will have merged your content into a single segment
which will later need to be merged. That merge will be costly as it will
involve copying the entire content of your large segment, which will
impact performance.

Before Solr 3.6, Optimisation was necessary and recommended. At that
point (or a little before) the TieredMergePolicy became the default, and
this made optimisation generally unnecessary.

Upayavira

On Mon, May 25, 2015, at 07:17 AM, Modassar Ather wrote:
 Thanks everybody for your replies.
 
 I have noticed the optimization running in background every time I
 indexed.
 This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient.
 Kindly
 share your findings on this issue.
 
 Our index has almost 100M documents running on SolrCloud. We have been
 optimizing the index after indexing for years and it has worked well for
 us.
 
 Thanks,
 Modassar
 
 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
 erickerick...@gmail.com
 wrote:
 
  Actually, I've recently seen very similar behavior in Solr 4.10.3, but
  involving hard commits openSearcher=true, see:
  https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
  reproduce this at will, sii.
 
  A unit test should be very simple to write though, maybe I can get to it
  today.
 
  Erick
 
 
 
  On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
  
  
   On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
   On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits after the
invocation of optimize and the optimization keeps on running in the
background.
Kindly let me know if it is per design and how can I make my indexer
  to
wait until the optimization is over. Is there a
  configuration/parameter I
need to set for the same.
   
Please note that the same indexer with cloudSolrServer.optimize(true,
  true,
1) on Solr-4.10 used to wait till the optimize was over before
  exiting.
  
   This is very odd, because I could not get HttpSolrServer to optimize in
   the background, even when that was what I wanted.
  
   I wondered if maybe the Cloud object behaves differently with regard to
   blocking until an optimize is finished ... except that there is no code
   for optimizing in CloudSolrClient at all ... so I don't know where the
   different behavior would actually be happening.
  
   A more important question is, why are you optimising? Generally it isn't
   recommended anymore as it reduces the natural distribution of documents
   amongst segments and makes future merges more costly.
  
   Upayavira
 


Re: YAJar

2015-05-26 Thread Upayavira
No, not really. Creating your own components that extend Solr is quite
acceptable - they can live in the Solr Home lib directory outside of the
war.

But really, if you are coding within Solr, you really need to use the
libraries that Solr uses. Or... create a JIRA ticket and help to upgrade
Solr to the next version of the library you need. Or explain here what
you are trying to so folks can help you find another way to achieve the
same.

Upayavira

On Tue, May 26, 2015, at 01:00 PM, Daniel Collins wrote:
 I guess this is one reason why the whole WAR approach is being removed!
 Solr should be a black-box that you talk to, and get responses from. 
 What
 it depends on and how it is deployed, should be irrelevant to you.
 
 If you are wanting to override the version of guava that Solr uses, then
 you'd have to rebuild Solr (can be done with maven) and manually update
 the
 pom.xml to use guava 18.0, but why would you? You need to test Solr
 completely (in case any guava bugs affect Solr), deal with any build
 issues
 that arise (if guava changes any APIs), and cause yourself a world of
 pain,
 for what gain?
 
 
 On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:
 
  i have custom search components.
 
  On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
 
   Why is your app tied that closely to Solr? I can understand if you are
   talking about SolrJ, but normal usage you use a different application in
   a different JVM from Solr.
  
   Upayavira
  
   On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
  question. i
noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What
  is
the pattern to override a jar version uploaded into jetty?
   
I am using maven, and solr is being started the old way
   
java -jar start.jar
-Dsolr.solr.home=...
-Djetty.home=...
   
I tried to edit jetty's start.config (then run java
-DSTART=/my/dir/start.config
-jar start.jar) but got no where...
   
any help would be much appreciated
   
Peyman
  
 


Removing characters like '\n \n' from indexing

2015-05-26 Thread Zheng Lin Edwin Yeo
Hi,

Is there a way to remove the special characters like \n during indexing of
the rich text documents.

I have quite alot of leading \n \n in front of my indexed content of rich
text documents due to the space and empty lines with the original
documents, and it's causing the content to be flooded with '\n \n' at the
start before the actual content comes in. This causes the content to look
ugly, and also takes up unnecessary bandwidth in the system.


Regards,
Edwin


Re: Removing characters like '\n \n' from indexing

2015-05-26 Thread Upayavira


On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote:
 Hi,
 
 Is there a way to remove the special characters like \n during indexing
 of
 the rich text documents.
 
 I have quite alot of leading \n \n in front of my indexed content of rich
 text documents due to the space and empty lines with the original
 documents, and it's causing the content to be flooded with '\n \n' at the
 start before the actual content comes in. This causes the content to look
 ugly, and also takes up unnecessary bandwidth in the system.

Where is this showing up?

If it is in search results, you must use an UpdateProcessor, as these
happen before fields are stored (E.g. RegexpReplaceProcessorFactory). 

If you are concerned about facet results, then you can do it in an
analysis chain, for example with a RegexpFilterFactory.

Upayavira


Re: When is too many fields in qf is too many?

2015-05-26 Thread Doug Turnbull
How you have tie is fine. Setting tie to 1 might give you reasonable
results. You could easily still have scores that are just always an order
of magnitude or two higher, but try it out!

BTW Anything you put in teh URL can also be put into a request handler.

If you ever just want to have a 15 minute conversation via hangout, happy
to chat with you :) Might be fun to think through your prob together.

-Doug

On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com wrote:

 Hi Doug,

 I'm back to this topic.  Unfortunately, due to my DB structer, and business
 need, I will not be able to search against a single field (i.e.: using
 copyField).  Thus, I have to use list of fields via qf.  Given this, I
 see you said above to use tie=1.0 will that, more or less, address this
 scoring issue?  Should tie=1.0 be set on the request handler like so:

   requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows20/int
str name=defTypeedismax/str
str name=qfF1 F2 F3 F4 ... ... .../str
float name=tie1.0/float
str name=fl_UNIQUE_FIELD_,score/str
str name=wtxml/str
str name=indenttrue/str
  /lst
   /requestHandler

 Or must tie be passed as part of the URL?

 Thanks

 Steve


 On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull 
 dturnb...@opensourceconnections.com wrote:

  Yeah a copyField into one could be a good space/time tradeoff. It can be
  more manageable to use an all field for both relevancy and performance,
 if
  you can handle the duplication of data.
 
  You could set tie=1.0, which effectively sums all the matches instead of
  picking the best match. You'll still have cases where one field's score
  might just happen to be far off of another, and thus dominating the
  summation. But something easy to try if you want to keep playing with
  dismax.
 
  -Doug
 
  On Wed, May 20, 2015 at 2:56 PM, Steven White swhite4...@gmail.com
  wrote:
 
   Hi Doug,
  
   Your blog write up on relevancy is very interesting, I didn't know
 this.
   Looks like I have to go back to my drawing board and figure out an
   alternative solution: somehow get those group-based-fields data into a
   single field using copyField.
  
   Thanks
  
   Steve
  
   On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull 
   dturnb...@opensourceconnections.com wrote:
  
Steven,
   
I'd be concerned about your relevance with that many qf fields.
 Dismax
takes a winner takes all point of view to search. Field scores can
  vary
by an order of magnitude (or even two) despite the attempts of query
normalization. You can read more here
   
   
  
 
 http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/
   
I'm about to win the blashphemer merit badge, but ad-hoc all-field
  like
searching over many fields is actually a good use case for
   Elasticsearch's
cross field queries.
   
   
  
 
 https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html
   
   
  
 
 http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/
   
It wouldn't be hard (and actually a great feature for the project) to
  get
the Lucene query associated with cross field search into Solr. You
  could
easily write a plugin to integrate it into a query parser:
   
   
  
 
 https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java
   
Hope that helps
-Doug
--
*Doug Turnbull **| *Search Relevance Consultant | OpenSource
  Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search http://manning.com/turnbull from Manning
Publications
This e-mail and all contents, including attachments, is considered to
  be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com
wrote:
   
 Hi everyone,

 My solution requires that users in group-A can only search against
 a
   set
of
 fields-A and users in group-B can only search against a set of
   fields-B,
 etc.  There can be several groups, as many as 100 even more.  To
 meet
this
 need, I build my search by passing in the list of fields via qf.
   What
 goes into qf can be large: as many as 1500 fields and each field
  name
 averages 15 characters long, in effect the data passed via qf
 will
  be
 over 20K characters.

 Given the above, beside the fact that a search for apple
  translating
to a
 20K characters passing over the network, what else within Solr and
Lucene I
 should be worried about if any?  Will I hit some kind of a limit?
  Will
 each search now require more CPU cycles?  

Re: docValues: Can we apply synonym

2015-05-26 Thread Alessandro Benedetti
mmm this is different !
Without any customisation, right now you could :
- use docValues to provide exact value facets.
- Than you can use a copy field, with the proper analysis, to search when a
user click on a filter !

So you will see in your facets :
Mumbai(3)
Bombay(2)

And when clicking you see 5 results.
A little bit misleading for the users …

On the other hand if you you want to apply the synonyms before, the
indexing pipeline ( because docValues field can not be analysed), I think
you should play with UpdateProcessors.

Cheers

2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 We are interested in using docValues for better memory utilization and
 speed.

 Currently we are faceting the search results on *city. *In city we have
 also added the synonym for cities like mumbai, bombay (These are Indian
 cities). So that result of mumbai is also eligible when somebody will
 applying filter of bombay on search results.

 I need this functionality to apply with docValues enabled field.

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  I checked in the Documentation to be sure, but apparently :
 
  DocValues are only available for specific field types. The types chosen
  determine the underlying Lucene docValue type that will be used. The
  available Solr field types are:
 
 - StrField and UUIDField.
 - If the field is single-valued (i.e., multi-valued is false), Lucene
will use the SORTED type.
- If the field is multi-valued, Lucene will use the SORTED_SET
 type.
 - Any Trie* numeric fields and EnumField.
 - If the field is single-valued (i.e., multi-valued is false), Lucene
will use the NUMERIC type.
- If the field is multi-valued, Lucene will use the SORTED_SET
 type.
 
 
  This means you should not analyse a field where DocValues is enabled.
  Can your explain us your use case ? Why are you interested in synonyms
  DocValues level ?
 
  Cheers
 
  2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:
 
   To my understanding, docValues are just an uninverted index. That is,
 it
   contains the terms that are generated at the end of an analysis chain.
   Therefore, you simply enable docValues and include the
   SynonymFilterFactory in your analysis.
  
   Is that enough, or are you struggling with some other issue?
  
   Upayavira
  
   On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
Hi,
   
We have some field *city* in which the docValues are enabled. We need
  to
add the synonym in that field so how could we do it?
   
With Regards
Aman Tandon
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: SolrCloud 4.8 - Transaction log size over 1GB

2015-05-26 Thread Vincenzo D'Amore
Thanks Erick for your willingness and patience,

if I understood well when autoCommit with openSearcher=true at first commit
(soft or hard) all new documents will be automatically available for search.
But when openSearcher=false, the commit will flush recent index changes to
stable storage, but does not cause a new searcher to be opened to make
those changes visible
https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-autoCommit
.

So, it is not clear what is this stable storage, where is and when the new
documents will be visible?
Only when at very end of indexing process my code will commit ?

Does it mean, let me say, that when openSearcher=false we have implicit
commit done by solrCloud autoCommit not visible to world and explicit
commit done by clients visible to world?




On Tue, May 26, 2015 at 2:55 AM, Erick Erickson erickerick...@gmail.com
wrote:

 The design is that the latest successfully flushed tlog file is kept
 for peer sync in SolrCloud mode. When a replica comes up, there's a
 chance that it's not very many docs behind. So, if possible, some of
 the docs are taken from the leader's tlog and replayed to the follower
 that's just been started. If the follower is too far out of sync, a
 full old-style replication is done. So there will always be a tlog
 file (and occasionally more than one if they're very small) kept
 around, even on successful commit. It doesn't matter if you have
 leaders and replicas or not, that's still the process that's followed.

 Please re-read the link I sent earlier. There's absolutely no reason
 your tlog files have to be so big! Really, set you autoCommit to, say,
 15 seconds and 10 docs and set openSearcher=false in your
 solrconfig.xml file and your tlog file that's kept around will be much
 smaller and they'll be available for peer sync..

 And if you really don't care about tlogs at all, just take this bit
 our of your solrconfig.xml

 updateLog
   str name=dir${solr.ulog.dir:}/str
   int name=${solr.ulog.numVersionBuckets:256}/int
 /updateLog



 Best,
 Erick

 On Mon, May 25, 2015 at 4:40 PM, Vincenzo D'Amore v.dam...@gmail.com
 wrote:
  Hi Erick,
 
  I have tried indexing code I have few times, this is the behaviour I have
  tried out:
 
  When an indexing process starts, even if one or more tlog file exists, a
  new tlog file is created and all the new documents are stored there.
  When indexing process ends and does an hard commit, older old tlog files
  are removed but the new one (the latest) remains.
 
  As far as I can see, since my indexing process every time loads few
  millions of documents, at end of process latest tlog file persist with
 all
  these documents there.
  So I have such big tlog files. Now the question is, why latest tlog file
  persist even if the code have done a hard commit.
  When an hard commit is done successfully, why should we keep latest tlog
  file?
 
 
 
  On Mon, May 25, 2015 at 7:24 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  OK, assuming you're not doing any commits at all until the very end,
  then the tlog contains all the docs for the _entire_ run. The article
  really doesn't care whether the commits come from the solrconfig.xml
  or SolrJ client or curl. The tlog simply is not truncated until a hard
  commit happens, no matter where it comes from.
 
  So here's what I'd do:
  1 set autoCommit in your solrconfig.xml with openSearcher=false for
  every minute. Then the problem will probably go away.
  or
  2 periodically issue a hard commit (openSearcher=false) from the
 client.
 
  Of the two, I _strongly_ recommend 1 as it's more graceful when
  there are multiple clents.
 
  Best,
  Erick
 
  On Mon, May 25, 2015 at 4:45 AM, Vincenzo D'Amore v.dam...@gmail.com
  wrote:
   Hi Erick, thanks for your support.
  
   Reading the post I realised that my scenario does not apply the
  autoCommit
   configuration, now we don't have autoCommit in our solrconfig.xml.
  
   We need docs are searchable only after the indexing process, and all
 the
   documents are committed only at end of index process.
  
   Now I don't understand why tlog files are so big, given that we have
 an
   hard commit at end of every indexing.
  
  
  
  
   On Sun, May 24, 2015 at 5:49 PM, Erick Erickson 
 erickerick...@gmail.com
  
   wrote:
  
   Vincenzo:
  
   Here's perhaps more than you want to know about hard commits, soft
   commits and transaction logs:
  
  
  
 
 http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
  
   Best,
   Erick
  
   On Sun, May 24, 2015 at 12:04 AM, Vincenzo D'Amore 
 v.dam...@gmail.com
   wrote:
Thanks Shawn for your prompt support.
   
Best regards,
Vincenzo
   
On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey apa...@elyograg.org
 
   wrote:
   
On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
 Thanks Shawn,

 may be this is a silly question, but I looked 

Re: Removing characters like '\n \n' from indexing

2015-05-26 Thread Jack Krupansky
Neither - it removes the characters before indexing. The distinction is
that if you remove them during indexing they will still appear in the
stored field values even if they are removed from the indexed values, but
by removing them before indexing, they will not appear in the stored field
values. Again, the distinction is between indexed field values and stored
field values.

-- Jack Krupansky

On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 It is showing up in the search results. Just to confirm, does this
 UpdateProcessor method remove the characters during indexing or only after
 indexing has been done?

 Regards,
 Edwin

 On 26 May 2015 at 21:30, Upayavira u...@odoko.co.uk wrote:

 
 
  On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote:
   Hi,
  
   Is there a way to remove the special characters like \n during indexing
   of
   the rich text documents.
  
   I have quite alot of leading \n \n in front of my indexed content of
 rich
   text documents due to the space and empty lines with the original
   documents, and it's causing the content to be flooded with '\n \n' at
 the
   start before the actual content comes in. This causes the content to
 look
   ugly, and also takes up unnecessary bandwidth in the system.
 
  Where is this showing up?
 
  If it is in search results, you must use an UpdateProcessor, as these
  happen before fields are stored (E.g. RegexpReplaceProcessorFactory).
 
  If you are concerned about facet results, then you can do it in an
  analysis chain, for example with a RegexpFilterFactory.
 
  Upayavira
 



Re: Running Solr 5.1.0 as a Service on Windows

2015-05-26 Thread Will Miller
I am using NSSM to start zookeeper as a service on windows (and for Solr too).

in NSSM I configured it to just point to to 
E:\zookeeper-3.4.6\bin\zkServer.cmd. 

As long as you can run that from the command line to validate that you have 
modified all of the zookeeper config files correctly, NSSM should have no 
problem starting up zookeeper.



Will Miller
Development Manager, eCommerce Services | Online Technology
462 Seventh Avenue, New York, NY, 10018
Office: 212.502.9323 | Cell: 317.653.0614
wmil...@fbbrands.com | www.fbbrands.com


From: Upayavira u...@odoko.co.uk
Sent: Monday, May 25, 2015 4:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Running Solr 5.1.0 as a Service on Windows

Zookeeper is just Java, so there's no reason why it can't be started in
Windows.

However, the startup scripts for Zookeeper on Windows are pathetic, so
you are much more on your own than you are on Linux.

There may be folks here who can answer your question (e.g. with Windows
specific startup scripts), or you might consider asking on the Zookeeper
mailing lists directly: https://zookeeper.apache.org/lists.html

Upayavira

On Mon, May 25, 2015, at 10:34 AM, Zheng Lin Edwin Yeo wrote:
 I've managed to get the Solr started as a Windows service after
 re-configuring the startup script, as I've previously missed out some of
 the custom configurations there.

 However, I still couldn't get the zookeeper to start the same way too.
 Are
 we able to use NSSM to start up zookeeper as a Microsoft Windows service
 too?


 Regards,
 Edwin



 On 25 May 2015 at 12:16, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:

  Hi,
 
  Has anyone tried to run Solr 5.1.0 as a Microsoft Windows service?
 
  i've tried to follow the steps from this website
  http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/, which
  uses NSSM.
 
  However, when I tried to start the service from the Component Services in
  the Windows Control Panel Administrative tools, I get the following message:
  Windows could not start the Solr5 service on Local Computer. The service
  did not return an error. This could be an internal Windows error or an
  internal service error.
 
  Is this the correct way to set it up, or is there other methods?
 
 
  Regards,
  Edwin
 
 


Re: Index of Hit in MultiValue fields

2015-05-26 Thread Alessandro Benedetti
We had a similar problem, when searching we wanted  to return the doc, and
for the multi-valued field we wanted to show only the value that matched
the search.
This was used for an advanced auto suggestion.

As Upaya specified, Highlighting was the good solution for us.
Managing in the UI only the unit of information coming from the
highlighting.

Cheers

2015-05-26 9:42 GMT+01:00 Upayavira u...@odoko.co.uk:

 The result that Solr returns is the document, not anything beneath, so
 no, you cannot do this.

 You could use highlighting, you could parse the output of explains
 (debug.explains.structured=true will help) to identify which field
 triggered the match. Alternatively, you could use block joins. Make a
 parent doc and each of your colours as child docs, then you could return
 which doc matched. You could use the ExpandComponent to retrieve details
 of the parent doc (http://heliosearch.org/expand-block-join/)

 Dunno if any of that helps.

 Upayavira

 On Tue, May 26, 2015, at 08:33 AM, Rodolfo Zitellini wrote:
  Dear List,
  In my schema I have a couple multi value fields and I would need to
  retrive
  the index of which one generated a match. For example let's suppose I
  have
  a text field like this with three values:
 
  MyField:
  [0] Red
  [1] Blue
  [2] Green
 
  Searching for Blue gets me the document, but I would also need the
  index
  (1) in the multi value field. I tried using the highligter but it is a
  bit
  hackish then to calculate the index. Is it possible without resorting to
  highlighting?
  Thanks!
  Rodolfo




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


AW: Solr 5.1 ignores SOLR_JAVA_MEM setting

2015-05-26 Thread Clemens Wyss DEV
Thx. When will 5.2 approximately be released?

-Ursprüngliche Nachricht-
Von: Timothy Potter [mailto:thelabd...@gmail.com] 
Gesendet: Dienstag, 26. Mai 2015 17:50
An: solr-user@lucene.apache.org
Betreff: Re: Solr 5.1 ignores SOLR_JAVA_MEM setting

Yes, same bug. Fixed in 5.2

On Tue, May 26, 2015 at 9:15 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 I also noticed that (see my post this morning) ...
 SOLR_OPTS=$SOLR_OPTS -Dsolr.allow.unsafe.resourceloading=true
 ...
 Is not taken into consideration (anymore). Same bug?


 -Ursprüngliche Nachricht-
 Von: Ere Maijala [mailto:ere.maij...@helsinki.fi]
 Gesendet: Mittwoch, 15. April 2015 09:25
 An: solr-user
 Betreff: Solr 5.1 ignores SOLR_JAVA_MEM setting

 Folks, just a quick heads-up that apparently Solr 5.1 introduced a change in 
 bin/solr that overrides SOLR_JAVA_MEM setting from solr.in.sh or environment. 
 I just filed https://issues.apache.org/jira/browse/SOLR-7392. The problem can 
 be circumvented by using SOLR_HEAP setting, e.g. SOLR_HEAP=32G, but it's 
 not mentioned in solr.in.sh by default.

 --Ere

 --
 Ere Maijala
 Kansalliskirjasto / The National Library of Finland


Re: Help/Guidance Needed : To reload kstem protword hash without full core reload

2015-05-26 Thread Aman Tandon
Hi Ahmet,

Can you please guide me to contribute for this *issue*. I haven't did this
before.

So I need to know...what should I need to know and how should I start..what
IDE or whatever you thought is need to know for a novice. I will be
thankful to you :)

With Regards
Aman Tandon

On Tue, May 19, 2015 at 8:10 PM, Aman Tandon amantandon...@gmail.com
wrote:

 That link you provided is exactly I want to do. Thanks Ahmet.

 With Regards
 Aman Tandon

 On Tue, May 19, 2015 at 5:06 PM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

 Hi Aman,

 changing protected words without reindexing makes little or no sense.
 Regarding protected words, trend is to use
 solr.KeywordMarkerFilterFactory.

 Instead I suggest you to work on a more general issue:
 https://issues.apache.org/jira/browse/SOLR-1307
 Ahmet


 On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com
 wrote:
 Please help or I am not clear here?

 With Regards
 Aman Tandon


 On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi,
 
  *Problem Statement: *I want to reload an hash of protwords created by
 the
  kstem filter without reloading the whole index core.
 
  *My Thought: *I am thinking to reload the hash by passing a parameter
  like *r=1 *to analysis url request (to somehow pass the parameter via
  url). And I am thinking if somehow by changing the IndexSchema.java I
 might
  can pass this parameter though my analyzer chain to KStemFilter. In
 which I
  will call the initializeDictionary function to make protwords hash again
  from the file if *r=1*, instead of making full core reload request.
 
  Please guide me, I know question might be stupid, the thought came in my
  mind and I want to share and ask some suggestions here. Is it possible
 or
  not and how can i achieve the same?
 
  I will be thankful for guidance.
 
  With Regards
  Aman Tandon
 




 On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com
 wrote:
 Please help or I am not clear here?

 With Regards
 Aman Tandon


 On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi,
 
  *Problem Statement: *I want to reload an hash of protwords created by
 the
  kstem filter without reloading the whole index core.
 
  *My Thought: *I am thinking to reload the hash by passing a parameter
  like *r=1 *to analysis url request (to somehow pass the parameter via
  url). And I am thinking if somehow by changing the IndexSchema.java I
 might
  can pass this parameter though my analyzer chain to KStemFilter. In
 which I
  will call the initializeDictionary function to make protwords hash again
  from the file if *r=1*, instead of making full core reload request.
 
  Please guide me, I know question might be stupid, the thought came in my
  mind and I want to share and ask some suggestions here. Is it possible
 or
  not and how can i achieve the same?
 
  I will be thankful for guidance.
 
  With Regards
  Aman Tandon
 





Re: Running Solr 5.1.0 as a Service on Windows

2015-05-26 Thread Timothy Potter
Hi Edwin,

Are there changes you recommend to bin/solr.cmd to make it easier to
work with NSSM? If so, please file a JIRA as I'd like to help make
that process easier.

Thanks.
Tim

On Mon, May 25, 2015 at 3:34 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 I've managed to get the Solr started as a Windows service after
 re-configuring the startup script, as I've previously missed out some of
 the custom configurations there.

 However, I still couldn't get the zookeeper to start the same way too. Are
 we able to use NSSM to start up zookeeper as a Microsoft Windows service
 too?


 Regards,
 Edwin



 On 25 May 2015 at 12:16, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 Hi,

 Has anyone tried to run Solr 5.1.0 as a Microsoft Windows service?

 i've tried to follow the steps from this website
 http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/, which
 uses NSSM.

 However, when I tried to start the service from the Component Services in
 the Windows Control Panel Administrative tools, I get the following message:
 Windows could not start the Solr5 service on Local Computer. The service
 did not return an error. This could be an internal Windows error or an
 internal service error.

 Is this the correct way to set it up, or is there other methods?


 Regards,
 Edwin




Re: docValues: Can we apply synonym

2015-05-26 Thread Aman Tandon
Okay So how could I do it with UpdateProcessors?

With Regards
Aman Tandon

On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 mmm this is different !
 Without any customisation, right now you could :
 - use docValues to provide exact value facets.
 - Than you can use a copy field, with the proper analysis, to search when a
 user click on a filter !

 So you will see in your facets :
 Mumbai(3)
 Bombay(2)

 And when clicking you see 5 results.
 A little bit misleading for the users …

 On the other hand if you you want to apply the synonyms before, the
 indexing pipeline ( because docValues field can not be analysed), I think
 you should play with UpdateProcessors.

 Cheers

 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  We are interested in using docValues for better memory utilization and
  speed.
 
  Currently we are faceting the search results on *city. *In city we have
  also added the synonym for cities like mumbai, bombay (These are Indian
  cities). So that result of mumbai is also eligible when somebody will
  applying filter of bombay on search results.
 
  I need this functionality to apply with docValues enabled field.
 
  With Regards
  Aman Tandon
 
  On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   I checked in the Documentation to be sure, but apparently :
  
   DocValues are only available for specific field types. The types chosen
   determine the underlying Lucene docValue type that will be used. The
   available Solr field types are:
  
  - StrField and UUIDField.
  - If the field is single-valued (i.e., multi-valued is false),
 Lucene
 will use the SORTED type.
 - If the field is multi-valued, Lucene will use the SORTED_SET
  type.
  - Any Trie* numeric fields and EnumField.
  - If the field is single-valued (i.e., multi-valued is false),
 Lucene
 will use the NUMERIC type.
 - If the field is multi-valued, Lucene will use the SORTED_SET
  type.
  
  
   This means you should not analyse a field where DocValues is enabled.
   Can your explain us your use case ? Why are you interested in synonyms
   DocValues level ?
  
   Cheers
  
   2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:
  
To my understanding, docValues are just an uninverted index. That is,
  it
contains the terms that are generated at the end of an analysis
 chain.
Therefore, you simply enable docValues and include the
SynonymFilterFactory in your analysis.
   
Is that enough, or are you struggling with some other issue?
   
Upayavira
   
On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
 Hi,

 We have some field *city* in which the docValues are enabled. We
 need
   to
 add the synonym in that field so how could we do it?

 With Regards
 Aman Tandon
   
  
  
  
   --
   --
  
   Benedetti Alessandro
   Visiting card : http://about.me/alessandro_benedetti
  
   Tyger, tyger burning bright
   In the forests of the night,
   What immortal hand or eye
   Could frame thy fearful symmetry?
  
   William Blake - Songs of Experience -1794 England
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: Solr relevancy score in percentage

2015-05-26 Thread Alessandro Benedetti
Honeslty the only case where the score in percentage could make sense, is
for the More Like This.
In that case Solr should provide that feature as we perfectly know that the
100 % similar score is a copy of the seed document.

If I am right, because of the MLT implementation, not taking care of the
identity score, we are getting there weird scores as well.
Maybe in there is the only place I would prefer a percentage.

Cheers

2015-05-26 16:23 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 Currently I've take the score that I get from Solr, and divide it by the
 maxScore, and multiply it by 100 to get the percentage. All these are done
 on the coding for the UI. The user will only see the percentage and will
 not know anything about the score. Since the score by itself is
 meaningless, so I don't think I should display that score of like 1.7 or
 0.2 on the UI, which could further confuse the user and raise alot more
 questions.

 Regards,
 Edwin



 On 26 May 2015 at 23:07, Shawn Heisey apa...@elyograg.org wrote:

  On 5/26/2015 8:10 AM, Zheng Lin Edwin Yeo wrote:
   We want the user to see how relevant the result is with respect to the
   search query entered, and not how good the results are.
   But I suspect a problem is that the 1st record will always be 100%,
   regardless of what is the score, as the 1st record score will always be
   equals to the maxScore.
 
  If you want to give your users *something* then simply display the score
  that you get from Solr.  I recommend that you DON'T give them maxScore,
  because they will be tempted to make the percentage calculation
  themselves to try and find meaning where there is none.  A clever user
  will be able to figure out maxScore for themselves simply by sorting on
  relevance and looking at the score on the top doc.
 
  When you get questions about what the number means, and you *WILL* get
  those questions, you can tell them that the number itself is meaningless
  and what matters is how the scores within a single result compare to
  each other -- exactly what you have been told here.
 
  Thanks,
  Shawn
 
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Index optimize runs in background.

2015-05-26 Thread Erick Erickson
No results yet. I finished the test harness last night (not really a
unit test, a stand-alone program that endlessly adds stuff and tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.


On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote:
 Hi,

 Erick you mentioned about a unit test to test the optimize running in
 background. Kindly share your findings if any.

 Thanks,
 Modassar

 On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
 wrote:

 Thanks everybody for your replies.

 I have noticed the optimization running in background every time I
 indexed. This is 5 node cluster with solr-5.1.0 and uses the
 CloudSolrClient. Kindly share your findings on this issue.

 Our index has almost 100M documents running on SolrCloud. We have been
 optimizing the index after indexing for years and it has worked well for
 us.

 Thanks,
 Modassar

 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Actually, I've recently seen very similar behavior in Solr 4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
 reproduce this at will, sii.

 A unit test should be very simple to write though, maybe I can get to it
 today.

 Erick



 On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
 
 
  On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
  On 5/21/2015 6:21 AM, Modassar Ather wrote:
   I am using Solr-5.1.0. I have an indexer class which invokes
   cloudSolrClient.optimize(true, true, 1). My indexer exits after the
   invocation of optimize and the optimization keeps on running in the
   background.
   Kindly let me know if it is per design and how can I make my indexer
 to
   wait until the optimization is over. Is there a
 configuration/parameter I
   need to set for the same.
  
   Please note that the same indexer with
 cloudSolrServer.optimize(true, true,
   1) on Solr-4.10 used to wait till the optimize was over before
 exiting.
 
  This is very odd, because I could not get HttpSolrServer to optimize in
  the background, even when that was what I wanted.
 
  I wondered if maybe the Cloud object behaves differently with regard to
  blocking until an optimize is finished ... except that there is no code
  for optimizing in CloudSolrClient at all ... so I don't know where the
  different behavior would actually be happening.
 
  A more important question is, why are you optimising? Generally it isn't
  recommended anymore as it reduces the natural distribution of documents
  amongst segments and makes future merges more costly.
 
  Upayavira





Re: Removing characters like '\n \n' from indexing

2015-05-26 Thread Alessandro Benedetti
I think this is still in topic,
Assuming we are using the Extract Update handler, I think the update
processor approach still applies.
But is it not possible to strip them directly with some extract request
handler param?


2015-05-26 16:33 GMT+01:00 Jack Krupansky jack.krupan...@gmail.com:

 Neither - it removes the characters before indexing. The distinction is
 that if you remove them during indexing they will still appear in the
 stored field values even if they are removed from the indexed values, but
 by removing them before indexing, they will not appear in the stored field
 values. Again, the distinction is between indexed field values and stored
 field values.

 -- Jack Krupansky

 On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
 wrote:

  It is showing up in the search results. Just to confirm, does this
  UpdateProcessor method remove the characters during indexing or only
 after
  indexing has been done?
 
  Regards,
  Edwin
 
  On 26 May 2015 at 21:30, Upayavira u...@odoko.co.uk wrote:
 
  
  
   On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote:
Hi,
   
Is there a way to remove the special characters like \n during
 indexing
of
the rich text documents.
   
I have quite alot of leading \n \n in front of my indexed content of
  rich
text documents due to the space and empty lines with the original
documents, and it's causing the content to be flooded with '\n \n' at
  the
start before the actual content comes in. This causes the content to
  look
ugly, and also takes up unnecessary bandwidth in the system.
  
   Where is this showing up?
  
   If it is in search results, you must use an UpdateProcessor, as these
   happen before fields are stored (E.g. RegexpReplaceProcessorFactory).
  
   If you are concerned about facet results, then you can do it in an
   analysis chain, for example with a RegexpFilterFactory.
  
   Upayavira
  
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: docValues: Can we apply synonym

2015-05-26 Thread Alessandro Benedetti
I checked in the Documentation to be sure, but apparently :

DocValues are only available for specific field types. The types chosen
determine the underlying Lucene docValue type that will be used. The
available Solr field types are:

   - StrField and UUIDField.
   - If the field is single-valued (i.e., multi-valued is false), Lucene
  will use the SORTED type.
  - If the field is multi-valued, Lucene will use the SORTED_SET type.
   - Any Trie* numeric fields and EnumField.
   - If the field is single-valued (i.e., multi-valued is false), Lucene
  will use the NUMERIC type.
  - If the field is multi-valued, Lucene will use the SORTED_SET type.


This means you should not analyse a field where DocValues is enabled.
Can your explain us your use case ? Why are you interested in synonyms
DocValues level ?

Cheers

2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:

 To my understanding, docValues are just an uninverted index. That is, it
 contains the terms that are generated at the end of an analysis chain.
 Therefore, you simply enable docValues and include the
 SynonymFilterFactory in your analysis.

 Is that enough, or are you struggling with some other issue?

 Upayavira

 On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
  Hi,
 
  We have some field *city* in which the docValues are enabled. We need to
  add the synonym in that field so how could we do it?
 
  With Regards
  Aman Tandon




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: docValues: Can we apply synonym

2015-05-26 Thread Alessandro Benedetti
I should investigate that, as usually synonyms are analysis stage.
A simple way is to replace the word with all its synonyms ( including
original word), but simply using this kind of processor will change the
token position and offsets, modifying the actual content of the document .

 I am from Bombay will become  I am from Bombay Mumbai which can be
annoying.
So a clever approach must be investigated.

2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 Okay So how could I do it with UpdateProcessors?

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  mmm this is different !
  Without any customisation, right now you could :
  - use docValues to provide exact value facets.
  - Than you can use a copy field, with the proper analysis, to search
 when a
  user click on a filter !
 
  So you will see in your facets :
  Mumbai(3)
  Bombay(2)
 
  And when clicking you see 5 results.
  A little bit misleading for the users …
 
  On the other hand if you you want to apply the synonyms before, the
  indexing pipeline ( because docValues field can not be analysed), I think
  you should play with UpdateProcessors.
 
  Cheers
 
  2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:
 
   We are interested in using docValues for better memory utilization and
   speed.
  
   Currently we are faceting the search results on *city. *In city we have
   also added the synonym for cities like mumbai, bombay (These are Indian
   cities). So that result of mumbai is also eligible when somebody will
   applying filter of bombay on search results.
  
   I need this functionality to apply with docValues enabled field.
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
I checked in the Documentation to be sure, but apparently :
   
DocValues are only available for specific field types. The types
 chosen
determine the underlying Lucene docValue type that will be used. The
available Solr field types are:
   
   - StrField and UUIDField.
   - If the field is single-valued (i.e., multi-valued is false),
  Lucene
  will use the SORTED type.
  - If the field is multi-valued, Lucene will use the SORTED_SET
   type.
   - Any Trie* numeric fields and EnumField.
   - If the field is single-valued (i.e., multi-valued is false),
  Lucene
  will use the NUMERIC type.
  - If the field is multi-valued, Lucene will use the SORTED_SET
   type.
   
   
This means you should not analyse a field where DocValues is enabled.
Can your explain us your use case ? Why are you interested in
 synonyms
DocValues level ?
   
Cheers
   
2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:
   
 To my understanding, docValues are just an uninverted index. That
 is,
   it
 contains the terms that are generated at the end of an analysis
  chain.
 Therefore, you simply enable docValues and include the
 SynonymFilterFactory in your analysis.

 Is that enough, or are you struggling with some other issue?

 Upayavira

 On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
  Hi,
 
  We have some field *city* in which the docValues are enabled. We
  need
to
  add the synonym in that field so how could we do it?
 
  With Regards
  Aman Tandon

   
   
   
--
--
   
Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti
   
Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?
   
William Blake - Songs of Experience -1794 England
   
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Help/Guidance Needed : To reload kstem protword hash without full core reload

2015-05-26 Thread Ahmet Arslan
Hi Aman,

Start with creating a jira account and vote/watch that issue.
Post on the issue to see if there is still interest on this.
Declare that you will be volunteer and ask kindly for guidance.
Creator of the issue or one the watchers may respond.
Try to digest ideas discussed on the issue. Rise yours. Collaborate.
Don't get discouraged if nobody responds, please remember that committers are 
busy people.

If you have implement something you want to share, upload a patch :
https://wiki.apache.org/solr/HowToContribute

Good luck,
Ahmet



On Tuesday, May 26, 2015 7:47 PM, Aman Tandon amantandon...@gmail.com wrote:
Hi Ahmet,

Can you please guide me to contribute for this *issue*. I haven't did this
before.

So I need to know...what should I need to know and how should I start..what
IDE or whatever you thought is need to know for a novice. I will be
thankful to you :)

With Regards
Aman Tandon


On Tue, May 19, 2015 at 8:10 PM, Aman Tandon amantandon...@gmail.com
wrote:

 That link you provided is exactly I want to do. Thanks Ahmet.

 With Regards
 Aman Tandon

 On Tue, May 19, 2015 at 5:06 PM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

 Hi Aman,

 changing protected words without reindexing makes little or no sense.
 Regarding protected words, trend is to use
 solr.KeywordMarkerFilterFactory.

 Instead I suggest you to work on a more general issue:
 https://issues.apache.org/jira/browse/SOLR-1307
 Ahmet


 On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com
 wrote:
 Please help or I am not clear here?

 With Regards
 Aman Tandon


 On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi,
 
  *Problem Statement: *I want to reload an hash of protwords created by
 the
  kstem filter without reloading the whole index core.
 
  *My Thought: *I am thinking to reload the hash by passing a parameter
  like *r=1 *to analysis url request (to somehow pass the parameter via
  url). And I am thinking if somehow by changing the IndexSchema.java I
 might
  can pass this parameter though my analyzer chain to KStemFilter. In
 which I
  will call the initializeDictionary function to make protwords hash again
  from the file if *r=1*, instead of making full core reload request.
 
  Please guide me, I know question might be stupid, the thought came in my
  mind and I want to share and ask some suggestions here. Is it possible
 or
  not and how can i achieve the same?
 
  I will be thankful for guidance.
 
  With Regards
  Aman Tandon
 




 On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com
 wrote:
 Please help or I am not clear here?

 With Regards
 Aman Tandon


 On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi,
 
  *Problem Statement: *I want to reload an hash of protwords created by
 the
  kstem filter without reloading the whole index core.
 
  *My Thought: *I am thinking to reload the hash by passing a parameter
  like *r=1 *to analysis url request (to somehow pass the parameter via
  url). And I am thinking if somehow by changing the IndexSchema.java I
 might
  can pass this parameter though my analyzer chain to KStemFilter. In
 which I
  will call the initializeDictionary function to make protwords hash again
  from the file if *r=1*, instead of making full core reload request.
 
  Please guide me, I know question might be stupid, the thought came in my
  mind and I want to share and ask some suggestions here. Is it possible
 or
  not and how can i achieve the same?
 
  I will be thankful for guidance.
 
  With Regards
  Aman Tandon
 





Re: No results for MoreLikeThis

2015-05-26 Thread John Blythe
Good call.

I'd previously attempted to use one of my fields, however, and it didn't
work. I then thought maybe broadening it to list anything could help. I'd
tried using the interestingTerms parameter as well.

Just for the sake of double checking before replying to your message,
though, I changed fl once more to the field I was hoping to find items
related to. I had a typo, though, and it worked. Instead of 'descript2' I
used 'descript' and voila. 'descript' is the indexed field, descript2 is a
copyField that uses a different analyzer (the one I'm actually using for
querying). I guess it only takes non-copy (and maybe non-dynamic?) fields
into account?

Thanks for any more information on that field specific approach/issue!

-- 
*John Blythe*
Product Manager  Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, May 26, 2015 at 4:16 PM, Upayavira u...@odoko.co.uk wrote:

 I doubt mlt.fl=* will work. Provide it with specific field names that
 should be used for the comparison.

 Upayavira

 On Tue, May 26, 2015, at 08:17 PM, John Blythe wrote:
  hi all,
 
  running a query like this, but am getting no results from the mlt
  handler:
  http://localhost:8983/solr/parts/select?q=mfgname2%3A+Acme
 
 Corp+descript2%3A+(SCREW+3.5X50MM)start=0rows=1fl=*%2C+scorewt=jsonindent=truemlt=truemlt.fl=*mlt.mintf=1mlt.mindf=1mlt.minwl=1
 
  been googling around without any luck as of yet. i have the
  requestHandler
  added to solrconfig.xml:
  requestHandler name=/mlt class=solr.MoreLikeThisHandler /
 
  and confirm it is loaded in the Plugins/Stats area of the solr admin
  interface.
 
  i've tried adding minimum word length, term frequency, etc. per a post or
  two i ran across where people had similar issues resolved by doing so,
  but
  it didn't help any.
 
  i'm not getting any errors, what puzzle piece am i missing in my
  configuration or query building?
 
  thanks!
 
  - john



Re: Problem with numeric math types and the dataimport handler

2015-05-26 Thread Shawn Heisey
On 5/20/2015 12:06 AM, Shalin Shekhar Mangar wrote:
 Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I
 fixed in 4.10. Can you try a newer release?

Looks like that didn't fix it.

I applied the patch on SOLR-6165 to the lucene_solr_4_9_1 tag, built a
new war, and when it was done, restarted Solr with that war.  The
solr-impl version in the dashboard is now


4.9-SNAPSHOT 1680667 - solr - 2015-05-20 14:23:11

After some importing with DIH and a Solr restart, this is the most
recent error in the log:

WARN  - 2015-05-26 14:28:09.289;
org.apache.solr.update.UpdateLog$LogReplayer; REYPLAY_ERR: IOException
reading log org.apache.solr.common.SolrException: ERROR:
[doc=usatphotos084190] Error adding field
'did'='java.math.BigInteger:1214221' msg=For input string:
java.math.BigInteger:1214221

Looks like we'll need a new issue.  I'm not in a position right now to
try a newer Solr version than 4.9.1.

Thanks,
Shawn



RE: NPE when faceting with MLT Query from upgrade to Solr 5.1.0

2015-05-26 Thread Jeroen Steggink
I have added a patch which should fix the problem.
https://issues.apache.org/jira/browse/SOLR-7559

Please review.

Cheers,
Jeroen

-Original Message-
From: Jeroen Steggink [mailto:jeroen.stegg...@contentstrategy.nl] 
Sent: dinsdag 26 mei 2015 21:45
To: solr-user@lucene.apache.org
Subject: RE: NPE when faceting with MLT Query from upgrade to Solr 5.1.0

Hi Tim,

I just ran into the exact same problem.
I see you created a bug in JIRA. I will check what is causing this and try and 
fix it.

https://issues.apache.org/jira/browse/SOLR-7559

Jeroen

-Original Message-
From: Tim H [mailto:th98...@gmail.com] 
Sent: maandag 18 mei 2015 17:28
To: solr-user@lucene.apache.org
Subject: NPE when faceting with MLT Query from upgrade to Solr 5.1.0

Hi everyone,

Recently I upgraded to solr 5.1.0.  When trying to generate facets using the 
more like this handler, I now get a a NullPointerException.  I never got this 
exception while using Solr 4.10.0 Details are below:

Stack Trace:
at
org.apache.solr.request.SimpleFacets.getHeatmapCounts(SimpleFacets.java:1555)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:284)
at
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:233)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


Query:
qt=/mlt
q=id:545dbb57b54c2403f286050e546dcdcab54cf2d074e5a2f7
mlt.mindf=5
mlt.mintf=1
mlt.minwl=3
mlt.boost=true
fq=storeid:546dcdcab54cf2d074e5a2f7
mlt.fl=overview_mlt,abstract_mlt,description_mlt,company_profile_mlt,bio_mlt
mlt.interestingTerms=details
fl=conceptid,score
sort=score desc
start=0
rows=2
facet=true
facet.field=tags
facet.field=locations
facet.mincount=1
facet.method=enum
facet.limit=-1
facet.sort=count

Schema.xml(relevant parts):
   field name=tags type=string indexed=true stored=true
multiValued=true /

   field name=locations type=string indexed=true stored=true
multiValued=true /

   dynamicField name=*_mlt stored=true indexed=true
type=text_general termVectors=true multiValued=true /


solrconfig.xml(relevant parts):
  requestHandler name=/mlt class=solr.MoreLikeThisHandler 
  /requestHandler


Re: No results for MoreLikeThis

2015-05-26 Thread Upayavira
I doubt mlt.fl=* will work. Provide it with specific field names that
should be used for the comparison.

Upayavira

On Tue, May 26, 2015, at 08:17 PM, John Blythe wrote:
 hi all,
 
 running a query like this, but am getting no results from the mlt
 handler:
 http://localhost:8983/solr/parts/select?q=mfgname2%3A+Acme
 Corp+descript2%3A+(SCREW+3.5X50MM)start=0rows=1fl=*%2C+scorewt=jsonindent=truemlt=truemlt.fl=*mlt.mintf=1mlt.mindf=1mlt.minwl=1
 
 been googling around without any luck as of yet. i have the
 requestHandler
 added to solrconfig.xml:
 requestHandler name=/mlt class=solr.MoreLikeThisHandler /
 
 and confirm it is loaded in the Plugins/Stats area of the solr admin
 interface.
 
 i've tried adding minimum word length, term frequency, etc. per a post or
 two i ran across where people had similar issues resolved by doing so,
 but
 it didn't help any.
 
 i'm not getting any errors, what puzzle piece am i missing in my
 configuration or query building?
 
 thanks!
 
 - john


Re: No results for MoreLikeThis

2015-05-26 Thread John Blythe
Just checked my schema.xml and think that the issue is resulting from the
stored property being set false on descript2 and true on descript.


-- 
*John Blythe*
Product Manager  Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, May 26, 2015 at 4:22 PM, John Blythe j...@curvolabs.com wrote:

 Good call.

 I'd previously attempted to use one of my fields, however, and it didn't
 work. I then thought maybe broadening it to list anything could help. I'd
 tried using the interestingTerms parameter as well.

 Just for the sake of double checking before replying to your message,
 though, I changed fl once more to the field I was hoping to find items
 related to. I had a typo, though, and it worked. Instead of 'descript2' I
 used 'descript' and voila. 'descript' is the indexed field, descript2 is a
 copyField that uses a different analyzer (the one I'm actually using for
 querying). I guess it only takes non-copy (and maybe non-dynamic?) fields
 into account?

 Thanks for any more information on that field specific approach/issue!

 --
 *John Blythe*
 Product Manager  Lead Developer

 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com

 58 Adams Ave
 Evansville, IN 47713

 On Tue, May 26, 2015 at 4:16 PM, Upayavira u...@odoko.co.uk wrote:

 I doubt mlt.fl=* will work. Provide it with specific field names that
 should be used for the comparison.

 Upayavira

 On Tue, May 26, 2015, at 08:17 PM, John Blythe wrote:
  hi all,
 
  running a query like this, but am getting no results from the mlt
  handler:
  http://localhost:8983/solr/parts/select?q=mfgname2%3A+Acme
 
 Corp+descript2%3A+(SCREW+3.5X50MM)start=0rows=1fl=*%2C+scorewt=jsonindent=truemlt=truemlt.fl=*mlt.mintf=1mlt.mindf=1mlt.minwl=1
 
  been googling around without any luck as of yet. i have the
  requestHandler
  added to solrconfig.xml:
  requestHandler name=/mlt class=solr.MoreLikeThisHandler /
 
  and confirm it is loaded in the Plugins/Stats area of the solr admin
  interface.
 
  i've tried adding minimum word length, term frequency, etc. per a post
 or
  two i ran across where people had similar issues resolved by doing so,
  but
  it didn't help any.
 
  i'm not getting any errors, what puzzle piece am i missing in my
  configuration or query building?
 
  thanks!
 
  - john





Re: [solr 5.1] Looking for full text + collation search field

2015-05-26 Thread Ahmet Arslan
Hi Bjorn,

Not 100% sure but, ICUFoldingFilter may suit for you. 
It also removes diacritics.

ahmet



On Thursday, May 21, 2015 3:20 PM, Björn Keil greifenschwi...@yahoo.de wrote:
Thanks for the advice. I have tried the field type and it seems to do what it 
is supposed to in combination with a lower case filter.

However, that raises another slight problem:

German umlauts are supposed to be treated slightly different for the purpose of 
searching than for sorting. For sorting a normal ICUCollationField with 
standard rules should suffice*, for the purpose of searching I cannot just 
replace an ü with a u, ü is supposed to equal ue, or, in terms of 
RuleBasedCollators, there is a secondary difference.

The rules for the collator include:

 ue , ü
 ae , ä
 oe , ö
 ss , ß

(again, that applies to searching *only*, for the sorting the rule  a , ä 
would apply, which is implied in the default rules.)

I can of course program a filter that does these rudimentary replacements 
myself, at best after the lower case filter but before the ASCIIFoldingFilter, 
I am just wondering if there isn't some way to use collations keys for full 
text search.




* even though Latin script and specifically German is my primary concern, I 
want some rudimentary support for all European languages, including ones that 
use Cyrillic and Greek script, special symbols in Icelandic that are not 
strictly Latin and ligatures like Æ, which collation keys could easily 
provide.






Ahmet Arslan iori...@yahoo.com.INVALID schrieb am 22:10 Mittwoch, 20.Mai 2015:
Hi Bjorn,

solr.ICUCollationField is useful for *sorting*, and you cannot sort on 
tokenized fields.

Your example looks like diacritics insensitive search. 
Please see : ASCIIFoldingFilterFactory

Ahmet



On Wednesday, May 20, 2015 2:53 PM, Björn Keil deeph...@web.de wrote:
Hello,

might anyone suggest a field type with which I may do both a full text
search (i.e. there is an analyzer including a tokenizer) and apply a
collation?

An example for what I want to do:
There is a field composer for which I passed the value Dvořák, Antonín.

I want the following queries to match:
composer:(antonín dvořák)
composer:dvorak
composer:dvorak, antonin

the latter case is possible using a solr.ICUCollationField, but that
type does not support an Analyzer and consequently no tokenizer, thus,
it is not helpful.

Unlike former versions of solr there do not seem to be
CollationKeyFilters which you may hang into the analyzer of a
solr.TextField... so I am a bit at a loss how I get *both* a tokenizer
and a collation at the same time.

Thanks for help,
Björn


Re: Solr relevancy score in percentage

2015-05-26 Thread Shawn Heisey
On 5/26/2015 8:10 AM, Zheng Lin Edwin Yeo wrote:
 We want the user to see how relevant the result is with respect to the
 search query entered, and not how good the results are.
 But I suspect a problem is that the 1st record will always be 100%,
 regardless of what is the score, as the 1st record score will always be
 equals to the maxScore.

If you want to give your users *something* then simply display the score
that you get from Solr.  I recommend that you DON'T give them maxScore,
because they will be tempted to make the percentage calculation
themselves to try and find meaning where there is none.  A clever user
will be able to figure out maxScore for themselves simply by sorting on
relevance and looking at the score on the top doc.

When you get questions about what the number means, and you *WILL* get
those questions, you can tell them that the number itself is meaningless
and what matters is how the scores within a single result compare to
each other -- exactly what you have been told here.

Thanks,
Shawn



Different behavior (bug?) for RegExTransformer in Solr5

2015-05-26 Thread Carrie Coy
I'm experimenting with Solr5 (5.1.0 1672403 - timpotter - 2015-04-09 
10:37:54).  In my custom DIH, I use a RegExTransformer to load several 
columns, which may or may not be present.  If present, the regexp 
matches and the data loads correctly in both Solr4 and 5. If not present 
and the regexp fails, the column is empty in Solr 4.   But in Solr5 it 
contains the original string to be matched.


In other words, in Solr 5.10, if the 'replaceWith' value is empty, 
'replaceWith' appears to revert to the original string.


Example:

Column 'data' contains:   column1:xxx,column3:yyy

DIH regexp:
field column=column1   regex=^.*column1:(.*?),.*$ 
replaceWith=$1  sourceColName=data /
field column=column2   regex=^.*column2:(.*?),.*$ 
replaceWith=$1  sourceColName=data /
field column=column3   regex=^.*column3:(.*?),.*$ 
replaceWith=$1  sourceColName=data /


solr4:
column1: xxx
column2:
column3: yyy

solr5:
column1:xxx
column2: column1:xxx,column3:yyy
column3: yyy


No results for MoreLikeThis

2015-05-26 Thread John Blythe
hi all,

running a query like this, but am getting no results from the mlt handler:
http://localhost:8983/solr/parts/select?q=mfgname2%3A+Acme
Corp+descript2%3A+(SCREW+3.5X50MM)start=0rows=1fl=*%2C+scorewt=jsonindent=truemlt=truemlt.fl=*mlt.mintf=1mlt.mindf=1mlt.minwl=1

been googling around without any luck as of yet. i have the requestHandler
added to solrconfig.xml:
requestHandler name=/mlt class=solr.MoreLikeThisHandler /

and confirm it is loaded in the Plugins/Stats area of the solr admin
interface.

i've tried adding minimum word length, term frequency, etc. per a post or
two i ran across where people had similar issues resolved by doing so, but
it didn't help any.

i'm not getting any errors, what puzzle piece am i missing in my
configuration or query building?

thanks!

- john


RE: NPE when faceting with MLT Query from upgrade to Solr 5.1.0

2015-05-26 Thread Jeroen Steggink
Hi Tim,

I just ran into the exact same problem.
I see you created a bug in JIRA. I will check what is causing this and try and 
fix it.

https://issues.apache.org/jira/browse/SOLR-7559

Jeroen

-Original Message-
From: Tim H [mailto:th98...@gmail.com] 
Sent: maandag 18 mei 2015 17:28
To: solr-user@lucene.apache.org
Subject: NPE when faceting with MLT Query from upgrade to Solr 5.1.0

Hi everyone,

Recently I upgraded to solr 5.1.0.  When trying to generate facets using the 
more like this handler, I now get a a NullPointerException.  I never got this 
exception while using Solr 4.10.0 Details are below:

Stack Trace:
at
org.apache.solr.request.SimpleFacets.getHeatmapCounts(SimpleFacets.java:1555)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:284)
at
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:233)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


Query:
qt=/mlt
q=id:545dbb57b54c2403f286050e546dcdcab54cf2d074e5a2f7
mlt.mindf=5
mlt.mintf=1
mlt.minwl=3
mlt.boost=true
fq=storeid:546dcdcab54cf2d074e5a2f7
mlt.fl=overview_mlt,abstract_mlt,description_mlt,company_profile_mlt,bio_mlt
mlt.interestingTerms=details
fl=conceptid,score
sort=score desc
start=0
rows=2
facet=true
facet.field=tags
facet.field=locations
facet.mincount=1
facet.method=enum
facet.limit=-1
facet.sort=count

Schema.xml(relevant parts):
   field name=tags type=string indexed=true stored=true
multiValued=true /

   field name=locations type=string indexed=true stored=true
multiValued=true /

   dynamicField name=*_mlt stored=true indexed=true
type=text_general termVectors=true multiValued=true /


solrconfig.xml(relevant parts):
  requestHandler name=/mlt class=solr.MoreLikeThisHandler 
  /requestHandler


Re: When is too many fields in qf is too many?

2015-05-26 Thread Steven White
Thanks Doug.  I might have to take you on the hangout offer.  Let me refine
the requirement further and if I still see the need, I will let you know.

Steve

On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:

 How you have tie is fine. Setting tie to 1 might give you reasonable
 results. You could easily still have scores that are just always an order
 of magnitude or two higher, but try it out!

 BTW Anything you put in teh URL can also be put into a request handler.

 If you ever just want to have a 15 minute conversation via hangout, happy
 to chat with you :) Might be fun to think through your prob together.

 -Doug

 On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
 wrote:

  Hi Doug,
 
  I'm back to this topic.  Unfortunately, due to my DB structer, and
 business
  need, I will not be able to search against a single field (i.e.: using
  copyField).  Thus, I have to use list of fields via qf.  Given this, I
  see you said above to use tie=1.0 will that, more or less, address this
  scoring issue?  Should tie=1.0 be set on the request handler like so:
 
requestHandler name=/select class=solr.SearchHandler
   lst name=defaults
 str name=echoParamsexplicit/str
 int name=rows20/int
 str name=defTypeedismax/str
 str name=qfF1 F2 F3 F4 ... ... .../str
 float name=tie1.0/float
 str name=fl_UNIQUE_FIELD_,score/str
 str name=wtxml/str
 str name=indenttrue/str
   /lst
/requestHandler
 
  Or must tie be passed as part of the URL?
 
  Thanks
 
  Steve
 
 
  On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull 
  dturnb...@opensourceconnections.com wrote:
 
   Yeah a copyField into one could be a good space/time tradeoff. It can
 be
   more manageable to use an all field for both relevancy and performance,
  if
   you can handle the duplication of data.
  
   You could set tie=1.0, which effectively sums all the matches instead
 of
   picking the best match. You'll still have cases where one field's score
   might just happen to be far off of another, and thus dominating the
   summation. But something easy to try if you want to keep playing with
   dismax.
  
   -Doug
  
   On Wed, May 20, 2015 at 2:56 PM, Steven White swhite4...@gmail.com
   wrote:
  
Hi Doug,
   
Your blog write up on relevancy is very interesting, I didn't know
  this.
Looks like I have to go back to my drawing board and figure out an
alternative solution: somehow get those group-based-fields data into
 a
single field using copyField.
   
Thanks
   
Steve
   
On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:
   
 Steven,

 I'd be concerned about your relevance with that many qf fields.
  Dismax
 takes a winner takes all point of view to search. Field scores
 can
   vary
 by an order of magnitude (or even two) despite the attempts of
 query
 normalization. You can read more here


   
  
 
 http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/

 I'm about to win the blashphemer merit badge, but ad-hoc
 all-field
   like
 searching over many fields is actually a good use case for
Elasticsearch's
 cross field queries.


   
  
 
 https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html


   
  
 
 http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/

 It wouldn't be hard (and actually a great feature for the project)
 to
   get
 the Lucene query associated with cross field search into Solr. You
   could
 easily write a plugin to integrate it into a query parser:


   
  
 
 https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java

 Hope that helps
 -Doug
 --
 *Doug Turnbull **| *Search Relevance Consultant | OpenSource
   Connections,
 LLC | 240.476.9983 | http://www.opensourceconnections.com
 Author: Relevant Search http://manning.com/turnbull from Manning
 Publications
 This e-mail and all contents, including attachments, is considered
 to
   be
 Company Confidential unless explicitly stated otherwise, regardless
 of whether attachments are marked as such.
 On Wed, May 20, 2015 at 8:27 AM, Steven White 
 swhite4...@gmail.com
 wrote:

  Hi everyone,
 
  My solution requires that users in group-A can only search
 against
  a
set
 of
  fields-A and users in group-B can only search against a set of
fields-B,
  etc.  There can be several groups, as many as 100 even more.  To
  meet
 this
  need, I build my search by passing in the list of fields via
 qf.
What
  goes into qf can be large: as many as 1500 fields and each
 field
   name

Re: Solr 5.1 ignores SOLR_JAVA_MEM setting

2015-05-26 Thread Erick Erickson
Probably in the next week or so. The branch has been cut, the release
is being put together/tested/finalized with the usual process.

On Tue, May 26, 2015 at 9:37 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 Thx. When will 5.2 approximately be released?

 -Ursprüngliche Nachricht-
 Von: Timothy Potter [mailto:thelabd...@gmail.com]
 Gesendet: Dienstag, 26. Mai 2015 17:50
 An: solr-user@lucene.apache.org
 Betreff: Re: Solr 5.1 ignores SOLR_JAVA_MEM setting

 Yes, same bug. Fixed in 5.2

 On Tue, May 26, 2015 at 9:15 AM, Clemens Wyss DEV clemens...@mysign.ch 
 wrote:
 I also noticed that (see my post this morning) ...
 SOLR_OPTS=$SOLR_OPTS -Dsolr.allow.unsafe.resourceloading=true
 ...
 Is not taken into consideration (anymore). Same bug?


 -Ursprüngliche Nachricht-
 Von: Ere Maijala [mailto:ere.maij...@helsinki.fi]
 Gesendet: Mittwoch, 15. April 2015 09:25
 An: solr-user
 Betreff: Solr 5.1 ignores SOLR_JAVA_MEM setting

 Folks, just a quick heads-up that apparently Solr 5.1 introduced a change in 
 bin/solr that overrides SOLR_JAVA_MEM setting from solr.in.sh or 
 environment. I just filed https://issues.apache.org/jira/browse/SOLR-7392. 
 The problem can be circumvented by using SOLR_HEAP setting, e.g. 
 SOLR_HEAP=32G, but it's not mentioned in solr.in.sh by default.

 --Ere

 --
 Ere Maijala
 Kansalliskirjasto / The National Library of Finland


Re: Problem with numeric math types and the dataimport handler

2015-05-26 Thread Shawn Heisey
On 5/26/2015 2:37 PM, Shawn Heisey wrote:
 On 5/20/2015 12:06 AM, Shalin Shekhar Mangar wrote:
 Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I
 fixed in 4.10. Can you try a newer release?
 Looks like that didn't fix it.

 I applied the patch on SOLR-6165 to the lucene_solr_4_9_1 tag, built a
 new war, and when it was done, restarted Solr with that war.  The
 solr-impl version in the dashboard is now

 4.9-SNAPSHOT 1680667 - solr - 2015-05-20 14:23:11

 After some importing with DIH and a Solr restart, this is the most
 recent error in the log:

 WARN  - 2015-05-26 14:28:09.289;
 org.apache.solr.update.UpdateLog$LogReplayer; REYPLAY_ERR: IOException
 reading log org.apache.solr.common.SolrException: ERROR:
 [doc=usatphotos084190] Error adding field
 'did'='java.math.BigInteger:1214221' msg=For input string:
 java.math.BigInteger:1214221

 Looks like we'll need a new issue.  I'm not in a position right now to
 try a newer Solr version than 4.9.1.

Given the way that I use Solr, this is honestly not really a major
problem for me.  Within five minutes or so after DIH is done, my
transaction logs will only contain data indexed via SolrJ, so this
problem will be gone.

The reason I think it's worth fixing, assuming it's still a problem in
5.2: There are people that use DIH *exclusively* for indexing, and for
those people, this could become a real problem, because tlog replay
won't work.

Thanks,
Shawn



Re: SolrCloud 4.8 - Transaction log size over 1GB

2015-05-26 Thread Erick Erickson
right, autoCommit (in solrconfig.xml) will
1 close the current Lucene segments and open a new one
2 close the tlog and start a new one.

Those actions are independent of whether openSearcher=true or false.
if (and only if) openSearcher=true, then the commits will be
immediately visible to a query.

So then it's up to you to issue either a soft commit (or hard commit
with openSearcher=true) at
some point for the docs to be visible.

bq: Does it mean, let me say, that when openSearcher=false we have implicit
commit done by solrCloud autoCommit not visible to world and explicit
commit done by clients visible to world?

Exactly. Now, this all assumes that you want all your recent indexing
to be visible at once. If you don't care whether documents become
visible while you're indexing but before the whole thing is done,
then:
1 set autoCommit with openSearcher=false to some fairly short
interval, say 1 minute.
2 set autoSoftCommit to some longer interval (say 5 minutes).

Now you don't have to do anything at all. Don't commit from the
client. Just wait 5 minutes after the indexing is done before
expecting to see _all_ the docs from your indexing run.

Do note one quirk though. Let's claim you're doing autoCommits with
openSearcher=false. If you restart Solr, then those changes _will_
become visible.

Best,
Erick

On Tue, May 26, 2015 at 9:33 AM, Vincenzo D'Amore v.dam...@gmail.com wrote:
 Thanks Erick for your willingness and patience,

 if I understood well when autoCommit with openSearcher=true at first commit
 (soft or hard) all new documents will be automatically available for search.
 But when openSearcher=false, the commit will flush recent index changes to
 stable storage, but does not cause a new searcher to be opened to make
 those changes visible
 https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-autoCommit
 .

 So, it is not clear what is this stable storage, where is and when the new
 documents will be visible?
 Only when at very end of indexing process my code will commit ?

 Does it mean, let me say, that when openSearcher=false we have implicit
 commit done by solrCloud autoCommit not visible to world and explicit
 commit done by clients visible to world?




 On Tue, May 26, 2015 at 2:55 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 The design is that the latest successfully flushed tlog file is kept
 for peer sync in SolrCloud mode. When a replica comes up, there's a
 chance that it's not very many docs behind. So, if possible, some of
 the docs are taken from the leader's tlog and replayed to the follower
 that's just been started. If the follower is too far out of sync, a
 full old-style replication is done. So there will always be a tlog
 file (and occasionally more than one if they're very small) kept
 around, even on successful commit. It doesn't matter if you have
 leaders and replicas or not, that's still the process that's followed.

 Please re-read the link I sent earlier. There's absolutely no reason
 your tlog files have to be so big! Really, set you autoCommit to, say,
 15 seconds and 10 docs and set openSearcher=false in your
 solrconfig.xml file and your tlog file that's kept around will be much
 smaller and they'll be available for peer sync..

 And if you really don't care about tlogs at all, just take this bit
 our of your solrconfig.xml

 updateLog
   str name=dir${solr.ulog.dir:}/str
   int name=${solr.ulog.numVersionBuckets:256}/int
 /updateLog



 Best,
 Erick

 On Mon, May 25, 2015 at 4:40 PM, Vincenzo D'Amore v.dam...@gmail.com
 wrote:
  Hi Erick,
 
  I have tried indexing code I have few times, this is the behaviour I have
  tried out:
 
  When an indexing process starts, even if one or more tlog file exists, a
  new tlog file is created and all the new documents are stored there.
  When indexing process ends and does an hard commit, older old tlog files
  are removed but the new one (the latest) remains.
 
  As far as I can see, since my indexing process every time loads few
  millions of documents, at end of process latest tlog file persist with
 all
  these documents there.
  So I have such big tlog files. Now the question is, why latest tlog file
  persist even if the code have done a hard commit.
  When an hard commit is done successfully, why should we keep latest tlog
  file?
 
 
 
  On Mon, May 25, 2015 at 7:24 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  OK, assuming you're not doing any commits at all until the very end,
  then the tlog contains all the docs for the _entire_ run. The article
  really doesn't care whether the commits come from the solrconfig.xml
  or SolrJ client or curl. The tlog simply is not truncated until a hard
  commit happens, no matter where it comes from.
 
  So here's what I'd do:
  1 set autoCommit in your solrconfig.xml with openSearcher=false for
  every minute. Then the problem will probably go away.
  or
  2 periodically issue a hard 

Re: No results for MoreLikeThis

2015-05-26 Thread Upayavira
If the source document is in your index (i.e. not passed in via
stream.body) then the fields used will either need to be stored or have
term vectors enabled. The latter is more performant.

Upayavira

On Tue, May 26, 2015, at 09:24 PM, John Blythe wrote:
 Just checked my schema.xml and think that the issue is resulting from the
 stored property being set false on descript2 and true on descript.
 
 
 -- 
 *John Blythe*
 Product Manager  Lead Developer
 
 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com
 
 58 Adams Ave
 Evansville, IN 47713
 
 On Tue, May 26, 2015 at 4:22 PM, John Blythe j...@curvolabs.com wrote:
 
  Good call.
 
  I'd previously attempted to use one of my fields, however, and it didn't
  work. I then thought maybe broadening it to list anything could help. I'd
  tried using the interestingTerms parameter as well.
 
  Just for the sake of double checking before replying to your message,
  though, I changed fl once more to the field I was hoping to find items
  related to. I had a typo, though, and it worked. Instead of 'descript2' I
  used 'descript' and voila. 'descript' is the indexed field, descript2 is a
  copyField that uses a different analyzer (the one I'm actually using for
  querying). I guess it only takes non-copy (and maybe non-dynamic?) fields
  into account?
 
  Thanks for any more information on that field specific approach/issue!
 
  --
  *John Blythe*
  Product Manager  Lead Developer
 
  251.605.3071 | j...@curvolabs.com
  www.curvolabs.com
 
  58 Adams Ave
  Evansville, IN 47713
 
  On Tue, May 26, 2015 at 4:16 PM, Upayavira u...@odoko.co.uk wrote:
 
  I doubt mlt.fl=* will work. Provide it with specific field names that
  should be used for the comparison.
 
  Upayavira
 
  On Tue, May 26, 2015, at 08:17 PM, John Blythe wrote:
   hi all,
  
   running a query like this, but am getting no results from the mlt
   handler:
   http://localhost:8983/solr/parts/select?q=mfgname2%3A+Acme
  
  Corp+descript2%3A+(SCREW+3.5X50MM)start=0rows=1fl=*%2C+scorewt=jsonindent=truemlt=truemlt.fl=*mlt.mintf=1mlt.mindf=1mlt.minwl=1
  
   been googling around without any luck as of yet. i have the
   requestHandler
   added to solrconfig.xml:
   requestHandler name=/mlt class=solr.MoreLikeThisHandler /
  
   and confirm it is loaded in the Plugins/Stats area of the solr admin
   interface.
  
   i've tried adding minimum word length, term frequency, etc. per a post
  or
   two i ran across where people had similar issues resolved by doing so,
   but
   it didn't help any.
  
   i'm not getting any errors, what puzzle piece am i missing in my
   configuration or query building?
  
   thanks!
  
   - john
 
 
 


Re: Solr relevancy score in percentage

2015-05-26 Thread Walter Underwood
On May 26, 2015, at 7:10 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 We want the user to see how relevant the result is with respect to the
 search query entered, and not how good the results are.

That is the meaning of the score from a probabilistic model search engine. Solr 
is not a probabilistic engine, it is vector space engine. The scores are 
fundamentally different.  Treating it as a probability of relevance will not 
work.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Sync failure after shard leader election when adding new replica.

2015-05-26 Thread Michael Roberts
Hi,

I have a SolrCloud setup, running 4.10.3. The setup consists of several cores, 
each with a single shard and initially each shard has a single replica (so, 
basically, one machine). I am using core discovery, and my deployment tools 
create an empty core on newly provisioned machines.

The scenario that I am testing is, Machine 1 is running and writes are 
occurring from my application to Solr. At some point, I stop Machine 1, and 
reconfigure my application to add Machine 2. Both machines are then started.

What I would expect to happen at this point, is Machine 2 cannot become leader 
because it is behind compared to Machine 1. Machine 2 would then restore from 
Machine 1.

However, looking at the logs. I am seeing Machine 2 become elected leader and 
fail the PeerRestore

2015-05-24 17:20:25.983 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to 
continue.
2015-05-24 17:20:25.983 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader - 
try and sync
2015-05-24 17:20:25.997 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
org.apache.solr.update.PeerSync - PeerSync: core=project 
url=http://10.32.132.64:11000/solr START 
replicas=[http://jchar-1:11000/solr/project/] nUpdates=100
2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
org.apache.solr.update.PeerSync - PeerSync: core=project 
url=http://10.32.132.64:11000/solr DONE.  We have no versions.  sync failed.
2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we have 
no versions - we can't sync in that case - we were active before, so become 
leader anyway
2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader: 
http://10.32.132.64:11000/solr/project/ shard1

What is the expected behavior here? What’s the best practice for adding a new 
replica? Should I have the SolrCloud running and do it via the Collections API 
or can I continue to use core discovery?

Thanks.




Re: YAJar

2015-05-26 Thread François Schiettecatte
Run whatever tests you want with 14.0.1, replace it with 18.0, rerun the tests 
and compare.

François

 On May 26, 2015, at 10:25 AM, Robust Links pey...@robustlinks.com wrote:
 
 by dumping you mean recompiling solr with guava 18?
 
 On Tue, May 26, 2015 at 10:22 AM, François Schiettecatte 
 fschietteca...@gmail.com wrote:
 
 Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a
 while ago and it worked fine for me.
 
 François
 
 On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com
 wrote:
 
 i have a minhash logic that uses guava 18.0 method that is not in guava
 14.0.1. This minhash logic is a separate maven project. I'm including it
 in
 my project via maven.the code is being used as a search component on the
 set of results. The logic goes through the search results and deletes
 duplicates. here is the solrconfig.xml
 
   requestHandler name=/select class=solr.SearchHandler
 default=true
 
 
  arr name=last-components
 
  strtvComponent/str
 
  strterms/str
 
  strminHashDedup/str
 
  /arr
 
   /requestHandler
 
 searchComponent name=minHashDedup
 class=com.xyz.DedupSearchHitsstr
 name=MAX_COMPARISONS5/str
 
 DedupSearchHits class is the one implementing the minhash (hence using
 guava 18). I start solr via the solr.in.sh script. The error I am
 getting
 is:
 
 
 Caused by: java.lang.NoSuchMethodError:
 
 com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode;
 
 at com.xyz.incrementToken(MinHashTokenFilter.java:54)
 
 at com.xyz.MinHash.calculate(MinHash.java:131)
 
 at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89)
 
 at
 com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74)
 
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338)
 
 at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297)
 
 at org.apache.solr.core.SolrCore.init(SolrCore.java:813)
 
 
 What is the best design to solve this problem?I understand the point of
 modularity but how can i include logic into solr that does result
 processing without loading that jar into solr?
 
 thank you
 
 
 On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com
 wrote:
 
 I guess this is one reason why the whole WAR approach is being removed!
 Solr should be a black-box that you talk to, and get responses from.
 What
 it depends on and how it is deployed, should be irrelevant to you.
 
 If you are wanting to override the version of guava that Solr uses, then
 you'd have to rebuild Solr (can be done with maven) and manually update
 the
 pom.xml to use guava 18.0, but why would you? You need to test Solr
 completely (in case any guava bugs affect Solr), deal with any build
 issues
 that arise (if guava changes any APIs), and cause yourself a world of
 pain,
 for what gain?
 
 
 On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:
 
 i have custom search components.
 
 On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
 
 Why is your app tied that closely to Solr? I can understand if you are
 talking about SolrJ, but normal usage you use a different application
 in
 a different JVM from Solr.
 
 Upayavira
 
 On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
 I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
 question. i
 noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What
 is
 the pattern to override a jar version uploaded into jetty?
 
 I am using maven, and solr is being started the old way
 
 java -jar start.jar
 -Dsolr.solr.home=...
 -Djetty.home=...
 
 I tried to edit jetty's start.config (then run java
 -DSTART=/my/dir/start.config
 -jar start.jar) but got no where...
 
 any help would be much appreciated
 
 Peyman
 
 
 
 
 



Re: YAJar

2015-05-26 Thread François Schiettecatte
What I am suggesting is that you set up a stand alone version of solr with 
14.0.1 and run some sort of test suite similar to what you would normally use 
solr for in your app. The replace the guava jar and re-run the tests. If all 
works well, and I suspect it will because it did for me, then you can use 18.0. 
Simple really.

François

 On May 26, 2015, at 10:30 AM, Robust Links pey...@robustlinks.com wrote:
 
 i can't run 14.0.1. that is the problem. 14 does not have the interfaces i
 need
 
 On Tue, May 26, 2015 at 10:28 AM, François Schiettecatte 
 fschietteca...@gmail.com wrote:
 
 Run whatever tests you want with 14.0.1, replace it with 18.0, rerun the
 tests and compare.
 
 François
 
 On May 26, 2015, at 10:25 AM, Robust Links pey...@robustlinks.com
 wrote:
 
 by dumping you mean recompiling solr with guava 18?
 
 On Tue, May 26, 2015 at 10:22 AM, François Schiettecatte 
 fschietteca...@gmail.com wrote:
 
 Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a
 while ago and it worked fine for me.
 
 François
 
 On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com
 wrote:
 
 i have a minhash logic that uses guava 18.0 method that is not in guava
 14.0.1. This minhash logic is a separate maven project. I'm including
 it
 in
 my project via maven.the code is being used as a search component on
 the
 set of results. The logic goes through the search results and deletes
 duplicates. here is the solrconfig.xml
 
  requestHandler name=/select class=solr.SearchHandler
 default=true
 
 
 arr name=last-components
 
 strtvComponent/str
 
 strterms/str
 
 strminHashDedup/str
 
 /arr
 
  /requestHandler
 
 searchComponent name=minHashDedup
 class=com.xyz.DedupSearchHitsstr
 name=MAX_COMPARISONS5/str
 
 DedupSearchHits class is the one implementing the minhash (hence using
 guava 18). I start solr via the solr.in.sh script. The error I am
 getting
 is:
 
 
 Caused by: java.lang.NoSuchMethodError:
 
 
 com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode;
 
 at com.xyz.incrementToken(MinHashTokenFilter.java:54)
 
 at com.xyz.MinHash.calculate(MinHash.java:131)
 
 at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89)
 
 at
 com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74)
 
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338)
 
 at
 org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297)
 
 at org.apache.solr.core.SolrCore.init(SolrCore.java:813)
 
 
 What is the best design to solve this problem?I understand the point of
 modularity but how can i include logic into solr that does result
 processing without loading that jar into solr?
 
 thank you
 
 
 On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com
 
 wrote:
 
 I guess this is one reason why the whole WAR approach is being
 removed!
 Solr should be a black-box that you talk to, and get responses from.
 What
 it depends on and how it is deployed, should be irrelevant to you.
 
 If you are wanting to override the version of guava that Solr uses,
 then
 you'd have to rebuild Solr (can be done with maven) and manually
 update
 the
 pom.xml to use guava 18.0, but why would you? You need to test Solr
 completely (in case any guava bugs affect Solr), deal with any build
 issues
 that arise (if guava changes any APIs), and cause yourself a world of
 pain,
 for what gain?
 
 
 On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:
 
 i have custom search components.
 
 On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
 
 Why is your app tied that closely to Solr? I can understand if you
 are
 talking about SolrJ, but normal usage you use a different
 application
 in
 a different JVM from Solr.
 
 Upayavira
 
 On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
 I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
 question. i
 noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0.
 What
 is
 the pattern to override a jar version uploaded into jetty?
 
 I am using maven, and solr is being started the old way
 
 java -jar start.jar
 -Dsolr.solr.home=...
 -Djetty.home=...
 
 I tried to edit jetty's start.config (then run java
 -DSTART=/my/dir/start.config
 -jar start.jar) but got no where...
 
 any help would be much appreciated
 
 Peyman
 
 
 
 
 
 
 



Re: Solr relevancy score in percentage

2015-05-26 Thread Zheng Lin Edwin Yeo
We want the user to see how relevant the result is with respect to the
search query entered, and not how good the results are.
But I suspect a problem is that the 1st record will always be 100%,
regardless of what is the score, as the 1st record score will always be
equals to the maxScore.

Regards,
Edwin


On 26 May 2015 at 19:36, Daniel Collins danwcoll...@gmail.com wrote:

 The question is more why do you want your users to see the scores?

 If they are wanting to affect ranking, what you want is the ability to run
 the same query with different boosting and see the difference (2 result
 sets), then see if the new ordering is better or worse.  What the
 actual/raw score is is irrelevant to that, what is important is ordering?
 If you want to show how good your results are, then as the link shows,
 that is very difficult to measure (and very subjective!)

 On 26 May 2015 at 09:37, Upayavira u...@odoko.co.uk wrote:

  Correct. The relevancy score simply states that we think result #1 is
  more relevant than result #2. It doesn't say that #1 is relevant.
 
  The score doesn't have any validity across queries either, as, for
  example, a different number of query terms will cause the score to
  change.
 
  Upayavira
 
  On Tue, May 26, 2015, at 08:57 AM, Zheng Lin Edwin Yeo wrote:
   Hi Arslan,
  
   Thank you for the link. That means we are not advisable to show
 anything
   that's related to the relevancy score, even though the default sorting
 of
   the result is by relevancy score? Since showing the raw relevancy score
   does not make any sense to the user since they won't understand what it
   means too.
  
  
   Regards,
   Edwin
  
  
  
   On 26 May 2015 at 14:16, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:
  
Hi Edwin,
   
Somehow, it is not recommended to display the relevancy score in
percentage:
https://wiki.apache.org/lucene-java/ScoresAsPercentages
   
Ahmet
   
   
   
On Tuesday, May 26, 2015 8:34 AM, Zheng Lin Edwin Yeo 
edwinye...@gmail.com wrote:
Hi,
   
Would like to check, does the new version of Solr allows this
 function
  of
display the relevancy score in percentage?
I understand from the older version that it is not able to, and the
  only
way is to take the highest score and use that as 100%, and calculate
  other
percentage from that number (For example if the max score is 10 and
 the
next result has a score of 5, you would do (5 / 10) * 100 = 50%)
   
Is there a better way to do this now? I'm using Solr 5.1
   
   
Regards,
Edwin
   
 



Re: YAJar

2015-05-26 Thread Upayavira
I'm not aware of a way you can do this, other than upgrading the Guava
in Solr itself.

Or rather, you'd need to create your own classloader and load your own
instance of Guava using that rather than the default classloader. That's
possible, but would be rather ugly and complex.

I'd say research what's required to upgrade the Guava in Solr.

Upayavira

On Tue, May 26, 2015, at 03:11 PM, Robust Links wrote:
 i have a minhash logic that uses guava 18.0 method that is not in guava
 14.0.1. This minhash logic is a separate maven project. I'm including it
 in
 my project via maven.the code is being used as a search component on the
 set of results. The logic goes through the search results and deletes
 duplicates. here is the solrconfig.xml
 
 requestHandler name=/select class=solr.SearchHandler
 default=true
 
 
arr name=last-components
 
strtvComponent/str
 
strterms/str
 
strminHashDedup/str
 
/arr
 
 /requestHandler
 
   searchComponent name=minHashDedup
   class=com.xyz.DedupSearchHitsstr
 name=MAX_COMPARISONS5/str
 
 DedupSearchHits class is the one implementing the minhash (hence using
 guava 18). I start solr via the solr.in.sh script. The error I am getting
 is:
 
 
 Caused by: java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode;
 
 at com.xyz.incrementToken(MinHashTokenFilter.java:54)
 
 at com.xyz.MinHash.calculate(MinHash.java:131)
 
 at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89)
 
 at
 com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74)
 
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338)
 
 at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297)
 
 at org.apache.solr.core.SolrCore.init(SolrCore.java:813)
 
 
 What is the best design to solve this problem?I understand the point of
 modularity but how can i include logic into solr that does result
 processing without loading that jar into solr?
 
 thank you
 
 
 On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com
 wrote:
 
  I guess this is one reason why the whole WAR approach is being removed!
  Solr should be a black-box that you talk to, and get responses from.  What
  it depends on and how it is deployed, should be irrelevant to you.
 
  If you are wanting to override the version of guava that Solr uses, then
  you'd have to rebuild Solr (can be done with maven) and manually update the
  pom.xml to use guava 18.0, but why would you? You need to test Solr
  completely (in case any guava bugs affect Solr), deal with any build issues
  that arise (if guava changes any APIs), and cause yourself a world of pain,
  for what gain?
 
 
  On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:
 
   i have custom search components.
  
   On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
  
Why is your app tied that closely to Solr? I can understand if you are
talking about SolrJ, but normal usage you use a different application
  in
a different JVM from Solr.
   
Upayavira
   
On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
 I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
   question. i
 noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What
   is
 the pattern to override a jar version uploaded into jetty?

 I am using maven, and solr is being started the old way

 java -jar start.jar
 -Dsolr.solr.home=...
 -Djetty.home=...

 I tried to edit jetty's start.config (then run java
 -DSTART=/my/dir/start.config
 -jar start.jar) but got no where...

 any help would be much appreciated

 Peyman
   
  
 


Re: YAJar

2015-05-26 Thread François Schiettecatte
Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a while ago 
and it worked fine for me.

François

 On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com wrote:
 
 i have a minhash logic that uses guava 18.0 method that is not in guava
 14.0.1. This minhash logic is a separate maven project. I'm including it in
 my project via maven.the code is being used as a search component on the
 set of results. The logic goes through the search results and deletes
 duplicates. here is the solrconfig.xml
 
requestHandler name=/select class=solr.SearchHandler default=true
 
 
   arr name=last-components
 
   strtvComponent/str
 
   strterms/str
 
   strminHashDedup/str
 
   /arr
 
/requestHandler
 
  searchComponent name=minHashDedup class=com.xyz.DedupSearchHitsstr
 name=MAX_COMPARISONS5/str
 
 DedupSearchHits class is the one implementing the minhash (hence using
 guava 18). I start solr via the solr.in.sh script. The error I am getting
 is:
 
 
 Caused by: java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode;
 
 at com.xyz.incrementToken(MinHashTokenFilter.java:54)
 
 at com.xyz.MinHash.calculate(MinHash.java:131)
 
 at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89)
 
 at com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74)
 
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338)
 
 at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297)
 
 at org.apache.solr.core.SolrCore.init(SolrCore.java:813)
 
 
 What is the best design to solve this problem?I understand the point of
 modularity but how can i include logic into solr that does result
 processing without loading that jar into solr?
 
 thank you
 
 
 On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com
 wrote:
 
 I guess this is one reason why the whole WAR approach is being removed!
 Solr should be a black-box that you talk to, and get responses from.  What
 it depends on and how it is deployed, should be irrelevant to you.
 
 If you are wanting to override the version of guava that Solr uses, then
 you'd have to rebuild Solr (can be done with maven) and manually update the
 pom.xml to use guava 18.0, but why would you? You need to test Solr
 completely (in case any guava bugs affect Solr), deal with any build issues
 that arise (if guava changes any APIs), and cause yourself a world of pain,
 for what gain?
 
 
 On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:
 
 i have custom search components.
 
 On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
 
 Why is your app tied that closely to Solr? I can understand if you are
 talking about SolrJ, but normal usage you use a different application
 in
 a different JVM from Solr.
 
 Upayavira
 
 On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
 I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
 question. i
 noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What
 is
 the pattern to override a jar version uploaded into jetty?
 
 I am using maven, and solr is being started the old way
 
 java -jar start.jar
 -Dsolr.solr.home=...
 -Djetty.home=...
 
 I tried to edit jetty's start.config (then run java
 -DSTART=/my/dir/start.config
 -jar start.jar) but got no where...
 
 any help would be much appreciated
 
 Peyman
 
 
 



Re: Removing characters like '\n \n' from indexing

2015-05-26 Thread Zheng Lin Edwin Yeo
It is showing up in the search results. Just to confirm, does this
UpdateProcessor method remove the characters during indexing or only after
indexing has been done?

Regards,
Edwin

On 26 May 2015 at 21:30, Upayavira u...@odoko.co.uk wrote:



 On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote:
  Hi,
 
  Is there a way to remove the special characters like \n during indexing
  of
  the rich text documents.
 
  I have quite alot of leading \n \n in front of my indexed content of rich
  text documents due to the space and empty lines with the original
  documents, and it's causing the content to be flooded with '\n \n' at the
  start before the actual content comes in. This causes the content to look
  ugly, and also takes up unnecessary bandwidth in the system.

 Where is this showing up?

 If it is in search results, you must use an UpdateProcessor, as these
 happen before fields are stored (E.g. RegexpReplaceProcessorFactory).

 If you are concerned about facet results, then you can do it in an
 analysis chain, for example with a RegexpFilterFactory.

 Upayavira



Re: Solr relevancy score in percentage

2015-05-26 Thread Zheng Lin Edwin Yeo
Currently I've take the score that I get from Solr, and divide it by the
maxScore, and multiply it by 100 to get the percentage. All these are done
on the coding for the UI. The user will only see the percentage and will
not know anything about the score. Since the score by itself is
meaningless, so I don't think I should display that score of like 1.7 or
0.2 on the UI, which could further confuse the user and raise alot more
questions.

Regards,
Edwin



On 26 May 2015 at 23:07, Shawn Heisey apa...@elyograg.org wrote:

 On 5/26/2015 8:10 AM, Zheng Lin Edwin Yeo wrote:
  We want the user to see how relevant the result is with respect to the
  search query entered, and not how good the results are.
  But I suspect a problem is that the 1st record will always be 100%,
  regardless of what is the score, as the 1st record score will always be
  equals to the maxScore.

 If you want to give your users *something* then simply display the score
 that you get from Solr.  I recommend that you DON'T give them maxScore,
 because they will be tempted to make the percentage calculation
 themselves to try and find meaning where there is none.  A clever user
 will be able to figure out maxScore for themselves simply by sorting on
 relevance and looking at the score on the top doc.

 When you get questions about what the number means, and you *WILL* get
 those questions, you can tell them that the number itself is meaningless
 and what matters is how the scores within a single result compare to
 each other -- exactly what you have been told here.

 Thanks,
 Shawn




Re: YAJar

2015-05-26 Thread Robust Links
by dumping you mean recompiling solr with guava 18?

On Tue, May 26, 2015 at 10:22 AM, François Schiettecatte 
fschietteca...@gmail.com wrote:

 Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a
 while ago and it worked fine for me.

 François

  On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com
 wrote:
 
  i have a minhash logic that uses guava 18.0 method that is not in guava
  14.0.1. This minhash logic is a separate maven project. I'm including it
 in
  my project via maven.the code is being used as a search component on the
  set of results. The logic goes through the search results and deletes
  duplicates. here is the solrconfig.xml
 
 requestHandler name=/select class=solr.SearchHandler
 default=true
 
 
arr name=last-components
 
strtvComponent/str
 
strterms/str
 
strminHashDedup/str
 
/arr
 
 /requestHandler
 
   searchComponent name=minHashDedup
 class=com.xyz.DedupSearchHitsstr
  name=MAX_COMPARISONS5/str
 
  DedupSearchHits class is the one implementing the minhash (hence using
  guava 18). I start solr via the solr.in.sh script. The error I am
 getting
  is:
 
 
  Caused by: java.lang.NoSuchMethodError:
 
 com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode;
 
  at com.xyz.incrementToken(MinHashTokenFilter.java:54)
 
  at com.xyz.MinHash.calculate(MinHash.java:131)
 
  at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89)
 
  at
 com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74)
 
  at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619)
 
  at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311)
 
  at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305)
 
  at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338)
 
  at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297)
 
  at org.apache.solr.core.SolrCore.init(SolrCore.java:813)
 
 
  What is the best design to solve this problem?I understand the point of
  modularity but how can i include logic into solr that does result
  processing without loading that jar into solr?
 
  thank you
 
 
  On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com
  wrote:
 
  I guess this is one reason why the whole WAR approach is being removed!
  Solr should be a black-box that you talk to, and get responses from.
 What
  it depends on and how it is deployed, should be irrelevant to you.
 
  If you are wanting to override the version of guava that Solr uses, then
  you'd have to rebuild Solr (can be done with maven) and manually update
 the
  pom.xml to use guava 18.0, but why would you? You need to test Solr
  completely (in case any guava bugs affect Solr), deal with any build
 issues
  that arise (if guava changes any APIs), and cause yourself a world of
 pain,
  for what gain?
 
 
  On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:
 
  i have custom search components.
 
  On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
 
  Why is your app tied that closely to Solr? I can understand if you are
  talking about SolrJ, but normal usage you use a different application
  in
  a different JVM from Solr.
 
  Upayavira
 
  On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
  I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
  question. i
  noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What
  is
  the pattern to override a jar version uploaded into jetty?
 
  I am using maven, and solr is being started the old way
 
  java -jar start.jar
  -Dsolr.solr.home=...
  -Djetty.home=...
 
  I tried to edit jetty's start.config (then run java
  -DSTART=/my/dir/start.config
  -jar start.jar) but got no where...
 
  any help would be much appreciated
 
  Peyman
 
 
 




Re: YAJar

2015-05-26 Thread Robust Links
i can't run 14.0.1. that is the problem. 14 does not have the interfaces i
need

On Tue, May 26, 2015 at 10:28 AM, François Schiettecatte 
fschietteca...@gmail.com wrote:

 Run whatever tests you want with 14.0.1, replace it with 18.0, rerun the
 tests and compare.

 François

  On May 26, 2015, at 10:25 AM, Robust Links pey...@robustlinks.com
 wrote:
 
  by dumping you mean recompiling solr with guava 18?
 
  On Tue, May 26, 2015 at 10:22 AM, François Schiettecatte 
  fschietteca...@gmail.com wrote:
 
  Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a
  while ago and it worked fine for me.
 
  François
 
  On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com
  wrote:
 
  i have a minhash logic that uses guava 18.0 method that is not in guava
  14.0.1. This minhash logic is a separate maven project. I'm including
 it
  in
  my project via maven.the code is being used as a search component on
 the
  set of results. The logic goes through the search results and deletes
  duplicates. here is the solrconfig.xml
 
requestHandler name=/select class=solr.SearchHandler
  default=true
 
 
   arr name=last-components
 
   strtvComponent/str
 
   strterms/str
 
   strminHashDedup/str
 
   /arr
 
/requestHandler
 
  searchComponent name=minHashDedup
  class=com.xyz.DedupSearchHitsstr
  name=MAX_COMPARISONS5/str
 
  DedupSearchHits class is the one implementing the minhash (hence using
  guava 18). I start solr via the solr.in.sh script. The error I am
  getting
  is:
 
 
  Caused by: java.lang.NoSuchMethodError:
 
 
 com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode;
 
  at com.xyz.incrementToken(MinHashTokenFilter.java:54)
 
  at com.xyz.MinHash.calculate(MinHash.java:131)
 
  at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89)
 
  at
  com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74)
 
  at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619)
 
  at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311)
 
  at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305)
 
  at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338)
 
  at
 org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297)
 
  at org.apache.solr.core.SolrCore.init(SolrCore.java:813)
 
 
  What is the best design to solve this problem?I understand the point of
  modularity but how can i include logic into solr that does result
  processing without loading that jar into solr?
 
  thank you
 
 
  On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com
 
  wrote:
 
  I guess this is one reason why the whole WAR approach is being
 removed!
  Solr should be a black-box that you talk to, and get responses from.
  What
  it depends on and how it is deployed, should be irrelevant to you.
 
  If you are wanting to override the version of guava that Solr uses,
 then
  you'd have to rebuild Solr (can be done with maven) and manually
 update
  the
  pom.xml to use guava 18.0, but why would you? You need to test Solr
  completely (in case any guava bugs affect Solr), deal with any build
  issues
  that arise (if guava changes any APIs), and cause yourself a world of
  pain,
  for what gain?
 
 
  On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:
 
  i have custom search components.
 
  On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
 
  Why is your app tied that closely to Solr? I can understand if you
 are
  talking about SolrJ, but normal usage you use a different
 application
  in
  a different JVM from Solr.
 
  Upayavira
 
  On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
  I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
  question. i
  noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0.
 What
  is
  the pattern to override a jar version uploaded into jetty?
 
  I am using maven, and solr is being started the old way
 
  java -jar start.jar
  -Dsolr.solr.home=...
  -Djetty.home=...
 
  I tried to edit jetty's start.config (then run java
  -DSTART=/my/dir/start.config
  -jar start.jar) but got no where...
 
  any help would be much appreciated
 
  Peyman
 
 
 
 
 




Re: Index optimize runs in background.

2015-05-26 Thread Shawn Heisey
On 5/26/2015 6:29 AM, Upayavira wrote:
 Are you saying that the reason you are optimising is because you have
 been doing it for years? If this is the only reason, you should stop
 doing it immediately. 

 The one scenario in which optimisation still makes some sense is when
 you reindex every night and optimise straight after. This will leave you
 with a single segment which will search faster.

 However, if you are doing a lot of indexing, especially with
 deletes/updates, you will have merged your content into a single segment
 which will later need to be merged. That merge will be costly as it will
 involve copying the entire content of your large segment, which will
 impact performance.

 Before Solr 3.6, Optimisation was necessary and recommended. At that
 point (or a little before) the TieredMergePolicy became the default, and
 this made optimisation generally unnecessary.

In general, I concur with this advice about optimizing.  Historically,
optimize was done for increased performance.  In older versions, an
unoptimized index performed *MUCH* worse than an index with a single
segment.  This is no longer the case today, mostly due to so many Lucene
features working on a per-segment basis.  A single segment does perform
faster, but the difference is much smaller than it used to be.

A full optimize on a large index requires a LOT of CPU and I/O resources
-- while the optimize is underway, performance is not very good.

There are,however, still times when running optimize is appropriate:

1) The index is mostly static, not receiving very frequent updates.
2) There is a large percentage of deleted documents in the index.

With modern Lucene/Solr and these use cases, the reasons for optimizing
are still performance-related, but the only time you should do an
optimize is when the benefit outweighs the cost.

For the 1) use case, the index will likely remain mostly-optimized for a
long period of time after the optimize is done, so the resources
required for the optimize are worth spending.

For the 2) use case, optimizing will reduce the size of the index
significantly, so general performance gets better.  That makes the cost
worthwhile.

Thanks,
Shawn



AW: Solr 5.1 ignores SOLR_JAVA_MEM setting

2015-05-26 Thread Clemens Wyss DEV
I also noticed that (see my post this morning)
...
SOLR_OPTS=$SOLR_OPTS -Dsolr.allow.unsafe.resourceloading=true
...
Is not taken into consideration (anymore). Same bug?


-Ursprüngliche Nachricht-
Von: Ere Maijala [mailto:ere.maij...@helsinki.fi] 
Gesendet: Mittwoch, 15. April 2015 09:25
An: solr-user
Betreff: Solr 5.1 ignores SOLR_JAVA_MEM setting

Folks, just a quick heads-up that apparently Solr 5.1 introduced a change in 
bin/solr that overrides SOLR_JAVA_MEM setting from solr.in.sh or environment. I 
just filed https://issues.apache.org/jira/browse/SOLR-7392. The problem can be 
circumvented by using SOLR_HEAP setting, e.g. SOLR_HEAP=32G, but it's not 
mentioned in solr.in.sh by default.

--Ere

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: YAJar

2015-05-26 Thread Robust Links
i have a minhash logic that uses guava 18.0 method that is not in guava
14.0.1. This minhash logic is a separate maven project. I'm including it in
my project via maven.the code is being used as a search component on the
set of results. The logic goes through the search results and deletes
duplicates. here is the solrconfig.xml

requestHandler name=/select class=solr.SearchHandler default=true


   arr name=last-components

   strtvComponent/str

   strterms/str

   strminHashDedup/str

   /arr

/requestHandler

  searchComponent name=minHashDedup class=com.xyz.DedupSearchHitsstr
name=MAX_COMPARISONS5/str

DedupSearchHits class is the one implementing the minhash (hence using
guava 18). I start solr via the solr.in.sh script. The error I am getting
is:


Caused by: java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode;

at com.xyz.incrementToken(MinHashTokenFilter.java:54)

at com.xyz.MinHash.calculate(MinHash.java:131)

at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89)

at com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74)

at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619)

at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311)

at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305)

at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338)

at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297)

at org.apache.solr.core.SolrCore.init(SolrCore.java:813)


What is the best design to solve this problem?I understand the point of
modularity but how can i include logic into solr that does result
processing without loading that jar into solr?

thank you


On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com
wrote:

 I guess this is one reason why the whole WAR approach is being removed!
 Solr should be a black-box that you talk to, and get responses from.  What
 it depends on and how it is deployed, should be irrelevant to you.

 If you are wanting to override the version of guava that Solr uses, then
 you'd have to rebuild Solr (can be done with maven) and manually update the
 pom.xml to use guava 18.0, but why would you? You need to test Solr
 completely (in case any guava bugs affect Solr), deal with any build issues
 that arise (if guava changes any APIs), and cause yourself a world of pain,
 for what gain?


 On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:

  i have custom search components.
 
  On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
 
   Why is your app tied that closely to Solr? I can understand if you are
   talking about SolrJ, but normal usage you use a different application
 in
   a different JVM from Solr.
  
   Upayavira
  
   On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
  question. i
noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What
  is
the pattern to override a jar version uploaded into jetty?
   
I am using maven, and solr is being started the old way
   
java -jar start.jar
-Dsolr.solr.home=...
-Djetty.home=...
   
I tried to edit jetty's start.config (then run java
-DSTART=/my/dir/start.config
-jar start.jar) but got no where...
   
any help would be much appreciated
   
Peyman
  
 



Re: Index optimize runs in background.

2015-05-26 Thread Alessandro Benedetti
I completely agree with Upayavira and Shawn.
Modassar, can you explain us how often do you index ?
Have you ever played with the merge Factor ?
I hardly think you need to optimise at all.
Simply a tuning of the merge Factor should solve all your issues .
I assume you were optimising only to have fast search, weren't you ?

Cheers

2015-05-26 16:07 GMT+01:00 Shawn Heisey apa...@elyograg.org:

 On 5/26/2015 6:29 AM, Upayavira wrote:
  Are you saying that the reason you are optimising is because you have
  been doing it for years? If this is the only reason, you should stop
  doing it immediately.
 
  The one scenario in which optimisation still makes some sense is when
  you reindex every night and optimise straight after. This will leave you
  with a single segment which will search faster.
 
  However, if you are doing a lot of indexing, especially with
  deletes/updates, you will have merged your content into a single segment
  which will later need to be merged. That merge will be costly as it will
  involve copying the entire content of your large segment, which will
  impact performance.
 
  Before Solr 3.6, Optimisation was necessary and recommended. At that
  point (or a little before) the TieredMergePolicy became the default, and
  this made optimisation generally unnecessary.

 In general, I concur with this advice about optimizing.  Historically,
 optimize was done for increased performance.  In older versions, an
 unoptimized index performed *MUCH* worse than an index with a single
 segment.  This is no longer the case today, mostly due to so many Lucene
 features working on a per-segment basis.  A single segment does perform
 faster, but the difference is much smaller than it used to be.

 A full optimize on a large index requires a LOT of CPU and I/O resources
 -- while the optimize is underway, performance is not very good.

 There are,however, still times when running optimize is appropriate:

 1) The index is mostly static, not receiving very frequent updates.
 2) There is a large percentage of deleted documents in the index.

 With modern Lucene/Solr and these use cases, the reasons for optimizing
 are still performance-related, but the only time you should do an
 optimize is when the benefit outweighs the cost.

 For the 1) use case, the index will likely remain mostly-optimized for a
 long period of time after the optimize is done, so the resources
 required for the optimize are worth spending.

 For the 2) use case, optimizing will reduce the size of the index
 significantly, so general performance gets better.  That makes the cost
 worthwhile.

 Thanks,
 Shawn




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


solr date functions and object creation

2015-05-26 Thread Jacob Graves
Hello

I have a weird SOLR problem with object creation from a date function query 
against a TrieDate field in my index called ds.

This boost function

   bf=min(div(ms(NOW/HOUR,ds),60480),26)

causes many millions of FunctionQuery objects to be created in memory. When I 
change it to

   bf=min(abs(div(ms(NOW/HOUR,ds),60480)),26)

the extra objects aren't created, the change is I added abs().
I've checked that every document has the field ds populated and the dates it 
contains are all on the past
Any ideas why? The extra memory usage has caused stability problems.

Thanks


Re: docValues: Can we apply synonym

2015-05-26 Thread Aman Tandon
We are interested in using docValues for better memory utilization and
speed.

Currently we are faceting the search results on *city. *In city we have
also added the synonym for cities like mumbai, bombay (These are Indian
cities). So that result of mumbai is also eligible when somebody will
applying filter of bombay on search results.

I need this functionality to apply with docValues enabled field.

With Regards
Aman Tandon

On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 I checked in the Documentation to be sure, but apparently :

 DocValues are only available for specific field types. The types chosen
 determine the underlying Lucene docValue type that will be used. The
 available Solr field types are:

- StrField and UUIDField.
- If the field is single-valued (i.e., multi-valued is false), Lucene
   will use the SORTED type.
   - If the field is multi-valued, Lucene will use the SORTED_SET type.
- Any Trie* numeric fields and EnumField.
- If the field is single-valued (i.e., multi-valued is false), Lucene
   will use the NUMERIC type.
   - If the field is multi-valued, Lucene will use the SORTED_SET type.


 This means you should not analyse a field where DocValues is enabled.
 Can your explain us your use case ? Why are you interested in synonyms
 DocValues level ?

 Cheers

 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk:

  To my understanding, docValues are just an uninverted index. That is, it
  contains the terms that are generated at the end of an analysis chain.
  Therefore, you simply enable docValues and include the
  SynonymFilterFactory in your analysis.
 
  Is that enough, or are you struggling with some other issue?
 
  Upayavira
 
  On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote:
   Hi,
  
   We have some field *city* in which the docValues are enabled. We need
 to
   add the synonym in that field so how could we do it?
  
   With Regards
   Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: Solr 5.1 ignores SOLR_JAVA_MEM setting

2015-05-26 Thread Timothy Potter
Yes, same bug. Fixed in 5.2

On Tue, May 26, 2015 at 9:15 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 I also noticed that (see my post this morning)
 ...
 SOLR_OPTS=$SOLR_OPTS -Dsolr.allow.unsafe.resourceloading=true
 ...
 Is not taken into consideration (anymore). Same bug?


 -Ursprüngliche Nachricht-
 Von: Ere Maijala [mailto:ere.maij...@helsinki.fi]
 Gesendet: Mittwoch, 15. April 2015 09:25
 An: solr-user
 Betreff: Solr 5.1 ignores SOLR_JAVA_MEM setting

 Folks, just a quick heads-up that apparently Solr 5.1 introduced a change in 
 bin/solr that overrides SOLR_JAVA_MEM setting from solr.in.sh or environment. 
 I just filed https://issues.apache.org/jira/browse/SOLR-7392. The problem can 
 be circumvented by using SOLR_HEAP setting, e.g. SOLR_HEAP=32G, but it's 
 not mentioned in solr.in.sh by default.

 --Ere

 --
 Ere Maijala
 Kansalliskirjasto / The National Library of Finland


Re: Solr relevancy score in percentage

2015-05-26 Thread Erick Erickson
This is one of those things that is, IMO, strictly a feel good thing
that's sometimes insisted upon
by the product manager and all the information in the world about
this is really meaningless falls
on deaf ears.

If you simply have no choice (a position I've been because it wasn't
worth the argument), you can
do the star thing. That is, display 5 stars for percentages between
80-100, 4 stars for 60-80 etc. and
not display the percentages or raw scores at all.

But as others have said, it really isn't providing any additional
information, and IMO misleading
the user...

Best,
Erick

On Tue, May 26, 2015 at 8:31 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 Honeslty the only case where the score in percentage could make sense, is
 for the More Like This.
 In that case Solr should provide that feature as we perfectly know that the
 100 % similar score is a copy of the seed document.

 If I am right, because of the MLT implementation, not taking care of the
 identity score, we are getting there weird scores as well.
 Maybe in there is the only place I would prefer a percentage.

 Cheers

 2015-05-26 16:23 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 Currently I've take the score that I get from Solr, and divide it by the
 maxScore, and multiply it by 100 to get the percentage. All these are done
 on the coding for the UI. The user will only see the percentage and will
 not know anything about the score. Since the score by itself is
 meaningless, so I don't think I should display that score of like 1.7 or
 0.2 on the UI, which could further confuse the user and raise alot more
 questions.

 Regards,
 Edwin



 On 26 May 2015 at 23:07, Shawn Heisey apa...@elyograg.org wrote:

  On 5/26/2015 8:10 AM, Zheng Lin Edwin Yeo wrote:
   We want the user to see how relevant the result is with respect to the
   search query entered, and not how good the results are.
   But I suspect a problem is that the 1st record will always be 100%,
   regardless of what is the score, as the 1st record score will always be
   equals to the maxScore.
 
  If you want to give your users *something* then simply display the score
  that you get from Solr.  I recommend that you DON'T give them maxScore,
  because they will be tempted to make the percentage calculation
  themselves to try and find meaning where there is none.  A clever user
  will be able to figure out maxScore for themselves simply by sorting on
  relevance and looking at the score on the top doc.
 
  When you get questions about what the number means, and you *WILL* get
  those questions, you can tell them that the number itself is meaningless
  and what matters is how the scores within a single result compare to
  each other -- exactly what you have been told here.
 
  Thanks,
  Shawn
 
 




 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England


Re: When is too many fields in qf is too many?

2015-05-26 Thread Steven White
Hi Doug,

I'm back to this topic.  Unfortunately, due to my DB structer, and business
need, I will not be able to search against a single field (i.e.: using
copyField).  Thus, I have to use list of fields via qf.  Given this, I
see you said above to use tie=1.0 will that, more or less, address this
scoring issue?  Should tie=1.0 be set on the request handler like so:

  requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=defTypeedismax/str
   str name=qfF1 F2 F3 F4 ... ... .../str
   float name=tie1.0/float
   str name=fl_UNIQUE_FIELD_,score/str
   str name=wtxml/str
   str name=indenttrue/str
 /lst
  /requestHandler

Or must tie be passed as part of the URL?

Thanks

Steve


On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:

 Yeah a copyField into one could be a good space/time tradeoff. It can be
 more manageable to use an all field for both relevancy and performance, if
 you can handle the duplication of data.

 You could set tie=1.0, which effectively sums all the matches instead of
 picking the best match. You'll still have cases where one field's score
 might just happen to be far off of another, and thus dominating the
 summation. But something easy to try if you want to keep playing with
 dismax.

 -Doug

 On Wed, May 20, 2015 at 2:56 PM, Steven White swhite4...@gmail.com
 wrote:

  Hi Doug,
 
  Your blog write up on relevancy is very interesting, I didn't know this.
  Looks like I have to go back to my drawing board and figure out an
  alternative solution: somehow get those group-based-fields data into a
  single field using copyField.
 
  Thanks
 
  Steve
 
  On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull 
  dturnb...@opensourceconnections.com wrote:
 
   Steven,
  
   I'd be concerned about your relevance with that many qf fields. Dismax
   takes a winner takes all point of view to search. Field scores can
 vary
   by an order of magnitude (or even two) despite the attempts of query
   normalization. You can read more here
  
  
 
 http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/
  
   I'm about to win the blashphemer merit badge, but ad-hoc all-field
 like
   searching over many fields is actually a good use case for
  Elasticsearch's
   cross field queries.
  
  
 
 https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html
  
  
 
 http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/
  
   It wouldn't be hard (and actually a great feature for the project) to
 get
   the Lucene query associated with cross field search into Solr. You
 could
   easily write a plugin to integrate it into a query parser:
  
  
 
 https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java
  
   Hope that helps
   -Doug
   --
   *Doug Turnbull **| *Search Relevance Consultant | OpenSource
 Connections,
   LLC | 240.476.9983 | http://www.opensourceconnections.com
   Author: Relevant Search http://manning.com/turnbull from Manning
   Publications
   This e-mail and all contents, including attachments, is considered to
 be
   Company Confidential unless explicitly stated otherwise, regardless
   of whether attachments are marked as such.
   On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com
   wrote:
  
Hi everyone,
   
My solution requires that users in group-A can only search against a
  set
   of
fields-A and users in group-B can only search against a set of
  fields-B,
etc.  There can be several groups, as many as 100 even more.  To meet
   this
need, I build my search by passing in the list of fields via qf.
  What
goes into qf can be large: as many as 1500 fields and each field
 name
averages 15 characters long, in effect the data passed via qf will
 be
over 20K characters.
   
Given the above, beside the fact that a search for apple
 translating
   to a
20K characters passing over the network, what else within Solr and
   Lucene I
should be worried about if any?  Will I hit some kind of a limit?
 Will
each search now require more CPU cycles?  Memory?  Etc.
   
If the network traffic becomes an issue, my alternative solution is
 to
create a /select handler for each group and in that handler list the
   fields
under qf.
   
I have considered creating pseudo-fields for each group and then use
copyField into that group.  During search, I than can qf against
 that
   one
field.  Unfortunately, this is not ideal for my solution because the
   fields
that go into each group dynamically change (at least once a month)
 and
   when
they do change, I have to re-index everything (this I have to avoid)
 to
sync that group-field.