AW: Occasionally getting error in solr suggester component.

2015-06-01 Thread Clemens Wyss DEV
Lucene 5.1:
I am (also) facing 
java.lang.IllegalStateException: suggester was not built

At the very moment no new documents seem tob e added to the index/core. Will a 
reboot sanitize the index/core?

I (still) have 
str name=buildOnCommittrue/str

How can I tell Solr to peridoically update the suggestions? If not possible per 
configuration (in solrconfig.xml), what ist he preferred approach through SolrJ?

Thx
Clemens

-Ursprüngliche Nachricht-
Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] 
Gesendet: Donnerstag, 15. Januar 2015 19:52
An: solr-user@lucene.apache.org
Betreff: Re: Occasionally getting error in solr suggester component.

That sounds like a good approach to me.  Of course it depends how often you 
commit, and what your tolerance is for delay in having suggestions appear, but 
it sounds as if you have a good understanding of the tradeoffs there.

-Mike

On 1/15/15 10:31 AM, Dhanesh Radhakrishnan wrote:
 Hi,
  From Solr 4.7 onwards, the implementation of this Suggester is 
 changed. The old SpellChecker based search component is replaced with 
 a new suggester that utilizes Lucene suggester module. The latest Solr 
 download is preconfigured with this new suggester I;m using Solr 4.10 
 and suggestion are based on query  /suggest instead of /spell.
 So what I did is that in changed to str 
 name=buildOnCommitfalse/str Its not good that each time rebuild 
 the index on  commit , however, I would like to build the index on 
 certain time period, say 1 hour.
 The lookup data will be built only when requested by URL parameter 
 suggest.build=true

 http://localhost:8983/solr/ha/suggest?suggest.build=true;

 So this will rebuild the index again and the changes will reflect in 
 the suggester.

 There are certain pros and cons for this.
 Issue is that the change will reflect only on certain time interval, 
 here 1 hour. Advantage is that we can avoid the  rebuilt index  on 
 every commit or optimize.

 Is this the right way ?? or any that I missed ???

 Regards
 dhanesh s.r




 On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov  
 msoko...@safaribooksonline.com wrote:

 did you build the spellcheck index using spellcheck.build as 
 described
 here: https://cwiki.apache.org/confluence/display/solr/Spell+Checking ?

 -Mike


 On 01/14/2015 07:19 AM, Dhanesh Radhakrishnan wrote:

 Hi,
 Thanks for the reply.
 As you mentioned in the previous mail I changed buildOnCommit=false 
 in solrConfig.
 After that change, suggestions are not working.
 In Solr 4.7 introduced a new approach based on a dedicated 
 SuggestComponent I'm using that component to build suggestions and 
 lookup implementation is AnalyzingInfixLookupFactory
 Is there any work around ??




 On Wed, Jan 14, 2015 at 12:47 AM, Michael Sokolov  
 msoko...@safaribooksonline.com wrote:

   I think you are probably getting bitten by one of the issues 
 addressed in
 LUCENE-5889

 I would recommend against using buildOnCommit=true - with a large 
 index this can be a performance-killer.  Instead, build the index 
 yourself using the Solr spellchecker support 
 (spellcheck.build=true)

 -Mike


 On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote:

   Hi all,
 I am experiencing a problem in Solr SuggestComponent Occasionally 
 solr suggester component throws an  error like

 Solr failed:
 {responseHeader:{status:500,QTime:1},error:{msg:suggest
 er
 was
 not built,trace:java.lang.IllegalStateException: suggester was 
 not built\n\tat 
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
 lookup(AnalyzingInfixSuggester.java:368)\n\tat
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
 lookup(AnalyzingInfixSuggester.java:342)\n\tat
 org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\
 tat org.apache.solr.spelling.suggest.SolrSuggester.
 getSuggestions(SolrSuggester.java:199)\n\tat
 org.apache.solr.handler.component.SuggestComponent.
 process(SuggestComponent.java:234)\n\tat
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(
 SearchHandler.java:218)\n\tat
 org.apache.solr.handler.RequestHandlerBase.handleRequest(
 RequestHandlerBase.java:135)\n\tat
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
 handleRequest(RequestHandlers.java:246)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat
 org.apache.solr.servlet.SolrDispatchFilter.execute(
 SolrDispatchFilter.java:777)\n\tat
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:418)\n\tat
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:207)\n\tat
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
 ApplicationFilterChain.java:243)\n\tat
 org.apache.catalina.core.ApplicationFilterChain.doFilter(
 ApplicationFilterChain.java:210)\n\tat
 org.apache.catalina.core.StandardWrapperValve.invoke(
 StandardWrapperValve.java:225)\n\tat
 org.apache.catalina.core.StandardContextValve.invoke(
 

Synonyms within FQ

2015-06-01 Thread John Blythe
morning everyone,

i'm attempting to find related documents based on a manufacturer's
competitor. as such i'm querying against the 'description' field with
manufacturer1's product description but running a filter query with
manufacturer2's name against the 'mfgname' field.

one of the ways that we help boost our document finding is with a synonym
dictionary for manufacturer names. many of the larger players have multiple
divisions, have absorbed smaller companies, etc. so we need all of their
potential names to map to our record.

i may be wrong, but from my initial testing it doesn't seem to be applying
to a fq. is there any way of doing this?

thanks-


Re: Synonyms within FQ

2015-06-01 Thread John Blythe
after further investigation it looks like the synonym i was testing against
was only associated with one of their multiple divisions (despite being the
most common name for them!). it looks like this may clear the issue up, but
thanks anyway!

-- 
*John Blythe*
Product Manager  Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote:

 morning everyone,

 i'm attempting to find related documents based on a manufacturer's
 competitor. as such i'm querying against the 'description' field with
 manufacturer1's product description but running a filter query with
 manufacturer2's name against the 'mfgname' field.

 one of the ways that we help boost our document finding is with a synonym
 dictionary for manufacturer names. many of the larger players have multiple
 divisions, have absorbed smaller companies, etc. so we need all of their
 potential names to map to our record.

 i may be wrong, but from my initial testing it doesn't seem to be applying
 to a fq. is there any way of doing this?

 thanks-



UI Admin - and stored=false fields

2015-06-01 Thread Sznajder ForMailingList
Hi

I am indexing some content under text field.
In the schema.xml text field is defined as :


   field name=text type=text_general indexed=true stored=false
multiValued=true/


However, when I am looking to the documents via the UI
http://localhost:8983/solr/#/sec_600b/query

I see the text field content in the returned documents.

Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
in admin ui) is expected?

thanks!

Benjamin.


Re: Number of clustering labels to show

2015-06-01 Thread Alessandro Benedetti
Only to clarify the initial mail, The carrot.fragSize has nothing to do
with the number of clusters produced.

When you select to work with field summary ( you will work only on snippets
from the original content, snippets produced by the highlight of the query
in the content), the fragSize will specify the size of these fragments.

From Carrot documentation :

carrot.produceSummary

When true, the carrot.snippet
https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet field (if
no snippet field, then the carrot.title
https://wiki.apache.org/solr/ClusteringComponent#carrot.title field) will
be highlighted and the highlighted text will be used for clustering.
Highlighting is recommended when the snippet field contains a lot of
content. Highlighting can also increase the quality of clustering because
the clustered content will get an additional query-specific context.
carrot.fragSize

The frag size to use for highlighting. Meaningful only when
carrot.produceSummary
https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary is
true. If not specified, the default highlighting fragsize (hl.fragsize)
will be used. If that isn't specified, then 100.


Cheers

2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 Thank you Stanislaw for the links. Will read them up to better understand
 how the algorithm works.

 Regards,
 Edwin

 On 29 May 2015 at 17:22, Stanislaw Osinski 
 stanislaw.osin...@carrotsearch.com wrote:

  Hi,
 
  The number of clusters primarily depends on the parameters of the
 specific
  clustering algorithm. If you're using the default Lingo algorithm, the
  number of clusters is governed by
  the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a
 look
  at the documentation (
 
 
 https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
  )
  for some more details (the Tweaking at Query-Time section shows how to
  pass the specific parameters at request time). A complete overview of the
  Lingo clustering algorithm parameters is here:
  http://doc.carrot2.org/#section.component.lingo.
 
  Stanislaw
 
  --
  Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
  http://carrotsearch.com
 
  On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
  
  wrote:
 
   Hi,
  
   I'm trying to increase the number of cluster result to be shown during
  the
   search. I tried to set carrot.fragSize=20 but only 15 cluster labels is
   shown. Even when I tried to set carrot.fragSize=5, there's also 15
 labels
   shown.
  
   Is this the correct way to do this? I understand that setting it to 20
   might not necessary mean 20 lables will be shown, as the setting is for
   maximum number. But when I set this to 5, it should reduce the number
 of
   labels to 5?
  
   I'm using Solr 5.1.
  
  
   Regards,
   Edwin
  
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Deleting Fields

2015-06-01 Thread Charlie Hull

On 30/05/2015 00:30, Shawn Heisey wrote:

On 5/29/2015 5:08 PM, Joseph Obernberger wrote:

Hi All - I have a lot of fields to delete, but noticed that once I
started deleting them, I quickly ran out of heap space.  Is
delete-field a memory intensive operation?  Should I delete one field,
wait a while, then delete the next?


I'm not aware of a way to delete a field.  I may have a different
definition of what a field is than you do, though.

Solr lets you delete entire documents, but deleting a field from the
entire index would involve re-indexing every document in the index,
excluding that field.

Can you be more specific about exactly what you are doing, what you are
seeing, and what you want to see instead?

Also, please be aware of this:

http://people.apache.org/~hossman/#threadhijack

Thanks,
Shawn


Here's a rather old post on how we did something similar:
http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Best strategy for logging security

2015-06-01 Thread Vishal Swaroop
It will be great if you can provide your valuable inputs on strategy for
logging  security...


Thanks a lot in advance...



Logging :

- Is there a way to implement logging for each cores separately.

- What will be the best strategy to log every query details (like source
IP, search query, etc.) at some point we will need monthly reports for
analysis.



Securing SOLR :

- We need to implement SOLR security from client as well as server side...
requests will be performed via web app as well as other server side apps
e.g. curl...

Please suggest about the best approach we can follow... link to any
documentation will also help.



Environment : SOLR 4.7 configured on Tomcat 7  (Linux)


Re: UI Admin - and stored=false fields

2015-06-01 Thread Erick Erickson
That's the whole point of having a true/false option for stored.
Stored=true implies that those fields are available for display to
the user in results lists. stored=false and they're not.

Best,
Erick

On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList
bs4mailingl...@gmail.com wrote:
 Hi

 I am indexing some content under text field.
 In the schema.xml text field is defined as :


field name=text type=text_general indexed=true stored=false
 multiValued=true/


 However, when I am looking to the documents via the UI
 http://localhost:8983/solr/#/sec_600b/query

 I see the text field content in the returned documents.

 Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
 in admin ui) is expected?

 thanks!

 Benjamin.


Re: Synonyms within FQ

2015-06-01 Thread John Blythe
Thanks Erick!

On Mon, Jun 1, 2015 at 11:29 AM, Erick Erickson erickerick...@gmail.com
wrote:

 For future reference, fq clauses are parsed just like the q clause;
 they can be arbitrarily complex.
 Best,
 Erick
 On Mon, Jun 1, 2015 at 5:52 AM, John Blythe j...@curvolabs.com wrote:
 after further investigation it looks like the synonym i was testing against
 was only associated with one of their multiple divisions (despite being the
 most common name for them!). it looks like this may clear the issue up, but
 thanks anyway!

 --
 *John Blythe*
 Product Manager  Lead Developer

 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com

 58 Adams Ave
 Evansville, IN 47713

 On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote:

 morning everyone,

 i'm attempting to find related documents based on a manufacturer's
 competitor. as such i'm querying against the 'description' field with
 manufacturer1's product description but running a filter query with
 manufacturer2's name against the 'mfgname' field.

 one of the ways that we help boost our document finding is with a synonym
 dictionary for manufacturer names. many of the larger players have multiple
 divisions, have absorbed smaller companies, etc. so we need all of their
 potential names to map to our record.

 i may be wrong, but from my initial testing it doesn't seem to be applying
 to a fq. is there any way of doing this?

 thanks-


Re: Occasionally getting error in solr suggester component.

2015-06-01 Thread Erick Erickson
Attach suggester.build=true or suggester.buildAll=true to any request
to suggester to rebuild.
OR
add
buildOnStartup or buildOnCommit or buildOnOptimize to the definition
in solrConfig.

BUT:
building can be a _very_ expensive operation. For document-based
indexes, the build process
reads through _all_ of the _stored_ documents in your index, and that
can take many minutes so
I recommend against these options for a large index, and strongly
recommend you test these
with a large corpus.

Best,
Erick



On Mon, Jun 1, 2015 at 4:01 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 Lucene 5.1:
 I am (also) facing
 java.lang.IllegalStateException: suggester was not built

 At the very moment no new documents seem tob e added to the index/core. Will 
 a reboot sanitize the index/core?

 I (still) have
 str name=buildOnCommittrue/str

 How can I tell Solr to peridoically update the suggestions? If not possible 
 per configuration (in solrconfig.xml), what ist he preferred approach through 
 SolrJ?

 Thx
 Clemens

 -Ursprüngliche Nachricht-
 Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com]
 Gesendet: Donnerstag, 15. Januar 2015 19:52
 An: solr-user@lucene.apache.org
 Betreff: Re: Occasionally getting error in solr suggester component.

 That sounds like a good approach to me.  Of course it depends how often you 
 commit, and what your tolerance is for delay in having suggestions appear, 
 but it sounds as if you have a good understanding of the tradeoffs there.

 -Mike

 On 1/15/15 10:31 AM, Dhanesh Radhakrishnan wrote:
 Hi,
  From Solr 4.7 onwards, the implementation of this Suggester is
 changed. The old SpellChecker based search component is replaced with
 a new suggester that utilizes Lucene suggester module. The latest Solr
 download is preconfigured with this new suggester I;m using Solr 4.10
 and suggestion are based on query  /suggest instead of /spell.
 So what I did is that in changed to str
 name=buildOnCommitfalse/str Its not good that each time rebuild
 the index on  commit , however, I would like to build the index on
 certain time period, say 1 hour.
 The lookup data will be built only when requested by URL parameter
 suggest.build=true

 http://localhost:8983/solr/ha/suggest?suggest.build=true;

 So this will rebuild the index again and the changes will reflect in
 the suggester.

 There are certain pros and cons for this.
 Issue is that the change will reflect only on certain time interval,
 here 1 hour. Advantage is that we can avoid the  rebuilt index  on
 every commit or optimize.

 Is this the right way ?? or any that I missed ???

 Regards
 dhanesh s.r




 On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov 
 msoko...@safaribooksonline.com wrote:

 did you build the spellcheck index using spellcheck.build as
 described
 here: https://cwiki.apache.org/confluence/display/solr/Spell+Checking ?

 -Mike


 On 01/14/2015 07:19 AM, Dhanesh Radhakrishnan wrote:

 Hi,
 Thanks for the reply.
 As you mentioned in the previous mail I changed buildOnCommit=false
 in solrConfig.
 After that change, suggestions are not working.
 In Solr 4.7 introduced a new approach based on a dedicated
 SuggestComponent I'm using that component to build suggestions and
 lookup implementation is AnalyzingInfixLookupFactory
 Is there any work around ??




 On Wed, Jan 14, 2015 at 12:47 AM, Michael Sokolov 
 msoko...@safaribooksonline.com wrote:

   I think you are probably getting bitten by one of the issues
 addressed in
 LUCENE-5889

 I would recommend against using buildOnCommit=true - with a large
 index this can be a performance-killer.  Instead, build the index
 yourself using the Solr spellchecker support
 (spellcheck.build=true)

 -Mike


 On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote:

   Hi all,
 I am experiencing a problem in Solr SuggestComponent Occasionally
 solr suggester component throws an  error like

 Solr failed:
 {responseHeader:{status:500,QTime:1},error:{msg:suggest
 er
 was
 not built,trace:java.lang.IllegalStateException: suggester was
 not built\n\tat
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
 lookup(AnalyzingInfixSuggester.java:368)\n\tat
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
 lookup(AnalyzingInfixSuggester.java:342)\n\tat
 org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\
 tat org.apache.solr.spelling.suggest.SolrSuggester.
 getSuggestions(SolrSuggester.java:199)\n\tat
 org.apache.solr.handler.component.SuggestComponent.
 process(SuggestComponent.java:234)\n\tat
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(
 SearchHandler.java:218)\n\tat
 org.apache.solr.handler.RequestHandlerBase.handleRequest(
 RequestHandlerBase.java:135)\n\tat
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
 handleRequest(RequestHandlers.java:246)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat
 org.apache.solr.servlet.SolrDispatchFilter.execute(
 

Re: Sorting in Solr

2015-06-01 Thread Shawn Heisey
On 6/1/2015 9:29 AM, Steven White wrote:
 I need to be able to sot in Solr.  Obviously, I need to do this in a way
 sorting won't cause OOM when a result may contain 1000's of hits if not
 millions.  Can you guide me on how I can do this?  Is there a way to tell
 Solr sort top N results (discarding everything else) or must such sorting
 be cone on the client side?

Solr supports sorting.

https://wiki.apache.org/solr/CommonQueryParameters#sort

https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter

I think we may have an omission from the docs -- docValues can also be
used for sorting, and may also offer a performance advantage.

Thanks,
Shawn



Re: Synonyms within FQ

2015-06-01 Thread Erick Erickson
For future reference, fq clauses are parsed just like the q clause;
they can be arbitrarily complex.

Best,
Erick

On Mon, Jun 1, 2015 at 5:52 AM, John Blythe j...@curvolabs.com wrote:
 after further investigation it looks like the synonym i was testing against
 was only associated with one of their multiple divisions (despite being the
 most common name for them!). it looks like this may clear the issue up, but
 thanks anyway!

 --
 *John Blythe*
 Product Manager  Lead Developer

 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com

 58 Adams Ave
 Evansville, IN 47713

 On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote:

 morning everyone,

 i'm attempting to find related documents based on a manufacturer's
 competitor. as such i'm querying against the 'description' field with
 manufacturer1's product description but running a filter query with
 manufacturer2's name against the 'mfgname' field.

 one of the ways that we help boost our document finding is with a synonym
 dictionary for manufacturer names. many of the larger players have multiple
 divisions, have absorbed smaller companies, etc. so we need all of their
 potential names to map to our record.

 i may be wrong, but from my initial testing it doesn't seem to be applying
 to a fq. is there any way of doing this?

 thanks-



Sorting in Solr

2015-06-01 Thread Steven White
Hi everyone,

I need to be able to sot in Solr.  Obviously, I need to do this in a way
sorting won't cause OOM when a result may contain 1000's of hits if not
millions.  Can you guide me on how I can do this?  Is there a way to tell
Solr sort top N results (discarding everything else) or must such sorting
be cone on the client side?

Thanks in advanced

Steve


Re: UI Admin - and stored=false fields

2015-06-01 Thread Erik Hatcher
Did you happen to change the field type definition without reindexing?  (it 
requires reindexing to “unstore” them if they were originally stored)

If you’re seeing a field value in a document result (not facets, those are 
driven by indexed terms) when stored=“false” then something is wrong and I’d 
guess it’s because of the field definition changing as mentioned.


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On Jun 1, 2015, at 11:27 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 That's the whole point of having a true/false option for stored.
 Stored=true implies that those fields are available for display to
 the user in results lists. stored=false and they're not.
 
 Best,
 Erick
 
 On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList
 bs4mailingl...@gmail.com wrote:
 Hi
 
 I am indexing some content under text field.
 In the schema.xml text field is defined as :
 
 
   field name=text type=text_general indexed=true stored=false
 multiValued=true/
 
 
 However, when I am looking to the documents via the UI
 http://localhost:8983/solr/#/sec_600b/query
 
 I see the text field content in the returned documents.
 
 Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
 in admin ui) is expected?
 
 thanks!
 
 Benjamin.



Re: UI Admin - and stored=false fields

2015-06-01 Thread Erick Erickson
Reset, pay attention to Erik. I didn't read it all the way through.

Erick

On Mon, Jun 1, 2015 at 8:27 AM, Erick Erickson erickerick...@gmail.com wrote:
 That's the whole point of having a true/false option for stored.
 Stored=true implies that those fields are available for display to
 the user in results lists. stored=false and they're not.

 Best,
 Erick

 On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList
 bs4mailingl...@gmail.com wrote:
 Hi

 I am indexing some content under text field.
 In the schema.xml text field is defined as :


field name=text type=text_general indexed=true stored=false
 multiValued=true/


 However, when I am looking to the documents via the UI
 http://localhost:8983/solr/#/sec_600b/query

 I see the text field content in the returned documents.

 Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
 in admin ui) is expected?

 thanks!

 Benjamin.


Re: Sorting in Solr

2015-06-01 Thread Erick Erickson
Steve:

Surprisingly, the number of hits is completely irrelevant for the
memory requirements for sorting. The base memory size is, AFAIK, an
array of maxDoc ints (you can find maxDoc on the admin screen).
There's some additional overhead, but that's the base size. If you sue
DocValues, much of the overhead is kept in the MMapDirectory space
IIRC.

Best,
Erick


On Mon, Jun 1, 2015 at 8:41 AM, Shawn Heisey apa...@elyograg.org wrote:
 On 6/1/2015 9:29 AM, Steven White wrote:
 I need to be able to sot in Solr.  Obviously, I need to do this in a way
 sorting won't cause OOM when a result may contain 1000's of hits if not
 millions.  Can you guide me on how I can do this?  Is there a way to tell
 Solr sort top N results (discarding everything else) or must such sorting
 be cone on the client side?

 Solr supports sorting.

 https://wiki.apache.org/solr/CommonQueryParameters#sort

 https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter

 I think we may have an omission from the docs -- docValues can also be
 used for sorting, and may also offer a performance advantage.

 Thanks,
 Shawn



Re: fq and defType

2015-06-01 Thread Shawn Heisey
On 6/1/2015 10:44 AM, david.dav...@correo.aeat.es wrote:
 I need to parse some complicated queries that only works properly with the 
 edismax query parser, in q and fq parameters. I am testing with 
 defType=edismax, but it seems that this clause only affects to the q 
 parameter. Is there any way to set edismax to the fq parameter?

fq={!edismax}querystring

The other edismax parameters on your request (qf, etc) apply to those
filter queries just like they would for the q parameter.

Thanks,
Shawn



fq and defType

2015-06-01 Thread david . davila
Hello,

I need to parse some complicated queries that only works properly with the 
edismax query parser, in q and fq parameters. I am testing with 
defType=edismax, but it seems that this clause only affects to the q 
parameter. Is there any way to set edismax to the fq parameter?

Thank you very much, 


David Dávila Atienza
DIT
Teléfono: 915828763
Extensión: 36763

Re: Deleting Fields

2015-06-01 Thread Joseph Obernberger

Hi - we are using 64bit OS and 64bit JVM.  The JVM settings are currently:
-
-DSTOP.KEY=solrrocks
-DSTOP.PORT=8100
-Dhost=helios
-Djava.net.preferIPv4Stack=true
-Djetty.port=9100
-DnumShards=27
-Dsolr.clustering.enabled=true
-Dsolr.install.dir=/opt/solr
-Dsolr.lock.type=hdfs
-Dsolr.solr.home=/opt/solr/server/solr
-Duser.timezone=UTC-DzkClientTimeout=15000
-DzkHost=eris.querymasters.com:2181,daphnis.querymasters.com:2181,triton.querymasters.com:2181,oberon.querymasters.com:2181,portia.querymasters.com:2181,puck.querymasters.com:2181/solr5
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseLargePages
-XX:+UseParNewGC-XX:CMSFullGCsBeforeCompaction=1
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:CMSTriggerPermRatio=80
-XX:ConcGCThreads=8
-XX:MaxDirectMemorySize=26g
-XX:MaxTenuringThreshold=8
-XX:NewRatio=3
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 9100 /opt/solr/server/logs
-XX:ParallelGCThreads=8
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-Xloggc:/opt/solr/server/logs/solr_gc.log
-Xms8g
-Xmx16g
-Xss256k
-verbose:gc
-

At the time out of the OOM error, Xmx was set to 10g.  OS limits, for 
the most part, are 'factory' Scientific Linux 6.6.  I didn't see any 
messages in the log about too many open files.  Thank you for the tips!


-Joe

On 5/31/2015 4:24 AM, Tomasz Borek wrote:

Joseph,

You are doing a memory intensive operation and perhaps an IO intensive
operation at once. That makes your C-heap run out of memory or hit a thread
limit (thus first problem, java.lang.OutOfMemoryError: unable to create new
native thread) and later you're also hitting the problem of Java heap being
full or - more precisely - GC being unable to free enough space there even
with collection, to allocate new object that you want allocated (thus
second throw: java.lang.OutOfMemoryError: Java heap space).

Important is:
- whether your OS is 32-bit or 64-bit
- whether your JVM is 32-bit or 64-bit
- what are your OS limits on thread creation and have you touched them or
changed them (1st problem)
- how do you start JVM (Xms, Xmx, Xss, dumps, direct memory, permgen size -
both problems)

What you can do to solve your problems differs depending on what exactly
causes them, but in general:

NATIVE:
1) Either your operation causes many threads to spawn and you hit your
thread limit (OS limits how many threads process can create) - have a
threaddump and see
2) Or your op causes many thread creation and the memory settings you start
JVM with plus 32/64 bits of OS and JVM make it impossible for the C-heap to
have this much memory thus you're hitting the OOM error - adjust settings,
move to 64-bit architectures, add RAM while on 64-bit (32-bit really chokes
you down, less than 4GB is available for you for EVERYTHING: Java heap,
PermGen space AND C-Heap)

Usually, with such thread-greedy operation, it's also nice to look at the
code and see if perhaps one can optimize thread creation/management.

OOM on Java heap:
Add crash on memory dump parameter to your JVM and walk dominator tree or
see histogram to actually tell what's eating your heap space. MAT is a good
tool for this, unless your heap is like 150GB, then Netbeans may help or
see Alexey Ragozin's work, I think he forked the code for NetBeans heap
analyzer and made some adjustments specially for such cases. Light Google
search and here it is: http://blog.ragozin.info/


pozdrawiam,
LAFK

2015-05-30 20:48 GMT+02:00 Erick Erickson erickerick...@gmail.com:


Faceting on very high cardinality fields can use up memory, no doubt
about that. I think the entire delete question was a red herring, but
you know that already ;)

So I think you can forget about the delete stuff. Although do note
that if you do re-index your old documents, the new version won't have
the field, and as segments are merged the deleted documents will have
all their resources reclaimed, effectively deleting the field from the
old docs So you could gradually re-index your corpus and get this
stuff out of there.

Best,
Erick

On Sat, May 30, 2015 at 5:18 AM, Joseph Obernberger
j...@lovehorsepower.com wrote:

Thank you Erick.  I was thinking that it actually went through and

removed

the index data; that you for the clarification.  What happened was I had
some bad data that created a lot of fields (some 8000).  I was getting

some

errors adding new fields where solr could not talk to zookeeper, and I
thought it may be because there are so many fields.  The index size is

some

420million docs.
I'm hesitant to try to re-create as when the shards crash, they leave a
write.lock file in HDFS, and I need to manually 

Re: fq and defType

2015-06-01 Thread david . davila
Thank you!

David



De: Shawn Heisey apa...@elyograg.org
Para:   solr-user@lucene.apache.org, 
Fecha:  01/06/2015 18:53
Asunto: Re: fq and defType



On 6/1/2015 10:44 AM, david.dav...@correo.aeat.es wrote:
 I need to parse some complicated queries that only works properly with 
the 
 edismax query parser, in q and fq parameters. I am testing with 
 defType=edismax, but it seems that this clause only affects to the q 
 parameter. Is there any way to set edismax to the fq parameter?

fq={!edismax}querystring

The other edismax parameters on your request (qf, etc) apply to those
filter queries just like they would for the q parameter.

Thanks,
Shawn




Chef recipes for Solr

2015-06-01 Thread Walter Underwood
Anyone have Chef recipes they like for deploying Solr? 

I’d especially appreciate one for uploading the configs directly to a Zookeeper 
ensemble.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




Re: Best strategy for logging security

2015-06-01 Thread Rajesh Hazari
Logging :

Just use logstash to a parse your logs for all collection and  logstash
forwarder and lumberjack at your solr replicas in your solr cloud to send
the log events to you central logstash server and send it to back to solr
(either the same or different instance) to a different collection.

The default log4j.properties that comes with solr dist can log core name
with each query log.

Security:
suggest you to go through this wiki
https://wiki.apache.org/solr/SolrSecurity

*Thanks,*
*Rajesh,*
*(mobile) : 8328789519.*

On Mon, Jun 1, 2015 at 11:20 AM, Vishal Swaroop vishal@gmail.com
wrote:

 It will be great if you can provide your valuable inputs on strategy for
 logging  security...


 Thanks a lot in advance...



 Logging :

 - Is there a way to implement logging for each cores separately.

 - What will be the best strategy to log every query details (like source
 IP, search query, etc.) at some point we will need monthly reports for
 analysis.



 Securing SOLR :

 - We need to implement SOLR security from client as well as server side...
 requests will be performed via web app as well as other server side apps
 e.g. curl...

 Please suggest about the best approach we can follow... link to any
 documentation will also help.



 Environment : SOLR 4.7 configured on Tomcat 7  (Linux)



Multiple Word Synonyms with Autophrasing

2015-06-01 Thread Chris Morley
Hello everyone @ solr-user,
  
 At Wayfair, I have implemented multiple word synonyms in a clean and 
efficient way in conjunction with with a slightly modified version of the 
LucidWorks' Autophrasing plugin by also tacking on a modified version of 
edismax.  It is not released or on use on our public website yet, but it 
will be very soon.  While it is not ready to officially open source yet, I 
know some people out there are anxious to implement this type of thing.  
Please feel free to contact me if you are interested in learning about how 
to theoretically accomplish this on your own.  Note that while this may 
have some concepts in common with Named Entity Recognition implementations, 
I think it really is a completely different thing.  I get a lot of spam, so 
if you please, would you write me privately your questions with the subject 
line being MWSwA so I can easily compile everyone's questions about this. 
 I will respond to everyone at some point soon with some beta documentation 
or possibly with an invitation to a private github or something so that you 
can review an example.
  
 Thanks!
 -Chris.
  



Re: Best strategy for logging security

2015-06-01 Thread Vishal Swaroop
Thanks Rajesh... just trying to figure out if *logstash *is opensource and
free ?

On Mon, Jun 1, 2015 at 2:13 PM, Rajesh Hazari rajeshhaz...@gmail.com
wrote:

 Logging :

 Just use logstash to a parse your logs for all collection and  logstash
 forwarder and lumberjack at your solr replicas in your solr cloud to send
 the log events to you central logstash server and send it to back to solr
 (either the same or different instance) to a different collection.

 The default log4j.properties that comes with solr dist can log core name
 with each query log.

 Security:
 suggest you to go through this wiki
 https://wiki.apache.org/solr/SolrSecurity

 *Thanks,*
 *Rajesh,*
 *(mobile) : 8328789519.*

 On Mon, Jun 1, 2015 at 11:20 AM, Vishal Swaroop vishal@gmail.com
 wrote:

  It will be great if you can provide your valuable inputs on strategy for
  logging  security...
 
 
  Thanks a lot in advance...
 
 
 
  Logging :
 
  - Is there a way to implement logging for each cores separately.
 
  - What will be the best strategy to log every query details (like source
  IP, search query, etc.) at some point we will need monthly reports for
  analysis.
 
 
 
  Securing SOLR :
 
  - We need to implement SOLR security from client as well as server
 side...
  requests will be performed via web app as well as other server side apps
  e.g. curl...
 
  Please suggest about the best approach we can follow... link to any
  documentation will also help.
 
 
 
  Environment : SOLR 4.7 configured on Tomcat 7  (Linux)
 



Re: fq and defType

2015-06-01 Thread Mikhail Khludnev
fq={!edismax}you are welome


On Mon, Jun 1, 2015 at 6:44 PM, david.dav...@correo.aeat.es wrote:

 Hello,

 I need to parse some complicated queries that only works properly with the
 edismax query parser, in q and fq parameters. I am testing with
 defType=edismax, but it seems that this clause only affects to the q
 parameter. Is there any way to set edismax to the fq parameter?

 Thank you very much,


 David Dávila Atienza
 DIT
 Teléfono: 915828763
 Extensión: 36763




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: solr uima and opennlp

2015-06-01 Thread Tommaso Teofili
yeah, I think you'd rather post it to d...@uima.apache.org .

Regards,
Tommaso

2015-05-28 15:19 GMT+02:00 hossmaa andreea.hossm...@gmail.com:

 Hi Tommaso

 Thanks for the quick reply! I have another question about using the
 Dictionary Annotator, but I guess it's better to post it separately.

 Cheers
 Andreea



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873p4208348.html
 Sent from the Solr - User mailing list archive at Nabble.com.



SolrCloud 5.1 startup looking for standalone config

2015-06-01 Thread tuxedomoon
I followed these steps and I am unable to launch in cloud mode.

1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3

2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2

3. uploaded a configset to zk1  (solr home is /volume/solr/data)
---
/opt/solrsvc/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost 
zk1:2181  -confname mycollection_cloud_conf -solrhome /volume/solr/data
-confdir  /home/ec2-user/mycollection/conf


4. on s1, added these params to solr.in.sh
---
ZK_HOST=zk1:2181,zk2:2181,zk3:2181
SOLR_HOST=s1
ZK_CLIENT_TIMEOUT=15000
SOLR_OPTS=$SOLR_OPTS -DnumShards=2


5. on s1 created core directory and file

/volume/solr/data/mycollection/core.properties (name=mycollection)


6. repeated steps 4,5 for s2 minus the numShards param


Starting the service on s1 gives me

mycollection:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Could not load conf for core mycollection: Error loading solr config from
/volume/solr/data/mycollection/conf/solrconfig.xml 

but aren't the config files supposed to be in Zookeeper?  

Tux


   
   







--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118.html
Sent from the Solr - User mailing list archive at Nabble.com.


Derive suggestions across multiple fields

2015-06-01 Thread Zheng Lin Edwin Yeo
Hi,

Does anyone knows if we can derive suggestions across multiple fields?

I tried to set something like this in my field in suggest searchComponents
in solrconfig.xml, but nothing is returned. It only works when I set a
single field, and not multiple field.

  searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
  str name=fieldContent, Summary/str  !-- the indexed field to
derive suggestions from --
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent

I'm using solr 5.1.

Regards,
Edwin


Re: Chef recipes for Solr

2015-06-01 Thread Walter Underwood
That sounds great. Someone else here will be making the recipes, so I’ll put 
him in touch with you.

As always, this is a really helpful list.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Jun 1, 2015, at 10:20 PM, Upayavira u...@odoko.co.uk wrote:

 
 I have many. My SolrCloud code has the app push configs to zookeeper.
 
 I am afk at the mo. Feel free to bug me about it!
 
 Upayavira
 
 On Mon, Jun 1, 2015, at 07:29 PM, Walter Underwood wrote:
 Anyone have Chef recipes they like for deploying Solr? 
 
 I’d especially appreciate one for uploading the configs directly to a
 Zookeeper ensemble.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
 



Re: SolrCloud 5.1 startup looking for standalone config

2015-06-01 Thread Erick Erickson
bq: but aren't the config files supposed to be in Zookeeper

Yes, but you haven't done anything to tell Solr that the nodes you've
created are part of SolrCloud!

You're confusing, I think, core discovery with creating collections.
Basically you were pretty much OK up until step 5 (although I'm not at
all sure that SOLR_HOST is doing you any good, and certainly setting
numShards in SOLR_OPTS isn't a good idea, what happens if you want to
create a collection with 5 shards?)

You don't need to create any directories on your Solr nodes, that'll
be done for you automatically by the collection creation command from
the Collections API. So I'd down the nodes and nuke the directories
you created by hand and bring the nodes back up. It's probably not
necessary to take the nodes down, but I tend to be paranoid about
that.

Then just create the collection via the Collections API CREATE
command, see: 
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1

You can use curl or a browser to issue something like this to any
active Solr node, Solr will do the rest:
http://some_solr_node:port/solr/admin/collections?action=CREATEname=mycollectionnumShards=2collection.configName=my_collection_cloud_confetc..

I believe it's _possible_ to carefully construct the core.properties
files on all the Solr instances, but unless you know _exactly_ what's
going on under the covers it'll lead to endless tail-chasing. You can
control which nodes the collection ends up on with the createNodeSet
parameter etc

Best,
Erick

On Mon, Jun 1, 2015 at 4:37 PM, tuxedomoon dancolem...@yahoo.com wrote:
 I followed these steps and I am unable to launch in cloud mode.

 1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3

 2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2

 3. uploaded a configset to zk1  (solr home is /volume/solr/data)
 ---
 /opt/solrsvc/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost
 zk1:2181  -confname mycollection_cloud_conf -solrhome /volume/solr/data
 -confdir  /home/ec2-user/mycollection/conf


 4. on s1, added these params to solr.in.sh
 ---
 ZK_HOST=zk1:2181,zk2:2181,zk3:2181
 SOLR_HOST=s1
 ZK_CLIENT_TIMEOUT=15000
 SOLR_OPTS=$SOLR_OPTS -DnumShards=2


 5. on s1 created core directory and file
 
 /volume/solr/data/mycollection/core.properties (name=mycollection)


 6. repeated steps 4,5 for s2 minus the numShards param


 Starting the service on s1 gives me

 mycollection:
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
 Could not load conf for core mycollection: Error loading solr config from
 /volume/solr/data/mycollection/conf/solrconfig.xml

 but aren't the config files supposed to be in Zookeeper?

 Tux











 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Number of clustering labels to show

2015-06-01 Thread Zheng Lin Edwin Yeo
Thank you so much Alessandro.

But i do not find any difference with the quality of the clustering results
when I change the hl.fragszie to a  even though I've set my
carrot.produceSummary to true.


Regards,
Edwin


On 1 June 2015 at 17:31, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

 Only to clarify the initial mail, The carrot.fragSize has nothing to do
 with the number of clusters produced.

 When you select to work with field summary ( you will work only on snippets
 from the original content, snippets produced by the highlight of the query
 in the content), the fragSize will specify the size of these fragments.

 From Carrot documentation :

 carrot.produceSummary

 When true, the carrot.snippet
 https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet field
 (if
 no snippet field, then the carrot.title
 https://wiki.apache.org/solr/ClusteringComponent#carrot.title field)
 will
 be highlighted and the highlighted text will be used for clustering.
 Highlighting is recommended when the snippet field contains a lot of
 content. Highlighting can also increase the quality of clustering because
 the clustered content will get an additional query-specific context.
 carrot.fragSize

 The frag size to use for highlighting. Meaningful only when
 carrot.produceSummary
 https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary
 is
 true. If not specified, the default highlighting fragsize (hl.fragsize)
 will be used. If that isn't specified, then 100.


 Cheers

 2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  Thank you Stanislaw for the links. Will read them up to better understand
  how the algorithm works.
 
  Regards,
  Edwin
 
  On 29 May 2015 at 17:22, Stanislaw Osinski 
  stanislaw.osin...@carrotsearch.com wrote:
 
   Hi,
  
   The number of clusters primarily depends on the parameters of the
  specific
   clustering algorithm. If you're using the default Lingo algorithm, the
   number of clusters is governed by
   the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a
  look
   at the documentation (
  
  
 
 https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
   )
   for some more details (the Tweaking at Query-Time section shows how
 to
   pass the specific parameters at request time). A complete overview of
 the
   Lingo clustering algorithm parameters is here:
   http://doc.carrot2.org/#section.component.lingo.
  
   Stanislaw
  
   --
   Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
   http://carrotsearch.com
  
   On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo 
  edwinye...@gmail.com
   
   wrote:
  
Hi,
   
I'm trying to increase the number of cluster result to be shown
 during
   the
search. I tried to set carrot.fragSize=20 but only 15 cluster labels
 is
shown. Even when I tried to set carrot.fragSize=5, there's also 15
  labels
shown.
   
Is this the correct way to do this? I understand that setting it to
 20
might not necessary mean 20 lables will be shown, as the setting is
 for
maximum number. But when I set this to 5, it should reduce the number
  of
labels to 5?
   
I'm using Solr 5.1.
   
   
Regards,
Edwin
   
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: Chef recipes for Solr

2015-06-01 Thread Upayavira

I have many. My SolrCloud code has the app push configs to zookeeper.

I am afk at the mo. Feel free to bug me about it!

Upayavira

On Mon, Jun 1, 2015, at 07:29 PM, Walter Underwood wrote:
 Anyone have Chef recipes they like for deploying Solr? 
 
 I’d especially appreciate one for uploading the configs directly to a
 Zookeeper ensemble.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)