date:20150601

AW: Occasionally getting error in solr suggester component.

2015-06-01 Thread Clemens Wyss DEV

Lucene 5.1:
I am (also) facing 
java.lang.IllegalStateException: suggester was not built

At the very moment no new documents seem tob e added to the index/core. Will a 
reboot sanitize the index/core?

I (still) have 
str name=buildOnCommittrue/str

How can I tell Solr to peridoically update the suggestions? If not possible per 
configuration (in solrconfig.xml), what ist he preferred approach through SolrJ?

Thx
Clemens

-Ursprüngliche Nachricht-
Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] 
Gesendet: Donnerstag, 15. Januar 2015 19:52
An: solr-user@lucene.apache.org
Betreff: Re: Occasionally getting error in solr suggester component.

That sounds like a good approach to me.  Of course it depends how often you 
commit, and what your tolerance is for delay in having suggestions appear, but 
it sounds as if you have a good understanding of the tradeoffs there.

-Mike

On 1/15/15 10:31 AM, Dhanesh Radhakrishnan wrote:
 Hi,
  From Solr 4.7 onwards, the implementation of this Suggester is 
 changed. The old SpellChecker based search component is replaced with 
 a new suggester that utilizes Lucene suggester module. The latest Solr 
 download is preconfigured with this new suggester I;m using Solr 4.10 
 and suggestion are based on query  /suggest instead of /spell.
 So what I did is that in changed to str 
 name=buildOnCommitfalse/str Its not good that each time rebuild 
 the index on  commit , however, I would like to build the index on 
 certain time period, say 1 hour.
 The lookup data will be built only when requested by URL parameter 
 suggest.build=true

 http://localhost:8983/solr/ha/suggest?suggest.build=true;

 So this will rebuild the index again and the changes will reflect in 
 the suggester.

 There are certain pros and cons for this.
 Issue is that the change will reflect only on certain time interval, 
 here 1 hour. Advantage is that we can avoid the  rebuilt index  on 
 every commit or optimize.

 Is this the right way ?? or any that I missed ???

 Regards
 dhanesh s.r




 On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov  
 msoko...@safaribooksonline.com wrote:

 did you build the spellcheck index using spellcheck.build as 
 described
 here: https://cwiki.apache.org/confluence/display/solr/Spell+Checking ?

 -Mike


 On 01/14/2015 07:19 AM, Dhanesh Radhakrishnan wrote:

 Hi,
 Thanks for the reply.
 As you mentioned in the previous mail I changed buildOnCommit=false 
 in solrConfig.
 After that change, suggestions are not working.
 In Solr 4.7 introduced a new approach based on a dedicated 
 SuggestComponent I'm using that component to build suggestions and 
 lookup implementation is AnalyzingInfixLookupFactory
 Is there any work around ??




 On Wed, Jan 14, 2015 at 12:47 AM, Michael Sokolov  
 msoko...@safaribooksonline.com wrote:

   I think you are probably getting bitten by one of the issues 
 addressed in
 LUCENE-5889

 I would recommend against using buildOnCommit=true - with a large 
 index this can be a performance-killer.  Instead, build the index 
 yourself using the Solr spellchecker support 
 (spellcheck.build=true)

 -Mike


 On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote:

   Hi all,
 I am experiencing a problem in Solr SuggestComponent Occasionally 
 solr suggester component throws an  error like

 Solr failed:
 {responseHeader:{status:500,QTime:1},error:{msg:suggest
 er
 was
 not built,trace:java.lang.IllegalStateException: suggester was 
 not built\n\tat 
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
 lookup(AnalyzingInfixSuggester.java:368)\n\tat
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
 lookup(AnalyzingInfixSuggester.java:342)\n\tat
 org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\
 tat org.apache.solr.spelling.suggest.SolrSuggester.
 getSuggestions(SolrSuggester.java:199)\n\tat
 org.apache.solr.handler.component.SuggestComponent.
 process(SuggestComponent.java:234)\n\tat
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(
 SearchHandler.java:218)\n\tat
 org.apache.solr.handler.RequestHandlerBase.handleRequest(
 RequestHandlerBase.java:135)\n\tat
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
 handleRequest(RequestHandlers.java:246)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat
 org.apache.solr.servlet.SolrDispatchFilter.execute(
 SolrDispatchFilter.java:777)\n\tat
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:418)\n\tat
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:207)\n\tat
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
 ApplicationFilterChain.java:243)\n\tat
 org.apache.catalina.core.ApplicationFilterChain.doFilter(
 ApplicationFilterChain.java:210)\n\tat
 org.apache.catalina.core.StandardWrapperValve.invoke(
 StandardWrapperValve.java:225)\n\tat
 org.apache.catalina.core.StandardContextValve.invoke(

Synonyms within FQ

2015-06-01 Thread John Blythe

morning everyone,

i'm attempting to find related documents based on a manufacturer's
competitor. as such i'm querying against the 'description' field with
manufacturer1's product description but running a filter query with
manufacturer2's name against the 'mfgname' field.

one of the ways that we help boost our document finding is with a synonym
dictionary for manufacturer names. many of the larger players have multiple
divisions, have absorbed smaller companies, etc. so we need all of their
potential names to map to our record.

i may be wrong, but from my initial testing it doesn't seem to be applying
to a fq. is there any way of doing this?

thanks-

Re: Synonyms within FQ

2015-06-01 Thread John Blythe

after further investigation it looks like the synonym i was testing against
was only associated with one of their multiple divisions (despite being the
most common name for them!). it looks like this may clear the issue up, but
thanks anyway!

-- 
*John Blythe*
Product Manager  Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote:

 morning everyone,

 i'm attempting to find related documents based on a manufacturer's
 competitor. as such i'm querying against the 'description' field with
 manufacturer1's product description but running a filter query with
 manufacturer2's name against the 'mfgname' field.

 one of the ways that we help boost our document finding is with a synonym
 dictionary for manufacturer names. many of the larger players have multiple
 divisions, have absorbed smaller companies, etc. so we need all of their
 potential names to map to our record.

 i may be wrong, but from my initial testing it doesn't seem to be applying
 to a fq. is there any way of doing this?

 thanks-

UI Admin - and stored=false fields

2015-06-01 Thread Sznajder ForMailingList

Hi

I am indexing some content under text field.
In the schema.xml text field is defined as :


   field name=text type=text_general indexed=true stored=false
multiValued=true/


However, when I am looking to the documents via the UI
http://localhost:8983/solr/#/sec_600b/query

I see the text field content in the returned documents.

Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
in admin ui) is expected?

thanks!

Benjamin.

Re: Number of clustering labels to show

2015-06-01 Thread Alessandro Benedetti

Only to clarify the initial mail, The carrot.fragSize has nothing to do
with the number of clusters produced.

When you select to work with field summary ( you will work only on snippets
from the original content, snippets produced by the highlight of the query
in the content), the fragSize will specify the size of these fragments.

From Carrot documentation :

carrot.produceSummary

When true, the carrot.snippet
https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet field (if
no snippet field, then the carrot.title
https://wiki.apache.org/solr/ClusteringComponent#carrot.title field) will
be highlighted and the highlighted text will be used for clustering.
Highlighting is recommended when the snippet field contains a lot of
content. Highlighting can also increase the quality of clustering because
the clustered content will get an additional query-specific context.
carrot.fragSize

The frag size to use for highlighting. Meaningful only when
carrot.produceSummary
https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary is
true. If not specified, the default highlighting fragsize (hl.fragsize)
will be used. If that isn't specified, then 100.

Cheers

2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

Thank you Stanislaw for the links. Will read them up to better understand
how the algorithm works.

Regards,
Edwin

On 29 May 2015 at 17:22, Stanislaw Osinski
stanislaw.osin...@carrotsearch.com wrote:

Hi,

The number of clusters primarily depends on the parameters of the
specific
clustering algorithm. If you're using the default Lingo algorithm, the
number of clusters is governed by
the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a
look
at the documentation (

https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
)
for some more details (the Tweaking at Query-Time section shows how to
pass the specific parameters at request time). A complete overview of the
Lingo clustering algorithm parameters is here:
http://doc.carrot2.org/#section.component.lingo.

Stanislaw

--
Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
http://carrotsearch.com

On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com

wrote:

Hi,

I'm trying to increase the number of cluster result to be shown during
the
search. I tried to set carrot.fragSize=20 but only 15 cluster labels is
shown. Even when I tried to set carrot.fragSize=5, there's also 15
labels
shown.

Is this the correct way to do this? I understand that setting it to 20
might not necessary mean 20 lables will be shown, as the setting is for
maximum number. But when I set this to 5, it should reduce the number
of
labels to 5?

I'm using Solr 5.1.

Regards,
Edwin

--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England

Re: Deleting Fields

2015-06-01 Thread Charlie Hull


On 30/05/2015 00:30, Shawn Heisey wrote:

On 5/29/2015 5:08 PM, Joseph Obernberger wrote:

Hi All - I have a lot of fields to delete, but noticed that once I
started deleting them, I quickly ran out of heap space.  Is
delete-field a memory intensive operation?  Should I delete one field,
wait a while, then delete the next?


I'm not aware of a way to delete a field.  I may have a different
definition of what a field is than you do, though.

Solr lets you delete entire documents, but deleting a field from the
entire index would involve re-indexing every document in the index,
excluding that field.

Can you be more specific about exactly what you are doing, what you are
seeing, and what you want to see instead?

Also, please be aware of this:

http://people.apache.org/~hossman/#threadhijack

Thanks,
Shawn


Here's a rather old post on how we did something similar:
http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Best strategy for logging security

2015-06-01 Thread Vishal Swaroop

It will be great if you can provide your valuable inputs on strategy for
logging  security...


Thanks a lot in advance...



Logging :

- Is there a way to implement logging for each cores separately.

- What will be the best strategy to log every query details (like source
IP, search query, etc.) at some point we will need monthly reports for
analysis.



Securing SOLR :

- We need to implement SOLR security from client as well as server side...
requests will be performed via web app as well as other server side apps
e.g. curl...

Please suggest about the best approach we can follow... link to any
documentation will also help.



Environment : SOLR 4.7 configured on Tomcat 7  (Linux)

Re: UI Admin - and stored=false fields

2015-06-01 Thread Erick Erickson

That's the whole point of having a true/false option for stored.
Stored=true implies that those fields are available for display to
the user in results lists. stored=false and they're not.

Best,
Erick

On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList
bs4mailingl...@gmail.com wrote:
 Hi

 I am indexing some content under text field.
 In the schema.xml text field is defined as :


field name=text type=text_general indexed=true stored=false
 multiValued=true/


 However, when I am looking to the documents via the UI
 http://localhost:8983/solr/#/sec_600b/query

 I see the text field content in the returned documents.

 Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
 in admin ui) is expected?

 thanks!

 Benjamin.

Re: Synonyms within FQ

2015-06-01 Thread John Blythe

Thanks Erick!

On Mon, Jun 1, 2015 at 11:29 AM, Erick Erickson erickerick...@gmail.com
wrote:

 For future reference, fq clauses are parsed just like the q clause;
 they can be arbitrarily complex.
 Best,
 Erick
 On Mon, Jun 1, 2015 at 5:52 AM, John Blythe j...@curvolabs.com wrote:
 after further investigation it looks like the synonym i was testing against
 was only associated with one of their multiple divisions (despite being the
 most common name for them!). it looks like this may clear the issue up, but
 thanks anyway!

 --
 *John Blythe*
 Product Manager  Lead Developer

 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com

 58 Adams Ave
 Evansville, IN 47713

 On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote:

 morning everyone,

 i'm attempting to find related documents based on a manufacturer's
 competitor. as such i'm querying against the 'description' field with
 manufacturer1's product description but running a filter query with
 manufacturer2's name against the 'mfgname' field.

 one of the ways that we help boost our document finding is with a synonym
 dictionary for manufacturer names. many of the larger players have multiple
 divisions, have absorbed smaller companies, etc. so we need all of their
 potential names to map to our record.

 i may be wrong, but from my initial testing it doesn't seem to be applying
 to a fq. is there any way of doing this?

 thanks-

Re: Occasionally getting error in solr suggester component.

2015-06-01 Thread Erick Erickson

Attach suggester.build=true or suggester.buildAll=true to any request
to suggester to rebuild.
OR
add
buildOnStartup or buildOnCommit or buildOnOptimize to the definition
in solrConfig.

BUT:
building can be a _very_ expensive operation. For document-based
indexes, the build process
reads through _all_ of the _stored_ documents in your index, and that
can take many minutes so
I recommend against these options for a large index, and strongly
recommend you test these
with a large corpus.

Best,
Erick



On Mon, Jun 1, 2015 at 4:01 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 Lucene 5.1:
 I am (also) facing
 java.lang.IllegalStateException: suggester was not built

 At the very moment no new documents seem tob e added to the index/core. Will 
 a reboot sanitize the index/core?

 I (still) have
 str name=buildOnCommittrue/str

 How can I tell Solr to peridoically update the suggestions? If not possible 
 per configuration (in solrconfig.xml), what ist he preferred approach through 
 SolrJ?

 Thx
 Clemens

 -Ursprüngliche Nachricht-
 Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com]
 Gesendet: Donnerstag, 15. Januar 2015 19:52
 An: solr-user@lucene.apache.org
 Betreff: Re: Occasionally getting error in solr suggester component.

 That sounds like a good approach to me.  Of course it depends how often you 
 commit, and what your tolerance is for delay in having suggestions appear, 
 but it sounds as if you have a good understanding of the tradeoffs there.

 -Mike

 On 1/15/15 10:31 AM, Dhanesh Radhakrishnan wrote:
 Hi,
  From Solr 4.7 onwards, the implementation of this Suggester is
 changed. The old SpellChecker based search component is replaced with
 a new suggester that utilizes Lucene suggester module. The latest Solr
 download is preconfigured with this new suggester I;m using Solr 4.10
 and suggestion are based on query  /suggest instead of /spell.
 So what I did is that in changed to str
 name=buildOnCommitfalse/str Its not good that each time rebuild
 the index on  commit , however, I would like to build the index on
 certain time period, say 1 hour.
 The lookup data will be built only when requested by URL parameter
 suggest.build=true

 http://localhost:8983/solr/ha/suggest?suggest.build=true;

 So this will rebuild the index again and the changes will reflect in
 the suggester.

 There are certain pros and cons for this.
 Issue is that the change will reflect only on certain time interval,
 here 1 hour. Advantage is that we can avoid the  rebuilt index  on
 every commit or optimize.

 Is this the right way ?? or any that I missed ???

 Regards
 dhanesh s.r




 On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov 
 msoko...@safaribooksonline.com wrote:

 did you build the spellcheck index using spellcheck.build as
 described
 here: https://cwiki.apache.org/confluence/display/solr/Spell+Checking ?

 -Mike


 On 01/14/2015 07:19 AM, Dhanesh Radhakrishnan wrote:

 Hi,
 Thanks for the reply.
 As you mentioned in the previous mail I changed buildOnCommit=false
 in solrConfig.
 After that change, suggestions are not working.
 In Solr 4.7 introduced a new approach based on a dedicated
 SuggestComponent I'm using that component to build suggestions and
 lookup implementation is AnalyzingInfixLookupFactory
 Is there any work around ??




 On Wed, Jan 14, 2015 at 12:47 AM, Michael Sokolov 
 msoko...@safaribooksonline.com wrote:

   I think you are probably getting bitten by one of the issues
 addressed in
 LUCENE-5889

 I would recommend against using buildOnCommit=true - with a large
 index this can be a performance-killer.  Instead, build the index
 yourself using the Solr spellchecker support
 (spellcheck.build=true)

 -Mike


 On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote:

   Hi all,
 I am experiencing a problem in Solr SuggestComponent Occasionally
 solr suggester component throws an  error like

 Solr failed:
 {responseHeader:{status:500,QTime:1},error:{msg:suggest
 er
 was
 not built,trace:java.lang.IllegalStateException: suggester was
 not built\n\tat
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
 lookup(AnalyzingInfixSuggester.java:368)\n\tat
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.
 lookup(AnalyzingInfixSuggester.java:342)\n\tat
 org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\
 tat org.apache.solr.spelling.suggest.SolrSuggester.
 getSuggestions(SolrSuggester.java:199)\n\tat
 org.apache.solr.handler.component.SuggestComponent.
 process(SuggestComponent.java:234)\n\tat
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(
 SearchHandler.java:218)\n\tat
 org.apache.solr.handler.RequestHandlerBase.handleRequest(
 RequestHandlerBase.java:135)\n\tat
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
 handleRequest(RequestHandlers.java:246)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat
 org.apache.solr.servlet.SolrDispatchFilter.execute(

Re: Sorting in Solr

2015-06-01 Thread Shawn Heisey

On 6/1/2015 9:29 AM, Steven White wrote:
 I need to be able to sot in Solr.  Obviously, I need to do this in a way
 sorting won't cause OOM when a result may contain 1000's of hits if not
 millions.  Can you guide me on how I can do this?  Is there a way to tell
 Solr sort top N results (discarding everything else) or must such sorting
 be cone on the client side?

Solr supports sorting.

https://wiki.apache.org/solr/CommonQueryParameters#sort

https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter

I think we may have an omission from the docs -- docValues can also be
used for sorting, and may also offer a performance advantage.

Thanks,
Shawn

Re: Synonyms within FQ

2015-06-01 Thread Erick Erickson

For future reference, fq clauses are parsed just like the q clause;
they can be arbitrarily complex.

Best,
Erick

On Mon, Jun 1, 2015 at 5:52 AM, John Blythe j...@curvolabs.com wrote:
 after further investigation it looks like the synonym i was testing against
 was only associated with one of their multiple divisions (despite being the
 most common name for them!). it looks like this may clear the issue up, but
 thanks anyway!

 --
 *John Blythe*
 Product Manager  Lead Developer

 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com

 58 Adams Ave
 Evansville, IN 47713

 On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote:

 morning everyone,

 i'm attempting to find related documents based on a manufacturer's
 competitor. as such i'm querying against the 'description' field with
 manufacturer1's product description but running a filter query with
 manufacturer2's name against the 'mfgname' field.

 one of the ways that we help boost our document finding is with a synonym
 dictionary for manufacturer names. many of the larger players have multiple
 divisions, have absorbed smaller companies, etc. so we need all of their
 potential names to map to our record.

 i may be wrong, but from my initial testing it doesn't seem to be applying
 to a fq. is there any way of doing this?

 thanks-

Sorting in Solr

2015-06-01 Thread Steven White

Hi everyone,

I need to be able to sot in Solr.  Obviously, I need to do this in a way
sorting won't cause OOM when a result may contain 1000's of hits if not
millions.  Can you guide me on how I can do this?  Is there a way to tell
Solr sort top N results (discarding everything else) or must such sorting
be cone on the client side?

Thanks in advanced

Steve

Re: UI Admin - and stored=false fields

2015-06-01 Thread Erik Hatcher

Did you happen to change the field type definition without reindexing?  (it 
requires reindexing to “unstore” them if they were originally stored)

If you’re seeing a field value in a document result (not facets, those are 
driven by indexed terms) when stored=“false” then something is wrong and I’d 
guess it’s because of the field definition changing as mentioned.


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On Jun 1, 2015, at 11:27 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 That's the whole point of having a true/false option for stored.
 Stored=true implies that those fields are available for display to
 the user in results lists. stored=false and they're not.
 
 Best,
 Erick
 
 On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList
 bs4mailingl...@gmail.com wrote:
 Hi
 
 I am indexing some content under text field.
 In the schema.xml text field is defined as :
 
 
   field name=text type=text_general indexed=true stored=false
 multiValued=true/
 
 
 However, when I am looking to the documents via the UI
 http://localhost:8983/solr/#/sec_600b/query
 
 I see the text field content in the returned documents.
 
 Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
 in admin ui) is expected?
 
 thanks!
 
 Benjamin.

Re: UI Admin - and stored=false fields

2015-06-01 Thread Erick Erickson

Reset, pay attention to Erik. I didn't read it all the way through.

Erick

On Mon, Jun 1, 2015 at 8:27 AM, Erick Erickson erickerick...@gmail.com wrote:
 That's the whole point of having a true/false option for stored.
 Stored=true implies that those fields are available for display to
 the user in results lists. stored=false and they're not.

 Best,
 Erick

 On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList
 bs4mailingl...@gmail.com wrote:
 Hi

 I am indexing some content under text field.
 In the schema.xml text field is defined as :


field name=text type=text_general indexed=true stored=false
 multiValued=true/


 However, when I am looking to the documents via the UI
 http://localhost:8983/solr/#/sec_600b/query

 I see the text field content in the returned documents.

 Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed
 in admin ui) is expected?

 thanks!

 Benjamin.

Re: Sorting in Solr

2015-06-01 Thread Erick Erickson

Steve:

Surprisingly, the number of hits is completely irrelevant for the
memory requirements for sorting. The base memory size is, AFAIK, an
array of maxDoc ints (you can find maxDoc on the admin screen).
There's some additional overhead, but that's the base size. If you sue
DocValues, much of the overhead is kept in the MMapDirectory space
IIRC.

Best,
Erick


On Mon, Jun 1, 2015 at 8:41 AM, Shawn Heisey apa...@elyograg.org wrote:
 On 6/1/2015 9:29 AM, Steven White wrote:
 I need to be able to sot in Solr.  Obviously, I need to do this in a way
 sorting won't cause OOM when a result may contain 1000's of hits if not
 millions.  Can you guide me on how I can do this?  Is there a way to tell
 Solr sort top N results (discarding everything else) or must such sorting
 be cone on the client side?

 Solr supports sorting.

 https://wiki.apache.org/solr/CommonQueryParameters#sort

 https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter

 I think we may have an omission from the docs -- docValues can also be
 used for sorting, and may also offer a performance advantage.

 Thanks,
 Shawn

Re: fq and defType

2015-06-01 Thread Shawn Heisey

On 6/1/2015 10:44 AM, david.dav...@correo.aeat.es wrote:
 I need to parse some complicated queries that only works properly with the 
 edismax query parser, in q and fq parameters. I am testing with 
 defType=edismax, but it seems that this clause only affects to the q 
 parameter. Is there any way to set edismax to the fq parameter?

fq={!edismax}querystring

The other edismax parameters on your request (qf, etc) apply to those
filter queries just like they would for the q parameter.

Thanks,
Shawn

fq and defType

2015-06-01 Thread david . davila

Hello,

I need to parse some complicated queries that only works properly with the 
edismax query parser, in q and fq parameters. I am testing with 
defType=edismax, but it seems that this clause only affects to the q 
parameter. Is there any way to set edismax to the fq parameter?

Thank you very much, 


David Dávila Atienza
DIT
Teléfono: 915828763
Extensión: 36763

Re: Deleting Fields

2015-06-01 Thread Joseph Obernberger


Hi - we are using 64bit OS and 64bit JVM.  The JVM settings are currently:
-
-DSTOP.KEY=solrrocks
-DSTOP.PORT=8100
-Dhost=helios
-Djava.net.preferIPv4Stack=true
-Djetty.port=9100
-DnumShards=27
-Dsolr.clustering.enabled=true
-Dsolr.install.dir=/opt/solr
-Dsolr.lock.type=hdfs
-Dsolr.solr.home=/opt/solr/server/solr
-Duser.timezone=UTC-DzkClientTimeout=15000
-DzkHost=eris.querymasters.com:2181,daphnis.querymasters.com:2181,triton.querymasters.com:2181,oberon.querymasters.com:2181,portia.querymasters.com:2181,puck.querymasters.com:2181/solr5
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseLargePages
-XX:+UseParNewGC-XX:CMSFullGCsBeforeCompaction=1
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:CMSTriggerPermRatio=80
-XX:ConcGCThreads=8
-XX:MaxDirectMemorySize=26g
-XX:MaxTenuringThreshold=8
-XX:NewRatio=3
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 9100 /opt/solr/server/logs
-XX:ParallelGCThreads=8
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-Xloggc:/opt/solr/server/logs/solr_gc.log
-Xms8g
-Xmx16g
-Xss256k
-verbose:gc
-

At the time out of the OOM error, Xmx was set to 10g.  OS limits, for 
the most part, are 'factory' Scientific Linux 6.6.  I didn't see any 
messages in the log about too many open files.  Thank you for the tips!


-Joe

On 5/31/2015 4:24 AM, Tomasz Borek wrote:

Joseph,

You are doing a memory intensive operation and perhaps an IO intensive
operation at once. That makes your C-heap run out of memory or hit a thread
limit (thus first problem, java.lang.OutOfMemoryError: unable to create new
native thread) and later you're also hitting the problem of Java heap being
full or - more precisely - GC being unable to free enough space there even
with collection, to allocate new object that you want allocated (thus
second throw: java.lang.OutOfMemoryError: Java heap space).

Important is:
- whether your OS is 32-bit or 64-bit
- whether your JVM is 32-bit or 64-bit
- what are your OS limits on thread creation and have you touched them or
changed them (1st problem)
- how do you start JVM (Xms, Xmx, Xss, dumps, direct memory, permgen size -
both problems)

What you can do to solve your problems differs depending on what exactly
causes them, but in general:

NATIVE:
1) Either your operation causes many threads to spawn and you hit your
thread limit (OS limits how many threads process can create) - have a
threaddump and see
2) Or your op causes many thread creation and the memory settings you start
JVM with plus 32/64 bits of OS and JVM make it impossible for the C-heap to
have this much memory thus you're hitting the OOM error - adjust settings,
move to 64-bit architectures, add RAM while on 64-bit (32-bit really chokes
you down, less than 4GB is available for you for EVERYTHING: Java heap,
PermGen space AND C-Heap)

Usually, with such thread-greedy operation, it's also nice to look at the
code and see if perhaps one can optimize thread creation/management.

OOM on Java heap:
Add crash on memory dump parameter to your JVM and walk dominator tree or
see histogram to actually tell what's eating your heap space. MAT is a good
tool for this, unless your heap is like 150GB, then Netbeans may help or
see Alexey Ragozin's work, I think he forked the code for NetBeans heap
analyzer and made some adjustments specially for such cases. Light Google
search and here it is: http://blog.ragozin.info/


pozdrawiam,
LAFK

2015-05-30 20:48 GMT+02:00 Erick Erickson erickerick...@gmail.com:


Faceting on very high cardinality fields can use up memory, no doubt
about that. I think the entire delete question was a red herring, but
you know that already ;)

So I think you can forget about the delete stuff. Although do note
that if you do re-index your old documents, the new version won't have
the field, and as segments are merged the deleted documents will have
all their resources reclaimed, effectively deleting the field from the
old docs So you could gradually re-index your corpus and get this
stuff out of there.

Best,
Erick

On Sat, May 30, 2015 at 5:18 AM, Joseph Obernberger
j...@lovehorsepower.com wrote:

Thank you Erick.  I was thinking that it actually went through and

removed

the index data; that you for the clarification.  What happened was I had
some bad data that created a lot of fields (some 8000).  I was getting

some

errors adding new fields where solr could not talk to zookeeper, and I
thought it may be because there are so many fields.  The index size is

some

420million docs.
I'm hesitant to try to re-create as when the shards crash, they leave a
write.lock file in HDFS, and I need to manually

Re: fq and defType

2015-06-01 Thread david . davila

Thank you!

David



De: Shawn Heisey apa...@elyograg.org
Para:   solr-user@lucene.apache.org, 
Fecha:  01/06/2015 18:53
Asunto: Re: fq and defType



On 6/1/2015 10:44 AM, david.dav...@correo.aeat.es wrote:
 I need to parse some complicated queries that only works properly with 
the 
 edismax query parser, in q and fq parameters. I am testing with 
 defType=edismax, but it seems that this clause only affects to the q 
 parameter. Is there any way to set edismax to the fq parameter?

fq={!edismax}querystring

The other edismax parameters on your request (qf, etc) apply to those
filter queries just like they would for the q parameter.

Thanks,
Shawn

Chef recipes for Solr

2015-06-01 Thread Walter Underwood

Anyone have Chef recipes they like for deploying Solr? 

I’d especially appreciate one for uploading the configs directly to a Zookeeper 
ensemble.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

Re: Best strategy for logging security

2015-06-01 Thread Rajesh Hazari

Logging :

Just use logstash to a parse your logs for all collection and  logstash
forwarder and lumberjack at your solr replicas in your solr cloud to send
the log events to you central logstash server and send it to back to solr
(either the same or different instance) to a different collection.

The default log4j.properties that comes with solr dist can log core name
with each query log.

Security:
suggest you to go through this wiki
https://wiki.apache.org/solr/SolrSecurity

*Thanks,*
*Rajesh,*
*(mobile) : 8328789519.*

On Mon, Jun 1, 2015 at 11:20 AM, Vishal Swaroop vishal@gmail.com
wrote:

 It will be great if you can provide your valuable inputs on strategy for
 logging  security...


 Thanks a lot in advance...



 Logging :

 - Is there a way to implement logging for each cores separately.

 - What will be the best strategy to log every query details (like source
 IP, search query, etc.) at some point we will need monthly reports for
 analysis.



 Securing SOLR :

 - We need to implement SOLR security from client as well as server side...
 requests will be performed via web app as well as other server side apps
 e.g. curl...

 Please suggest about the best approach we can follow... link to any
 documentation will also help.



 Environment : SOLR 4.7 configured on Tomcat 7  (Linux)

Multiple Word Synonyms with Autophrasing

2015-06-01 Thread Chris Morley

Hello everyone @ solr-user,
  
 At Wayfair, I have implemented multiple word synonyms in a clean and 
efficient way in conjunction with with a slightly modified version of the 
LucidWorks' Autophrasing plugin by also tacking on a modified version of 
edismax.  It is not released or on use on our public website yet, but it 
will be very soon.  While it is not ready to officially open source yet, I 
know some people out there are anxious to implement this type of thing.  
Please feel free to contact me if you are interested in learning about how 
to theoretically accomplish this on your own.  Note that while this may 
have some concepts in common with Named Entity Recognition implementations, 
I think it really is a completely different thing.  I get a lot of spam, so 
if you please, would you write me privately your questions with the subject 
line being MWSwA so I can easily compile everyone's questions about this. 
 I will respond to everyone at some point soon with some beta documentation 
or possibly with an invitation to a private github or something so that you 
can review an example.
  
 Thanks!
 -Chris.

Re: Best strategy for logging security

2015-06-01 Thread Vishal Swaroop

Thanks Rajesh... just trying to figure out if *logstash *is opensource and
free ?

On Mon, Jun 1, 2015 at 2:13 PM, Rajesh Hazari rajeshhaz...@gmail.com
wrote:

 Logging :

 Just use logstash to a parse your logs for all collection and  logstash
 forwarder and lumberjack at your solr replicas in your solr cloud to send
 the log events to you central logstash server and send it to back to solr
 (either the same or different instance) to a different collection.

 The default log4j.properties that comes with solr dist can log core name
 with each query log.

 Security:
 suggest you to go through this wiki
 https://wiki.apache.org/solr/SolrSecurity

 *Thanks,*
 *Rajesh,*
 *(mobile) : 8328789519.*

 On Mon, Jun 1, 2015 at 11:20 AM, Vishal Swaroop vishal@gmail.com
 wrote:

  It will be great if you can provide your valuable inputs on strategy for
  logging  security...
 
 
  Thanks a lot in advance...
 
 
 
  Logging :
 
  - Is there a way to implement logging for each cores separately.
 
  - What will be the best strategy to log every query details (like source
  IP, search query, etc.) at some point we will need monthly reports for
  analysis.
 
 
 
  Securing SOLR :
 
  - We need to implement SOLR security from client as well as server
 side...
  requests will be performed via web app as well as other server side apps
  e.g. curl...
 
  Please suggest about the best approach we can follow... link to any
  documentation will also help.
 
 
 
  Environment : SOLR 4.7 configured on Tomcat 7  (Linux)

Re: fq and defType

2015-06-01 Thread Mikhail Khludnev

fq={!edismax}you are welome


On Mon, Jun 1, 2015 at 6:44 PM, david.dav...@correo.aeat.es wrote:

 Hello,

 I need to parse some complicated queries that only works properly with the
 edismax query parser, in q and fq parameters. I am testing with
 defType=edismax, but it seems that this clause only affects to the q
 parameter. Is there any way to set edismax to the fq parameter?

 Thank you very much,


 David Dávila Atienza
 DIT
 Teléfono: 915828763
 Extensión: 36763




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

Re: solr uima and opennlp

2015-06-01 Thread Tommaso Teofili

yeah, I think you'd rather post it to d...@uima.apache.org .

Regards,
Tommaso

2015-05-28 15:19 GMT+02:00 hossmaa andreea.hossm...@gmail.com:

 Hi Tommaso

 Thanks for the quick reply! I have another question about using the
 Dictionary Annotator, but I guess it's better to post it separately.

 Cheers
 Andreea



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873p4208348.html
 Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud 5.1 startup looking for standalone config

2015-06-01 Thread tuxedomoon

I followed these steps and I am unable to launch in cloud mode.

1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3

2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2

3. uploaded a configset to zk1  (solr home is /volume/solr/data)
---
/opt/solrsvc/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost 
zk1:2181  -confname mycollection_cloud_conf -solrhome /volume/solr/data
-confdir  /home/ec2-user/mycollection/conf


4. on s1, added these params to solr.in.sh
---
ZK_HOST=zk1:2181,zk2:2181,zk3:2181
SOLR_HOST=s1
ZK_CLIENT_TIMEOUT=15000
SOLR_OPTS=$SOLR_OPTS -DnumShards=2


5. on s1 created core directory and file

/volume/solr/data/mycollection/core.properties (name=mycollection)


6. repeated steps 4,5 for s2 minus the numShards param


Starting the service on s1 gives me

mycollection:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Could not load conf for core mycollection: Error loading solr config from
/volume/solr/data/mycollection/conf/solrconfig.xml 

but aren't the config files supposed to be in Zookeeper?  

Tux


   
   







--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118.html
Sent from the Solr - User mailing list archive at Nabble.com.

Derive suggestions across multiple fields

2015-06-01 Thread Zheng Lin Edwin Yeo

Hi,

Does anyone knows if we can derive suggestions across multiple fields?

I tried to set something like this in my field in suggest searchComponents
in solrconfig.xml, but nothing is returned. It only works when I set a
single field, and not multiple field.

  searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
  str name=fieldContent, Summary/str  !-- the indexed field to
derive suggestions from --
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent

I'm using solr 5.1.

Regards,
Edwin

Re: Chef recipes for Solr

2015-06-01 Thread Walter Underwood

That sounds great. Someone else here will be making the recipes, so I’ll put 
him in touch with you.

As always, this is a really helpful list.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Jun 1, 2015, at 10:20 PM, Upayavira u...@odoko.co.uk wrote:

 
 I have many. My SolrCloud code has the app push configs to zookeeper.
 
 I am afk at the mo. Feel free to bug me about it!
 
 Upayavira
 
 On Mon, Jun 1, 2015, at 07:29 PM, Walter Underwood wrote:
 Anyone have Chef recipes they like for deploying Solr? 
 
 I’d especially appreciate one for uploading the configs directly to a
 Zookeeper ensemble.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)

Re: SolrCloud 5.1 startup looking for standalone config

2015-06-01 Thread Erick Erickson

bq: but aren't the config files supposed to be in Zookeeper

Yes, but you haven't done anything to tell Solr that the nodes you've
created are part of SolrCloud!

You're confusing, I think, core discovery with creating collections.
Basically you were pretty much OK up until step 5 (although I'm not at
all sure that SOLR_HOST is doing you any good, and certainly setting
numShards in SOLR_OPTS isn't a good idea, what happens if you want to
create a collection with 5 shards?)

You don't need to create any directories on your Solr nodes, that'll
be done for you automatically by the collection creation command from
the Collections API. So I'd down the nodes and nuke the directories
you created by hand and bring the nodes back up. It's probably not
necessary to take the nodes down, but I tend to be paranoid about
that.

Then just create the collection via the Collections API CREATE
command, see: 
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1

You can use curl or a browser to issue something like this to any
active Solr node, Solr will do the rest:
http://some_solr_node:port/solr/admin/collections?action=CREATEname=mycollectionnumShards=2collection.configName=my_collection_cloud_confetc..

I believe it's _possible_ to carefully construct the core.properties
files on all the Solr instances, but unless you know _exactly_ what's
going on under the covers it'll lead to endless tail-chasing. You can
control which nodes the collection ends up on with the createNodeSet
parameter etc

Best,
Erick

On Mon, Jun 1, 2015 at 4:37 PM, tuxedomoon dancolem...@yahoo.com wrote:
 I followed these steps and I am unable to launch in cloud mode.

 1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3

 2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2

 3. uploaded a configset to zk1  (solr home is /volume/solr/data)
 ---
 /opt/solrsvc/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost
 zk1:2181  -confname mycollection_cloud_conf -solrhome /volume/solr/data
 -confdir  /home/ec2-user/mycollection/conf


 4. on s1, added these params to solr.in.sh
 ---
 ZK_HOST=zk1:2181,zk2:2181,zk3:2181
 SOLR_HOST=s1
 ZK_CLIENT_TIMEOUT=15000
 SOLR_OPTS=$SOLR_OPTS -DnumShards=2


 5. on s1 created core directory and file
 
 /volume/solr/data/mycollection/core.properties (name=mycollection)


 6. repeated steps 4,5 for s2 minus the numShards param


 Starting the service on s1 gives me

 mycollection:
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
 Could not load conf for core mycollection: Error loading solr config from
 /volume/solr/data/mycollection/conf/solrconfig.xml

 but aren't the config files supposed to be in Zookeeper?

 Tux











 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Number of clustering labels to show

2015-06-01 Thread Zheng Lin Edwin Yeo

Thank you so much Alessandro.

But i do not find any difference with the quality of the clustering results
when I change the hl.fragszie to a even though I've set my
carrot.produceSummary to true.

Regards,
Edwin

On 1 June 2015 at 17:31, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

Only to clarify the initial mail, The carrot.fragSize has nothing to do
with the number of clusters produced.

From Carrot documentation :

carrot.produceSummary

When true, the carrot.snippet
https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet field
(if
no snippet field, then the carrot.title
https://wiki.apache.org/solr/ClusteringComponent#carrot.title field)
will
be highlighted and the highlighted text will be used for clustering.
Highlighting is recommended when the snippet field contains a lot of
content. Highlighting can also increase the quality of clustering because
the clustered content will get an additional query-specific context.
carrot.fragSize

The frag size to use for highlighting. Meaningful only when
carrot.produceSummary
https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary
is
true. If not specified, the default highlighting fragsize (hl.fragsize)
will be used. If that isn't specified, then 100.

Cheers

2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

Thank you Stanislaw for the links. Will read them up to better understand
how the algorithm works.

Regards,
Edwin

On 29 May 2015 at 17:22, Stanislaw Osinski
stanislaw.osin...@carrotsearch.com wrote:

Hi,

https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
)
for some more details (the Tweaking at Query-Time section shows how
to
pass the specific parameters at request time). A complete overview of
the
Lingo clustering algorithm parameters is here:
http://doc.carrot2.org/#section.component.lingo.

Stanislaw

--
Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
http://carrotsearch.com

On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com

wrote:

Hi,

I'm trying to increase the number of cluster result to be shown
during
the
search. I tried to set carrot.fragSize=20 but only 15 cluster labels
is
shown. Even when I tried to set carrot.fragSize=5, there's also 15
labels
shown.

Is this the correct way to do this? I understand that setting it to
20
might not necessary mean 20 lables will be shown, as the setting is
for
maximum number. But when I set this to 5, it should reduce the number
of
labels to 5?

I'm using Solr 5.1.

Regards,
Edwin

--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England

Re: Chef recipes for Solr

2015-06-01 Thread Upayavira


I have many. My SolrCloud code has the app push configs to zookeeper.

I am afk at the mo. Feel free to bug me about it!

Upayavira

On Mon, Jun 1, 2015, at 07:29 PM, Walter Underwood wrote:
 Anyone have Chef recipes they like for deploying Solr? 
 
 I’d especially appreciate one for uploading the configs directly to a
 Zookeeper ensemble.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)

AW: Occasionally getting error in solr suggester component.

Synonyms within FQ

Re: Synonyms within FQ

UI Admin - and stored=false fields

Re: Number of clustering labels to show

Re: Deleting Fields

Best strategy for logging security

Re: UI Admin - and stored=false fields

Re: Synonyms within FQ

Re: Occasionally getting error in solr suggester component.

Re: Sorting in Solr

Re: Synonyms within FQ

Sorting in Solr

Re: UI Admin - and stored=false fields

Re: UI Admin - and stored=false fields

Re: Sorting in Solr

Re: fq and defType

fq and defType

Re: Deleting Fields

Re: fq and defType

Chef recipes for Solr

Re: Best strategy for logging security

Multiple Word Synonyms with Autophrasing

Re: Best strategy for logging security

Re: fq and defType

Re: solr uima and opennlp

SolrCloud 5.1 startup looking for standalone config

Derive suggestions across multiple fields

Re: Chef recipes for Solr

Re: SolrCloud 5.1 startup looking for standalone config

Re: Number of clustering labels to show

Re: Chef recipes for Solr

32 matches

Site Navigation

Mail list logo

Footer information