RE: FW: NRTCachingDirectory threads stuck

2015-02-23 Thread Moshe Recanati
Thank you.



Regards,
Moshe Recanati
SVP Engineering
Office + 972-73-2617564
Mobile  + 972-52-6194481
Skype    :  recanati

More at:  www.kmslh.com | LinkedIn | FB


-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Sunday, February 22, 2015 6:16 PM
To: solr-user
Subject: Re: FW: NRTCachingDirectory threads stuck

On Sun, Feb 22, 2015 at 1:54 PM, Moshe Recanati mos...@kmslh.com wrote:

 Hi Mikhail,
 Thank you.
 1. Regarding jetty threads - How I can reduce them?


https://wiki.eclipse.org/Jetty/Howto/High_Load#Thread_Pool note, you'll get
503 or something when pool size is exceeded.


 2. Is it related to the fact we're running Solr 4.0 in parallel on 
 this machine?


are their index dirs different? Nevertheless, running something at same machine 
leads to resource contention. What does `top` say?



 Thank you


 Regards,
 Moshe Recanati
 SVP Engineering
 Office + 972-73-2617564
 Mobile  + 972-52-6194481
 Skype:  recanati

 More at:  www.kmslh.com | LinkedIn | FB


 -Original Message-
 From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
 Sent: Sunday, February 22, 2015 11:18 AM
 To: solr-user
 Subject: Re: FW: NRTCachingDirectory threads stuck

 Hello,

 I checked 20020.tdump. From the update perspective, it's ok, I see the 
 single thread committed and awaits for opening a searcher. There are a 
 few very bad evidences:
 - there are many threads executing search requests in parallel. let;s 
 say it's roughly hundred of them. This is dead end. Consider to limit 
 number of jetty threads, start from number of cores available;
 - heap is full, it's no-way for java. Either increase it, or reduce 
 load or make sure that there are no any leak;
 - i see many threads executing Luke handler code, it might be really 
 wrong setup, or regular approach for Solr replication. I'm not sure here.


 On Sun, Feb 22, 2015 at 9:57 AM, Moshe Recanati mos...@kmslh.com wrote:

   Hi,
 
  I saw message rejected because of attachment.
 
  I uploaded data to drive
 
 
  https://drive.google.com/file/d/0B0GR0M-lL5QHVDNjZlUwVTR2QTQ/view?us
  p=
  sharing
 
 
 
  Moshe
 
 
 
  *From:* Moshe Recanati [mailto:mos...@kmslh.com]
  *Sent:* Sunday, February 22, 2015 8:37 AM
  *To:* solr-user@lucene.apache.org
  *Subject:* RE: NRTCachingDirectory threads stuck
 
 
 
  *From:* Moshe Recanati
  *Sent:* Sunday, February 22, 2015 8:34 AM
  *To:* solr-user@lucene.apache.org
  *Subject:* NRTCachingDirectory threads stuck
 
 
 
  Hi,
 
  We're running two Solr servers on same machine.
 
  Once Solr 4.0 and the second is Solr 4.7.1.
 
  In the Solr 4.7.1 we've very strange behavior, while indexing 
  document we get spike of memory from 1GB to 4Gb in couple of minutes 
  and huge number of threads stuck on
 
  NRTCachingDirectory.openInput methods.
 
 
 
  Thread sump and GC attached.
 
 
 
  Are you familiar with this behavior? What can be the trigger for this?
 
 
 
  Thank you,
 
 
 
 
 
  *Regards,*
 
  *Moshe Recanati*
 
  *SVP Engineering*
 
  Office + 972-73-2617564
 
  Mobile  + 972-52-6194481
 
  Skype:  recanati
  [image: KMS2]
  http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121
  00
  0184.html
 
  More at:  www.kmslh.com | LinkedIn
  http://www.linkedin.com/company/kms-lighthouse | FB 
  https://www.facebook.com/pages/KMS-lighthouse/123774257810917
 
 
 
 
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com




--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


highlighting the boolean query

2015-02-23 Thread Dmitry Kan
Hello!

In solr 4.3.1 there seem to be some inconsistency with the highlighting of
the boolean query:

a OR (b c) OR d

This returns a proper hit, which shows that only d was included into the
document score calculation.

But the highlighter returns both d and c in em tags.

Is this a known issue of the standard highlighter? Can it be mitigated?


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: Question on CloudSolrServer API

2015-02-23 Thread Shalin Shekhar Mangar
By default the max connections is set to 128 and max connections per host
is 32. You can configure an HttpClient as per your needs and pass it as a
parameter to CloudSolrServer's constructor.

On Mon, Feb 23, 2015 at 3:49 PM, Manohar Sripada manohar...@gmail.com
wrote:

 Thanks for the response. How to control the number of connections pooled
 here in SolrJ Client? Also, what will be the default values for maximum
 Connections and all.

 - Thanks

 On Thu, Feb 19, 2015 at 6:09 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

  No, you should reuse the same CloudSolrServer instance for all requests.
 It
  is a thread safe object. You could also create a static/common HttpClient
  instance and pass it to the constructor of CloudSolrServer but even if
 you
  don't, it will create one internally and use it for all requests so that
  connections can be pooled.
  On 19-Feb-2015 1:44 pm, Manohar Sripada manohar...@gmail.com wrote:
 
   Hi All,
  
   I am using CloudSolrServer API of SolrJ library from my application to
   query Solr. Here, I am creating a new connection to Solr for every
 search
   that I am doing. Once I got the results I am closing the connection.
  
   Is this the correct way? How does Solr create connections internally?
  Does
   it maintain a pool of connections (if so how to configure it)?
  
   Thanks,
   Manohar
  
 




-- 
Regards,
Shalin Shekhar Mangar.


Re: Question on CloudSolrServer API

2015-02-23 Thread Manohar Sripada
Thanks for the response. How to control the number of connections pooled
here in SolrJ Client? Also, what will be the default values for maximum
Connections and all.

- Thanks

On Thu, Feb 19, 2015 at 6:09 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 No, you should reuse the same CloudSolrServer instance for all requests. It
 is a thread safe object. You could also create a static/common HttpClient
 instance and pass it to the constructor of CloudSolrServer but even if you
 don't, it will create one internally and use it for all requests so that
 connections can be pooled.
 On 19-Feb-2015 1:44 pm, Manohar Sripada manohar...@gmail.com wrote:

  Hi All,
 
  I am using CloudSolrServer API of SolrJ library from my application to
  query Solr. Here, I am creating a new connection to Solr for every search
  that I am doing. Once I got the results I am closing the connection.
 
  Is this the correct way? How does Solr create connections internally?
 Does
  it maintain a pool of connections (if so how to configure it)?
 
  Thanks,
  Manohar
 



CollationKeyFilterFactory stops suggestions and collations

2015-02-23 Thread Nitin Solanki
Hello all,
  I am working on collations. Somewhere in Solr, I found that
UnicodeCollation will do searching fast. But after applying
CollationKeyFilterFactory in schema.xml, it stops the suggestions and
collations both. Please check the configurations and help me.

*Schema.xml:*

fieldType name=textSpell class=solr.TextField
positionIncrementGap=100
   analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.CollationKeyFilterFactory language=
strength=primary/
  /analyzer
  analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.CollationKeyFilterFactory language=
strength=primary/
  /analyzer
/fieldType


Solrconfig.xml:

requestHandler name=/spell class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=dfgram_ci/str
  !-- Solr will use suggestions from both the 'default' spellchecker
   and from the 'wordbreak' spellchecker and combine them.
   collations (re-written queries) can include a combination of
   corrections from both spellcheckers --
  str name=spellcheck.dictionarydefault/str
  str name=spellcheckon/str
  str name=spellcheck.extendedResultstrue/str
  str name=spellcheck.count25/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.maxResultsForSuggest10/str
  str name=spellcheck.alternativeTermCount25/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.maxCollations100/str
  str name=spellcheck.maxCollationTries1000/str
  str name=spellcheck.collateExtendedResultstrue/str
/lst
arr name=last-components
  strspellcheck/str
  !--strsuggest/str--
  !--strquery/str--
/arr
  /requestHandler


Atomic Update while having fields with attribute stored=true in schema

2015-02-23 Thread Rahul Bhooteshwar
Hi,
I have around 50 fields in my schema and having 20 fields are stored=”true”
and rest of them stored=”false”
In case partial update (atomic update), it is  mentioned at many places
that the fields in schema should have stored=”true”. I have also tried
atomic update on documents having fields with stored=false and
indexed=true, and it didn't work (My whole document vanished from solr or
I am unable to search it now, whatever.). Although I didn't change the
existing value for the fields having stored=false.

Which means I have to change all my fields to stored=”true” if I want to
use atomic update.Right?
Will it affect the performance of the Solr? if yes, then what is the best
practice to reduce performance degradation as much as possible?Thanks in
advance.

Thanks and Regards,
Rahul Bhooteshwar
Enterprise Software Engineer
HotWax Systems http://www.hotwaxsystems.com - The global leader in
innovative enterprise commerce solutions powered by Apache OFBiz.
ApacheCon US 2014 Silver Sponsor


Re: Atomic Update while having fields with attribute stored=true in schema

2015-02-23 Thread Yago Riveiro
Field with store=true has the downside of disk space. Your index will grow in 
space requirements.


Maybe update the whole document can be an option ...









—
/Yago Riveiro

On Mon, Feb 23, 2015 at 1:02 PM, Rahul Bhooteshwar
rahul.bhootesh...@hotwaxsystems.com wrote:

 Hi Yago Riveiro,
 Thanks for your quick reply. I am using Solr for faceted search using 
 *Solr**j.
 *I am using facet queries and filter queries. I am new to Solr so I would
 like to know what is the best practice to handle such scenarios.
 Thanks and Regards,
 Rahul Bhooteshwar
 Enterprise Software Engineer
 HotWax Systems http://www.hotwaxsystems.com - The global leader in
 innovative enterprise commerce solutions powered by Apache OFBiz.
 ApacheCon US 2014 Silver Sponsor
 On Mon, Feb 23, 2015 at 5:42 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:
 Which means I have to change all my fields to stored=”true” if I want to

 use atomic update.Right?”




 Yes, and re-index all your data.




 Will it affect the performance of the Solr?”




 What type of queries are you doing now?


 —
 /Yago Riveiro

 On Mon, Feb 23, 2015 at 12:05 PM, Rahul Bhooteshwar
 rahul.bhootesh...@hotwaxsystems.com wrote:

  Hi,
  I have around 50 fields in my schema and having 20 fields are
 stored=”true”
  and rest of them stored=”false”
  In case partial update (atomic update), it is  mentioned at many places
  that the fields in schema should have stored=”true”. I have also tried
  atomic update on documents having fields with stored=false and
  indexed=true, and it didn't work (My whole document vanished from solr
 or
  I am unable to search it now, whatever.). Although I didn't change the
  existing value for the fields having stored=false.
  Which means I have to change all my fields to stored=”true” if I want to
  use atomic update.Right?
  Will it affect the performance of the Solr? if yes, then what is the best
  practice to reduce performance degradation as much as possible?Thanks in
  advance.
  Thanks and Regards,
  Rahul Bhooteshwar
  Enterprise Software Engineer
  HotWax Systems http://www.hotwaxsystems.com - The global leader in
  innovative enterprise commerce solutions powered by Apache OFBiz.
  ApacheCon US 2014 Silver Sponsor


Re: Atomic Update while having fields with attribute stored=true in schema

2015-02-23 Thread Yago Riveiro
Which means I have to change all my fields to stored=”true” if I want to

use atomic update.Right?”




Yes, and re-index all your data.




Will it affect the performance of the Solr?”




What type of queries are you doing now?


—
/Yago Riveiro

On Mon, Feb 23, 2015 at 12:05 PM, Rahul Bhooteshwar
rahul.bhootesh...@hotwaxsystems.com wrote:

 Hi,
 I have around 50 fields in my schema and having 20 fields are stored=”true”
 and rest of them stored=”false”
 In case partial update (atomic update), it is  mentioned at many places
 that the fields in schema should have stored=”true”. I have also tried
 atomic update on documents having fields with stored=false and
 indexed=true, and it didn't work (My whole document vanished from solr or
 I am unable to search it now, whatever.). Although I didn't change the
 existing value for the fields having stored=false.
 Which means I have to change all my fields to stored=”true” if I want to
 use atomic update.Right?
 Will it affect the performance of the Solr? if yes, then what is the best
 practice to reduce performance degradation as much as possible?Thanks in
 advance.
 Thanks and Regards,
 Rahul Bhooteshwar
 Enterprise Software Engineer
 HotWax Systems http://www.hotwaxsystems.com - The global leader in
 innovative enterprise commerce solutions powered by Apache OFBiz.
 ApacheCon US 2014 Silver Sponsor

Re: Solr 4.x to Solr 5 = org.noggit.JSONParser$ParseException

2015-02-23 Thread Alan Woodward
I think this means you've got an older version of noggit around.  You need 
version 0.6.

Alan Woodward
www.flax.co.uk


On 23 Feb 2015, at 13:00, Clemens Wyss DEV wrote:

 Just about to upgrade to Solr5. My UnitTests fail:
 13:50:41.178 [main] ERROR org.apache.solr.core.CoreContainer - Error creating 
 core [1-de_CH]: null
 java.lang.ExceptionInInitializerError: null
   at 
 org.apache.solr.core.SolrConfig.getConfigOverlay(SolrConfig.java:359) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.SolrConfig.getOverlay(SolrConfig.java:808) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at 
 org.apache.solr.core.SolrConfig.getSubstituteProperties(SolrConfig.java:798) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.Config.init(Config.java:152) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.Config.init(Config.java:92) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.SolrConfig.init(SolrConfig.java:180) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at 
 org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:158) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at 
 org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:80)
  ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at 
 org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:61) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:511) 
 [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488) 
 [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at 
 ch.mysign.search.solr.EmbeddedSolrMode.prepareCore(EmbeddedSolrMode.java:51) 
 [target/:na]
 ...
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
  [.cp/:na]
 Caused by: org.noggit.JSONParser$ParseException: Expected string: 
 char=u,position=2 BEFORE='{ u' AFTER='pdateHandler : { autoCo'
   at org.noggit.JSONParser.err(JSONParser.java:223) ~[noggit.jar:na]
   at org.noggit.JSONParser.nextEvent(JSONParser.java:671) ~[noggit.jar:na]
   at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:123) 
 ~[noggit.jar:na]
   at org.apache.solr.core.ConfigOverlay.clinit(ConfigOverlay.java:213) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   ... 56 common frames omitted
 
 Look like the exception occurs in the ConfigOverlay static block, line 213:
 editable_prop_map =  (Map)new ObjectBuilder(new JSONParser(new StringReader(
  MAPPING))).getObject();
 
 What is happening?



Re: CollationKeyFilterFactory stops suggestions and collations

2015-02-23 Thread Nitin Solanki
Hi all,
I have found to use UnicodeCollation. I need
*lucene-collation-2.9.1.jar.
*I am using solr 4.10.2. I have download lucene-collation-2.9.1.jar where I
have to store this or Is it already in-built in solr?
If it already in solr then why suggestions and collations are not coming?
Any help. Please?


On Mon, Feb 23, 2015 at 4:43 PM, Nitin Solanki nitinml...@gmail.com wrote:

 Hello all,
   I am working on collations. Somewhere in Solr, I found that
 UnicodeCollation will do searching fast. But after applying
 CollationKeyFilterFactory in schema.xml, it stops the suggestions and
 collations both. Please check the configurations and help me.

 *Schema.xml:*

 fieldType name=textSpell class=solr.TextField
 positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.CollationKeyFilterFactory language=
 strength=primary/
   /analyzer
   analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.CollationKeyFilterFactory language=
 strength=primary/
   /analyzer
 /fieldType


 Solrconfig.xml:

 requestHandler name=/spell class=solr.SearchHandler startup=lazy
 lst name=defaults
   str name=dfgram_ci/str
   !-- Solr will use suggestions from both the 'default' spellchecker
and from the 'wordbreak' spellchecker and combine them.
collations (re-written queries) can include a combination of
corrections from both spellcheckers --
   str name=spellcheck.dictionarydefault/str
   str name=spellcheckon/str
   str name=spellcheck.extendedResultstrue/str
   str name=spellcheck.count25/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.maxResultsForSuggest10/str
   str name=spellcheck.alternativeTermCount25/str
   str name=spellcheck.collatetrue/str
   str name=spellcheck.maxCollations100/str
   str name=spellcheck.maxCollationTries1000/str
   str name=spellcheck.collateExtendedResultstrue/str
 /lst
 arr name=last-components
   strspellcheck/str
   !--strsuggest/str--
   !--strquery/str--
 /arr
   /requestHandler



Solr 4.x to Solr 5 = org.noggit.JSONParser$ParseException

2015-02-23 Thread Clemens Wyss DEV
Just about to upgrade to Solr5. My UnitTests fail:
13:50:41.178 [main] ERROR org.apache.solr.core.CoreContainer - Error creating 
core [1-de_CH]: null
java.lang.ExceptionInInitializerError: null
at 
org.apache.solr.core.SolrConfig.getConfigOverlay(SolrConfig.java:359) 
~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at org.apache.solr.core.SolrConfig.getOverlay(SolrConfig.java:808) 
~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at 
org.apache.solr.core.SolrConfig.getSubstituteProperties(SolrConfig.java:798) 
~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at org.apache.solr.core.Config.init(Config.java:152) 
~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at org.apache.solr.core.Config.init(Config.java:92) 
~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at org.apache.solr.core.SolrConfig.init(SolrConfig.java:180) 
~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at 
org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:158) 
~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at 
org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:80)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at 
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:61) 
~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:511) 
[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488) 
[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
at 
ch.mysign.search.solr.EmbeddedSolrMode.prepareCore(EmbeddedSolrMode.java:51) 
[target/:na]
...
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
 [.cp/:na]
Caused by: org.noggit.JSONParser$ParseException: Expected string: 
char=u,position=2 BEFORE='{ u' AFTER='pdateHandler : { autoCo'
at org.noggit.JSONParser.err(JSONParser.java:223) ~[noggit.jar:na]
at org.noggit.JSONParser.nextEvent(JSONParser.java:671) ~[noggit.jar:na]
at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:123) 
~[noggit.jar:na]
at org.apache.solr.core.ConfigOverlay.clinit(ConfigOverlay.java:213) 
~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
... 56 common frames omitted

Look like the exception occurs in the ConfigOverlay static block, line 213:
editable_prop_map =  (Map)new ObjectBuilder(new JSONParser(new StringReader(
  MAPPING))).getObject();

What is happening?


Re: Atomic Update while having fields with attribute stored=true in schema

2015-02-23 Thread Rahul Bhooteshwar
Hi Yago Riveiro,
Thanks for your quick reply. I am using Solr for faceted search using *Solr**j.
*I am using facet queries and filter queries. I am new to Solr so I would
like to know what is the best practice to handle such scenarios.

Thanks and Regards,
Rahul Bhooteshwar
Enterprise Software Engineer
HotWax Systems http://www.hotwaxsystems.com - The global leader in
innovative enterprise commerce solutions powered by Apache OFBiz.
ApacheCon US 2014 Silver Sponsor

On Mon, Feb 23, 2015 at 5:42 PM, Yago Riveiro yago.rive...@gmail.com
wrote:

 Which means I have to change all my fields to stored=”true” if I want to

 use atomic update.Right?”




 Yes, and re-index all your data.




 Will it affect the performance of the Solr?”




 What type of queries are you doing now?


 —
 /Yago Riveiro

 On Mon, Feb 23, 2015 at 12:05 PM, Rahul Bhooteshwar
 rahul.bhootesh...@hotwaxsystems.com wrote:

  Hi,
  I have around 50 fields in my schema and having 20 fields are
 stored=”true”
  and rest of them stored=”false”
  In case partial update (atomic update), it is  mentioned at many places
  that the fields in schema should have stored=”true”. I have also tried
  atomic update on documents having fields with stored=false and
  indexed=true, and it didn't work (My whole document vanished from solr
 or
  I am unable to search it now, whatever.). Although I didn't change the
  existing value for the fields having stored=false.
  Which means I have to change all my fields to stored=”true” if I want to
  use atomic update.Right?
  Will it affect the performance of the Solr? if yes, then what is the best
  practice to reduce performance degradation as much as possible?Thanks in
  advance.
  Thanks and Regards,
  Rahul Bhooteshwar
  Enterprise Software Engineer
  HotWax Systems http://www.hotwaxsystems.com - The global leader in
  innovative enterprise commerce solutions powered by Apache OFBiz.
  ApacheCon US 2014 Silver Sponsor



Re: Solr 4.x to Solr 5 = org.noggit.JSONParser$ParseException

2015-02-23 Thread Noble Paul
This code is executed every time Solr is initialized and it is unlikely
that it is a bug.
Are you using an older version of noggit.jar by any chance?


On Mon, Feb 23, 2015 at 6:30 PM, Clemens Wyss DEV clemens...@mysign.ch
wrote:

 Just about to upgrade to Solr5. My UnitTests fail:
 13:50:41.178 [main] ERROR org.apache.solr.core.CoreContainer - Error
 creating core [1-de_CH]: null
 java.lang.ExceptionInInitializerError: null
 at
 org.apache.solr.core.SolrConfig.getConfigOverlay(SolrConfig.java:359)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at org.apache.solr.core.SolrConfig.getOverlay(SolrConfig.java:808)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at
 org.apache.solr.core.SolrConfig.getSubstituteProperties(SolrConfig.java:798)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at org.apache.solr.core.Config.init(Config.java:152)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at org.apache.solr.core.Config.init(Config.java:92)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at org.apache.solr.core.SolrConfig.init(SolrConfig.java:180)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at
 org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:158)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at
 org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:80)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at
 org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:61)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:511)
 [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
 [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 at
 ch.mysign.search.solr.EmbeddedSolrMode.prepareCore(EmbeddedSolrMode.java:51)
 [target/:na]
 ...
 at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
 [.cp/:na]
 Caused by: org.noggit.JSONParser$ParseException: Expected string:
 char=u,position=2 BEFORE='{ u' AFTER='pdateHandler : { autoCo'
 at org.noggit.JSONParser.err(JSONParser.java:223) ~[noggit.jar:na]
 at org.noggit.JSONParser.nextEvent(JSONParser.java:671)
 ~[noggit.jar:na]
 at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:123)
 ~[noggit.jar:na]
 at
 org.apache.solr.core.ConfigOverlay.clinit(ConfigOverlay.java:213)
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
 ... 56 common frames omitted

 Look like the exception occurs in the ConfigOverlay static block, line 213:
 editable_prop_map =  (Map)new ObjectBuilder(new JSONParser(new
 StringReader(
   MAPPING))).getObject();

 What is happening?




-- 
-
Noble Paul


Stop solr query

2015-02-23 Thread Moshe Recanati
Hi,
Recently there were some scenarios in which queries that user sent to solr got 
stuck and increased our solr heap.
Is there any option to kill or timeout query that wasn't returned from solr by 
external command?

Thank you,
Regards,
Moshe Recanati
SVP Engineering
Office + 972-73-2617564
Mobile  + 972-52-6194481
Skype:  recanati
[KMS2]http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121000184.html
More at:  www.kmslh.comhttp://www.kmslh.com/ | 
LinkedInhttp://www.linkedin.com/company/kms-lighthouse | 
FBhttps://www.facebook.com/pages/KMS-lighthouse/123774257810917




incorrect Java version reported in solr dashboard

2015-02-23 Thread SolrUser1543
I have upgraded Java version from 1.7 to 1.8 on Linux server. 
After the upgrade,  if I run  Java -version  I can see that it really 
changed to the new one. 

But when I run Solr, it is still reporting the old version in dashboard JVM
section.  

What could be the reason? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/incorrect-Java-version-reported-in-solr-dashboard-tp4188236.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: incorrect Java version reported in solr dashboard

2015-02-23 Thread Michael Della Bitta
You're probably launching Solr using the older version of Java somehow. You
should make sure your PATH and JAVA_HOME variables point at your Java 8
install from the point of view of the script or configuration that launches
Solr.

Hope that helps.

Michael Della Bitta

Senior Software Engineer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/

On Mon, Feb 23, 2015 at 9:19 AM, SolrUser1543 osta...@gmail.com wrote:

 I have upgraded Java version from 1.7 to 1.8 on Linux server.
 After the upgrade,  if I run  Java -version  I can see that it really
 changed to the new one.

 But when I run Solr, it is still reporting the old version in dashboard JVM
 section.

 What could be the reason?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/incorrect-Java-version-reported-in-solr-dashboard-tp4188236.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Used CollationKeyFilterFactory, Seems not to be working

2015-02-23 Thread Ahmet Arslan
Hi Nitin,


How can you pass empty value to the language attribute?
Is this intentional?

What is your intention to use that filter with suggestion functionality?

Ahmet

On Monday, February 23, 2015 5:03 PM, Nitin Solanki nitinml...@gmail.com 
wrote:



Hi,
  I have integrate CollationKeyFilterFactory in schema.xml and re-index
the data again.

*filter class=solr.CollationKeyFilterFactory language=
strength=primary/*

I need to use this becuase I want to build collations fast.
Referred link: http://wiki.apache.org/solr/UnicodeCollation

But it stops both suggestions and  collations. *Why?*

I have also test *CollationKeyFilterFactory *into solr admin inside
analysis. Inside that, CKF show some chinese language output.

*Please any help?*


Re: Stop solr query

2015-02-23 Thread Shawn Heisey
On 2/23/2015 7:23 AM, Moshe Recanati wrote:
 Recently there were some scenarios in which queries that user sent to
 solr got stuck and increased our solr heap.

 Is there any option to kill or timeout query that wasn't returned from
 solr by external command?


The best thing you can do is examine all user input and stop such
queries before they execute, especially if they are the kind of query
that will cause your heap to grow out of control.

The timeAllowed parameter can abort a query that takes too long in
certain phases of the query.  In recent months, Solr has been modified
so that timeAllowed will take effect during more query phases.  It is
not a perfect solution, but it can be better than nothing.

http://wiki.apache.org/solr/CommonQueryParameters#timeAllowed

Be aware that sometimes legitimate queries will be slow, and using
timeAllowed may cause those queries to fail.

Thanks,
Shawn



[ANNOUNCE] Luke 4.10.3 released

2015-02-23 Thread Dmitry Kan
Hello,

Luke 4.10.3 has been released. Download it here:

https://github.com/DmitryKey/luke/releases/tag/luke-4.10.3

The release has been tested against the solr-4.10.3 based index.

Issues fixed in this release: #13
https://github.com/DmitryKey/luke/pull/13
Apache License 2.0 abbreviation changed from ASL 2.0 to ALv2

Thanks to respective contributors!


P.S. waiting for lucene 5.0 artifacts to hit public maven repositories for
the next major release of luke.

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Used CollationKeyFilterFactory, Seems not to be working

2015-02-23 Thread Nitin Solanki
Hi,
  I have integrate CollationKeyFilterFactory in schema.xml and re-index
the data again.

*filter class=solr.CollationKeyFilterFactory language=
strength=primary/*

I need to use this becuase I want to build collations fast.
Referred link: http://wiki.apache.org/solr/UnicodeCollation

But it stops both suggestions and  collations. *Why?*

I have also test *CollationKeyFilterFactory *into solr admin inside
analysis. Inside that, CKF show some chinese language output.

*Please any help?*


AW: Solr 4.x to Solr 5 = org.noggit.JSONParser$ParseException

2015-02-23 Thread Clemens Wyss DEV
Bingo!  thx for the hint

-Ursprüngliche Nachricht-
Von: Alan Woodward [mailto:a...@flax.co.uk] 
Gesendet: Montag, 23. Februar 2015 15:00
An: solr-user@lucene.apache.org
Betreff: Re: Solr 4.x to Solr 5 = org.noggit.JSONParser$ParseException

I think this means you've got an older version of noggit around.  You need 
version 0.6.

Alan Woodward
www.flax.co.uk


On 23 Feb 2015, at 13:00, Clemens Wyss DEV wrote:

 Just about to upgrade to Solr5. My UnitTests fail:
 13:50:41.178 [main] ERROR org.apache.solr.core.CoreContainer - Error 
 creating core [1-de_CH]: null
 java.lang.ExceptionInInitializerError: null
   at 
 org.apache.solr.core.SolrConfig.getConfigOverlay(SolrConfig.java:359) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.SolrConfig.getOverlay(SolrConfig.java:808) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at 
 org.apache.solr.core.SolrConfig.getSubstituteProperties(SolrConfig.java:798) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.Config.init(Config.java:152) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.Config.init(Config.java:92) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.SolrConfig.init(SolrConfig.java:180) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at 
 org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:158) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at 
 org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:80)
  ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at 
 org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:61) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:511) 
 [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488) 
 [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   at 
 ch.mysign.search.solr.EmbeddedSolrMode.prepareCore(EmbeddedSolrMode.java:51) 
 [target/:na] ...
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
  [.cp/:na] Caused by: org.noggit.JSONParser$ParseException: Expected string: 
 char=u,position=2 BEFORE='{ u' AFTER='pdateHandler : { autoCo'
   at org.noggit.JSONParser.err(JSONParser.java:223) ~[noggit.jar:na]
   at org.noggit.JSONParser.nextEvent(JSONParser.java:671) ~[noggit.jar:na]
   at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:123) 
 ~[noggit.jar:na]
   at org.apache.solr.core.ConfigOverlay.clinit(ConfigOverlay.java:213) 
 ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10]
   ... 56 common frames omitted
 
 Look like the exception occurs in the ConfigOverlay static block, line 213:
 editable_prop_map =  (Map)new ObjectBuilder(new JSONParser(new StringReader(
  MAPPING))).getObject();
 
 What is happening?



Re: Strange search behaviour when upgrading to 4.10.3

2015-02-23 Thread Rishi Easwaran
Thanks Shawn.
Just ran the analysis between 4.6 and 4.10, there seems to be only difference 
between the outputs positionLength value is set in 4.10. Does that mean 
anything.

Version 4.10



SF





text

raw_bytes

start

end

positionLength

type

position









message

[6d 65 73 73 61 67 65]

0

7

1

ALNUM

1








 Version 4.6


 


SF





text

raw_bytes

type

start

end

position









message

[6d 65 73 73 61 67 65]

ALNUM

0

7

1







Thanks,
Rishi.


 

-Original Message-
From: Shawn Heisey apa...@elyograg.org
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Feb 20, 2015 6:51 pm
Subject: Re: Strange search behaviour when upgrading to 4.10.3


On 2/20/2015 4:24 PM, Rishi Easwaran wrote:
 Also, the tokenizer we use is very similar to the following.
 ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java
 ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex


 From the looks of it the text is being indexed as a single token and not 
broken across whitespace. 

I can't claim to know how analyzer code works.  I did manage to see the
code, but it doesn't mean much to me.

I would suggest using the analysis tab in the Solr admin interface.  On
that page, select the field or fieldType, set the verbose flag and
type the actual field contents into the index side of the page.  When
you click the Analyze Values button, it will show you what Solr does
with the input at index time.

Do you still have access to any machines (dev or otherwise) running the
old version with the custom component? If so, do the same things on the
analysis page for that version that you did on the new version, and see
whether it does something different.  If it does do something different,
then you will need to track down the problem in the code for your custom
analyzer.

Thanks,
Shawn


 


Is Solr best for did you mean functionality just like Google?

2015-02-23 Thread Nitin Solanki
Hello,
  I came in the worst condition. I want to do spell/query
correction functionality. I have 49 GB indexed data where I have applied
spellchecker. I want to do same as Google - *did you mean*.
*Example* - If any user types any question/query which might be misspell or
wrong typed. I need to give them suggestion like Did you mean.
Is Solr best for it?


Warm Regards,
Nitin Solanki


Re: Collations are not working fine.

2015-02-23 Thread Nitin Solanki
Hi Charles,
 How you patch the suggester to get frequency information in
the spellcheck response?
It's very good. I also want to do that?


On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:

 I have been working with collations the last couple days and I kept adding
 the collation-related parameters until it started working for me.   It
 seems I needed str name=spellcheck.collateMaxCollectDocs50/str.

 But, I am using the Suggester with the WFSTLookupFactory.

 Also, I needed to patch the suggester to get frequency information in the
 spellcheck response.

 -Original Message-
 From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com]
 Sent: Friday, February 13, 2015 3:48 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Collations are not working fine.

 Hi Nitin,

 Can u try with the below config, we have these config seems to be working
 for us.

 searchComponent name=spellcheck class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypetext_general/str


   lst name=spellchecker
 str name=namewordbreak/str
 str name=classnamesolr.WordBreakSolrSpellChecker/str
 str name=fieldtextSpell/str
 str name=combineWordstrue/str
 str name=breakWordsfalse/str
 int name=maxChanges5/int
   /lst

lst name=spellchecker
 str name=namedefault/str
 str name=fieldtextSpell/str
 str name=classnamesolr.IndexBasedSpellChecker/str
 str name=spellcheckIndexDir./spellchecker/str
 str name=accuracy0.75/str
 float name=thresholdTokenFrequency0.01/float
 str name=buildOnCommittrue/str
 str name=spellcheck.maxResultsForSuggest5/str
  /lst


   /searchComponent



 str name=spellchecktrue/str
 str name=spellcheck.dictionarydefault/str
 str name=spellcheck.dictionarywordbreak/str
 int name=spellcheck.count5/int
 str name=spellcheck.alternativeTermCount15/str
 str name=spellcheck.collatetrue/str
 str name=spellcheck.onlyMorePopularfalse/str
 str name=spellcheck.extendedResultstrue/str
 str name =spellcheck.maxCollations100/str
 str name=spellcheck.collateParam.mm100%/str
 str name=spellcheck.collateParam.q.opAND/str
 str name=spellcheck.maxCollationTries1000/str


 *Rajesh.*

 On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com
 
 wrote:

  Nitin,
 
  Can you post the full spellcheck response when you query:
 
  q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell
 
  James Dyer
  Ingram Content Group
 
 
  -Original Message-
  From: Nitin Solanki [mailto:nitinml...@gmail.com]
  Sent: Friday, February 13, 2015 1:05 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Collations are not working fine.
 
  Hi James Dyer,
I did the same as you told me. Used
  WordBreakSolrSpellChecker instead of shingles. But still collations
  are not coming or working.
  For instance, I tried to get collation of gone with the wind by
  searching gone wthh thes wint on field=gram_ci but didn't succeed.
  Even, I am getting the suggestions of wtth as *with*, thes as *the*,
 wint as *wind*.
  Also I have documents which contains gone with the wind having 167
  times in the documents. I don't know that I am missing something or not.
  Please check my below solr configuration:
 
  *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes
  wintwt=jsonindent=trueshards.qt=/spell
 
  *solrconfig.xml:*
 
  searchComponent name=spellcheck class=solr.SpellCheckComponent
  str name=queryAnalyzerFieldTypetextSpellCi/str
  lst name=spellchecker
str name=namedefault/str
str name=fieldgram_ci/str
str name=classnamesolr.DirectSolrSpellChecker/str
str name=distanceMeasureinternal/str
float name=accuracy0.5/float
int name=maxEdits2/int
int name=minPrefix0/int
int name=maxInspections5/int
int name=minQueryLength2/int
float name=maxQueryFrequency0.9/float
str name=comparatorClassfreq/str
  /lst
  lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=fieldgram/str
str name=combineWordstrue/str
str name=breakWordstrue/str
int name=maxChanges5/int
  /lst
  /searchComponent
 
  requestHandler name=/spell class=solr.SearchHandler startup=lazy
  lst name=defaults
str name=dfgram_ci/str
str name=spellcheck.dictionarydefault/str
str name=spellcheckon/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.count25/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.maxResultsForSuggest1/str
str name=spellcheck.alternativeTermCount25/str
str name=spellcheck.collatetrue/str
str name=spellcheck.maxCollations50/str
str name=spellcheck.maxCollationTries50/str
str name=spellcheck.collateExtendedResultstrue/str
  /lst
  arr name=last-components
strspellcheck/str
  /arr

Re: syntax for increasing java memory

2015-02-23 Thread Walter Underwood
That depends on the JVM you are using. For the Oracle JVMs, use this to get a 
list of extended options:

java -X

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Feb 23, 2015, at 8:21 AM, Kevin Laurie superinterstel...@gmail.com wrote:

 Hi Guys,
 I am a newbie on Solr and I am just using it for dovecot sake.
 Could you help advise the correct syntax to increase java heap size using
 the  -xmx option(or advise some easy-to-read literature for configuring) ?
 Much appreciate if you could help. I just need this to sort out the problem
 with my Dovecot FTS.
 Thanks
 Kevin



Re: syntax for increasing java memory

2015-02-23 Thread Kevin Laurie
Hi Walter
Got it.
java -Xmx1024m -jar start.jar
Thanks
Kevin

On Tue, Feb 24, 2015 at 1:00 AM, Kevin Laurie superinterstel...@gmail.com
wrote:

 Hi Walter,

 I am running :-
 Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_65 24.65-b04)

 I tried running with this command:-

 java -jar start.jar -Xmx1024m
 WARNING: System properties and/or JVM args set.  Consider using --dry-run
 or --exec
 0[main] INFO  org.eclipse.jetty.server.Server  ? jetty-8.1.10.v20130312
 61   [main] INFO  org.eclipse.jetty.deploy.providers.ScanningAppProvider
 ? Deployment monitor /opt/solr/contexts at interval 0

 Still getting 500m.

 Any advise? Will check java -X out.


 On Tue, Feb 24, 2015 at 12:49 AM, Walter Underwood wun...@wunderwood.org
 wrote:

 That depends on the JVM you are using. For the Oracle JVMs, use this to
 get a list of extended options:

 java -X

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)


 On Feb 23, 2015, at 8:21 AM, Kevin Laurie superinterstel...@gmail.com
 wrote:

  Hi Guys,
  I am a newbie on Solr and I am just using it for dovecot sake.
  Could you help advise the correct syntax to increase java heap size
 using
  the  -xmx option(or advise some easy-to-read literature for
 configuring) ?
  Much appreciate if you could help. I just need this to sort out the
 problem
  with my Dovecot FTS.
  Thanks
  Kevin





Re: Used CollationKeyFilterFactory, Seems not to be working

2015-02-23 Thread Ahmet Arslan
Hi Nitin,

I think that token filter factory has nothing to do with 
collations in spellchecker domain. Single term from different domains causing 
confusion.


solr.CollationKeyFilterFactory targets mainly for locale sensitive sorting.
For example, I used below type to fix sorting problem of Turkish strings.

fieldType name=collatedTURKISH class=solr.CollationField language=tr/

Ahmet

 




On Monday, February 23, 2015 6:18 PM, Nitin Solanki nitinml...@gmail.com 
wrote:
Hi Ahmet,
 language= means that  it is used for any language -
simply define the language as the empty string for most languages

*Intention:* I am working on spell/question correction. Just like google, I
want to do same as did you mean.

Using spellchecker, I got suggestions and collations both. But collations
are not coming as I expected. Reason is that
spellcheck.maxCollationTries, If I set the value
spellcheck.maxCollationTries=10 then it gives nearby 10 results.
Sometimes, expected collation doesn't come inside 10 collations. So, I
increased the value to 16000 and results come but it takes around 15 sec.
on 49GB indexed data. It is worst case. So, somewhere in Solr, I found
*unicodeCollation* and it says that build collations fast.
Is it fast? Or Am I doing something wrong in collations?


On Mon, Feb 23, 2015 at 9:12 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Nitin,


 How can you pass empty value to the language attribute?
 Is this intentional?

 What is your intention to use that filter with suggestion functionality?

 Ahmet

 On Monday, February 23, 2015 5:03 PM, Nitin Solanki nitinml...@gmail.com
 wrote:



 Hi,
   I have integrate CollationKeyFilterFactory in schema.xml and re-index
 the data again.

 *filter class=solr.CollationKeyFilterFactory language=
 strength=primary/*

 I need to use this becuase I want to build collations fast.
 Referred link: http://wiki.apache.org/solr/UnicodeCollation

 But it stops both suggestions and  collations. *Why?*

 I have also test *CollationKeyFilterFactory *into solr admin inside
 analysis. Inside that, CKF show some chinese language output.

 *Please any help?*



Re: Collations are not working fine.

2015-02-23 Thread Rajesh Hazari
Hi,

we have used spellcheck component the below configs to get a best collation
(exact collation) when a query has either single term or multiple terms.

As charles, mentioned above we do have a check for getOriginalFrequency()
for each term in our service before we send spellcheck response to client,
this may not be the case for you, hope this helps

request-handler name=/select class=solr.SearchHandler
!-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
lst name=defaults
str name=echoParamsexplicit/str
int name=rows100/int
str name=dftextSpell/str
 str name=spellchecktrue/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
int name=spellcheck.count5/int
* str name=spellcheck.alternativeTermCount15/str *
* str name=spellcheck.collatetrue/str*
* str name=spellcheck.onlyMorePopularfalse/str*
* str name=spellcheck.extendedResultstrue/str*
* str name =spellcheck.maxCollations100/str*
* str name=spellcheck.collateParam.mm
http://spellcheck.collateParam.mm100%/str*
* str name=spellcheck.collateParam.q.opAND/str*
* str name=spellcheck.maxCollationTries1000/str*
str name=q.opOR/str
.
.
..   /lst /request-handler
.
.
.

searchComponent name=spellcheck class=solr.SpellCheckComponent

 lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=fieldtextSpell/str
str name=combineWordstrue/str
str name=breakWordsfalse/str
int name=maxChanges5/int
  /lst

   lst name=spellchecker
str name=namedefault/str
str name=fieldtextSpell/str
str name=classnamesolr.IndexBasedSpellChecker/str
!-- str name=classnamesolr.DirectSolrSpellChecker/str --
str name=spellcheckIndexDir./spellchecker/str
!-- str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str--
str name=accuracy0.75/str
float name=thresholdTokenFrequency0.01/float
str name=buildOnCommittrue/str
str name=spellcheck.maxResultsForSuggest5/str
 /lst


  /searchComponent



*Rajesh**.*

On Fri, Feb 20, 2015 at 8:42 AM, Nitin Solanki nitinml...@gmail.com wrote:

 How to get only the best collations whose hits are more and need to sort
 them?

 On Wed, Feb 18, 2015 at 3:53 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Hi Nitin,
 
  I was trying many different options for a couple different queries.   In
  fact, I have collations working ok now with the Suggester and WFSTLookup.
   The problem may have been due to a different dictionary and/or lookup
  implementation and the specific options I was sending.
 
  In general, we're using spellcheck for search suggestions.   The
 Suggester
  component (vs. Suggester spellcheck implementation), doesn't handle all
 of
  our cases.  But we can get things working using the spellcheck interface.
  What gives us particular troubles are the cases where a term may be valid
  by itself, but also be the start of longer words.
 
  The specific terms are acronyms specific to our business.   But I'll
  attempt to show generic examples.
 
  E.g. a partial term like fo can expand to fox, fog, etc. and a full
 term
  like brown can also expand to something like brownstone.   And, yes, the
  collation brownstone fox is nonsense.  But assume, for the sake of
  argument, it appears in our documents somewhere.
 
  For multiple term query with a spelling error (or partially typed term):
  brown fo
 
  We get collations in order of hits, descending like ...
  brown fox,
  brown fog,
  brownstone fox.
 
  So far, so good.
 
  For a single term query, brown, we get a single suggestion, brownstone
 and
  no collations.
 
  So, we don't know to keep the term brown!
 
  At this point, we need spellcheck.extendedResults=true and look at the
  origFreq value in the suggested corrections.  Unfortunately, the
 Suggester
  (spellcheck dictionary) does not populate the original frequency
  information.  And, without this information, the SpellCheckComponent
 cannot
  format the extended results.
 
  However, with a simple change to Suggester.java, it was easy to get the
  needed frequency information use it to make a sound decision to keep or
  drop the input term.   But I'd be much obliged if there is a better way
 to
  go about it.
 
  Configs below.
 
  Thanks,
  Charlie
 
  !-- SpellCheck component --
searchComponent class=solr.SpellCheckComponent name=suggestSC
  lst name=spellchecker
str name=namesuggestDictionary/str
str
  name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
 
 name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str
str name=fieldtext_all/str
float name=threshold0.0001/float
str name=exactMatchFirsttrue/str
str name=buildOnCommittrue/str
  /lst
/searchComponent
 
  !-- Request Handler --
  requestHandler name=/tcSuggest class=solr.SearchHandler
lst name=defaults
  str name=titleSearch 

syntax for increasing java memory

2015-02-23 Thread Kevin Laurie
Hi Guys,
 I am a newbie on Solr and I am just using it for dovecot sake.
Could you help advise the correct syntax to increase java heap size using
the  -xmx option(or advise some easy-to-read literature for configuring) ?
Much appreciate if you could help. I just need this to sort out the problem
with my Dovecot FTS.
Thanks
Kevin


Re: highlighting the boolean query

2015-02-23 Thread Dmitry Kan
Erick,

nope, we are using std lucene qparser with some customizations, that do not
affect the boolean query parsing logic.

Should we try some other highlighter?

On Mon, Feb 23, 2015 at 6:57 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Are you using edismax?

 On Mon, Feb 23, 2015 at 3:28 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Hello!
 
  In solr 4.3.1 there seem to be some inconsistency with the highlighting
 of
  the boolean query:
 
  a OR (b c) OR d
 
  This returns a proper hit, which shows that only d was included into the
  document score calculation.
 
  But the highlighter returns both d and c in em tags.
 
  Is this a known issue of the standard highlighter? Can it be mitigated?
 
 
  --
  Dmitry Kan
  Luke Toolbox: http://github.com/DmitryKey/luke
  Blog: http://dmitrykan.blogspot.com
  Twitter: http://twitter.com/dmitrykan
  SemanticAnalyzer: www.semanticanalyzer.info




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: highlighting the boolean query

2015-02-23 Thread Erick Erickson
Are you using edismax?

On Mon, Feb 23, 2015 at 3:28 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Hello!

 In solr 4.3.1 there seem to be some inconsistency with the highlighting of
 the boolean query:

 a OR (b c) OR d

 This returns a proper hit, which shows that only d was included into the
 document score calculation.

 But the highlighter returns both d and c in em tags.

 Is this a known issue of the standard highlighter? Can it be mitigated?


 --
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info


Re: Used CollationKeyFilterFactory, Seems not to be working

2015-02-23 Thread Nitin Solanki
Hi Ahmet,
 language= means that  it is used for any language -
simply define the language as the empty string for most languages

*Intention:* I am working on spell/question correction. Just like google, I
want to do same as did you mean.

Using spellchecker, I got suggestions and collations both. But collations
are not coming as I expected. Reason is that
spellcheck.maxCollationTries, If I set the value
spellcheck.maxCollationTries=10 then it gives nearby 10 results.
Sometimes, expected collation doesn't come inside 10 collations. So, I
increased the value to 16000 and results come but it takes around 15 sec.
on 49GB indexed data. It is worst case. So, somewhere in Solr, I found
*unicodeCollation* and it says that build collations fast.
Is it fast? Or Am I doing something wrong in collations?

On Mon, Feb 23, 2015 at 9:12 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Nitin,


 How can you pass empty value to the language attribute?
 Is this intentional?

 What is your intention to use that filter with suggestion functionality?

 Ahmet

 On Monday, February 23, 2015 5:03 PM, Nitin Solanki nitinml...@gmail.com
 wrote:



 Hi,
   I have integrate CollationKeyFilterFactory in schema.xml and re-index
 the data again.

 *filter class=solr.CollationKeyFilterFactory language=
 strength=primary/*

 I need to use this becuase I want to build collations fast.
 Referred link: http://wiki.apache.org/solr/UnicodeCollation

 But it stops both suggestions and  collations. *Why?*

 I have also test *CollationKeyFilterFactory *into solr admin inside
 analysis. Inside that, CKF show some chinese language output.

 *Please any help?*



Optimize maxSegments=2 not working right with Solr 4.10.2

2015-02-23 Thread Tom Burton-West
Hello,

We normally run an optimize with maxSegments=2  after our daily indexing.
This has worked without problem on Solr 3.6.  We recently moved to Solr
4.10.2 and on several shards the optimize completed with no errors in the
logs, but left more than 2 segments.

We send this xml to Solr
optimize maxSegments=2/

I've attached a copy of the indexwriter log for one of the segments where
there were 4 segments rather than the requested number (i.e. there should
have been only 2 segments) at the end of the optimize.It looks like a
merge was done down to two segments and then somehow another process
flushed some postings to disk creating two more segments.  Then there are
messages about 2 of the remaining 4 segments being too big. (See below)

What we expected is that the remainng 2 small segments (about 40MB) would
get merged with the smaller of the two large segments, i.e. with the 56GB
segment, since we gave the argument maxSegments=2.   This didn't happen.


Any suggestions about how to troubleshoot this issue would be appreciated.

Tom

---
Excerpt from indexwriter log:

TMP][http-8091-Processor5]: findForcedMerges maxSegmentCount=2  ...
...
[IW][Lucene Merge Thread #0]: merge time 3842310 msec for 65236 docs
...
[TMP][http-8091-Processor5]: findMerges: 4 segments
 [TMP][http-8091-Processor5]:   seg=_1fzb(4.10.2):C1081559/24089:delGen=9
size=672402.066 MB [skip: too large]
 [TMP][http-8091-Processor5]:   seg=_1gj2(4.10.2):C65236/2:delGen=1
size=56179.245 MB [skip: too large]
 [TMP][http-8091-Processor5]:   seg=_1gj0(4.10.2):C16 size=44.280 MB
 [TMP][http-8091-Processor5]:   seg=_1gj1(4.10.2):C8 size=40.442 MB
 [TMP][http-8091-Processor5]:   allowedSegmentCount=3 vs count=4 (eligible
count=2) tooBigCount=2


build-1.iw.2015-02-23.txt.gz
Description: GNU Zip compressed data


Re: syntax for increasing java memory

2015-02-23 Thread Kevin Laurie
Hi Walter,

I am running :-
Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_65 24.65-b04)

I tried running with this command:-

java -jar start.jar -Xmx1024m
WARNING: System properties and/or JVM args set.  Consider using --dry-run
or --exec
0[main] INFO  org.eclipse.jetty.server.Server  ? jetty-8.1.10.v20130312
61   [main] INFO  org.eclipse.jetty.deploy.providers.ScanningAppProvider  ?
Deployment monitor /opt/solr/contexts at interval 0

Still getting 500m.

Any advise? Will check java -X out.


On Tue, Feb 24, 2015 at 12:49 AM, Walter Underwood wun...@wunderwood.org
wrote:

 That depends on the JVM you are using. For the Oracle JVMs, use this to
 get a list of extended options:

 java -X

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)


 On Feb 23, 2015, at 8:21 AM, Kevin Laurie superinterstel...@gmail.com
 wrote:

  Hi Guys,
  I am a newbie on Solr and I am just using it for dovecot sake.
  Could you help advise the correct syntax to increase java heap size using
  the  -xmx option(or advise some easy-to-read literature for configuring)
 ?
  Much appreciate if you could help. I just need this to sort out the
 problem
  with my Dovecot FTS.
  Thanks
  Kevin




RE: Collations are not working fine.

2015-02-23 Thread Reitzel, Charles
I filed issue SOLR-7144 with the patch attached.   It's probably best to get 
some feedback from developers.  It may not be the right approach, etc.

Also, spellcheck.maxCollationTries  0 is the parameter needed to get collation 
results that respect the current filter queries, etc.

Set spellcheck.maxCollations  1 to get multiple collation results.   However, 
if the original query has only a single term, there will be no collation 
results.   Thus, for single term queries, you need to look at the original 
frequency information to determine if the original term is valid or not.   
There may be spellcheck suggestions even for terms with origFreq  0.

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Monday, February 23, 2015 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi Charles,
 How you patch the suggester to get frequency information in the 
spellcheck response?
It's very good. I also want to do that?


On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles  
charles.reit...@tiaa-cref.org wrote:

 I have been working with collations the last couple days and I kept adding
 the collation-related parameters until it started working for me.   It
 seems I needed str name=spellcheck.collateMaxCollectDocs50/str.

 But, I am using the Suggester with the WFSTLookupFactory.

 Also, I needed to patch the suggester to get frequency information in 
 the spellcheck response.

 -Original Message-
 From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com]
 Sent: Friday, February 13, 2015 3:48 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Collations are not working fine.

 Hi Nitin,

 Can u try with the below config, we have these config seems to be 
 working for us.

 searchComponent name=spellcheck class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypetext_general/str


   lst name=spellchecker
 str name=namewordbreak/str
 str name=classnamesolr.WordBreakSolrSpellChecker/str
 str name=fieldtextSpell/str
 str name=combineWordstrue/str
 str name=breakWordsfalse/str
 int name=maxChanges5/int
   /lst

lst name=spellchecker
 str name=namedefault/str
 str name=fieldtextSpell/str
 str name=classnamesolr.IndexBasedSpellChecker/str
 str name=spellcheckIndexDir./spellchecker/str
 str name=accuracy0.75/str
 float name=thresholdTokenFrequency0.01/float
 str name=buildOnCommittrue/str
 str name=spellcheck.maxResultsForSuggest5/str
  /lst


   /searchComponent



 str name=spellchecktrue/str
 str name=spellcheck.dictionarydefault/str
 str name=spellcheck.dictionarywordbreak/str
 int name=spellcheck.count5/int
 str name=spellcheck.alternativeTermCount15/str
 str name=spellcheck.collatetrue/str
 str name=spellcheck.onlyMorePopularfalse/str
 str name=spellcheck.extendedResultstrue/str
 str name =spellcheck.maxCollations100/str
 str name=spellcheck.collateParam.mm100%/str
 str name=spellcheck.collateParam.q.opAND/str
 str name=spellcheck.maxCollationTries1000/str


 *Rajesh.*

 On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James 
 james.d...@ingramcontent.com
 
 wrote:

  Nitin,
 
  Can you post the full spellcheck response when you query:
 
  q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell
 
  James Dyer
  Ingram Content Group
 
 
  -Original Message-
  From: Nitin Solanki [mailto:nitinml...@gmail.com]
  Sent: Friday, February 13, 2015 1:05 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Collations are not working fine.
 
  Hi James Dyer,
I did the same as you told me. Used 
  WordBreakSolrSpellChecker instead of shingles. But still collations 
  are not coming or working.
  For instance, I tried to get collation of gone with the wind by 
  searching gone wthh thes wint on field=gram_ci but didn't succeed.
  Even, I am getting the suggestions of wtth as *with*, thes as *the*,
 wint as *wind*.
  Also I have documents which contains gone with the wind having 167 
  times in the documents. I don't know that I am missing something or not.
  Please check my below solr configuration:
 
  *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes 
  wintwt=jsonindent=trueshards.qt=/spell
 
  *solrconfig.xml:*
 
  searchComponent name=spellcheck class=solr.SpellCheckComponent
  str name=queryAnalyzerFieldTypetextSpellCi/str
  lst name=spellchecker
str name=namedefault/str
str name=fieldgram_ci/str
str name=classnamesolr.DirectSolrSpellChecker/str
str name=distanceMeasureinternal/str
float name=accuracy0.5/float
int name=maxEdits2/int
int name=minPrefix0/int
int name=maxInspections5/int
int name=minQueryLength2/int
float name=maxQueryFrequency0.9/float
str name=comparatorClassfreq/str
  /lst
  lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=fieldgram/str
str 

Re: Suggestion on distinct/ group by for a field ?

2015-02-23 Thread Erick Erickson
Maybe pivot facets will do what you need? See:

https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Pivot(DecisionTree)Faceting

Best,
Erick

On Mon, Feb 23, 2015 at 11:31 AM, Vishal Swaroop vishal@gmail.com wrote:
 Please suggest on how to get the distinct count for a field (name).

 Summary : I have data indexed in the following format
 category name value
 Cat1 A 1
 Cat1 A 2
 Cat1 B 3
 Cat1 B 4

 I tried getting the distinct name count... but it returns 4 records
 instaed of 2 (i.e. A, B)
 http://localhost:8081/solr/core_test/select?q=category:Cat1fl=category,namewt=jsonindent=truefacet.mincount=1facet=true

 In Oracle I can easily perform the distinct count using groop-by
 select c.cat, count(*distinct *i.name) from category c, itemname i, value v
 where v.item_id = i.id and i.cat_id = c.id and c.cat ='Cat1' *group by
 c.cat http://c.cat*
 Result:
 Cat1 2

 Thanks


Basic Multilingual search capability

2015-02-23 Thread Rishi Easwaran
Hi All,

For our use case we don't really need to do a lot of manipulation of incoming 
text during index time. At most removal of common stop words, tokenize emails/ 
filenames etc if possible. We get text documents from our end users, which can 
be in any language (sometimes combination) and we cannot determine the language 
of the incoming text. Language detection at index time is not necessary.

Which analyzer is recommended to achive basic multilingual search capability 
for a use case like this.
I have read a bunch of posts about using a combination standardtokenizer or 
ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking for 
ideas, suggestions, best practices.

http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236
http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923
https://issues.apache.org/jira/browse/SOLR-6492  

 
Thanks,
Rishi.
 


Re: highlighting the boolean query

2015-02-23 Thread Erick Erickson
Highlighting is such a pain...

what does the parsed query look like? If the default operator is OR,
then this seems correct as both 'd' and 'c' appear in the doc. So
I'm a bit puzzled by your statement that c didn't contribute to the score.

If the parsed query is, indeed
a +b +c d

then it does look like something with the highlighter. Whether other
highlighters are better for this case.. no clue ;(

Best,
Erick

On Mon, Feb 23, 2015 at 9:36 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Erick,

 nope, we are using std lucene qparser with some customizations, that do not
 affect the boolean query parsing logic.

 Should we try some other highlighter?

 On Mon, Feb 23, 2015 at 6:57 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Are you using edismax?

 On Mon, Feb 23, 2015 at 3:28 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Hello!
 
  In solr 4.3.1 there seem to be some inconsistency with the highlighting
 of
  the boolean query:
 
  a OR (b c) OR d
 
  This returns a proper hit, which shows that only d was included into the
  document score calculation.
 
  But the highlighter returns both d and c in em tags.
 
  Is this a known issue of the standard highlighter? Can it be mitigated?
 
 
  --
  Dmitry Kan
  Luke Toolbox: http://github.com/DmitryKey/luke
  Blog: http://dmitrykan.blogspot.com
  Twitter: http://twitter.com/dmitrykan
  SemanticAnalyzer: www.semanticanalyzer.info




 --
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info


Suggestion on distinct/ group by for a field ?

2015-02-23 Thread Vishal Swaroop
Please suggest on how to get the distinct count for a field (name).

Summary : I have data indexed in the following format
category name value
Cat1 A 1
Cat1 A 2
Cat1 B 3
Cat1 B 4

I tried getting the distinct name count... but it returns 4 records
instaed of 2 (i.e. A, B)
http://localhost:8081/solr/core_test/select?q=category:Cat1fl=category,namewt=jsonindent=truefacet.mincount=1facet=true

In Oracle I can easily perform the distinct count using groop-by
select c.cat, count(*distinct *i.name) from category c, itemname i, value v
where v.item_id = i.id and i.cat_id = c.id and c.cat ='Cat1' *group by
c.cat http://c.cat*
Result:
Cat1 2

Thanks


SolrCloud 4.10.3 Security

2015-02-23 Thread mihaela olteanu
Hello,
Does anyone know why the Basic authentication was not yet released for 
SolrCloud as described on the wiki page: 
https://wiki.apache.org/solr/SolrSecurity? Is there any plan in the near future 
for closing this issue: https://issues.apache.org/jira/browse/SOLR-4470 ?
Isn't already a very basic implementation that can be released?
Thanks a lot!Mihaela

 

more like this and term vectors

2015-02-23 Thread Scott C. Cote
Is there a way to configure the more like this query handler and also receive 
the corresponding term vectors? (tf-idf) ?

I tried by creating a “search component” for the term vectors and adding it to 
the mlt handler, but that did not work.

Here is what I tried:

 searchComponent name=tvComponent 
class=org.apache.solr.handler.component.TermVectorComponent”/

   requestHandler name=/mlt class=solr.MoreLikeThisHandler
lst name=defaults
  str name=mlt.flfilteredText/str
  str name=mlt.mintf1/str
  str name=mlt.mindf1/str
  str name=mlt.interestingTermslist/str
  bool name=tvtrue/bool
/lst 
arr name=last-components
  strtvComponent/str
/arr
   /requestHandler

Now I realize that I could turn on the debug parameter but that does not 
contain the all of the tf/idf (at least not like the tv component provides)

Thanks,

SCott

Re: more like this and term vectors

2015-02-23 Thread Jack Krupansky
It's never helpful when you merely say that it did not work - detail the
symptom, please.

Post both the query and the response. As well as the field and type
definitions for the fields for which you expected term vectors - no term
vectors are enabled by default.

-- Jack Krupansky

On Mon, Feb 23, 2015 at 2:48 PM, Scott C. Cote scottcc...@yahoo.com.invalid
 wrote:

 Is there a way to configure the more like this query handler and also
 receive the corresponding term vectors? (tf-idf) ?

 I tried by creating a “search component” for the term vectors and adding
 it to the mlt handler, but that did not work.

 Here is what I tried:

  searchComponent name=tvComponent
 class=org.apache.solr.handler.component.TermVectorComponent”/

requestHandler name=/mlt class=solr.MoreLikeThisHandler
 lst name=defaults
   str name=mlt.flfilteredText/str
   str name=mlt.mintf1/str
   str name=mlt.mindf1/str
   str name=mlt.interestingTermslist/str
   bool name=tvtrue/bool
 /lst
 arr name=last-components
   strtvComponent/str
 /arr
/requestHandler

 Now I realize that I could turn on the debug parameter but that does not
 contain the all of the tf/idf (at least not like the tv component provides)

 Thanks,

 SCott


Re: Basic Multilingual search capability

2015-02-23 Thread Alexandre Rafalovitch
Which languages are you expecting to deal with? Multilingual support
is a complex issue. Even if you think you don't need much, it is
usually a lot more complex than expected, especially around relevancy.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 February 2015 at 16:19, Rishi Easwaran rishi.easwa...@aol.com wrote:
 Hi All,

 For our use case we don't really need to do a lot of manipulation of incoming 
 text during index time. At most removal of common stop words, tokenize 
 emails/ filenames etc if possible. We get text documents from our end users, 
 which can be in any language (sometimes combination) and we cannot determine 
 the language of the incoming text. Language detection at index time is not 
 necessary.

 Which analyzer is recommended to achive basic multilingual search capability 
 for a use case like this.
 I have read a bunch of posts about using a combination standardtokenizer or 
 ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking 
 for ideas, suggestions, best practices.

 http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236
 http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923
 https://issues.apache.org/jira/browse/SOLR-6492


 Thanks,
 Rishi.



Error instantiating class: 'org.apache.lucene.collation.CollationKeyFilterFactory'

2015-02-23 Thread Nitin Solanki
Hi,
   I am using Collation Key Filter. After adding it into schema.xml.

*Schema.xml*
field name=gram type=textSpell indexed=true stored=true
required=true multiValued=false/

/fieldTypefieldType name=textSpell class=solr.TextField
positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.CollationKeyFilterFactory language=
strength=primary/
   /analyzer
   analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.CollationKeyFilterFactory language=
strength=primary/
   /analyzer
/fieldType


*  It throws errror...*

Problem accessing /solr/. Reason:

{msg=SolrCore 'collection1' is not available due to init failure:
Could not load conf for core collection1: Plugin init failure for
[schema.xml] fieldType textSpell: Plugin init failure for
[schema.xml] analyzer/filter: Error instantiating class:
'org.apache.lucene.collation.CollationKeyFilterFactory'. Schema file
is /configs/myconf/schema.xml,trace=org.apache.solr.common.SolrException:
SolrCore 'collection1' is not available due to init failure: Could not
load conf for core collection1: Plugin init failure for [schema.xml]
fieldType textSpell: Plugin init failure for [schema.xml]
analyzer/filter: Error instantiating class:
'org.apache.lucene.collation.CollationKeyFilterFactory'. Schema file
is /configs/myconf/schema.xml
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:745)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


Geo Aggregations and Search Alerts in Solr

2015-02-23 Thread Richard Gibbs
Hi There,

I am in the process of choosing a search technology for one of my projects
and I was looking into Solr and Elasticsearch.

Two features that I am more interested are geo aggregations (for map
clustering) and search alerts. Elasticsearch seem to have these two
features built-in.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/geo-aggs.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html

I couldn't find relevant documentation for Solr and therefore not sure
whether these features are readily available in Solr. Can you please let me
know whether these features are available in Solr? If not, whether there
are solutions to achieve same with Solr.

Thank you.


Query: no result returned if use AND OR operators

2015-02-23 Thread arthur.hk.c...@gmail.com
Hi,

My Solr is 4.10.2

When I use the web UI to run a simple query: 1+AND+2


1) from the log, I can see the hits=8
7629109 [qtp1702388274-16] INFO  org.apache.solr.core.SolrCore  – [infocast] 
webapp=/solr path=/clustering 
params={q=1+AND+2wt=velocityv.template=cluster_results} hits=8 
status=0 QTime=21 

However, from the query page, it returns
2) 0 results found in 5 ms Page 0 of 0
  0 results found. Page 0 of 0


3) If I use Admin page to ruyn the query, I can get 3 back

{
  responseHeader: {
status: 0,
QTime: 5,
params: {
  indent: true,
  q: \1\ AND \2\,
  _: 1424761089223,
  wt: json
}
  },
  response: {
numFound: 3,
start: 0,
docs: [
  {
title: [ ….

Very strange to me, please help!

Regards

Re: Basic Multilingual search capability

2015-02-23 Thread Walter Underwood
It isn’t just complicated, it can be impossible.

Do you have content in Chinese or Japanese? Those languages (and some others) 
do not separate words with spaces. You cannot even do word search without a 
language-specific, dictionary-based parser.

German is space separated, except many noun compounds are not space-separated.

Do you have Finnish content? Entire prepositional phrases turn into word 
endings.

Do you have Arabic content? That is even harder.

If all your content is in space-separated languages that are not heavily 
inflected, you can kind of do OK with a language-insensitive approach. But it 
hits the wall pretty fast.

One thing that does work pretty well is trademarked names (LaserJet, Coke, 
etc). Those are spelled the same in all languages and usually not inflected.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Feb 23, 2015, at 8:00 PM, Rishi Easwaran rishi.easwa...@aol.com wrote:

 Hi Alex,
 
 There is no specific language list.  
 For example: the documents that needs to be indexed are emails or any 
 messages for a global customer base. The messages back and forth could be in 
 any language or mix of languages.
 
 I understand relevancy, stemming etc becomes extremely complicated with 
 multilingual support, but our first goal is to be able to tokenize and 
 provide basic search capability for any language. Ex: When the document 
 contains hello or здравствуйте, the analyzer creates tokens and provides 
 exact match search results.
 
 Now it would be great if it had capability to tokenize email addresses 
 (ex:he...@aol.com- i think standardTokenizer already does this),  filenames 
 (здравствуйте.pdf), but maybe we can use filters to accomplish that. 
 
 Thanks,
 Rishi.
 
 -Original Message-
 From: Alexandre Rafalovitch arafa...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Feb 23, 2015 5:49 pm
 Subject: Re: Basic Multilingual search capability
 
 
 Which languages are you expecting to deal with? Multilingual support
 is a complex issue. Even if you think you don't need much, it is
 usually a lot more complex than expected, especially around relevancy.
 
 Regards,
   Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/
 
 
 On 23 February 2015 at 16:19, Rishi Easwaran rishi.easwa...@aol.com wrote:
 Hi All,
 
 For our use case we don't really need to do a lot of manipulation of 
 incoming 
 text during index time. At most removal of common stop words, tokenize 
 emails/ 
 filenames etc if possible. We get text documents from our end users, which 
 can 
 be in any language (sometimes combination) and we cannot determine the 
 language 
 of the incoming text. Language detection at index time is not necessary.
 
 Which analyzer is recommended to achive basic multilingual search capability 
 for a use case like this.
 I have read a bunch of posts about using a combination standardtokenizer or 
 ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking 
 for 
 ideas, suggestions, best practices.
 
 http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236
 http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923
 https://issues.apache.org/jira/browse/SOLR-6492
 
 
 Thanks,
 Rishi.
 
 
 



Re: Special character and wildcard matching

2015-02-23 Thread Jack Krupansky
Is it really a string field - as opposed to a text field? Show us the field
and field type.

Besides, if it really were a raw name, wouldn't that be a capital B?

-- Jack Krupansky

On Mon, Feb 23, 2015 at 6:52 PM, Arun Rangarajan arunrangara...@gmail.com
wrote:

 I have a string field raw_name like this in my document:

 {raw_name: beyoncé}

 (Notice that the last character is a special character.)

 When I issue this wildcard query:

 q=raw_name:beyonce*

 i.e. with the last character simply being the ASCII 'e', Solr returns me
 the above document.

 How do I prevent this?



Re: Special character and wildcard matching

2015-02-23 Thread Jack Krupansky
But how is that lowercasing occurring? I mean, solr.StrField doesn't do
that.

Some containers default to automatically mapping accented characters, so
that the accented e would then get indexed as a normal e, and then your
wildcard would match it, and an accented e in a query would get mapped as
well and then match the normal e in the index. What does your query
response look like?

This blog post explains that problem:
http://bensch.be/tomcat-solr-and-special-characters

Note that you could make your string field a text field with the keyword
tokenizer and then filter it for lower case, such as when the user query
might have a capital B. String field is most appropriate when the field
really is 100% raw.


-- Jack Krupansky

On Mon, Feb 23, 2015 at 7:37 PM, Arun Rangarajan arunrangara...@gmail.com
wrote:

 Yes, it is a string field and not a text field.

 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/
 field name=raw_name type=string indexed=true stored=true /

 Lower-casing done to do case-insensitive matching.

 On Mon, Feb 23, 2015 at 4:01 PM, Jack Krupansky jack.krupan...@gmail.com
 wrote:

  Is it really a string field - as opposed to a text field? Show us the
 field
  and field type.
 
  Besides, if it really were a raw name, wouldn't that be a capital B?
 
  -- Jack Krupansky
 
  On Mon, Feb 23, 2015 at 6:52 PM, Arun Rangarajan 
 arunrangara...@gmail.com
  
  wrote:
 
   I have a string field raw_name like this in my document:
  
   {raw_name: beyoncé}
  
   (Notice that the last character is a special character.)
  
   When I issue this wildcard query:
  
   q=raw_name:beyonce*
  
   i.e. with the last character simply being the ASCII 'e', Solr returns
 me
   the above document.
  
   How do I prevent this?
  
 



Re: Basic Multilingual search capability

2015-02-23 Thread Rishi Easwaran
Hi Alex,

There is no specific language list.  
For example: the documents that needs to be indexed are emails or any messages 
for a global customer base. The messages back and forth could be in any 
language or mix of languages.
 
I understand relevancy, stemming etc becomes extremely complicated with 
multilingual support, but our first goal is to be able to tokenize and provide 
basic search capability for any language. Ex: When the document contains hello 
or здравствуйте, the analyzer creates tokens and provides exact match search 
results.

Now it would be great if it had capability to tokenize email addresses 
(ex:he...@aol.com- i think standardTokenizer already does this),  filenames 
(здравствуйте.pdf), but maybe we can use filters to accomplish that. 

Thanks,
Rishi.
 
 
-Original Message-
From: Alexandre Rafalovitch arafa...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Feb 23, 2015 5:49 pm
Subject: Re: Basic Multilingual search capability


Which languages are you expecting to deal with? Multilingual support
is a complex issue. Even if you think you don't need much, it is
usually a lot more complex than expected, especially around relevancy.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 February 2015 at 16:19, Rishi Easwaran rishi.easwa...@aol.com wrote:
 Hi All,

 For our use case we don't really need to do a lot of manipulation of incoming 
text during index time. At most removal of common stop words, tokenize emails/ 
filenames etc if possible. We get text documents from our end users, which can 
be in any language (sometimes combination) and we cannot determine the language 
of the incoming text. Language detection at index time is not necessary.

 Which analyzer is recommended to achive basic multilingual search capability 
for a use case like this.
 I have read a bunch of posts about using a combination standardtokenizer or 
ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking for 
ideas, suggestions, best practices.

 http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236
 http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923
 https://issues.apache.org/jira/browse/SOLR-6492


 Thanks,
 Rishi.


 


Special character and wildcard matching

2015-02-23 Thread Arun Rangarajan
I have a string field raw_name like this in my document:

{raw_name: beyoncé}

(Notice that the last character is a special character.)

When I issue this wildcard query:

q=raw_name:beyonce*

i.e. with the last character simply being the ASCII 'e', Solr returns me
the above document.

How do I prevent this?


Re: Special character and wildcard matching

2015-02-23 Thread Arun Rangarajan
Yes, it is a string field and not a text field.

fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
field name=raw_name type=string indexed=true stored=true /

Lower-casing done to do case-insensitive matching.

On Mon, Feb 23, 2015 at 4:01 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Is it really a string field - as opposed to a text field? Show us the field
 and field type.

 Besides, if it really were a raw name, wouldn't that be a capital B?

 -- Jack Krupansky

 On Mon, Feb 23, 2015 at 6:52 PM, Arun Rangarajan arunrangara...@gmail.com
 
 wrote:

  I have a string field raw_name like this in my document:
 
  {raw_name: beyoncé}
 
  (Notice that the last character is a special character.)
 
  When I issue this wildcard query:
 
  q=raw_name:beyonce*
 
  i.e. with the last character simply being the ASCII 'e', Solr returns me
  the above document.
 
  How do I prevent this?
 



apache solr - dovecot - some search fields works some dont

2015-02-23 Thread Kevin Laurie
Hi,
I finally understand how Solr works(somewhat) its a bit complicated as I am
new to the whole concept but I understand it as a search engine. I am using
Solr with dovecot.
and  I found out that some seach fields from the inbox work and other dont.
For example if I were to search To and From apache solr would process it in
its log and give me an output, however if I were to search something in the
Body it would stall and no output.
I am guessing this is some schema.xml problem. Could you advise?
Oh. I already addressed the java heap size problem.
I have underlined the syntax that shows it.
I am guessing its only the body search that fails, and it might be
schema.xml related.



*3374412 [qtp1728413448-16] INFO  org.apache.solr.core.SolrCore  ?
[collection1] webapp=/solr path=/select
params={sort=uid+ascfl=uid,scoreq=subject:dave+OR+from:dave+OR+to:davefq=%2Bbox:ac553604f7314b54e6233555fc1a+%2Buser:b...@email.net
b...@email.netrows=107161} hits=571 status=0 QTime=706 *
3379438 [qtp1728413448-18] INFO  org.apache.solr.servlet.
SolrDispatchFilter  ? [admin] webapp=null path=/admin/info/logging
params={_=1424714397078since=1424711021771wt=json} status=0 QTime=0
3389791 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714407453since=1424711021771wt=json} status=0 QTime=1
3400172 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714417834since=1424711021771wt=json} status=0 QTime=1
3410544 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714428205since=1424711021771wt=json} status=0 QTime=0
3420895 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714438558since=1424711021771wt=json} status=0 QTime=0
3431247 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714448908since=1424711021771wt=json} status=0 QTime=1
3441671 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714459334since=1424711021771wt=json} status=0 QTime=1
3452017 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714469679since=1424711021771wt=json} status=0 QTime=1
3462363 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714480026since=1424711021771wt=json} status=0 QTime=0
3472707 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714490369since=1424711021771wt=json} status=0 QTime=0
3483139 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714500802since=1424711021771wt=json} status=0 QTime=1
3493590 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714511246since=1424711021771wt=json} status=0 QTime=0
3504027 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714521691since=1424711021771wt=json} status=0 QTime=0
3514477 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714532137since=1424711021771wt=json} status=0 QTime=1
3524933 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714542598since=1424711021771wt=json} status=0 QTime=0
3535288 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714552951since=1424711021771wt=json} status=0 QTime=0
3545634 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714563290since=1424711021771wt=json} status=0 QTime=0
3556077 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714573714since=1424711021771wt=json} status=0 QTime=0
3566496 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714584157since=1424711021771wt=json} status=0 QTime=1
3576937 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714594601since=1424711021771wt=json} status=0 QTime=0
3587273 [qtp1728413448-18] INFO
org.apache.solr.servlet.SolrDispatchFilter  ? [admin] webapp=null
path=/admin/info/logging
params={_=1424714604939since=1424711021771wt=json} status=0 

snapinstaller does not start newSearcher

2015-02-23 Thread alxsss
Hello,

I am using latest solr (solr trunk) . I run snapinstaller, and see that it 
copies snapshot to index folder but changes are not picked up and

 logs in slave after running snapinstaller are

44302 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
44303 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – No 
uncommitted changes. Skipping IW.commit.
44304 [qtp1312571113-14] INFO  org.apache.solr.core.SolrCore  – 
SolrIndexSearcher has not changed - not re-opening: 
org.apache.solr.search.SolrIndexSearcher
44305 [qtp1312571113-14] INFO  org.apache.solr.update.UpdateHandler  – 
end_commit_flush
44305 [qtp1312571113-14] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  – [product] webapp=/solr 
path=/update params={} {commit=} 0 57

Restarting solr  gives

 Error creating core [product]: Error opening new searcher
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.init(SolrCore.java:873)
at org.apache.solr.core.SolrCore.init(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.init(SolrCore.java:845)
... 9 more

Any idea what causes this issue.

Thanks in advance.
Alex.



Re: Basic Multilingual search capability

2015-02-23 Thread Rishi Easwaran
Hi Wunder,

Yes we do expect incoming documents to contain Chinese/Japanese/Arabic 
languages.

From what you have mentioned, it looks like we need to auto detect the incoming 
content language and tokenize/filter after that.
But I thought the ICU tokenizer had capability to do that  
(https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-ICUTokenizer)
This tokenizer processes multilingual text and tokenizes it appropriately 
based on its script attribute. 
or am I missing something? 

Thanks,
Rishi.

 

 

-Original Message-
From: Walter Underwood wun...@wunderwood.org
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Feb 23, 2015 11:17 pm
Subject: Re: Basic Multilingual search capability


It isn’t just complicated, it can be impossible.

Do you have content in Chinese or Japanese? Those languages (and some others) 
do 
not separate words with spaces. You cannot even do word search without a 
language-specific, dictionary-based parser.

German is space separated, except many noun compounds are not space-separated.

Do you have Finnish content? Entire prepositional phrases turn into word 
endings.

Do you have Arabic content? That is even harder.

If all your content is in space-separated languages that are not heavily 
inflected, you can kind of do OK with a language-insensitive approach. But it 
hits the wall pretty fast.

One thing that does work pretty well is trademarked names (LaserJet, Coke, 
etc). 
Those are spelled the same in all languages and usually not inflected.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Feb 23, 2015, at 8:00 PM, Rishi Easwaran rishi.easwa...@aol.com wrote:

 Hi Alex,
 
 There is no specific language list.  
 For example: the documents that needs to be indexed are emails or any 
 messages 
for a global customer base. The messages back and forth could be in any 
language 
or mix of languages.
 
 I understand relevancy, stemming etc becomes extremely complicated with 
multilingual support, but our first goal is to be able to tokenize and provide 
basic search capability for any language. Ex: When the document contains hello 
or здравствуйте, the analyzer creates tokens and provides exact match search 
results.
 
 Now it would be great if it had capability to tokenize email addresses 
(ex:he...@aol.com- i think standardTokenizer already does this),  filenames 
(здравствуйте.pdf), but maybe we can use filters to accomplish that. 
 
 Thanks,
 Rishi.
 
 -Original Message-
 From: Alexandre Rafalovitch arafa...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Feb 23, 2015 5:49 pm
 Subject: Re: Basic Multilingual search capability
 
 
 Which languages are you expecting to deal with? Multilingual support
 is a complex issue. Even if you think you don't need much, it is
 usually a lot more complex than expected, especially around relevancy.
 
 Regards,
   Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/
 
 
 On 23 February 2015 at 16:19, Rishi Easwaran rishi.easwa...@aol.com wrote:
 Hi All,
 
 For our use case we don't really need to do a lot of manipulation of 
 incoming 

 text during index time. At most removal of common stop words, tokenize 
 emails/ 

 filenames etc if possible. We get text documents from our end users, which 
 can 

 be in any language (sometimes combination) and we cannot determine the 
language 
 of the incoming text. Language detection at index time is not necessary.
 
 Which analyzer is recommended to achive basic multilingual search capability 
 for a use case like this.
 I have read a bunch of posts about using a combination standardtokenizer or 
 ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking 
 for 

 ideas, suggestions, best practices.
 
 http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236
 http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923
 https://issues.apache.org/jira/browse/SOLR-6492
 
 
 Thanks,
 Rishi.
 
 
 


 


Setting Up an External ZooKeeper Ensemble

2015-02-23 Thread CKReddy Bhimavarapu
Hi,
  I did follow all the steps in [
https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble]
but still I am getting this error
bWaiting to see Solr listening on port 8983 [-]  Still not seeing Solr
listening on 8983 after 30 seconds!/b
WARN  - 2015-02-24 05:50:19.161;
org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null,
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
WARN  - 2015-02-24 05:50:20.262;
org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null,
unexpected error, closing socket connection and attempting reconnect


Where am I going wrong?

-- 
ckreddybh. chaitu...@gmail.com


Re: Basic Multilingual search capability

2015-02-23 Thread Trey Grainger
Hi Rishi,

I don't generally recommend a language-insensitive approach except for
really simple multilingual use cases (for most of the reasons Walter
mentioned), but the ICUTokenizer is probably the best bet you're going to
have if you really want to go that route and only need exact-match on the
tokens that are parsed. It won't work that well for all languages (CJK
languages, for example), but it will work fine for many.

It is also possible to handle multi-lingual content in a more intelligent
(i.e. per-language configuration) way in your search index, of course.
There are three primary strategies (i.e. ways that actually work in the
real world) to do this:
1) create a separate field for each language and search across all of them
at query time
2) create a separate core per language-combination and search across all of
them at query time
3) invoke multiple language-specific analyzers within a single field's
analyzer and index/query using one or more of those language's analyzers
for each document/query.

These are listed in ascending order of complexity, and each can be valid
based upon your use case. For at least the first and third cases, you can
use index-time language detection to map to the appropriate
fields/analyzers if you are otherwise unaware of the languages of the
content from your application layer. The third option requires custom code
(included in the large Multilingual Search chapter of Solr in Action
http://solrinaction.com and soon to be contributed back to Solr via
SOLR-6492 https://issues.apache.org/jira/browse/SOLR-6492), but it
enables you to index an arbitrarily large number of languages into the same
field if needed, while preserving language-specific analysis for each
language.

I presented in detail on the above strategies at Lucene/Solr Revolution
last November, so you may consider checking out the presentation and/or
slides to asses if one of these strategies will work for your use case:
http://www.treygrainger.com/posts/presentations/semantic-multilingual-strategies-in-lucenesolr/

For the record, I'd highly recommend going with the first strategy (a
separate field per language) if you can, as it is certainly the simplest of
the approaches (albeit the one that scales the least well after you add
more than a few languages to your queries). If you want to stay simple and
stick with the ICUTokenizer then it will work to a point, but some of the
problems Walter mentioned may eventually bite you if you are supporting
certain groups of languages.

All the best,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search  Recommendations @ CareerBuilder

On Mon, Feb 23, 2015 at 11:14 PM, Walter Underwood wun...@wunderwood.org
wrote:

 It isn’t just complicated, it can be impossible.

 Do you have content in Chinese or Japanese? Those languages (and some
 others) do not separate words with spaces. You cannot even do word search
 without a language-specific, dictionary-based parser.

 German is space separated, except many noun compounds are not
 space-separated.

 Do you have Finnish content? Entire prepositional phrases turn into word
 endings.

 Do you have Arabic content? That is even harder.

 If all your content is in space-separated languages that are not heavily
 inflected, you can kind of do OK with a language-insensitive approach. But
 it hits the wall pretty fast.

 One thing that does work pretty well is trademarked names (LaserJet, Coke,
 etc). Those are spelled the same in all languages and usually not inflected.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)

 On Feb 23, 2015, at 8:00 PM, Rishi Easwaran rishi.easwa...@aol.com
 wrote:

  Hi Alex,
 
  There is no specific language list.
  For example: the documents that needs to be indexed are emails or any
 messages for a global customer base. The messages back and forth could be
 in any language or mix of languages.
 
  I understand relevancy, stemming etc becomes extremely complicated with
 multilingual support, but our first goal is to be able to tokenize and
 provide basic search capability for any language. Ex: When the document
 contains hello or здравствуйте, the analyzer creates tokens and provides
 exact match search results.
 
  Now it would be great if it had capability to tokenize email addresses
 (ex:he...@aol.com- i think standardTokenizer already does this),
 filenames (здравствуйте.pdf), but maybe we can use filters to accomplish
 that.
 
  Thanks,
  Rishi.
 
  -Original Message-
  From: Alexandre Rafalovitch arafa...@gmail.com
  To: solr-user solr-user@lucene.apache.org
  Sent: Mon, Feb 23, 2015 5:49 pm
  Subject: Re: Basic Multilingual search capability
 
 
  Which languages are you expecting to deal with? Multilingual support
  is a complex issue. Even if you think you don't need much, it is
  usually a lot more complex than expected, especially around relevancy.
 
  Regards,
Alex.
  
  Sign up for my Solr resources