Re: Boosting the score using edismax for a non empty and non indexed field.

2014-12-09 Thread Erik Hatcher
Boosting will need to be done off an indexed field.  But maybe rather than 
indexing the url value, maybe index another new hasImage field as a boolean 
true.  No need to index the false values even. 

   Erik  


 On Dec 8, 2014, at 02:45, S.L simpleliving...@gmail.com wrote:
 
 Hi All,
 
 I have a situation where I need to boost the score of a query if a field
 (imageURL) in the given document is non empty , I am using edismax so I
 know that using bq parameter would solve the problem. However the field
 imageURL that  I am trying to boost on is not indexed , meaning (stored =
 true and indexed = false), can I use the bq parameter for a non indexed
 field ? or should I be looking at re-indexing after changing the schema to
 make this an indexed field ?
 
 Also , my use case is such that I want the documents that have an imageURL
 to be boosted so that they appear before those documents that do not have
 the imageURL when sorted by score in a descending order, and this field in
 question i.e. imageURL is sometimes present  and sometimes not, that is why
 I am looking at boosting the score of those documents that have the
 imageURL present.
 
 Thanks and any help and suggestionis much appreciated!


Re: Length norm not functioning in solr queries.

2014-12-09 Thread S.L
Hi ,

Mikhail Thanks , I looked at the explain and this is what I see for the two
different documents in questions, they have identical scores   even though
the document 2 has a shorter productName field, I do not see any lenghtNorm
related information in the explain.

Also I am not exactly clear on what needs to be looked in the API ?

*Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf=
productNameps=1pf2= productNamepf3=
productNamestopwords=truelowercaseOperators=true

*productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory
Unlocked *


   - *100%* 10.649221 sum of the following:
  - *10.58%* 1.1270299 sum of the following:
 - *2.1%* 0.22383358 productName:iphon
 - *3.47%* 0.36922288 productName:4 s
 - *5.01%* 0.53397346 productName:16 gb
  - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1
  - *27.79%* 2.959255 sum of the following:
 - *10.97%* 1.1680154 productName:iphon 4 s~1
 - *16.82%* 1.7912396 productName:4 s 16 gb~1
  - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1


*productName Apple iPhone 4S 16GB for Net10, No Contract, White*


   - *100%* 10.649221 sum of the following:
  - *10.58%* 1.1270299 sum of the following:
 - *2.1%* 0.22383358 productName:iphon
 - *3.47%* 0.36922288 productName:4 s
 - *5.01%* 0.53397346 productName:16 gb
  - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1
  - *27.79%* 2.959255 sum of the following:
 - *10.97%* 1.1680154 productName:iphon 4 s~1
 - *16.82%* 1.7912396 productName:4 s 16 gb~1
  - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1




On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 It's worth to look into explain to check particular scoring values. But
 for most suspect is the reducing precision when float norms are stored in
 byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)


 On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote:

  I have two documents doc1 and doc2 and each one of those has a field
 called
  phoneName.
 
  doc1:phoneName:Details about  Apple iPhone 4s - 16GB - White (Verizon)
  Smartphone Factory Unlocked
 
  doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White
 
  Here if I search for
 
 
 q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true
 
  Doc1 and Doc2 both have the same identical score , but since the field
  phoneName in the doc2 has shorter length I would expect it to have a
 higher
  score , but both have an identical score of 9.961212.
 
  The phoneName filed is defined as follows.As we can see no where am I
  specifying omitNorms=True, still the behavior seems to be that the length
  norm is not functioning at all. Can some one let me know whats the issue
  here ?
 
  field name=phoneName type=text_en_splitting indexed=true
  stored=true required=true /
  fieldType name=text_en_splitting class=solr.TextField
  positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  !-- in this example, we will only use synonyms at query
  time filter
  class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true
  expand=false/ --
  !-- Case insensitive stop word removal. add
  enablePositionIncrements=true
  in both the index and query analyzers to leave a
 'gap'
  for more accurate
  phrase queries. --
  filter class=solr.StopFilterFactory ignoreCase=true
  words=lang/stopwords_en.txt
  enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1
  catenateWords=1
  catenateNumbers=1 catenateAll=0
  splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.KeywordMarkerFilterFactory
  protected=protwords.txt /
  filter class=solr.PorterStemFilterFactory /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=lang/stopwords_en.txt
  enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1
  catenateWords=0
  catenateNumbers=0 catenateAll=0
  splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter 

Re: Length norm not functioning in solr queries.

2014-12-09 Thread Ahmet Arslan
Hi,

Default length norm is not best option for differentiating very short 
documents, like product names.
Please see : 
http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec

I suggest you to create an additional integer field, that holds number of 
tokens. You can populate it via update processor. And then penalise (using 
fuction queries) according to that field. This way you have more fine grained 
and flexible control over it.

Ahmet



On Tuesday, December 9, 2014 12:22 PM, S.L simpleliving...@gmail.com wrote:
Hi ,

Mikhail Thanks , I looked at the explain and this is what I see for the two
different documents in questions, they have identical scores   even though
the document 2 has a shorter productName field, I do not see any lenghtNorm
related information in the explain.

Also I am not exactly clear on what needs to be looked in the API ?

*Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf=
productNameps=1pf2= productNamepf3=
productNamestopwords=truelowercaseOperators=true

*productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory
Unlocked *


   - *100%* 10.649221 sum of the following:
  - *10.58%* 1.1270299 sum of the following:
 - *2.1%* 0.22383358 productName:iphon
 - *3.47%* 0.36922288 productName:4 s
 - *5.01%* 0.53397346 productName:16 gb
  - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1
  - *27.79%* 2.959255 sum of the following:
 - *10.97%* 1.1680154 productName:iphon 4 s~1
 - *16.82%* 1.7912396 productName:4 s 16 gb~1
  - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1


*productName Apple iPhone 4S 16GB for Net10, No Contract, White*


   - *100%* 10.649221 sum of the following:
  - *10.58%* 1.1270299 sum of the following:
 - *2.1%* 0.22383358 productName:iphon
 - *3.47%* 0.36922288 productName:4 s
 - *5.01%* 0.53397346 productName:16 gb
  - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1
  - *27.79%* 2.959255 sum of the following:
 - *10.97%* 1.1680154 productName:iphon 4 s~1
 - *16.82%* 1.7912396 productName:4 s 16 gb~1
  - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1





On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 It's worth to look into explain to check particular scoring values. But
 for most suspect is the reducing precision when float norms are stored in
 byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)


 On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote:

  I have two documents doc1 and doc2 and each one of those has a field
 called
  phoneName.
 
  doc1:phoneName:Details about  Apple iPhone 4s - 16GB - White (Verizon)
  Smartphone Factory Unlocked
 
  doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White
 
  Here if I search for
 
 
 q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true
 
  Doc1 and Doc2 both have the same identical score , but since the field
  phoneName in the doc2 has shorter length I would expect it to have a
 higher
  score , but both have an identical score of 9.961212.
 
  The phoneName filed is defined as follows.As we can see no where am I
  specifying omitNorms=True, still the behavior seems to be that the length
  norm is not functioning at all. Can some one let me know whats the issue
  here ?
 
  field name=phoneName type=text_en_splitting indexed=true
  stored=true required=true /
  fieldType name=text_en_splitting class=solr.TextField
  positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  !-- in this example, we will only use synonyms at query
  time filter
  class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true
  expand=false/ --
  !-- Case insensitive stop word removal. add
  enablePositionIncrements=true
  in both the index and query analyzers to leave a
 'gap'
  for more accurate
  phrase queries. --
  filter class=solr.StopFilterFactory ignoreCase=true
  words=lang/stopwords_en.txt
  enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1
  catenateWords=1
  catenateNumbers=1 catenateAll=0
  splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.KeywordMarkerFilterFactory
  protected=protwords.txt /
  filter class=solr.PorterStemFilterFactory /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory
  

Re: Length norm not functioning in solr queries.

2014-12-09 Thread Mikhail Khludnev
I wonder why your explains are so brief, mine looks like

str
0.4500489 = (MATCH) weight(text:inc in 17) [DefaultSimilarity], result of:
  0.4500489 = fieldWeight in 17, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
2.880313 = idf(docFreq=8, maxDocs=59)
0.15625 = fieldNorm(doc=17)/str
str
0.4500489 = (MATCH) weight(text:inc in 27) [DefaultSimilarity], result of:
  0.4500489 = fieldWeight in 27, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
2.880313 = idf(docFreq=8, maxDocs=59)
0.15625 = fieldNorm(doc=27)/str

here we can see fieldNorm factors. These two docs are rather different,
however norm factors are equal.

 Also I am not exactly clear on what needs to be looked in the API ?

Because you can see how exactly how it looses precision when stores
float field norm in the byte.



On Tue, Dec 9, 2014 at 1:22 PM, S.L simpleliving...@gmail.com wrote:

 Hi ,

 Mikhail Thanks , I looked at the explain and this is what I see for the two
 different documents in questions, they have identical scores   even though
 the document 2 has a shorter productName field, I do not see any lenghtNorm
 related information in the explain.

 Also I am not exactly clear on what needs to be looked in the API ?

 *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf=
 productNameps=1pf2= productNamepf3=
 productNamestopwords=truelowercaseOperators=true

 *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory
 Unlocked *


- *100%* 10.649221 sum of the following:
   - *10.58%* 1.1270299 sum of the following:
  - *2.1%* 0.22383358 productName:iphon
  - *3.47%* 0.36922288 productName:4 s
  - *5.01%* 0.53397346 productName:16 gb
   - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1
   - *27.79%* 2.959255 sum of the following:
  - *10.97%* 1.1680154 productName:iphon 4 s~1
  - *16.82%* 1.7912396 productName:4 s 16 gb~1
   - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1


 *productName Apple iPhone 4S 16GB for Net10, No Contract, White*


- *100%* 10.649221 sum of the following:
   - *10.58%* 1.1270299 sum of the following:
  - *2.1%* 0.22383358 productName:iphon
  - *3.47%* 0.36922288 productName:4 s
  - *5.01%* 0.53397346 productName:16 gb
   - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1
   - *27.79%* 2.959255 sum of the following:
  - *10.97%* 1.1680154 productName:iphon 4 s~1
  - *16.82%* 1.7912396 productName:4 s 16 gb~1
   - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1




 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

  It's worth to look into explain to check particular scoring values. But
  for most suspect is the reducing precision when float norms are stored in
  byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)
 
 
  On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote:
 
   I have two documents doc1 and doc2 and each one of those has a field
  called
   phoneName.
  
   doc1:phoneName:Details about  Apple iPhone 4s - 16GB - White (Verizon)
   Smartphone Factory Unlocked
  
   doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White
  
   Here if I search for
  
  
 
 q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true
  
   Doc1 and Doc2 both have the same identical score , but since the field
   phoneName in the doc2 has shorter length I would expect it to have a
  higher
   score , but both have an identical score of 9.961212.
  
   The phoneName filed is defined as follows.As we can see no where am I
   specifying omitNorms=True, still the behavior seems to be that the
 length
   norm is not functioning at all. Can some one let me know whats the
 issue
   here ?
  
   field name=phoneName type=text_en_splitting indexed=true
   stored=true required=true /
   fieldType name=text_en_splitting class=solr.TextField
   positionIncrementGap=100
 autoGeneratePhraseQueries=true
   analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory /
   !-- in this example, we will only use synonyms at
 query
   time filter
   class=solr.SynonymFilterFactory
   synonyms=index_synonyms.txt ignoreCase=true
   expand=false/ --
   !-- Case insensitive stop word removal. add
   enablePositionIncrements=true
   in both the index and query analyzers to leave a
  'gap'
   for more accurate
   phrase queries. --
   filter class=solr.StopFilterFactory
 ignoreCase=true
   words=lang/stopwords_en.txt
   enablePositionIncrements=true /
   filter class=solr.WordDelimiterFilterFactory
   generateWordParts=1 generateNumberParts=1

AW: AW: Keeping capitalization in suggestions?

2014-12-09 Thread Clemens Wyss DEV
Thanks for all the insightful links.
I tried 
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr but 
that approach returns searchresults instead of term-suggestions.

I have (at the moment) a solution based on 
http://wiki.apache.org/solr/TermsComponent . But I might want 
multi-term-suggestions (and fuzzyness). 
Therefore I'd be very much interested how AnalyzingInfixLookupFactory (or any 
other suggest-component) would allow to
a) return case-sensitive suggestions (i.e. as-indexed/stored)
b) allow case-insensitive suggestion-lookup
?
Anybody else doing what I'd like to do?

-Ursprüngliche Nachricht-
Von: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Gesendet: Montag, 8. Dezember 2014 19:25
An: solr-user@lucene.apache.org
Betreff: Re: AW: Keeping capitalization in suggestions?

Hi Clemens,

There a a number of ways to implement auto complete/suggest. Some of them pull 
data from indexed terms, therefore they will be lowercased. Some pull data from 
stored values, therefore capitalisation is preserved.

Here are great resources on this topic.

https://lucidworks.com/blog/auto-suggest-from-popular-queries-using-edgengrams/
http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Ahmet


On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV clemens...@mysign.ch 
wrote:

Allthough making use of AnalyzingInfixSuggester I still getting either or.

When lowercase-filter is active I always get suggestions, BUT they are 
lowercased (i.e. chamäleon).
When lowercase-filter is not active I only get suggestions when querying Chamä

my solrconfig.xml
...
requestHandler class=org.apache.solr.handler.component.SearchHandler 
name=/suggest
lst name=defaults
str name=echoParamsnone/str
str name=wtjson/str
str name=indentfalse/str
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggestDictionary/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count5/str
str name=spellcheck.collatefalse/str
/lst
arr name=components
strsuggest/str
/arr
/requestHandler
...
searchComponent class=solr.SpellCheckComponent name=suggest
  lst name=spellchecker
str name=namesuggestDictionary/str
str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str
str 
name=dictionaryImplorg.apache.solr.spelling.suggest.DocumentDictionaryFactory/str
str name=fieldsuggest/str  
str name=buildOnCommittrue/str
str name=storeDirsuggester/str
str name=suggestAnalyzerFieldTypetext_suggest/str
str name=minPrefixChars4/str
  /lst
/searchComponent
...

my schema.xml
...
field indexed=true multiValued=true name=suggest stored=false 
type=text_suggest/ ...
fieldType class=solr.TextField name=text_suggest 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.UAX29URLEmailTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
!-- filter class=solr.LowerCaseFilterFactory/ --
  /analyzer
  analyzer type=query
tokenizer class=solr.UAX29URLEmailTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/ 
!--filter class=solr.LowerCaseFilterFactory/--
  /analyzer  
/fieldType
...


-Ursprüngliche Nachricht-
Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com]
Gesendet: Donnerstag, 4. Dezember 2014 14:05
An: solr-user@lucene.apache.org
Betreff: Re: Keeping capitalization in suggestions?

Have a look at AnalyzingInfixSuggester - it does what you want.

-Mike

On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
 When I index a text such as Chamäleon and look for suggestions for chamä 
 and/or Chamä, I'd expect to get Chamäleon (uppercased).
 But what happens is

 If lowecasefilter (see below (1)) set
 chamä returns chamäleon
 Chamä does not match

 If lowecasefilter (1) not set
 Chamä returns Chamäleon
 chamä does not match

 I guess lowecasefilter should not be set/active, but then how do I get 
 matches even if the search term is lowercased?

 Context:
 schema.xml
 ...
  fieldType class=solr.TextField name=text_de 
 positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true 
 words=lang/stopwords_de.txt/
  filter class=solr.GermanLightStemFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.SynonymFilterFactory expand=true 
 ignoreCase=true 

Re: SOLR shards stay down forever

2014-12-09 Thread Erick Erickson
How big is your transaction log? If you don't do a hard commit
(openSearcher = true or false doesn't matter), then the tlog
can grow and upon restart the tlog gets replayed. I've seen
tlogs in the 10s of G range which can take a long time to replay.
In the mean time, new updates are written to, you guessed it,
the tlog.

So check the tlog size. If it's big, be sure you have indexing turned
off and be very patient (as in hours in some cases). To avoid this
make sure to do a hard commit when indexing.

Here's a long blog on the topic:
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

and if this is irrelevant, then I'm not quite sure what's going on.

Best,
Erick

On Tue, Dec 9, 2014 at 12:48 AM, Norgorn lsunnyd...@mail.ru wrote:
 I'm using SOLR 4.10.1 in cloud mode with 3 instances, 5 shards per instance
 without replication.
 I restarted one SOLR and now all shards from that instance are down, but
 there are no errors in logs.
 All I see is

 09.12.2014, 11:13:40WARNUpdateLog   Starting log replay
 tlog{file=/opt/data4/data/tlog/tlog.297 refcount=2}
 active=false starting pos=0
 09.12.2014, 11:13:40WARNUpdateLog   Starting log replay
 tlog{file=/opt/data5/data/tlog/tlog.297 refcount=2}
 active=false starting pos=0
 09.12.2014, 11:13:40WARNUpdateLog   Starting log replay
 tlog{file=/opt/data/data/tlog/tlog.298 refcount=2}
 active=false starting pos=0
 09.12.2014, 11:13:40WARNUpdateLog   Starting log replay
 tlog{file=/opt/data3/data/tlog/tlog.298 refcount=2}
 active=false starting pos=0
 09.12.2014, 11:13:40WARNUpdateLog   Starting log replay
 tlog{file=/opt/data2/data/tlog/tlog.299 refcount=2}
 active=false starting pos=0

 SOLR with down shards tries to open new searcher, and I see something like
 this in output:

 INFO  org.apache.solr.servlet.SolrDispatchFilter  – [admin] webapp=null
 path=/admin/info/system params={_=1418106009371wt=json} status=0 QTime=314
 4020344 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore  –
 [collection_shard5_replica1] Registered new searcher
 Searcher@4dcb568c[vk_hbase_shard5_replica1]
 main{StandardDirectoryReader(segments_85:4012:nrt _4l(4.10.1):C8880876
 _88(4.10.1):C8730658 _im(4.10.1):C8773208 _pa(4.10.1):C8435426
 _cy(4.10.1):C9802246 _fc(4.10.1):C9046837 _sc(4.10.1):C7806921
 _m7(4.10.1):C9362895 _zy(4.10.1):C8808455 _w0(4.10.1):C8384542
 _ui(4.10.1):C164859 _1dd(4.10.1):C7764232 _13a(4.10.1):C8240288
 _16n(4.10.1):C8839542 _19w(4.10.1):C1071719 _172(4.10.1):C200551
 _1av(4.10.1):C9141784 _1if(4.10.1):C997348 _1eh(4.10.1):C174190
 _1hb(4.10.1):C9050675 _1dl(4.10.1):C64 _1j9(4.10.1):C119759
 _1fw(4.10.1):C795323 _1gn(4.10.1):C4922 _1ht(4.10.1):C984261
 _1hh(4.10.1):C966986 _1iz(4.10.1):C953605 _1ip(4.10.1):C994842
 _1i6(4.10.1):C75701 _1i7(4.10.1):C4011 _1id(4.10.1):C17581
 _1iy(4.10.1):C75483 _1j1(4.10.1):C102710 _1jj(4.10.1):C1030895
 _1j5(4.10.1):C90936 _1jc(4.10.1):C79955 _1jd(4.10.1):C6312
 _1jh(4.10.1):C96957 _1ji(4.10.1):C71555 _1jk(4.10.1):C3270
 _1jl(4.10.1):C107854 _1jm(4.10.1):C107286 _1jn(4.10.1):C94250
 _1jo(4.10.1):C98851 _1jp(4.10.1):C88492)}

 But all shards remain down state.

 I tried to stop SOLR-1 (the one with problems), delete shards with
 DELETESHARD command and then start it again - didn't help.





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SOLR-shards-stay-down-forever-tp4173284.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: AW: AW: Keeping capitalization in suggestions?

2014-12-09 Thread Michael Sokolov

Clemens --

  what I do (see suggestions of titles of books on $EMPLOYER's web 
site) is to define a field with no analysis (type=keyword, use 
KeywordAnalyzer) and build the suggestions from that.  Then tell AIS to 
use an analyzer internally to pick out word from that (StandardAnalyzer, 
or WhitespaceAnalyzer, with LowerCaseFilter - however you want the 
matching to work in the suggester).  It will return the terms from the 
source field.


You didn't show the definition of your suggest field - I expect it 
must be analyzed, right?  Just don't do that.


-Mike

On 12/09/2014 08:58 AM, Clemens Wyss DEV wrote:

Thanks for all the insightful links.
I tried 
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr but 
that approach returns searchresults instead of term-suggestions.

I have (at the moment) a solution based on 
http://wiki.apache.org/solr/TermsComponent . But I might want 
multi-term-suggestions (and fuzzyness).
Therefore I'd be very much interested how AnalyzingInfixLookupFactory (or any 
other suggest-component) would allow to
a) return case-sensitive suggestions (i.e. as-indexed/stored)
b) allow case-insensitive suggestion-lookup
?
Anybody else doing what I'd like to do?

-Ursprüngliche Nachricht-
Von: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Gesendet: Montag, 8. Dezember 2014 19:25
An: solr-user@lucene.apache.org
Betreff: Re: AW: Keeping capitalization in suggestions?

Hi Clemens,

There a a number of ways to implement auto complete/suggest. Some of them pull 
data from indexed terms, therefore they will be lowercased. Some pull data from 
stored values, therefore capitalisation is preserved.

Here are great resources on this topic.

https://lucidworks.com/blog/auto-suggest-from-popular-queries-using-edgengrams/
http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Ahmet


On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV clemens...@mysign.ch 
wrote:

Allthough making use of AnalyzingInfixSuggester I still getting either or.

When lowercase-filter is active I always get suggestions, BUT they are lowercased (i.e. 
chamäleon).
When lowercase-filter is not active I only get suggestions when querying Chamä

my solrconfig.xml
...
 requestHandler class=org.apache.solr.handler.component.SearchHandler 
name=/suggest
 lst name=defaults
 str name=echoParamsnone/str
 str name=wtjson/str
 str name=indentfalse/str
 str name=spellchecktrue/str
 str name=spellcheck.dictionarysuggestDictionary/str
 str name=spellcheck.onlyMorePopulartrue/str
 str name=spellcheck.count5/str
 str name=spellcheck.collatefalse/str
 /lst
 arr name=components
 strsuggest/str
 /arr
 /requestHandler
...
 searchComponent class=solr.SpellCheckComponent name=suggest
   lst name=spellchecker
 str name=namesuggestDictionary/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory/str
 str 
name=dictionaryImplorg.apache.solr.spelling.suggest.DocumentDictionaryFactory/str
 str name=fieldsuggest/str
 str name=buildOnCommittrue/str
 str name=storeDirsuggester/str
 str name=suggestAnalyzerFieldTypetext_suggest/str
 str name=minPrefixChars4/str
   /lst
 /searchComponent
...

my schema.xml
...
field indexed=true multiValued=true name=suggest stored=false 
type=text_suggest/ ...
 fieldType class=solr.TextField name=text_suggest 
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.UAX29URLEmailTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
!-- filter class=solr.LowerCaseFilterFactory/ --
   /analyzer
   analyzer type=query
 tokenizer class=solr.UAX29URLEmailTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
!--filter class=solr.LowerCaseFilterFactory/--
   /analyzer
 /fieldType
...


-Ursprüngliche Nachricht-
Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com]
Gesendet: Donnerstag, 4. Dezember 2014 14:05
An: solr-user@lucene.apache.org
Betreff: Re: Keeping capitalization in suggestions?

Have a look at AnalyzingInfixSuggester - it does what you want.

-Mike

On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:

When I index a text such as Chamäleon and look for suggestions for chamä and/or 
Chamä, I'd expect to get Chamäleon (uppercased).
But what happens is

If lowecasefilter (see below (1)) set
chamä returns chamäleon
Chamä does not match

If lowecasefilter (1) not set
Chamä returns Chamäleon
chamä does not match

I guess lowecasefilter should not be set/active, but then how do I get matches 

Re: Clearing SolrCaches

2014-12-09 Thread Shawn Heisey
On 12/8/2014 11:10 PM, Manohar Sripada wrote:
 How to edit the configuration that is linked to a collection?? I am using
 SolrCloud and I upload my config to Zookeeper. So, if I modify and upload
 the config, will that not impact the latest collection as well, if I don't
 reload the latest collection?

Yes, you would need to change the config and re-upload it to zookeeper
before you do the reload.  Any collections linked to that config will
use the new config on reload or when a new collection linked to that
config is created.

The scenario I have described probably requires that you have more than
one config in zookeeper - one that has the caches and warming configured
for production, and one that doesn't, plus any other configs you might
require.  You could re-link the old collection to the config with no
caches before you reload it.

Thanks,
Shawn



Comparing Solr Elasticsearch performance

2014-12-09 Thread Charlie Hull

Hi all,

We've been working on a study of any performance differences between 
Solr and Elasticsearch and we've also published the code we used - 
here's the background with links to Github 
http://www.flax.co.uk/blog/2014/12/09/comparing-solr-and-elasticsearch-heres-the-code-we-used/ 



Cheers

Charlie
Flax


Re: Comparing Solr Elasticsearch performance

2014-12-09 Thread Alexandre Rafalovitch
I guess when you said you did not tune instances, you really really
meant it. The Solr one looks like an example one with all the config
files and Carrot enabled, etc.

I was hoping for a bit more TodoMVC style. I guess that's for the next
lull in the client work. Still, great to have it out there.

Regards,
   Alex.
P.s. I also realized that my presentation on Solr vs. Elasticsearch
now exists in two places on Slideshare. One under my own account and
one under LucidWorks one. Hmm.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 9 December 2014 at 12:09, Charlie Hull char...@flax.co.uk wrote:
 Hi all,

 We've been working on a study of any performance differences between Solr
 and Elasticsearch and we've also published the code we used - here's the
 background with links to Github
 http://www.flax.co.uk/blog/2014/12/09/comparing-solr-and-elasticsearch-heres-the-code-we-used/

 Cheers

 Charlie
 Flax


RE: How to stop Solr tokenising search terms with spaces

2014-12-09 Thread Dinesh Babu

But my requirement is A* B*  to be A* B* . A* OR B*won't meet my requirement. 
We have chosen the NGram solution and it is working for our rquirement at the 
moment. Thanks for your input and help Yonik

Regards,
Dinesh Babu.


-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 08 December 2014 17:58
To: solr-user@lucene.apache.org
Subject: Re: How to stop Solr tokenising search terms with spaces

On Mon, Dec 8, 2014 at 12:01 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 debug output tells a lot.  Looks like in the last two examples that the 
 second part (Viewpoint*) is NOT parsed with the complex phrase parser - the 
 whitespace thwarts it.

Actually, it looks like it is, but you're not telling the complex phrase parser 
to put the two clauses in a phrase.  You need the quotes.

Even for complexphrase parser
A* B*  is the same as A* OR B*

-Yonik
http://heliosearch.org - native code faceting, facet functions, sub-facets, 
off-heap data





facet.mincount=0 returns facet values with 0 counts for q=* query

2014-12-09 Thread Abhishek Sharma
Hi,

Can any one help me understand what does it mean to have facet results like
this -

  values: [
4th of july flags,
0,
angela moore,
0,
anklets,
0,
applique flags,
0,
army national guard,
0,
bangles,
0,
beatriz ball
  ]

for a *q=** query with

*facet.mincount=0?*
What do the* results signify? *In what condition can we have *facet count
as 0* for *q=** query?


Re: facet.mincount=0 returns facet values with 0 counts for q=* query

2014-12-09 Thread Chris Hostetter

in general, a facet count of 0 means the term is in the index but does not 
match an of the docs in the result set.

if you are doing a query that matches all docs, and seeing facet values 
with a mincount of 0, that means the *term* is still in the index, but the 
documents that contained those terms have been deleted.

the tems themselves will be deleted if/when the segments containing them 
get merged away and the deleted documents are expunged.



: Date: Tue, 9 Dec 2014 23:24:54 +0530
: From: Abhishek Sharma abhishe...@unbxd.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: facet.mincount=0 returns facet values with 0 counts for q=* query
: 
: Hi,
: 
: Can any one help me understand what does it mean to have facet results like
: this -
: 
:   values: [
: 4th of july flags,
: 0,
: angela moore,
: 0,
: anklets,
: 0,
: applique flags,
: 0,
: army national guard,
: 0,
: bangles,
: 0,
: beatriz ball
:   ]
: 
: for a *q=** query with
: 
: *facet.mincount=0?*
: What do the* results signify? *In what condition can we have *facet count
: as 0* for *q=** query?
: 

-Hoss
http://www.lucidworks.com/


Re: How to stop Solr tokenising search terms with spaces

2014-12-09 Thread Yonik Seeley
On Tue, Dec 9, 2014 at 12:49 PM, Dinesh Babu dinesh.b...@pb.com wrote:

 But my requirement is A* B*  to be A* B* . A* OR B*won't meet my requirement.

The syntax is what it is...  With the complexphrase parser, if you
want at phrase, you need to surround the clauses with double quotes:
A* B*

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


We have chosen the NGram solution and it is working for our rquirement
at the moment. Thanks for your input and help Yonik

 Regards,
 Dinesh Babu.


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: 08 December 2014 17:58
 To: solr-user@lucene.apache.org
 Subject: Re: How to stop Solr tokenising search terms with spaces

 On Mon, Dec 8, 2014 at 12:01 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 debug output tells a lot.  Looks like in the last two examples that the 
 second part (Viewpoint*) is NOT parsed with the complex phrase parser - the 
 whitespace thwarts it.

 Actually, it looks like it is, but you're not telling the complex phrase 
 parser to put the two clauses in a phrase.  You need the quotes.

 Even for complexphrase parser
 A* B*  is the same as A* OR B*

 -Yonik
 http://heliosearch.org - native code faceting, facet functions, sub-facets, 
 off-heap data


Re: Comparing Solr Elasticsearch performance

2014-12-09 Thread Charlie Hull
Yes of course, starting with an OOTB configuration seemed sensible and
obviously there is scope for tuning. It occurs to me that a comparison
between tuned and OOTB Solr would also be interesting. We do sometimes find
Solr configs that are barely modified example files!

Cheers

Charlie

--
Charlie Hull
www.flax.co.uk
On Dec 9, 2014 5:22 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 I guess when you said you did not tune instances, you really really
 meant it. The Solr one looks like an example one with all the config
 files and Carrot enabled, etc.

 I was hoping for a bit more TodoMVC style. I guess that's for the next
 lull in the client work. Still, great to have it out there.

 Regards,
Alex.
 P.s. I also realized that my presentation on Solr vs. Elasticsearch
 now exists in two places on Slideshare. One under my own account and
 one under LucidWorks one. Hmm.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 9 December 2014 at 12:09, Charlie Hull char...@flax.co.uk wrote:
  Hi all,
 
  We've been working on a study of any performance differences between Solr
  and Elasticsearch and we've also published the code we used - here's the
  background with links to Github
 
 http://www.flax.co.uk/blog/2014/12/09/comparing-solr-and-elasticsearch-heres-the-code-we-used/
 
  Cheers
 
  Charlie
  Flax



Q: Does anybody asks/answer Solr questions on Stack Overflow? Why?

2014-12-09 Thread Alexandre Rafalovitch
Hello,

This is an informal survey trying to understand the community
participation patterns.

Most of the non-interactive Solr information-gathering activity is
happening on Google/Bing/DDG/Yandex/etc. That's probably very common,
though I'd love to see Google Analytics stats from websites with large
collection of Solr articles. I'd be happy to share mine in exchange.

Most of the interactive Solr discussion activity happens on this list.
Which is great. There are real experts hanging around and popping out
of shadows when the need comes. I owe my first book's success to this
community's willingness to answer my incomprehensible questions.

But there is also Stack Overflow. Which some people ask questions at
and - even smaller number of people - answer. I answer SO questions
(2^8 as of today), but don't ask there.

But I am curious about other peoples' experiences with SO. Do you ask
questions in that forum? Do you answer? Why? How do you compare that
support channel with this one? Did you migrate from one to another?
Private replies are welcome, though I suspect this topic might be
interesting for public discussion too.

Regards,
   Alex.
P.s. This is related to my next Solr book, if somebody is really
confused right now.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


RE: Does anybody asks/answer Solr questions on Stack Overflow? Why?

2014-12-09 Thread Toke Eskildsen
Alexandre Rafalovitch [arafa...@gmail.com] wrote:
 But I am curious about other peoples' experiences with SO. Do you ask
 questions in that forum? Do you answer? Why? How do you compare that
 support channel with this one? Did you migrate from one to another?

I have answered a few questions on StackOverflow, but do not consider myself an 
active member. I see SO as strict question-answer. I prefer the more open 
dialogue form of the mailing list and accept the extra noise. Look what I 
made is a fine conversation starter, that I see no room for on SO.

- Toke Eskildsen


Re: Q: Does anybody asks/answer Solr questions on Stack Overflow? Why?

2014-12-09 Thread Michael Sokolov
Alex, I spent some time answering questions there, but got ultimately 
got turned off by the competitive nature of it. I wanted to increase my 
score -- fun! But if you are not watching it all the time, the questions 
go by very fast, and you lose your edge.  The typical pattern seems to 
be: so-so answer gets selected as correct, and then the really good 
thoughtful answer comes along later. Not to say I don't value it, but it 
seems to demand a kind of intensity of effort that I didn't have time 
for -- it's much easier to find good answers there than it is to find 
good questions.


-Mike

On 12/09/2014 02:03 PM, Alexandre Rafalovitch wrote:

Hello,

This is an informal survey trying to understand the community
participation patterns.

Most of the non-interactive Solr information-gathering activity is
happening on Google/Bing/DDG/Yandex/etc. That's probably very common,
though I'd love to see Google Analytics stats from websites with large
collection of Solr articles. I'd be happy to share mine in exchange.

Most of the interactive Solr discussion activity happens on this list.
Which is great. There are real experts hanging around and popping out
of shadows when the need comes. I owe my first book's success to this
community's willingness to answer my incomprehensible questions.

But there is also Stack Overflow. Which some people ask questions at
and - even smaller number of people - answer. I answer SO questions
(2^8 as of today), but don't ask there.

But I am curious about other peoples' experiences with SO. Do you ask
questions in that forum? Do you answer? Why? How do you compare that
support channel with this one? Did you migrate from one to another?
Private replies are welcome, though I suspect this topic might be
interesting for public discussion too.

Regards,
Alex.
P.s. This is related to my next Solr book, if somebody is really
confused right now.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853




Re: Q: Does anybody asks/answer Solr questions on Stack Overflow? Why?

2014-12-09 Thread Chris Hostetter

: But I am curious about other peoples' experiences with SO. Do you ask
: questions in that forum? Do you answer? Why? How do you compare that
: support channel with this one? Did you migrate from one to another?
: Private replies are welcome, though I suspect this topic might be
: interesting for public discussion too.

There aren't enough hours in the day for me to read/respond to every 
question/discussion that i want to help out with the solr-user@lucene 
mailing list -- i'm certainly not going to sacrifice any of the time i 
have available to help out in this official community/support discussion 
@apache to create content for a 3rd party company that generates ad 
revenue based on the answers people provide for free, and might not be 
arround next week/month/year to keep an archive for future users.

(don't even get me started on quora and their bullshit you can't even 
*read* the content foolish people wrote for us for free w/o giving us 
access to your social graph)


-Hoss
http://www.lucidworks.com/


Re: Q: Does anybody asks/answer Solr questions on Stack Overflow? Why?

2014-12-09 Thread Alexandre Rafalovitch
On 9 December 2014 at 16:05, Chris Hostetter hossman_luc...@fucit.org wrote:
 (don't even get me started on quora and their bullshit you can't even
 *read* the content foolish people wrote for us for free w/o giving us
 access to your social graph)
+2 on Quora annoyance. It's even worse than that, because they don't
seem to organize content by recency. Just some sort interest-order
useful for themselves. I only comment on Quora - very occasionally -
to drive clueless to better forums through the links.

Regards,
   Alex.


Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: CLUSTERSTATUS timeout

2014-12-09 Thread Shalin Shekhar Mangar
Hi Jonathan,

That shouldn't happen. The API returns the answer from the Overseer node
(via ZK) and should return immediately. The API will timeout after 180s if
somehow it cannot get a response from Overseer. I don't see why it would
timeout. What's the read timeout on your monitoring system?

About your shards getting marked as down, do you send commits explicitly?
If yes, you may be running into SOLR-6530 which is fixed in 4.10.2.

On Mon, Dec 8, 2014 at 9:45 PM, Hutchins, Jonathan jhutch...@webmd.net
wrote:

 We are currently running Solr 4.10.0 in production.  We have run into an
 issue where we cannot have our monitoring system hit the CLUSTERSTATUS api
 command every five minutes (or even as long as every hour) without getting
 a significant number of timeouts on the command.  Does this command return
 a timeout message after a period of time (and we aren’t waiting long
 enough) or will it hang indefinitely?  This is a major problem for us
 because we are finding that shards will randomly get marked down, turn red
 in the web console, but there is no indication that they are down except if
 we run that command.  Please help!

 Thanks!

 - Jonathan




-- 
Regards,
Shalin Shekhar Mangar.


# of daily/weekly/monthly Solr downloads?

2014-12-09 Thread Otis Gospodnetic
Hi,

Does anyone know the number of daily/weekly/monthly Solr downloads?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


Re: Get matched Term in join query

2014-12-09 Thread Peter Sturge
Hi,

Your question is a good one - I have added an option to search through
results and filter that way, but it's not ideal, as very often there are
10,000 or millions of hits, with only 20 results per page returned.

I've realized I run into the classic 'Terms-can't-filtered' issue. To
filter Terms would, in the worst case, mean looking up a great many items.
For now, I'm going with the TermsComponent added to the standard
searchhandler. The drawback is you get back all terms that match the
terms.regex, even those not necessarily in the results.

Many thanks,
Peter



On Tue, Dec 9, 2014 at 7:32 AM, Mikhail Khludnev mkhlud...@griddynamics.com
 wrote:

 Hello Peter,

 Let's limit or just fix the problem definition. I've got that dealing with
 cross core join id mandatory. Is it right?
 Then, do you need facets (from all resultset) or just a snippets (just from
 result page)?
 09.12.2014 1:23 пользователь Peter Sturge peter.stu...@gmail.com
 написал:

  Hi Forum,
 
  Is it possible for a Solr query to return the term(s) that matched a
  particular field/query?
 
  For example, let's say there's a field like this:
  raw=This is a raw text field that happens to contain some text that's
 also
  in the action field value...
 
  And another field in a different index like this:
  action=contain
 
  And they are tokenized on whitespace.
 
  If my query is:
  q={!join from=action to=raw fromIndex=TheActionIndex}*
 
  If 'action' was in the same index, it would be ok, but
  the problem is the match in 'TheActionIndex' isn't returned as it's in a
  different index.
 
  The query returns matching raw documents, but not *which* term was
 matched
  to cause it to be returned.
  I've tried the highlighting trick, but that doesn't work here - it
 returns
  highlighting on all terms.
  It would be great to get these back as facets, but getting them back at
 all
  would be great.
 
  Is it possible to have the query return which term(s) from 'raw' actually
  matched the value in 'action'?
  Maybe an extended TermsComponent to add only matched terms to the
 response
  payload or similar?
 
  Many thanks,
  Peter
 



Re: Disappearance of post.jar from the new tutorial

2014-12-09 Thread Chris Hostetter

: Subject: Re: Disappearance of post.jar from the new tutorial
: 
: I removed reference to it as the same class is in solr-core's JAR. 
: 
: The idea is to hide the details behind bin/post and before end of year 
: (before 5.0 release at least) to get that taken care of.

This doesn't make any sense to me.

If/When there is a bin/post available that serves as a suitable 
replacement for post.jar, then i'm all in favor of encouraging it's use 
and refering to it in the tutorial.

Until that happens however, it seems absurd to me that the tutorial would 
tell people they should jump through hoops of setting their CLASSPATH 
(which has to be done differnetly on differnet OSs) and then explicitly 
spell out the fully qualified classname of SimplePostTool when post.jar is 
sitting right there (right next to the example docs) ready  waiting to be 
used.


-Hoss
http://www.lucidworks.com/


Re: SOLR shards stay down forever

2014-12-09 Thread Norgorn
The problem is, that hard commit is on, max uncommited docs = 500.000.
And tlog size is just about 200 MB per shard - doesn't seem too big for me.

The reason of my panic is the fact, that one shard in my old collection is
down forever, without any unusual entries in logs. I tried different magic
(deleting shard, checking index with Luke, restarting SOLR), but nothing
helped. Tlog is empty, there was no updates, but it's just down.

And, sometimes other shards become down if I restart SOLR, some of them are
down until re-creating (deleting shard with its SOLR being stopped and the
starting SOLR).
I just don't understand some logic with shards in SOLR - why are they down
after restart, if there wasn't any updates in index.

Anyway, thank u, now all new shards are active, so forgive me for my panic.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-shards-stay-down-forever-tp4173284p4173452.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AW: AW: Keeping capitalization in suggestions?

2014-12-09 Thread Ryan Yacyshyn
Hi Clemens,

I recently added typeahead functionality to something I'm playing with and
I used the EdgeNGramFilterFactory to help. I just tried this out after
adding a doc with Chamäleon in my title.

I was able to get Chamäleon, with a capital C, returned I searched for
chama, Chama, chamã, and Chamã.

Here's what I have in my files:

-
solrconfig.xml:

requestHandler name=/suggest_movie class=solr.SearchHandler
  lst name=defaults
str name=wtjson/str
str name=defTypeedismax/str
str name=rows10/str
str name=omitHeadertrue/str !-- keeping the response as lean as
possible so not returning header info.. --
str name=flvalue:title/str !-- only returning 'title', and I
want that key to be called 'value' in the response.. --
str name=qftitle^10 suggest_ngram/str !-- boosting title to show
on top if exact match with query.. --
  /lst
/requestHandler

-
schema.xml:

fieldType name=text_suggest_ngram class=solr.TextField
positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.UAX29URLEmailTokenizerFactory /
   filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_en.txt /
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.ASCIIFoldingFilterFactory /
   filter class=solr.EnglishPossessiveFilterFactory /
   filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=10 / !-- create edge n-grams of each term when indexing,
not when querying.. --
 /analyzer
 analyzer type=query
   tokenizer class=solr.UAX29URLEmailTokenizerFactory /
   filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_en.txt /
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.ASCIIFoldingFilterFactory /
   filter class=solr.EnglishPossessiveFilterFactory /
 /analyzer
/fieldType

...

field name=suggest_ngram type=text_suggest_ngram indexed=true
stored=false /

...

copyField source=title dest=suggest_ngram /

-
request:

http://localhost:8983/solr/movies/suggest_movie?q=chama

-
response:

{
response: {
numFound: 1,
start: 0,
docs: [
{
value: Chamäleon
}
]
}
}

Hope this helps?

Ryan




On Tue Dec 09 2014 at 7:21:02 AM Michael Sokolov 
msoko...@safaribooksonline.com wrote:

 Clemens --

what I do (see suggestions of titles of books on $EMPLOYER's web
 site) is to define a field with no analysis (type=keyword, use
 KeywordAnalyzer) and build the suggestions from that.  Then tell AIS to
 use an analyzer internally to pick out word from that (StandardAnalyzer,
 or WhitespaceAnalyzer, with LowerCaseFilter - however you want the
 matching to work in the suggester).  It will return the terms from the
 source field.

 You didn't show the definition of your suggest field - I expect it
 must be analyzed, right?  Just don't do that.

 -Mike

 On 12/09/2014 08:58 AM, Clemens Wyss DEV wrote:
  Thanks for all the insightful links.
  I tried http://www.cominvent.com/2012/01/25/super-flexible-autocompl
 ete-with-solr but that approach returns searchresults instead of
 term-suggestions.
 
  I have (at the moment) a solution based on http://wiki.apache.org/solr/
 TermsComponent . But I might want multi-term-suggestions (and fuzzyness).
  Therefore I'd be very much interested how AnalyzingInfixLookupFactory
 (or any other suggest-component) would allow to
  a) return case-sensitive suggestions (i.e. as-indexed/stored)
  b) allow case-insensitive suggestion-lookup
  ?
  Anybody else doing what I'd like to do?
 
  -Ursprüngliche Nachricht-
  Von: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
  Gesendet: Montag, 8. Dezember 2014 19:25
  An: solr-user@lucene.apache.org
  Betreff: Re: AW: Keeping capitalization in suggestions?
 
  Hi Clemens,
 
  There a a number of ways to implement auto complete/suggest. Some of
 them pull data from indexed terms, therefore they will be lowercased. Some
 pull data from stored values, therefore capitalisation is preserved.
 
  Here are great resources on this topic.
 
  https://lucidworks.com/blog/auto-suggest-from-popular-querie
 s-using-edgengrams/
  http://blog.trifork.com/2012/02/15/different-ways-to-make-au
 to-suggestions-with-solr/
  http://www.cominvent.com/2012/01/25/super-flexible-autocompl
 ete-with-solr/
 
  Ahmet
 
 
  On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV 
 clemens...@mysign.ch wrote:
 
  Allthough making use of AnalyzingInfixSuggester I still getting either
 or.
 
  When lowercase-filter is active I always get suggestions, BUT they are
 lowercased (i.e. chamäleon).
  When lowercase-filter is not active I only get suggestions when querying
 Chamä
 
  my solrconfig.xml
  ...
   requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest
   lst name=defaults
   str name=echoParamsnone/str
   str name=wtjson/str
   str name=indentfalse/str
   str 

Solr Composite Unique key from existing fields in schema

2014-12-09 Thread Rajesh Panneerselvam
Hi,
I'm using DIH to index my entities. I'm facing an issue while delta-import. 
I've declared multiple entities in one data-config.xml. The entities will have 
different primary key. Now if I want to delta-import how should I mention the 
UniqueKey in schema.xml.
My data-config structure is like this
document
  entity/entity
  entity/entity
  entity/entity
/document


Thanks
Rajesh
[Aspire Systems]

This e-mail message and any attachments are for the sole use of the intended 
recipient(s) and may contain proprietary, confidential, trade secret or 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited and may be a violation of law. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message.


Re: Solr Composite Unique key from existing fields in schema

2014-12-09 Thread Ahmet Arslan
Hi,

Once I used template transformer to generate unique id across entities.

http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer




On Wednesday, December 10, 2014 8:51 AM, Rajesh Panneerselvam 
rajesh.panneersel...@aspiresys.com wrote:
Hi,
I'm using DIH to index my entities. I'm facing an issue while delta-import. 
I've declared multiple entities in one data-config.xml. The entities will have 
different primary key. Now if I want to delta-import how should I mention the 
UniqueKey in schema.xml.
My data-config structure is like this
document
  entity/entity
  entity/entity
  entity/entity
/document


Thanks
Rajesh
[Aspire Systems]

This e-mail message and any attachments are for the sole use of the intended 
recipient(s) and may contain proprietary, confidential, trade secret or 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited and may be a violation of law. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message. 


RE: # of daily/weekly/monthly Solr downloads?

2014-12-09 Thread Alexey Kozhemiakin
Hi, according to slides #3 it's 250,000+ monthly downloads.

http://www.slideshare.net/anshumg/ease-of-use-in-apache-solr

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Wednesday, December 10, 2014 01:25
To: solr-user@lucene.apache.org
Subject: # of daily/weekly/monthly Solr downloads?

Hi,

Does anyone know the number of daily/weekly/monthly Solr downloads?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/