Re: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-18 Thread Nitin Solanki
Hi James,
 How to see the suggestions of
spellcheck.alternativeTermCount ?

On Wed, Feb 18, 2015 at 11:09 AM, Nitin Solanki nitinml...@gmail.com
wrote:

 Thanks James,
   I tried the same thing
 spellcheck.count=10spellcheck.alternativeTermCount=5. And I got 5
 suggestions of both life and hope but not like this * The spellchecker
 will try to return you up to 10 suggestions for hope, but only up to 5
 suggestions for life. *


 On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James james.d...@ingramcontent.com
  wrote:

 Here is an example to illustrate what I mean...

 - query q=text:(life AND
 hope)spellcheck.count=10spellcheck.alternativeTermCount=5
 - suppose at least one document in your dictionary field has life in it
 - also suppose zero documents in your dictionary field have hope in them
 - The spellchecker will try to return you up to 10 suggestions for
 hope, but only up to 5 suggestions for life

 James Dyer
 Ingram Content Group


 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Tuesday, February 17, 2015 11:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

 Hi James,
 How can you say that count doesn't use
 index/dictionary then from where suggestions come.

 On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James 
 james.d...@ingramcontent.com
 wrote:

  See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count
 and
  the following section, for details.
 
  Briefly, count is the # of suggestions it will return for terms that
 are
  *not* in your index/dictionary.  alternativeTermCount are the # of
  alternatives you want returned for terms that *are* in your dictionary.
  You can set them to the same value, unless you want fewer suggestions
 when
  the terms is in the dictionary.
 
  James Dyer
  Ingram Content Group
 
  -Original Message-
  From: Nitin Solanki [mailto:nitinml...@gmail.com]
  Sent: Tuesday, February 17, 2015 5:27 AM
  To: solr-user@lucene.apache.org
  Subject: spellcheck.count v/s spellcheck.alternativeTermCount
 
  Hello Everyone,
I got confusion between spellcheck.count and
  spellcheck.alternativeTermCount in Solr. Any help in details?
 





Why collations are coming even I set the value of spellcheck.count to zero(0)

2015-02-18 Thread Nitin Solanki
Hi Everyone,
I have set the value of spellcheck.count = 0 and
spellcheck.alternativeTermCount = 0. Even though collations are coming when
I search any query which is misspelled. Why so?
I also set the value of spellcheck.maxCollations = 100 and
spellcheck.maxCollationTries = 100. What I know that collations are built
on suggestions. So, Have I any misunderstanding about collation or any
other configuration issue. Any help Please?


Re: Solrcloud sizing

2015-02-18 Thread Dominique Bejean
Hi Toke,

Thank you for your response.

Here is some precisions.


 - The same terms will occurs several time for a given field (from 10
   to 100.000)

Do you mean that any term is only present in a limited number (up to
about 100K) of documents or do you mean that some documents has fields
with content like  foo bar foo foo zoo foo...?

If any terms is only present in a maximum of 100K documents or 100K/15M
~= 0.0006% of the full document count, then you have a lot of unique
terms.

I ask because a low number of unique terms probably means that searches
will result in a lot of hits, which can be heavy when we're talking
billions. Or to ask more directly: How many hits do you expect a typical
search match and how many will you return?


I mean that string fields will content in general one term (color, brand,
type), but according to fields, a same value can occurs in 10 to 100.000
documents.

So, a filter query on one field can return a result from 10 to 100.000
documents (of course, we will paginate)

Queries date scope will often :
* be on the previous day
* or be on the last few days
* or be successively with the same filter query but with a changing date
frame (last day, last week, week before, ...)

Regards

Dominique


2015-02-18 10:35 GMT+01:00 Toke Eskildsen t...@statsbiblioteket.dk:

 On Wed, 2015-02-18 at 01:40 +0100, Dominique Bejean wrote:

 (I reordered the requirements)

   - Collection size : 15 billions document
   - Document size is nearly 300 bytes
  - 1 billion documents indexed = 5Gb index size
  - Collection update : 8 millions new documents / days + 8 millions
 deleted documents / days
   - Updates occur during the night without queries
  - Document fields are mainly string including one date field

 That does not sound too scary. Back of the envelope: If you can handle
 500 updates/second (which I would guess would be easy with such small
 documents), the update phase would be done in 4 hours.

  - The same terms will occurs several time for a given field (from 10
to 100.000)

 Do you mean that any term is only present in a limited number (up to
 about 100K) of documents or do you mean that some documents has fields
 with content like  foo bar foo foo zoo foo...?

 If any terms is only present in a maximum of 100K documents or 100K/15M
 ~= 0.0006% of the full document count, then you have a lot of unique
 terms.

 I ask because a low number of unique terms probably means that searches
 will result in a lot of hits, which can be heavy when we're talking
 billions. Or to ask more directly: How many hits do you expect a typical
 search match and how many will you return?

 - Queries occur during the day without updates
 - Query will use a date period and a filter query on one or more
 fields

 The initial call with date range filtering can be a bit heavy. Will you
 have a lot of requests for unique date ranges or will they typically be
 re-used (and thus nearly free)?

 - 10.000 queries / minutes
 - expected response time  500ms

 As Erick says, do prototype. With 5GB/1b documents, you could run a
 fairly telling prototype off a desktop or a beefy desktop for that
 matter. Just remember to test with 2 or more shards as there is a
 performance penalty for the switch from single- to multi-shard.


 As a yard stick, our setup has 7 billion documents (22TB index) with a
 fair bit of freetext: https://sbdevel.wordpress.com/net-archive-search/

 At one point we tried hammering it with simple queries (personal
 security numbers) for simple result sets (no faceting) and got a
 throughput of about 50 documents/sec.

 - no ssd drives

 No need with such tiny (measured in bytes) indexes. Just go with the
 oft-given advice of ensuring enough RAM for full disk cache.

  So, what is you advice about :
 
  # of shards : 15 billions documents - 16 shards ?

 Standard advice is to make smaller shards for lower latency, but as you
 will likely be CPU bound with a small index (in bytes) and a high query
 rate, that probably won't help your throughput.


 - Toke Eskildsen, State and University Library, Denmark





Re: Solrcloud sizing

2015-02-18 Thread Toke Eskildsen
On Wed, 2015-02-18 at 01:40 +0100, Dominique Bejean wrote:

(I reordered the requirements)

  - Collection size : 15 billions document
  - Document size is nearly 300 bytes
 - 1 billion documents indexed = 5Gb index size
 - Collection update : 8 millions new documents / days + 8 millions
deleted documents / days
  - Updates occur during the night without queries
 - Document fields are mainly string including one date field

That does not sound too scary. Back of the envelope: If you can handle
500 updates/second (which I would guess would be easy with such small
documents), the update phase would be done in 4 hours.

 - The same terms will occurs several time for a given field (from 10
   to 100.000)

Do you mean that any term is only present in a limited number (up to
about 100K) of documents or do you mean that some documents has fields
with content like  foo bar foo foo zoo foo...?

If any terms is only present in a maximum of 100K documents or 100K/15M
~= 0.0006% of the full document count, then you have a lot of unique
terms.

I ask because a low number of unique terms probably means that searches
will result in a lot of hits, which can be heavy when we're talking
billions. Or to ask more directly: How many hits do you expect a typical
search match and how many will you return?

- Queries occur during the day without updates
- Query will use a date period and a filter query on one or more fields

The initial call with date range filtering can be a bit heavy. Will you
have a lot of requests for unique date ranges or will they typically be
re-used (and thus nearly free)?

- 10.000 queries / minutes
- expected response time  500ms

As Erick says, do prototype. With 5GB/1b documents, you could run a
fairly telling prototype off a desktop or a beefy desktop for that
matter. Just remember to test with 2 or more shards as there is a
performance penalty for the switch from single- to multi-shard.


As a yard stick, our setup has 7 billion documents (22TB index) with a
fair bit of freetext: https://sbdevel.wordpress.com/net-archive-search/

At one point we tried hammering it with simple queries (personal
security numbers) for simple result sets (no faceting) and got a
throughput of about 50 documents/sec.

- no ssd drives

No need with such tiny (measured in bytes) indexes. Just go with the
oft-given advice of ensuring enough RAM for full disk cache.

 So, what is you advice about :
 
 # of shards : 15 billions documents - 16 shards ?

Standard advice is to make smaller shards for lower latency, but as you
will likely be CPU bound with a small index (in bytes) and a high query
rate, that probably won't help your throughput.


- Toke Eskildsen, State and University Library, Denmark




Internal document format for Solr 4.10.2

2015-02-18 Thread dinesh naik
Hi,
Is there a way to read the internal document once solr does the indexing ?

Also is there a possibility to store this internal document in xml format ?

-- 
Best Regards,
Dinesh Naik


How to place whole indexed data on cache

2015-02-18 Thread Nitin Solanki
Hi,
 How can I place whole indexed data on cache by which if I will
search any query then I will get response, suggestions, collations rapidly.
And also how to view that which documents are on cache and how to verify it?


Re: Solrcloud, no puts anymore and tons of “update?update.distrib=TOLEADER”

2015-02-18 Thread Shawn Heisey
On 2/18/2015 9:22 AM, Abdelali AHBIB wrote:
 with Collections API, they still some config files in
 /solr/config/Xunused_collection, I deleted them also manualy

 2015-02-18 16:16 GMT+00:00 Dominique Bejean dominique.bej...@eolya.fr:

 When you say I renamed some cores, cleaned other unused ones that we don't
 need anymore etc, how did you do this ?
 With Cores or Collections API or by deleting core's directories in Solr
 Home ?

The collections API cannot rename anything, so I suspect that you went
into the Core Admin section of the admin UI and clicked the rename
button, or otherwise used the CoreAdmin API rename functionality via
http.  If you do anything in the Core Admin screen other than adding
cores or optimizing, bad things WILL happen to SolrCloud.  Incorrectly
using the add core functionality can also cause problems, but it's
less likely.

Unless you completely understand how SolrCloud works internally, the
only safe way to change things in SolrCloud is with the Collections API
and the zkcli script, and none of the that functionality is currently
exposed via the admin interface.  You have to manually make all those
requests.

Thanks,
Shawn



Re: solr output in custom json format

2015-02-18 Thread meena.sri...@mathworks.com
Sorry I was missing the actual part that is without parsing the json output.
I was looking in to Solrj 
QueryReponse.getBeans(Syndrome.class) , but how do I embed highlighting
snippet inside each of the Syndrome object itself.

Thanks
meena
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-output-in-custom-json-format-tp4187200p4187202.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr output in custom json format

2015-02-18 Thread meena.sri...@mathworks.com
I need to create custom json format of solr output for a specific UI. I was
wondering if there is a way to embed highlighting portion inside docs
itself.

Thanks
Meena



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-output-in-custom-json-format-tp4187200.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr output in custom json format

2015-02-18 Thread Erik Hatcher
I think what ideally is needed here is an implementation for this open issue:

   https://issues.apache.org/jira/browse/SOLR-3479 
https://issues.apache.org/jira/browse/SOLR-3479


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On Feb 18, 2015, at 12:51 PM, meena.sri...@mathworks.com wrote:
 
 Sorry I was missing the actual part that is without parsing the json output.
 I was looking in to Solrj 
 QueryReponse.getBeans(Syndrome.class) , but how do I embed highlighting
 snippet inside each of the Syndrome object itself.
 
 Thanks
 meena
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-output-in-custom-json-format-tp4187200p4187202.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: How to achieve lemmatization for english words in Solr 4.10.2

2015-02-18 Thread Dinesh Naik
Hi Jack,
We are looking for something like this-
For example if you search for a text -go

We should also get other forms of this text like going,gone,goes etc.

This is not being achieved via stemming.



-Original Message-
From: Jack Krupansky jack.krupan...@gmail.com
Sent: ‎18-‎02-‎2015 21:50
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Subject: Re: How to achieve lemmatization for english words in Solr 4.10.2

Please provide a few examples that illustrate your requirements.
Specifically, requirements that are not met by the existing Solr stemming
filters. What is your specific goal?

-- Jack Krupansky

On Wed, Feb 18, 2015 at 10:50 AM, dinesh naik dineshkumarn...@gmail.com
wrote:

 Hi,
 IS there a way to achieve lemmatization in Solr? Stemming option is not
 meeting the requirement.

 --
 Best Regards,
 Dinesh Naik



Re: Solr suggest is related to second letter, not to initial letter

2015-02-18 Thread Volkan Altan
Yes. I did it. Bu it doesn’t work.

New Example;

TSTLookup

doc 1 : shoe adidas 2 hiking
doc 2 : galaxy samsung s5 phone
doc 3 : shakeology sample packets

http://localhost:8983/solr/solr/suggest?q=samsung+hi

response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=samsung
int name=numFound2/int
int name=startOffset0/int
int name=endOffset7/int
arr name=suggestion
strsamsung s5/str
strsamsung s5 phone/str
/arr
/lst
lst name=hi
int name=numFound1/int
int name=startOffset8/int
int name=endOffset10/int
arr name=suggestion
strhiking/str
/arr
/lst
lst name=collation
str name=collationQuery(samsung s5) hiking/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=samsungsamsung s5/str
str name=hihiking/str
/lst
/lst
lst name=collation
str name=collationQuery(samsung s5 phone) hiking/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=samsungsamsung s5 phone/str
str name=hihiking/str
/lst
/lst
lst name=collation
str name=collationQuerysamsung hiking/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=samsungsamsung/str
str name=hihiking/str
/lst
/lst
/lst
/lst
/response

field name=suggestions type=suggest_term indexed=true multiValued=true 
stored=false omitNorms=true/
fieldType name=suggest_term class=solr.TextField 
positionIncrementGap=100
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-PunctuationToSpace.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 
preserveOriginal=1 /
filter class=solr.TrimFilterFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ShingleFilterFactory minShingleSize=2 
maxShingleSize=4 outputUnigrams=true/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
/analyzer
analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-PunctuationToSpace.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 
preserveOriginal=0 /
filter class=solr.TrimFilterFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ShingleFilterFactory minShingleSize=2 
maxShingleSize=4 outputUnigrams=true/
filter class=solr.ApostropheFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
/analyzer
/fieldType

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namedefault/str
str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str 
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
str name=fieldsuggestions/str  !-- the indexed field to 
derive suggestions from --
float name=threshold0.1/float
str name=buildOnCommittrue/str
/lst
str name=queryAnalyzerFieldTypesuggest_term/str
/searchComponent
!-- auto-complete --
requestHandler name=/suggest class=solr.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.buildfalse/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count10/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollations10/str
str name=spellcheck.maxCollationTries100/str
/lst
arr name=components
strsuggest/str
/arr
/requestHandler

---

FreeTextLookupFactory

doc 1 : shoe adidas 2 hiking
doc 2 : galaxy samsung s5 phone
doc 3 : shakeology sample packets

http://localhost:8983/solr/solr/suggest?q=samsung+hi

response
lst name=responseHeader
int name=status0/int
int name=QTime3/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=samsung
int name=numFound9/int
int name=startOffset0/int
int name=endOffset7/int
arr name=suggestion
strsamsung s5/str
strsamsung s5 phone/str
strsamsung s5 phone galaxy/str
strsamsung s5 phone samsung/str
strsamsung s5 phone samsung samsungs5phone/str
strsamsung s5 phone samsung samsungs5phone s5/str
strsamsung s5 

Re: Solrcloud, no puts anymore and tons of “update?update.distrib=TOLEADER”

2015-02-18 Thread Abdelali AHBIB
Thank you Dominique and Shawn, now I see that clusterstate.json does not
reflect current number of cores in shard2, there are duplicated cores in
all collections like this, how can I edit clusterstate.json :

[image: Images intégrées 1]

2015-02-18 16:54 GMT+00:00 Shawn Heisey apa...@elyograg.org:

 On 2/18/2015 9:22 AM, Abdelali AHBIB wrote:
  with Collections API, they still some config files in
  /solr/config/Xunused_collection, I deleted them also manualy
 
  2015-02-18 16:16 GMT+00:00 Dominique Bejean dominique.bej...@eolya.fr:
 
  When you say I renamed some cores, cleaned other unused ones that we
 don't
  need anymore etc, how did you do this ?
  With Cores or Collections API or by deleting core's directories in Solr
  Home ?

 The collections API cannot rename anything, so I suspect that you went
 into the Core Admin section of the admin UI and clicked the rename
 button, or otherwise used the CoreAdmin API rename functionality via
 http.  If you do anything in the Core Admin screen other than adding
 cores or optimizing, bad things WILL happen to SolrCloud.  Incorrectly
 using the add core functionality can also cause problems, but it's
 less likely.

 Unless you completely understand how SolrCloud works internally, the
 only safe way to change things in SolrCloud is with the Collections API
 and the zkcli script, and none of the that functionality is currently
 exposed via the admin interface.  You have to manually make all those
 requests.

 Thanks,
 Shawn




-- 
Cordialement,
Abdelali AHBIB


Re: How to place whole indexed data on cache

2015-02-18 Thread Dominique Bejean
Hi,

As Shawn said, install enough memory in order that all free direct memory
(non heap memory) be used as disk cache.
Use 40% maximum of the available memory for heap memory (Xmx JVM
parameter), but never more than 32 Gb

And avoid your server to swap.
For most Linux systems, this is configured using the /etc/sysctl.conf value:
vm.swappiness = 1
This prevents swapping under normal circumstances, but still allows the OS
to swap under emergency memory situations.
A swappiness of 1 is better than 0, since on some kernel versions a
swappiness of 0 can invoke the OOM-killer

http://askubuntu.com/questions/103915/how-do-i-configure-swappiness
http://unix.stackexchange.com/questions/88693/why-is-swappiness-set-to-60-by-default

Dominique
http://www.eolya.fr/


2015-02-18 14:39 GMT+01:00 Shawn Heisey apa...@elyograg.org:

 On 2/18/2015 4:20 AM, Nitin Solanki wrote:
   How can I place whole indexed data on cache by which if I will
  search any query then I will get response, suggestions, collations
 rapidly.
  And also how to view that which documents are on cache and how to verify
 it?

 Simply install enough extra memory in your machine for the entire index
 to fit in RAM that is not being used by programs ... and then do NOT
 allocate that extra memory to any program.

 The operating system will automatically do the caching for you as part
 of normal operation, no config required.

 https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

 Relevant articles referenced by that wiki page:

 http://en.wikipedia.org/wiki/Page_cache
 http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

 Thanks,
 Shawn




Can Solr meet my requirements?

2015-02-18 Thread Cupidvogel
I went through the Solr documentation, and it seemed pretty good. However, I
have a different requirement. I have this scenario - I will provides list of
words, each corresponds to a particular position. Say this array of tuples,
where each tuple consists of the word and its position (the position may not
be contiguous, that is obtained by simply obtained by adding the number of
characters in each word) - [{Hello,0}, {I, 12}, {am, 21}, {Cupidvogel,
25}...]. This is what I can input to Solr (of course I can modify the data
format to suit Solr). While searching, 

1. I may search for a whole word from one of the tuples, say 'Hello', in
which case I want Solr to return the corresponding numbers in the tuples
where the word is present. - [0].

2. I search for a substring of one of the words, say 'e'. In which case I
want Solr to return all instances of tuples and the positions where 'e' is a
substring of the word. - [0,25...].

3 I want o search a string consisting of multiple words that are present in
the array of tuples contiguously, and if there, I want Solr to return the
number corresponding to the first word in each such string. So if search 'I
am', I must get [12...].

4. The same as above, but the first word can be a suffix, and the last word
can be a prefix like ' ello I am Cupidvo', in which case I want [0...] (
'ello' belongs to 'Hello' and 'Cupidvo' belongs to 'Cupidvogel': ''Hello',
'I', 'am' and 'Cupidvogel' are contiguous entries in the array )

Can Solr provide such capability?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-Solr-meet-my-requirements-tp4187211.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-18 Thread Dyer, James
It will try to give you suggestions up to the number you specify, but if fewer 
are available it will not give you any more.

James Dyer
Ingram Content Group

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:40 PM
To: solr-user@lucene.apache.org
Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

Thanks James,
  I tried the same thing
spellcheck.count=10spellcheck.alternativeTermCount=5. And I got 5
suggestions of both life and hope but not like this * The spellchecker
will try to return you up to 10 suggestions for hope, but only up to 5
suggestions for life. *


On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James james.d...@ingramcontent.com
wrote:

 Here is an example to illustrate what I mean...

 - query q=text:(life AND
 hope)spellcheck.count=10spellcheck.alternativeTermCount=5
 - suppose at least one document in your dictionary field has life in it
 - also suppose zero documents in your dictionary field have hope in them
 - The spellchecker will try to return you up to 10 suggestions for hope,
 but only up to 5 suggestions for life

 James Dyer
 Ingram Content Group


 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Tuesday, February 17, 2015 11:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

 Hi James,
 How can you say that count doesn't use
 index/dictionary then from where suggestions come.

 On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James 
 james.d...@ingramcontent.com
 wrote:

  See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
  the following section, for details.
 
  Briefly, count is the # of suggestions it will return for terms that
 are
  *not* in your index/dictionary.  alternativeTermCount are the # of
  alternatives you want returned for terms that *are* in your dictionary.
  You can set them to the same value, unless you want fewer suggestions
 when
  the terms is in the dictionary.
 
  James Dyer
  Ingram Content Group
 
  -Original Message-
  From: Nitin Solanki [mailto:nitinml...@gmail.com]
  Sent: Tuesday, February 17, 2015 5:27 AM
  To: solr-user@lucene.apache.org
  Subject: spellcheck.count v/s spellcheck.alternativeTermCount
 
  Hello Everyone,
I got confusion between spellcheck.count and
  spellcheck.alternativeTermCount in Solr. Any help in details?
 



RE: Why collations are coming even I set the value of spellcheck.count to zero(0)

2015-02-18 Thread Dyer, James
I think when you set count/alternativeTermCount to zero, the defaults (10?) 
are used instead.  Instead of setting these to zero, just use 
spellcheck=false.  These 2 parameters control suggestions, not collations.

To turn off collations, set spellcheck.collate=false.  Also, I wouldn't set 
maxCollationTries as high as 100, as it could (sometimes) potentially check 
100 possibly collations against the index and that would be very slow.

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Wednesday, February 18, 2015 2:37 AM
To: solr-user@lucene.apache.org
Subject: Why collations are coming even I set the value of spellcheck.count to 
zero(0)

Hi Everyone,
I have set the value of spellcheck.count = 0 and
spellcheck.alternativeTermCount = 0. Even though collations are coming when
I search any query which is misspelled. Why so?
I also set the value of spellcheck.maxCollations = 100 and
spellcheck.maxCollationTries = 100. What I know that collations are built
on suggestions. So, Have I any misunderstanding about collation or any
other configuration issue. Any help Please?


Re: Solrcloud, no puts anymore and tons of “update?update.distrib=TOLEADER”

2015-02-18 Thread Dominique Bejean
Hi,

When you say I renamed some cores, cleaned other unused ones that we don't
need anymore etc, how did you do this ?
With Cores or Collections API or by deleting core's directories in Solr
Home ?

Dominique
http://www.eolya.fr


2015-02-18 17:04 GMT+01:00 Abdelali AHBIB alifar...@gmail.com:

 Hello,

 We use solrcloud with two shards (no replication for now), zookeeper is in
 a separate machine and it works well until yesterday when I renamed some
 cores, cleaned other unused ones that we don't need anymore etc... then I
 got tons of these errors when I try to put docs into my core sellers, so
 I can't add docs anymore :

  10.213.0.124  10.213.0.124solrcloud-2 POST
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
 HTTP/1.1
  S 5265 ms 0 KB0 KB10.213.0.12410.213.0.124
 solrcloud-2 POST
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
 HTTP/1.1
  S 5253 ms 0 KB0 KB10.213.0.12410.213.0.124
 solrcloud-2 POST
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
 HTTP/1.1 S  5277 ms 0 KB0


 in zookeeper I see some exceptions :

 EndOfStreamException: Unable to read additional data from client
 sessionid 0x14b98fd8e86, likely client has closed socket
 at
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
 at
 org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
 at java.lang.Thread.run(Thread.java:745)

 I don't see any relation between deleting some cores and theses errors but
 in the admin gui I see the shards list display changed (shard2 have 2
 machines?!! it's configured just in one) : [image: enter image description
 here]

 I sent cores configs to zookeeper using zcli, I reloaded cores, I restarted
 tomcat without any result, any advices??

 Tomcat 7, Solr 4.10




 http://stackoverflow.com/questions/28584066/solrcloud-no-puts-anymore-and-tons-of-updateupdate-distrib-toleader

 --
 Cordialement,
 Abdelali AHBIB



 --
 Cordialement,
 Abdelali AHBIB



Re: How to achieve lemmatization for english words in Solr 4.10.2

2015-02-18 Thread Jack Krupansky
Please provide a few examples that illustrate your requirements.
Specifically, requirements that are not met by the existing Solr stemming
filters. What is your specific goal?

-- Jack Krupansky

On Wed, Feb 18, 2015 at 10:50 AM, dinesh naik dineshkumarn...@gmail.com
wrote:

 Hi,
 IS there a way to achieve lemmatization in Solr? Stemming option is not
 meeting the requirement.

 --
 Best Regards,
 Dinesh Naik



Re: Solrcloud, no puts anymore and tons of “update?update.distrib=TOLEADER”

2015-02-18 Thread Abdelali AHBIB
with Collections API, they still some config files in
/solr/config/Xunused_collection, I deleted them also manualy

2015-02-18 16:16 GMT+00:00 Dominique Bejean dominique.bej...@eolya.fr:

 Hi,

 When you say I renamed some cores, cleaned other unused ones that we don't
 need anymore etc, how did you do this ?
 With Cores or Collections API or by deleting core's directories in Solr
 Home ?

 Dominique
 http://www.eolya.fr


 2015-02-18 17:04 GMT+01:00 Abdelali AHBIB alifar...@gmail.com:

  Hello,
 
  We use solrcloud with two shards (no replication for now), zookeeper is
 in
  a separate machine and it works well until yesterday when I renamed some
  cores, cleaned other unused ones that we don't need anymore etc... then I
  got tons of these errors when I try to put docs into my core sellers,
 so
  I can't add docs anymore :
 
   10.213.0.124  10.213.0.124solrcloud-2 POST
 
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
  HTTP/1.1
   S 5265 ms 0 KB0 KB10.213.0.12410.213.0.124
  solrcloud-2 POST
 
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
  HTTP/1.1
   S 5253 ms 0 KB0 KB10.213.0.12410.213.0.124
  solrcloud-2 POST
 
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
  HTTP/1.1 S  5277 ms 0 KB0
 
 
  in zookeeper I see some exceptions :
 
  EndOfStreamException: Unable to read additional data from client
  sessionid 0x14b98fd8e86, likely client has closed socket
  at
  org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
  at
 
 org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
  at java.lang.Thread.run(Thread.java:745)
 
  I don't see any relation between deleting some cores and theses errors
 but
  in the admin gui I see the shards list display changed (shard2 have 2
  machines?!! it's configured just in one) : [image: enter image
 description
  here]
 
  I sent cores configs to zookeeper using zcli, I reloaded cores, I
 restarted
  tomcat without any result, any advices??
 
  Tomcat 7, Solr 4.10
 
 
 
 
 
 http://stackoverflow.com/questions/28584066/solrcloud-no-puts-anymore-and-tons-of-updateupdate-distrib-toleader
 
  --
  Cordialement,
  Abdelali AHBIB
 
 
 
  --
  Cordialement,
  Abdelali AHBIB
 




-- 
Cordialement,
Abdelali AHBIB


Re: Confirm Solr index corruption

2015-02-18 Thread Otis Gospodnetic
Hi,

It sounds like Solr simply could not index some docs.  The index is not
corrupt, it's just that indexing was failing while disk was full.  You'll
need to re-send/re-add/re-index the missing docs (or simply all of them if
you don't know which ones are missing).

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Wed, Feb 18, 2015 at 1:52 AM, Thomas Mathew mothas.tho...@gmail.com
wrote:

 Hi All,

 I use Solr 4.4.0 in a master-slave configuration. Last week, the master
 server ran out of disk (logs got too big too quick due to a bug in our
 system). Because of this, we weren't able to add new docs to an index. The
 first thing I did was to delete a few old log files to free up disk space
 (later I moved the other logs to free up disk). The index is working fine
 even after this fiasco.

 The next day, a colleague of mine pointed out that we may be missing a few
 documents in the index. I suspect the above scenario may have broken the
 index. I ran the checkIndex against this index. It didn't mention of any
 corruption though.

 Right now, the index has about 25k docs. I haven't optimized this index in
 a while, and there are about 4000 deleted-docs. How can I confirm if we
 lost anything? If we've lost docs, is there a way to recover it?

 Thanks in advance!!

 Regards
 Thomas



Re: Solr suggest is related to second letter, not to initial letter

2015-02-18 Thread Michael Sokolov


On 02/17/2015 03:46 AM, Volkan Altan wrote:

First of all thank you for your answer.
You're welcome - thanks for sending a more complete example of your 
problem and expected behavior.


I don’t want to use KeywordTokenizer. Because, as long as the compound words 
written by the user are available in any document, I am able to receive a 
conclusion. I just don’t want “q=galaxy + samsung” to appear; because it is an 
inappropriate suggession and it doesn’t work.

Many Thanks Ahead of Time!

Did you try the other suggestions in my earlier reply?

-Mike



Solrcloud with map-reduce indexing and document routing

2015-02-18 Thread Dominique Bejean
Hi,

I never used map-reduce indexing.

My understanding is that map-reduce tasks generate one or more Solr
indices, then the golive tool is used in order to merge these indices at
core level to one or more shards (the shard's leaders) in a Solrcloud
collection. After merge occurs in leaders the replica are automatically
synchronized.

My question is what about document routing ?
If I want to route documents to shards according to one field of my
documents, how can this work if the merge is done at core level ?

Thank you.

Dominique


Re: Internal document format for Solr 4.10.2

2015-02-18 Thread Dmitry Kan
Hello!

You can try luke's export feature:

https://github.com/DmitryKey/luke/wiki/Exporting-index-to-xml

On Wed, Feb 18, 2015 at 12:57 PM, dinesh naik dineshkumarn...@gmail.com
wrote:

 Hi,
 Is there a way to read the internal document once solr does the indexing ?

 Also is there a possibility to store this internal document in xml format ?

 --
 Best Regards,
 Dinesh Naik




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: How to place whole indexed data on cache

2015-02-18 Thread Shawn Heisey
On 2/18/2015 4:20 AM, Nitin Solanki wrote:
  How can I place whole indexed data on cache by which if I will
 search any query then I will get response, suggestions, collations rapidly.
 And also how to view that which documents are on cache and how to verify it?

Simply install enough extra memory in your machine for the entire index
to fit in RAM that is not being used by programs ... and then do NOT
allocate that extra memory to any program.

The operating system will automatically do the caching for you as part
of normal operation, no config required.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Relevant articles referenced by that wiki page:

http://en.wikipedia.org/wiki/Page_cache
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Thanks,
Shawn



How to achieve lemmatization for english words in Solr 4.10.2

2015-02-18 Thread dinesh naik
Hi,
IS there a way to achieve lemmatization in Solr? Stemming option is not
meeting the requirement.

-- 
Best Regards,
Dinesh Naik


Re: Divide 4 Nodes into 100 nodes in Solr Cloud

2015-02-18 Thread Shawn Heisey
On 2/18/2015 8:17 AM, Nitin Solanki wrote:
 I have created 4 nodes having 8 shards. Now, I want to divide those
 4 Nodes into 100 Nodes without any failure/ or re-indexing the data. Any
 help please?

I think your only real option within a strict interpretation of your
requirements is shard splitting.  You will probably have to do it
several times, and the resulting core names could get very ugly.

https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud#ShardsandIndexingDatainSolrCloud-ShardSplitting

Reindexing is a LOT cleaner and is likely to work better.  If you build
a new collection sharded the way you want across all the new nodes, you
can delete the old collection and set up an alias pointing the old name
at the new collection, no need to change any applications, as long as
they use the collection name rather than the actual core names.  The
delete and alias might take long enough that there would be a few
seconds of downtime, but that's probably all you'd see.  Both indexing
and queries would work with the alias.

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DeleteaCollection

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection

Thanks,
Shawn



Re: Divide 4 Nodes into 100 nodes in Solr Cloud

2015-02-18 Thread Yago Riveiro
No,




SPLIT operation doesn’t destroy the data. When the SPLIT operation is finished, 
the PARENT is deactivate and you can remove it.




More info: 
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3




—
/Yago Riveiro

On Wed, Feb 18, 2015 at 3:39 PM, Nitin Solanki nitinml...@gmail.com
wrote:

 Okay, It will destroy/harm my indexed data. Right?
 On Wed, Feb 18, 2015 at 9:01 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:
 You can try the SPLIT command


 —
 /Yago Riveiro

 On Wed, Feb 18, 2015 at 3:19 PM, Nitin Solanki nitinml...@gmail.com
 wrote:

  Hi,
  I have created 4 nodes having 8 shards. Now, I want to divide
 those
  4 Nodes into 100 Nodes without any failure/ or re-indexing the data. Any
  help please?


Fwd: Solrcloud, no puts anymore and tons of “update?update.distrib=TOLEADER”

2015-02-18 Thread Abdelali AHBIB
Hello,

We use solrcloud with two shards (no replication for now), zookeeper is in
a separate machine and it works well until yesterday when I renamed some
cores, cleaned other unused ones that we don't need anymore etc... then I
got tons of these errors when I try to put docs into my core sellers, so
I can't add docs anymore :

 10.213.0.124  10.213.0.124solrcloud-2 POST 
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
  HTTP/1.1
 S 5265 ms 0 KB0 KB10.213.0.12410.213.0.124solrcloud-2 
 POST 
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
  HTTP/1.1
 S 5253 ms 0 KB0 KB10.213.0.12410.213.0.124solrcloud-2 
 POST 
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
  HTTP/1.1 S  5277 ms 0 KB0


in zookeeper I see some exceptions :

EndOfStreamException: Unable to read additional data from client
sessionid 0x14b98fd8e86, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)

I don't see any relation between deleting some cores and theses errors but
in the admin gui I see the shards list display changed (shard2 have 2
machines?!! it's configured just in one) : [image: enter image description
here]

I sent cores configs to zookeeper using zcli, I reloaded cores, I restarted
tomcat without any result, any advices??

Tomcat 7, Solr 4.10



http://stackoverflow.com/questions/28584066/solrcloud-no-puts-anymore-and-tons-of-updateupdate-distrib-toleader

-- 
Cordialement,
Abdelali AHBIB



-- 
Cordialement,
Abdelali AHBIB


SOLR, Coveo, and Oracle questions

2015-02-18 Thread Eric E
Hello,

I have some basic questions for the group.  I would appreciate any advice
you can give me.

We have an Oracle RAC database that has a number of schemas on it.  Various
things query the structured data stored in these schemas, 10s of
thousands of times per day.  Two of these schemas in particular we wish to
replicate into a Lucene-based enterprise search environment.  This in total
will cover about 750 million records that span a few dozen tables in each
schema.

As I said, we have structured data here.  No documents, no BLOBS and no
CLOBS.

We have front-end search tools already in place that provide the user with
filtering options that, when submitted by the user, run carefully optimized
(and indexed) dynamically-built queries against the data in these schemas.
These are on all the common fields you'd imagine when searching for user
data--name, etc.

As I said, we would like to replicate this data into a Lucene-based
environment and revise the front-end that formerly was hitting Oracle to
instead call RESTful APIs that we develop, to go against this new
Lucene-based tier.  We want to retain all of the previous faceted search
abilities, and not have anything run more slowly than before.  In addition
to retaining all of the existing filtering options for the user, we would
like to introduce an additional single google-style search window that
really takes advantage of the fancy new indexes and features that SOLR can
offer.

At this point I am trying to understand how the indexes work for structured
data in Lucene, and whether this is feasible to basically take all search
processing and querying out of Oracle.  Most of the Lucene/SOLR
documentation that I've found puts emphasis on all the great things it can
do with unstructured data, and does not address what it can do with
structured data.  Can anyone please give me some feedback or point me in
the right direction?

Second question-- members of our group are trying to sway us to using Coveo
Enterprise Search, another Lucene-based tool, rather than SOLR.  I realize
this is a SOLR forum, and I was hoping someone might be able to give some
insight as to why SOLR (or Coveo) would be the better option.  We will be
using custom written interfaces, so SOLR's out of the box search interfaces
would not be helpful to us.

Thanks for your time,
Eric


Re: How to achieve lemmatization for english words in Solr 4.10.2

2015-02-18 Thread Ahmet Arslan
Hi Dinesh,

solr.KStemFilterFactory is dictionary based. E.g. Produced outputs are 
valid/legitimate English words.
If you mean finding dictionary entries by saying lemmatizer. 

Ahmet



On Wednesday, February 18, 2015 5:51 PM, dinesh naik 
dineshkumarn...@gmail.com wrote:
Hi,
IS there a way to achieve lemmatization in Solr? Stemming option is not
meeting the requirement.

-- 
Best Regards,
Dinesh Naik


Re: Divide 4 Nodes into 100 nodes in Solr Cloud

2015-02-18 Thread Yago Riveiro
You can try the SPLIT command


—
/Yago Riveiro

On Wed, Feb 18, 2015 at 3:19 PM, Nitin Solanki nitinml...@gmail.com
wrote:

 Hi,
 I have created 4 nodes having 8 shards. Now, I want to divide those
 4 Nodes into 100 Nodes without any failure/ or re-indexing the data. Any
 help please?

Re: Divide 4 Nodes into 100 nodes in Solr Cloud

2015-02-18 Thread Nitin Solanki
Okay, It will destroy/harm my indexed data. Right?

On Wed, Feb 18, 2015 at 9:01 PM, Yago Riveiro yago.rive...@gmail.com
wrote:

 You can try the SPLIT command


 —
 /Yago Riveiro

 On Wed, Feb 18, 2015 at 3:19 PM, Nitin Solanki nitinml...@gmail.com
 wrote:

  Hi,
  I have created 4 nodes having 8 shards. Now, I want to divide
 those
  4 Nodes into 100 Nodes without any failure/ or re-indexing the data. Any
  help please?



Best platform for hosting Solr

2015-02-18 Thread Ganesh.Yadav
Guys,

1.   Can anyone suggest what would be the best platform to host Solr on any 
Unix or windows server?

2.   All I will be doing is importing lots of PDF documents into Solr. I 
believe Solr will automatically build the schema for imported documents.

3.   Can someone suggest what would be the max size limit on PDF document 
that can be imported into Solr?

4.   What implementation would make it faster?



Thanks

Ganesh




Get nearby suggestions for any phrase searching

2015-02-18 Thread Nitin Solanki
Hello,
I want to retrieve only top- five suggestions for any
phrase/query searching. How to do that?
Assume, If I search like ?q=the bark night then I need suggestion/
collation like the dark knight.
How to get nearby suggestion/ terms of the phrase?


Divide 4 Nodes into 100 nodes in Solr Cloud

2015-02-18 Thread Nitin Solanki
Hi,
I have created 4 nodes having 8 shards. Now, I want to divide those
4 Nodes into 100 Nodes without any failure/ or re-indexing the data. Any
help please?


Re: Solrcloud, no puts anymore and tons of “update?update.distrib=TOLEADER”

2015-02-18 Thread Abdelali AHBIB
sorry, no rename operation happens, juste delete (manualy from solr home
and config) and duplicate a core manualy also (this core duplicated is the
same core that don't have a problem), then I run the zkcli upconfig command

2015-02-18 16:22 GMT+00:00 Abdelali AHBIB alifar...@gmail.com:

 with Collections API, they still some config files in
 /solr/config/Xunused_collection, I deleted them also manualy

 2015-02-18 16:16 GMT+00:00 Dominique Bejean dominique.bej...@eolya.fr:

 Hi,

 When you say I renamed some cores, cleaned other unused ones that we
 don't
 need anymore etc, how did you do this ?
 With Cores or Collections API or by deleting core's directories in Solr
 Home ?

 Dominique
 http://www.eolya.fr


 2015-02-18 17:04 GMT+01:00 Abdelali AHBIB alifar...@gmail.com:

  Hello,
 
  We use solrcloud with two shards (no replication for now), zookeeper is
 in
  a separate machine and it works well until yesterday when I renamed some
  cores, cleaned other unused ones that we don't need anymore etc... then
 I
  got tons of these errors when I try to put docs into my core sellers,
 so
  I can't add docs anymore :
 
   10.213.0.124  10.213.0.124solrcloud-2 POST
 
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
  HTTP/1.1
   S 5265 ms 0 KB0 KB10.213.0.12410.213.0.124
  solrcloud-2 POST
 
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
  HTTP/1.1
   S 5253 ms 0 KB0 KB10.213.0.12410.213.0.124
  solrcloud-2 POST
 
 /solr/Sellers_shard1_replica1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fsolrcloud-2%3A8080%2Fsolr%2FSellers_shard1_replica1%2Fwt=javabinversion=2
  HTTP/1.1 S  5277 ms 0 KB0
 
 
  in zookeeper I see some exceptions :
 
  EndOfStreamException: Unable to read additional data from client
  sessionid 0x14b98fd8e86, likely client has closed socket
  at
  org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
  at
 
 org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
  at java.lang.Thread.run(Thread.java:745)
 
  I don't see any relation between deleting some cores and theses errors
 but
  in the admin gui I see the shards list display changed (shard2 have 2
  machines?!! it's configured just in one) : [image: enter image
 description
  here]
 
  I sent cores configs to zookeeper using zcli, I reloaded cores, I
 restarted
  tomcat without any result, any advices??
 
  Tomcat 7, Solr 4.10
 
 
 
 
 
 http://stackoverflow.com/questions/28584066/solrcloud-no-puts-anymore-and-tons-of-updateupdate-distrib-toleader
 
  --
  Cordialement,
  Abdelali AHBIB
 
 
 
  --
  Cordialement,
  Abdelali AHBIB
 




 --
 Cordialement,
 Abdelali AHBIB




-- 
Cordialement,
Abdelali AHBIB


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-18 Thread Simon Cheng
Great help and thanks to you, Alex.


On Wed, Feb 18, 2015 at 2:48 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Like I mentioned before. You could use string type if you just want
 title it is. Or you can use a custom type to normalize the indexed
 value, as long as you end up with a single token.

 So, if you want to strip leading A/An/The, you can use
 KeywordTokenizer, combined with whatever post-processing you need. I
 would suggest LowerCase filter and perhaps Regex filter to strip off
 those leading articles. You may need to iterate a couple of times on
 that specific chain.

 The good news is that you can just make a couple of type definitions
 with different values/order, reload the index (from Cores screen of
 the Web Admin UI) and run some of your sample titles through those
 different definitions without having to reindex in the Analysis
 screen.

 Regards,
   Alex.

 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/

 On 17 February 2015 at 22:36, Simon Cheng simonwhch...@gmail.com wrote:
  Hi Alex,
 
  It's okay after I added in a new field s_title in the schema and
  re-indexed.
 
 field name=s_title type=string indexed=true stored=false
  multiValued=false/
 copyField source=title dest=s_title/
 
  But how can I ignore the articles (A, An, The) in the sorting. As
 you
  can see from the below example :



Re: Collections API - HTTP verbs

2015-02-18 Thread Mark Miller
Perhaps try quotes around the url you are providing to curl. It's not
complaining about the http method - Solr has historically always taken
simple GET's for http - for good or bad, you pretty much only post
documents / updates.

It's saying the name param is required and not being found and since you
are trying to specify the name, I'm guessing something about the command is
not working. You might try just shoving it in a browser url bar as well.

- Mark

On Wed Feb 18 2015 at 8:56:26 PM Hrishikesh Gadre gadre.s...@gmail.com
wrote:

 Hi,

 Can we please document which HTTP method is supposed to be used with each
 of these APIs?

 https://cwiki.apache.org/confluence/display/solr/Collections+API

 I am trying to invoke following API

 curl http://
 hostname:8983/solr/admin/collections?action=CLUSTERPROPname=urlScheme
 val=https

 This request is failing due to following error,

 2015-02-18 17:29:39,965 INFO org.apache.solr.servlet.SolrDispatchFilter:
 [admin] webapp=null path=/admin/collections params={action=CLUSTERPROP}
 status=400 QTime=20

 org.apache.solr.core.SolrCore: org.apache.solr.common.SolrException:
 Missing required parameter: name

 at
 org.apache.solr.common.params.RequiredSolrParams.get(
 RequiredSolrParams.java:49)

 at
 org.apache.solr.common.params.RequiredSolrParams.check(
 RequiredSolrParams.java:153)

 at
 org.apache.solr.handler.admin.CollectionsHandler.handleProp(
 CollectionsHandler.java:238)

 at
 org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(
 CollectionsHandler.java:200)

 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(
 RequestHandlerBase.java:135)

 at
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(
 SolrDispatchFilter.java:770)

 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:271)

 I am using Solr 4.10.3 version.

 Thanks

 Hrishikesh



Collections API - HTTP verbs

2015-02-18 Thread Hrishikesh Gadre
Hi,

Can we please document which HTTP method is supposed to be used with each
of these APIs?

https://cwiki.apache.org/confluence/display/solr/Collections+API

I am trying to invoke following API

curl http://
hostname:8983/solr/admin/collections?action=CLUSTERPROPname=urlSchemeval=https

This request is failing due to following error,

2015-02-18 17:29:39,965 INFO org.apache.solr.servlet.SolrDispatchFilter:
[admin] webapp=null path=/admin/collections params={action=CLUSTERPROP}
status=400 QTime=20

org.apache.solr.core.SolrCore: org.apache.solr.common.SolrException:
Missing required parameter: name

at
org.apache.solr.common.params.RequiredSolrParams.get(RequiredSolrParams.java:49)

at
org.apache.solr.common.params.RequiredSolrParams.check(RequiredSolrParams.java:153)

at
org.apache.solr.handler.admin.CollectionsHandler.handleProp(CollectionsHandler.java:238)

at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:200)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:770)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:271)

I am using Solr 4.10.3 version.

Thanks

Hrishikesh


Re: ApacheCon 2015 at Austin, TX

2015-02-18 Thread CP Mishra
Dmitry, that would be great.

CP

On Thu, Feb 12, 2015 at 5:35 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi,

 Looks like I'll be there. So if you want to discuss luke / lucene / solr,
 will be happy to de-virtualize.

 Dmitry

 On Mon, Jan 12, 2015 at 6:32 PM, CP Mishra mishr...@gmail.com wrote:

  Hi,
 
  I am planning to attend ApacheCon 2015 at Austin, TX (Apr 13-16th) and
  wondering if there will be lucene/solr sessions in it.
 
  Anyone else planning to attend?
 
  Thanks,
  CP
 



 --
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info



Re: Boosting by calculated distance buckets

2015-02-18 Thread sraav
David,

I just subscriped to the solr list..lets see if that will allow me to
posting this. 

I will write a Custom ValueSource.  I tried the map function that you
suggested, it works but it is not so great on performance.  

I will try referring funtion query as a sort instead of bq..may be it will
perform better. Either ways ValusSource seems like a better bet over all.  

Thanks You so much!!
Raav



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-by-calculated-distance-buckets-tp4186504p4187274.html
Sent from the Solr - User mailing list archive at Nabble.com.