Re: maxWarmingSearchers in Solr 4.

2013-04-04 Thread Dotan Cohen
On Wed, Apr 3, 2013 at 7:55 PM, Shawn Heisey s...@elyograg.org wrote: In situations where I don't want to change the default value, I prefer to leave config elements out of the solrconfig. It makes the config smaller, and it also makes it so that I will automatically see benefits from the

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
Hi Otis, then what is the difference between add and update? And how we update or add documents into Solr (I see that there is just one update handler)? 2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com I don't recall what Nutch does, so it's hard to tell. In Solr (Lucene, really), you

Re: Query parser cuts last letter from search term.

2013-04-04 Thread vsl
The problem was connected with filter order. WordDelimiterFilter should be put before others. Thanks for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-parser-cuts-last-letter-from-search-term-tp4053432p4053736.html Sent from the Solr - User mailing list

Re: do SearchComponents have access to response contents

2013-04-04 Thread xavier jmlucjav
A custom QueryResponseWriter...this makes sense, thanks Jack On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky j...@basetechnology.comwrote: The search components can see the response as a namedlist, but it is only when SolrDispatchFIlter calls the QueryResponseWriter that XML or JSON or

Re: Zookeeper dataimport.properties node

2013-04-04 Thread Tim Vaillancourt
It its in your SolrCloud-based collection's config, it won't be on disk and only in Zookeeper. What I did was use the XInclude feature to include a file with my dataimport handler properties, so I'm assuming you're doing the same. Use a relative path to the config dir in Zookeeper, ie: no

Re: solre scores remains same for exact match and nearly exact match

2013-04-04 Thread Andre Bois-Crettez
On 04/03/2013 07:22 AM, amit wrote: Below is my query http://localhost:8983/solr/select/?q=subject:session management in phpfq=category:[*%20TO%20*]fl=category,score,subject You specify that you want session to appear in field subject, but the other tokens only match to the default search

Re: Question on Exact Matches - edismax

2013-04-04 Thread Sandeep Mestry
Hi Jan, Thanks for your reply. I have defined string_ci like below: fieldType name=string_ci class=solr.TextField sortMissingLast=true omitNorms=true compressThreshold=10 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter

Spell check component does not return any suggestions

2013-04-04 Thread vsl
Hi, I configured index-based spell check component and unexpected problem occurs. *CASE 1: * I added two documents with following content: 1. handbuch 2. hanbuch The suggestions are returned for both terms: e.g. handbuch - hanbuch and hanbuch- handbuch. Comment: Works as expected. *CASE 2: *

Re: Spell check component does not return any suggestions

2013-04-04 Thread Eoghan Ó Carragáin
Hi, I think you need to use the alternativeTermCount parameter ( http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount) to return suggestions for terms which occur less often than the user-entered term. More discussion here:

Re: Spell check component does not return any suggestions

2013-04-04 Thread vsl
I tried to add spellcheck.alternativeTermCount=5 but still no suggestion has been found. -- View this message in context: http://lucene.472066.n3.nabble.com/Spell-check-component-does-not-return-any-suggestions-tp4053757p4053772.html Sent from the Solr - User mailing list archive at

Filtered search term suggestions via Facet Prefixing or NGrams

2013-04-04 Thread Andreas Hubold
Hi, we've successfully implemented suggestion of search terms using facet prefixing with Solr 4.0. However, with lots of unique index terms we've encountered performance problems (long running queries) and even exceptions: Too many values for UnInvertedField faceting on field textbody. We

Re: solre scores remains same for exact match and nearly exact match

2013-04-04 Thread Jack Krupansky
The simple way to write the query: q=subject:session subject:management subject:in subject:php Would be: q=subject:(session management in php) Of course, edismax is usually a better way to go in general. -- Jack Krupansky -Original Message- From: Andre Bois-Crettez Sent: Thursday,

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Jack Krupansky
Technically, update and add are identical from a user perspective - you don't need to worry about whether the document already exists. But, there is another, newer form of update, selective or atomic which is updating a subset of the fields in an existing document without needing to re-send

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
I craw webages with Nutch and send them to Solr for indexing. There are two parameters to send data into Solr. One of them is -index and the other one is -reindex. I just want to learn what they do. 2013/4/4 Jack Krupansky j...@basetechnology.com Technically, update and add are identical from

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Jack Krupansky
That's a question for the Nutch email list. In Solr, reindexing simply means that you manually delete your full Solr index (or at least delete all documents using a query) and fully ingest all documents, from scratch. There is no option, it's just something that you, the user/developer, do

RE: Difference Between Indexing and Reindexing

2013-04-04 Thread Markus Jelsma
I assume you're using Nutch 2.x? Nutch 1.x does not have such an option and i find it strange to hear 2.x does. It really makes no sense to have a -reindex option and it should be removed. I'd recommend to stick to plain indexing. -Original message- From:Jack Krupansky

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Alexandre Rafalovitch
On Thu, Apr 4, 2013 at 9:03 AM, Furkan KAMACI furkankam...@gmail.comwrote: I craw webages with Nutch and send them to Solr for indexing. There are two parameters to send data into Solr. One of them is -index and the other one is -reindex. I just want to learn what they do. Are you sure this

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Gora Mohanty
On 4 April 2013 18:33, Furkan KAMACI furkankam...@gmail.com wrote: I craw webages with Nutch and send them to Solr for indexing. There are two parameters to send data into Solr. One of them is -index and the other one is -reindex. I just want to learn what they do. [...] Which version of Nutch

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
I use Nutch 2.1 and using that: bin/nutch solrindex http://localhost:8983/solr -index bin/nutch solrindex http://localhost:8983/solr -reindex 2013/4/4 Gora Mohanty g...@mimirtech.com On 4 April 2013 18:33, Furkan KAMACI furkankam...@gmail.com wrote: I craw webages with Nutch and send them

Re: SolrCloud not distributing documents across shards

2013-04-04 Thread Michael Della Bitta
Thank you for all your hard work! Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Apr 3, 2013 at 6:08 PM, Mark Miller markrmil...@gmail.com wrote: On

Re: solre scores remains same for exact match and nearly exact match

2013-04-04 Thread amit
Thanks Jack and Andre I am trying to use edismax;but struck with the NoClassDefFoundError: org/apache/solr/response/QueryResponseWriter I am using solr 3.6 I have followed the steps here http://wiki.apache.org/solr/VelocityResponseWriter#Using_the_VelocityResponseWriter_in_Solr_Core Just the

detailed Error reporting in Solr

2013-04-04 Thread eShard
Good morning, I'm currently running Solr 4.0 final with tika v1.2 and Manifoldcf v1.2 dev. And I'm battling Tika XML parse errors again. Solr reports this error:org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error which is too vague. I had to

Re: detailed Error reporting in Solr

2013-04-04 Thread eShard
ok, one possible fix is to add the xml equivalent to nbsp with is: ?xml version=1.0? !DOCTYPE some_name [ lt;!ENTITY nbsp quot;amp;#160;quot; ] but how do I add this into the tika configuration? -- View this message in context:

RE: Solr Multiword Search

2013-04-04 Thread Dyer, James
If you are using dismax/edismax with mm=0 (or some other low number), you should override this in the spellchecker. Specify spellcheck.collateParam.mm=100%, or something high like that. Likewise if you're using the default lucene/solr query parser with q.op=OR, then you can specify

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Gora Mohanty
On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote: I use Nutch 2.1 and using that: bin/nutch solrindex http://localhost:8983/solr -index bin/nutch solrindex http://localhost:8983/solr -reindex [...] Sorry, but are you sure that you are using 2.1. Here is what I get with:

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Gora Mohanty
On 4 April 2013 20:16, Gora Mohanty g...@mimirtech.com wrote: On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote: I use Nutch 2.1 and using that: bin/nutch solrindex http://localhost:8983/solr -index bin/nutch solrindex http://localhost:8983/solr -reindex [...] Sorry, but

RE: Spell check component does not return any suggestions

2013-04-04 Thread Dyer, James
Make sure you also set spellcheck.onlyMorePopular=false (or leave it out as false is the default) when using spellcheck.alternativeTermCount. You may also need to set spellcheck.maxResultsForSuggest=0. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxResultsForSuggest to

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
It may be a deprecated usage(maybe not) but certainly can run -index and -reindex on Nutch 2.1. 2013/4/4 Gora Mohanty g...@mimirtech.com On 4 April 2013 20:16, Gora Mohanty g...@mimirtech.com wrote: On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote: I use Nutch 2.1 and

Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Jack Krupansky
Could you guys please take this discussion offline or over to a Nutch mailing list - where it belongs? This has nothing to do with Solr. -- Jack Krupansky -Original Message- From: Gora Mohanty Sent: Thursday, April 04, 2013 10:46 AM To: solr-user@lucene.apache.org Subject: Re:

Solr 4.2 single server limitations

2013-04-04 Thread imehesz
hello, I'm using a single server setup with Nutch (1.6) and Solr (4.2) I plan to trigger the Nutch crawling process every 30 minutes or so and add about 300+ websites a month with (~5-10 pages each). At this point I'm not sure about the query requests/sec. Can I run this on a single server (how

Re: Question on Exact Matches - edismax

2013-04-04 Thread Sandeep Mestry
Another problem that I see in Solr analysis is the query term that matches the tokenized field does not match on the case insensitive field. So, if I'm searching for 'coast to coast', I see that the tokenized series title (pg_series_title) is matched but not the ci field which is

Solr Query UI

2013-04-04 Thread scallawa
I am trying to understand how to plug data into the solr query option from the UI. The query below works on our old solr version (1.3) but does not return results on 4.2. I pulled it from the catalina log file. I am trying to plug in the values one by one into the query UI to see which one it

Re: Solr Query UI

2013-04-04 Thread Gora Mohanty
On 4 April 2013 22:11, scallawa dami...@altrec.com wrote: I am trying to understand how to plug data into the solr query option from the UI. The query below works on our old solr version (1.3) but does not return results on 4.2. I pulled it from the catalina log file. I am trying to plug

Re: detailed Error reporting in Solr

2013-04-04 Thread Jack Krupansky
I'm trying to understand the context is here... are you trying to crawl web pages that have bad HTML? Or, ... what? -- Jack Krupansky -Original Message- From: eShard Sent: Thursday, April 04, 2013 10:23 AM To: solr-user@lucene.apache.org Subject: detailed Error reporting in Solr

RE: Solr Multiword Search

2013-04-04 Thread skmirch
Hi James, Thanks for the response. Nope, I'm not using dismax or edismax. Just the standard solr query parser. Also by using the variable spellcheck.collateParam.q.op=AND I see this working. This also means that all the words need to correct and the maxEdits can only be 2 else it won't suggest

Re: detailed Error reporting in Solr

2013-04-04 Thread eShard
Yes, that's it exactly. I crawled a link with these (nbsp;rsaquo;) in each list item and solr couldn't handle it threw the xml parse error and the crawler terminated the job. Is this fixable? Or do I have to submit a bug to the tika folks? Thanks, -- View this message in context:

RE: Solr Multiword Search

2013-04-04 Thread Dyer, James
Use IndexBasedSpellChecker instead of DirectSolrSpellChecker if you need more than 2 edits. You may need to set the accuracy parameter lower than the default of .5 Keep in mind that while this might get the correct responses for your test cases, in the wild your users might find their

how to avoid single character to get indexed for directspellchecker dictionary

2013-04-04 Thread Rohan Thakur
hi all I am using solr directspellcheker for spell suggestions using raw analyses for indexing but I have some fields which have single characters like l L so its is been indexed in the dictionary and when I am using this for suggestions for query like delll its suggesting de and l l l as the

RE: how to avoid single character to get indexed for directspellchecker dictionary

2013-04-04 Thread Dyer, James
I assume if your user queries delll and it breaks it into pieces like de l l l, then you're probably using WordBreakSolrSpellChecker in addition to DirectSolrSpellChecker, right? If so, then you can specify minBreakLength in solrconfig.xml like this: searchComponent name=spellcheck

Re: Solr Query UI

2013-04-04 Thread scallawa
We are still in the testing phase for 4.2. A new server was built and the latest tomcat, java and solr were installed. The schema file was copied over from the old and then customized as follows. Schema Changes We changed all float field types to tfloat. The solrqueryparser default operator is

Re: maxWarmingSearchers in Solr 4.

2013-04-04 Thread Shawn Heisey
On 4/4/2013 12:34 AM, Dotan Cohen wrote: In the case of maxWarmingSearchers, I would hope that you have your system set up so that you would never need more than 1 warming searcher at a time. If you do a commit while a previous commit is still warming, Solr will try to create a second warming

Re: detailed Error reporting in Solr

2013-04-04 Thread Jack Krupansky
I've been away from Tika for awhile, so I'm not sure. This might also be an issue of Tika using a strict XML parser for HTML rather than a looser and more error-tolerant HTML-specific parser, like most browsers use, that allows these kinds of technical errors that in reality, in most cases, can

Compressed Fields in 4.2.1

2013-04-04 Thread Jamie Johnson
I had read somewhere that text fields by default were compressed in 4.2.1, is this the case? If not how do I enable compression of stored text fields?

Re: Compressed Fields in 4.2.1

2013-04-04 Thread Yonik Seeley
On Thu, Apr 4, 2013 at 7:41 PM, Jamie Johnson jej2...@gmail.com wrote: I had read somewhere that text fields by default were compressed in 4.2.1, is this the case? If not how do I enable compression of stored text fields? Compressed stored fields are the default since 4.1 -Yonik

Re: Solr Query UI

2013-04-04 Thread scallawa
I found the problem. The values that we have for cat-path include the special character /. This was not a special character in pre 4.0 releases. That explains why it worked in my previous version but not in 4.2. Pre 4.0 Lucene supports escaping special characters that are part of the query

SolR InvalidTokenOffsetsException with Highlighter and Synonyms

2013-04-04 Thread juancesarvillalba
Hi I saw some similar problems in other threads but I think that this is a little different and couldn't get any solution.*I get the exception */org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token eightysix exceeds length of provided text sized 80/This happens for example when I

Fwd: Zookeeper dataimport.properties node

2013-04-04 Thread Nathan Findley
- Is dataimport.properties ever written to the filesystem? (Trying to determine if I have a permissions error because I don't see it anywhere on disk). - How do you manually edit dataimport.properties? My system is periodically pulling in new data. If that process has issues, I want to be able

Re: do SearchComponents have access to response contents

2013-04-04 Thread Amit Nithian
We need to also track the size of the response (as the size in bytes of the whole xml response tat is streamed, with stored fields and all). I was a bit worried cause I am wondering if a searchcomponent will actually have access to the response bytes... == Can't you get this from your container