Re: issue with highlighting in solr 4.10.2

2015-06-29 Thread Dmitry Kan
Hi Erick,

The Contents field contains one sentence only and no watch exists in it.
Plus we use quite large snippet size to surely cover the field.

Dmitry

On Sat, Jun 27, 2015 at 6:16 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Does watch exist in the Contents field somewhere outside the snippet
 size you've specified?

 Shot in the dark,
 Erick

 On Fri, Jun 26, 2015 at 3:22 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Hi,
 
  When highlighting hits for the following query:
 
  (+Contents:apple +Contents:watch) Contents:iphone
 
  I expect the standard solr highlighter to highlight either iphone or
 iphone
  AND apple, only if watch is present.
 
  However, solr highlights iphone along with only apple. Is this a bug or a
  known feature? Is there any way to debug the highlighter using solr
 admin?
 
  --
  Dmitry Kan
  Luke Toolbox: http://github.com/DmitryKey/luke
  Blog: http://dmitrykan.blogspot.com
  Twitter: http://twitter.com/dmitrykan
  SemanticAnalyzer: www.semanticanalyzer.info




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


SolrCloud Document Update Problem

2015-06-29 Thread Amit Jha
Hi,

I setup a SolrCloud with 2 shards each is having 2 replicas with 3
zookeeper ensemble.

We add and update documents from web app. While updating we delete the
document and add same document with updated values with same unique id.

I am facing a very strange issue that some time 2 documents have the same
unique ID. One document with old values and another one with new values.
It happens only we update the document.

Please suggest or guide...

Rgds


Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-29 Thread Erick Erickson
Not quite sure what you mean by compressed values. admin/luke
doesn't show the results of the compression of the stored values, there's
no way I know of to do that.

Best,
Erick

On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik dineshkumarn...@gmail.com wrote:
 Hi all,

 Is there a way to read the indexed data for field on which the
 analysis/processing  has been done ?

 I know using admin GUI we can see field wise analysis But how can i get
 hold on the complete document using admin/luke? or any other way?

 For example, if i have 2 fields called name and compressedname.

 name has values like apple, green-apple,red-apple
 compressedname has values like apple,greenapple,redapple

 Even though i make both these field indexed=true and stored=true

 I am not able to see the compressed values using admin/luke?id=mydocid

 in response i see something like this-


 lst name=name 
 str name=typestring/str
 str name=schemaITS--/str
 str name=flagsITS--/str
 str name=valueGREEN-APPLE/str
 str name=internalGREEN-APPLE/str
 float name=boost1.0/float
 int name=docFreq0/int
 /lst
 lst name=compressedname
 str name=typestring/str
 str name=schemaITS--/str
 str name=flagsITS--/str
 str name=valueGREEN-APPLE/str
 str name=internalGREEN-APPLE/str
 float name=boost1.0/float
 int name=docFreq0/int
 /lst



 --
 Best Regards,
 Dinesh Naik


Re: optimize status

2015-06-29 Thread Erick Erickson
Steven:

Yes, but

First, here's Mike McCandles' excellent blog on segment merging:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

I think the third animation is the TieredMergePolicy. In short, yes an
optimize will reclaim disk space. But as you update, this is done for
you anyway. About the only time optimizing is at all beneficial is
when you have a relatively static index. If you're continually
updating documents, and by that I mean replacing some existing
documents, then you'll immediately start generating holes in your
index.

And if you _do_ optimize, you wind up with a huge segment. And since
the default policy tries to merge segments of roughly the same size,
it accumulates deletes for quite a while before they merged away.

And if you don't update existing docs or delete docs, then there's no
wasted space anyway.

Summer:

First off, why do you care about not updating during optimizing?
There's no good reason you have to worry about that, you can freely
update while optimizing.

But frankly I have to agree with Upayavira that on the face of it
you're doing a lot of extra work. See above, but you optimize while
indexing, so immediately you're rather defeating the purpose.
Personally I'd only optimize relatively static indexes and, by
definition, you're index isn't static since the second process is just
waiting to modify it.

Best,
Erick

On Mon, Jun 29, 2015 at 8:15 AM, Steven White swhite4...@gmail.com wrote:
 Hi Upayavira,

 This is news to me that we should not optimize and index.

 What about disk space saving, isn't optimization to reclaim disk space or
 is Solr somehow does that?  Where can I read more about this?

 I'm on Solr 5.1.0 (may switch to 5.2.1)

 Thanks

 Steve

 On Mon, Jun 29, 2015 at 4:16 AM, Upayavira u...@odoko.co.uk wrote:

 I'm afraid I don't understand. You're saying that optimising is causing
 performance issues?

 Simple solution: DO NOT OPTIMIZE!

 Optimisation is very badly named. What it does is squashes all segments
 in your index into one segment, removing all deleted documents. It is
 good to get rid of deletes - in that sense the index is optimized.
 However, future merges become very expensive. The best way to handle
 this topic is to leave it to Lucene/Solr to do it for you. Pretend the
 optimize option never existed.

 This is, of course, assuming you are using something like Solr 3.5+.

 Upayavira

 On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
 
  Have to cause of performance issues.
  Just want to know if there is a way to tap into the status.
 
   On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
  
   Bigger question, why are you optimizing? Since 3.6 or so, it generally
   hasn't been requires, even, is a bad thing.
  
   Upayavira
  
   On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
   Hi All,
  
   I have two indexers (Independent processes ) writing to a common solr
   core.
   If One indexer process issued an optimize on the core
   I want the second indexer to wait adding docs until the optimize has
   finished.
  
   Are there ways I can do this programmatically?
   pinging the core when the optimize is happening is returning OK
 because
   technically
   solr allows you to update when an optimize is happening.
  
   any suggestions ?
  
   thanks,
   Summer



RE: Correcting text at index time

2015-06-29 Thread hossmaa
Hi Markus

Thanks for the reply. I'm already using the Synonyms filter and it is
working fine (i.e., when I search for customer, it also returns documents
containing cst.).
What the synonyms filter does not do is to actually replace the word cst.
with customer in the document.

Just to be clearer: in the returned results, I do not want to see the word
cst. any more (it should be permanently replaced with customer). I want
to only see the expanded form.

Cheers
A.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4214643.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Document Update Problem

2015-06-29 Thread Shalin Shekhar Mangar
On Mon, Jun 29, 2015 at 4:37 PM, Amit Jha shanuu@gmail.com wrote:
 Hi,

 I setup a SolrCloud with 2 shards each is having 2 replicas with 3
 zookeeper ensemble.

 We add and update documents from web app. While updating we delete the
 document and add same document with updated values with same unique id.

I am not sure why you delete the document. If you use the same unique
key and send the whole document again (with some other fields
changed), Solr will automatically overwrite the old document with the
new one.


 I am facing a very strange issue that some time 2 documents have the same
 unique ID. One document with old values and another one with new values.
 It happens only we update the document.



 Please suggest or guide...

 Rgds



-- 
Regards,
Shalin Shekhar Mangar.


Architectural advice questions on using Solr XML DataImport Handlers (and Nutch) for a Vertical Search engine.

2015-06-29 Thread Arthur Yarwood
Please bear with me here, I'm pretty new to Solr with most of me DB 
experience being of the relational variety. I'm planning a new project, 
which I believe Solr (and Nutch) will solve well. Although I've 
installed Solr 5.2 and Nutch 1.10 (on Centos) and tinkered about a bit, 
I'd be grateful for advice and tips regarding my plan.


I'm looking to build a vertical search engine to cover a very specific 
and narrow dataset. Sources will number in the hundreds and mostly 
managed by hand, these will be a mixture of forums and product based 
e-commerce sites. For some of these I was hoping to leverage the SOLR 
DataImportHandler system with their RSS feeds primarily for the ease of 
acquiring clean, reasonably sanitised and well structured data. For the 
rest, I'm going to fall back to Nutch crawling them, with some heavy 
regulation via Regex of urls. So to sum up, a Solr DB populated through 
a couple of different ways, then search via some custom user facing PHP 
webpages. Finally a cronjob script would delete any docs older than X 
weeks, to keep on top of data retention.


Does that sound sensible at all?

Regarding RSS feeds:-
Many only provide a limited number of recent items, however I'd like to 
retain items for many weeks. I've already discovered the clean=false 
param on DataImport, after wondering why old rss items vanished!
Question 1) is there an easy way to filter items to import in the 
URLDataSource entity? Or is it best to go down route of XSLT 
preprocessing?
Question 2) Multiple URLDataSources: reference all in one DataImport 
handler? Or have multiple DataImport handlers?


What's the best approach to supplement imported data with additional 
static fields/keywords based associated with the source feed or crawled 
site? e.g. all docs from sites A, B  C are of subcategory Foo. I'm 
guessing with RSS feeds this would be straightforward via the XSLT 
preprocessor. But for Nutch submitted docs - I've no idea?


Scheduling import: Do people just cron up a curl post request (or shell 
execute of Nutch crawl script)? Or is there a more elegant solution 
available?


Any other more general tips and advice on the above greatly appreciated.

--
Arthur Yarwood


set the param [facet.offset] for EVERY [facet.pivot]

2015-06-29 Thread lzqxb
HI All:I need a pagenigation with facet offset.
There are two or more fields in [facet.pivot], but only one value for 
[facet.offset], eg: facet.offset=10facet.pivot=field_1,field_2. In this 
condition, field_2 is 10's offset and then field_1 is 10's offset. But what I 
want is field_2 is 1's offset and field_1 is 10's offset. How can I fix this 
problem or try another way to complete?
Any help is appreciated!

Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-29 Thread dinesh naik
Hi all,

Is there a way to read the indexed data for field on which the
analysis/processing  has been done ?

I know using admin GUI we can see field wise analysis But how can i get
hold on the complete document using admin/luke? or any other way?

For example, if i have 2 fields called name and compressedname.

name has values like apple, green-apple,red-apple
compressedname has values like apple,greenapple,redapple

Even though i make both these field indexed=true and stored=true

I am not able to see the compressed values using admin/luke?id=mydocid

in response i see something like this-


lst name=name 
str name=typestring/str
str name=schemaITS--/str
str name=flagsITS--/str
str name=valueGREEN-APPLE/str
str name=internalGREEN-APPLE/str
float name=boost1.0/float
int name=docFreq0/int
/lst
lst name=compressedname
str name=typestring/str
str name=schemaITS--/str
str name=flagsITS--/str
str name=valueGREEN-APPLE/str
str name=internalGREEN-APPLE/str
float name=boost1.0/float
int name=docFreq0/int
/lst



-- 
Best Regards,
Dinesh Naik


Correcting text at index time

2015-06-29 Thread hossmaa
Hi everyone

I'm wondering if it's possible in Solr to correct text at indexing time,
based on a synonyms-like list. This would be great for expanding undesirable
abbreviations (for example, cst. instead of customer).
I've been searching the Solr docs and the web quite thoroughly I believe,
but haven't found anything to do this.

I guess if there really isn't anything like this, I could implement it as a
custom Filter...

Thanks!
A.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Correcting text at index time

2015-06-29 Thread Markus Jelsma
Hello - why not just use synonyms or StemmerOverrideFilter?
Markus

 
 
-Original message-
 From:hossmaa andreea.hossm...@gmail.com
 Sent: Monday 29th June 2015 14:08
 To: solr-user@lucene.apache.org
 Subject: Correcting text at index time
 
 Hi everyone
 
 I'm wondering if it's possible in Solr to correct text at indexing time,
 based on a synonyms-like list. This would be great for expanding undesirable
 abbreviations (for example, cst. instead of customer).
 I've been searching the Solr docs and the web quite thoroughly I believe,
 but haven't found anything to do this.
 
 I guess if there really isn't anything like this, I could implement it as a
 custom Filter...
 
 Thanks!
 A.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: optimize status

2015-06-29 Thread Steven White
Hi Upayavira,

This is news to me that we should not optimize and index.

What about disk space saving, isn't optimization to reclaim disk space or
is Solr somehow does that?  Where can I read more about this?

I'm on Solr 5.1.0 (may switch to 5.2.1)

Thanks

Steve

On Mon, Jun 29, 2015 at 4:16 AM, Upayavira u...@odoko.co.uk wrote:

 I'm afraid I don't understand. You're saying that optimising is causing
 performance issues?

 Simple solution: DO NOT OPTIMIZE!

 Optimisation is very badly named. What it does is squashes all segments
 in your index into one segment, removing all deleted documents. It is
 good to get rid of deletes - in that sense the index is optimized.
 However, future merges become very expensive. The best way to handle
 this topic is to leave it to Lucene/Solr to do it for you. Pretend the
 optimize option never existed.

 This is, of course, assuming you are using something like Solr 3.5+.

 Upayavira

 On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
 
  Have to cause of performance issues.
  Just want to know if there is a way to tap into the status.
 
   On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
  
   Bigger question, why are you optimizing? Since 3.6 or so, it generally
   hasn't been requires, even, is a bad thing.
  
   Upayavira
  
   On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
   Hi All,
  
   I have two indexers (Independent processes ) writing to a common solr
   core.
   If One indexer process issued an optimize on the core
   I want the second indexer to wait adding docs until the optimize has
   finished.
  
   Are there ways I can do this programmatically?
   pinging the core when the optimize is happening is returning OK
 because
   technically
   solr allows you to update when an optimize is happening.
  
   any suggestions ?
  
   thanks,
   Summer



Re: SolrCloud Document Update Problem

2015-06-29 Thread Amit Jha
It was because of the issues

Rgds
AJ

 On Jun 29, 2015, at 6:52 PM, Shalin Shekhar Mangar shalinman...@gmail.com 
 wrote:
 
 On Mon, Jun 29, 2015 at 4:37 PM, Amit Jha shanuu@gmail.com wrote:
 Hi,
 
 I setup a SolrCloud with 2 shards each is having 2 replicas with 3
 zookeeper ensemble.
 
 We add and update documents from web app. While updating we delete the
 document and add same document with updated values with same unique id.
 
 I am not sure why you delete the document. If you use the same unique
 key and send the whole document again (with some other fields
 changed), Solr will automatically overwrite the old document with the
 new one.
 
 
 I am facing a very strange issue that some time 2 documents have the same
 unique ID. One document with old values and another one with new values.
 It happens only we update the document.
 
 
 
 Please suggest or guide...
 
 Rgds
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.


Jetty Plus for Solr 4.10.4

2015-06-29 Thread Tarala, Magesh
We are planning to go to production with Solr 4.10.4. Documentation recommends 
to use full Jetty package that includes JettyPlus. I'm not able to find the 
instructions to do this. Can someone point me in the right direction?

Thanks,
Magesh





Re: optimize status

2015-06-29 Thread Walter Underwood
“Optimize” is a manual full merge.

Solr automatically merges segments as needed. This also expunges deleted 
documents.

We really need to rename “optimize” to “force merge”. Is there a Jira for that?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Jun 29, 2015, at 5:15 AM, Steven White swhite4...@gmail.com wrote:

 Hi Upayavira,
 
 This is news to me that we should not optimize and index.
 
 What about disk space saving, isn't optimization to reclaim disk space or
 is Solr somehow does that?  Where can I read more about this?
 
 I'm on Solr 5.1.0 (may switch to 5.2.1)
 
 Thanks
 
 Steve
 
 On Mon, Jun 29, 2015 at 4:16 AM, Upayavira u...@odoko.co.uk wrote:
 
 I'm afraid I don't understand. You're saying that optimising is causing
 performance issues?
 
 Simple solution: DO NOT OPTIMIZE!
 
 Optimisation is very badly named. What it does is squashes all segments
 in your index into one segment, removing all deleted documents. It is
 good to get rid of deletes - in that sense the index is optimized.
 However, future merges become very expensive. The best way to handle
 this topic is to leave it to Lucene/Solr to do it for you. Pretend the
 optimize option never existed.
 
 This is, of course, assuming you are using something like Solr 3.5+.
 
 Upayavira
 
 On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
 
 Have to cause of performance issues.
 Just want to know if there is a way to tap into the status.
 
 On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
 
 Bigger question, why are you optimizing? Since 3.6 or so, it generally
 hasn't been requires, even, is a bad thing.
 
 Upayavira
 
 On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
 Hi All,
 
 I have two indexers (Independent processes ) writing to a common solr
 core.
 If One indexer process issued an optimize on the core
 I want the second indexer to wait adding docs until the optimize has
 finished.
 
 Are there ways I can do this programmatically?
 pinging the core when the optimize is happening is returning OK
 because
 technically
 solr allows you to update when an optimize is happening.
 
 any suggestions ?
 
 thanks,
 Summer
 



cursorMark and timeAllowed are mutually exclusive?

2015-06-29 Thread Bernd Fehling
Hi list,

while just trying cursorMark I got the following search response:

error: {
msg: Can not search using both cursorMark and timeAllowed,
code: 400
}


Yes, I'm using timeAllowed which is set in my requestHandler as
invariant to 6 (60 seconds) as a limit to killer searches.

Have nothing found in the ref guides, docs, wiki, examples about this mutually
exclusive parameters.

Is this a bug or a feature and if it is a feature, where is the sense of this?

Regards
Bernd


Re: cursorMark and timeAllowed are mutually exclusive?

2015-06-29 Thread Shawn Heisey
On 6/29/2015 9:12 AM, Bernd Fehling wrote:
 while just trying cursorMark I got the following search response:

 error: {
 msg: Can not search using both cursorMark and timeAllowed,
 code: 400
 }


 Yes, I'm using timeAllowed which is set in my requestHandler as
 invariant to 6 (60 seconds) as a limit to killer searches.

 Have nothing found in the ref guides, docs, wiki, examples about this mutually
 exclusive parameters.

 Is this a bug or a feature and if it is a feature, where is the sense of this?

It appears to have been disallowed almost from the beginning of the
cursorMark feature.  It was not present in the first versions of the
patch, but it was already incorporated before anything got committed to SVN.

https://issues.apache.org/jira/browse/SOLR-5463

The reasons for the incompatibility are not clear from the issue notes,
so either hossman or sarowe may need to comment about what makes the two
features fundamentally incompatible, and that info needs to go into the
documentation.

Thanks,
Shawn



RE: optimize status

2015-06-29 Thread Reitzel, Charles
Is there really a good reason to consolidate down to a single segment?

Any incremental query performance benefit is tiny compared to the loss of 
managability.   

I.e. shouldn't segments _always_ be kept small enough to facilitate 
re-balancing data across shards?   Even in non-cloud instances this is true.  
When a collection grows, you may want shard/split an existing index by adding a 
node and moving some segments around.Isn't this the direction Solr is 
going?   With many, smaller segments, this is feasible.  With one big 
segment, the collection must always be reindexed.

Thus, optimize would mean, get rid of all deleted records and would, in 
fact, optimize queries by eliminating wasted I/O.   Perhaps worth it for slowly 
changing indexes.   Seems like the Tiered merge policy is 90% there ...Or 
am I all wet (again)?

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Monday, June 29, 2015 10:39 AM
To: solr-user@lucene.apache.org
Subject: Re: optimize status

Optimize is a manual full merge.

Solr automatically merges segments as needed. This also expunges deleted 
documents.

We really need to rename optimize to force merge. Is there a Jira for that?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Jun 29, 2015, at 5:15 AM, Steven White swhite4...@gmail.com wrote:

 Hi Upayavira,
 
 This is news to me that we should not optimize and index.
 
 What about disk space saving, isn't optimization to reclaim disk space 
 or is Solr somehow does that?  Where can I read more about this?
 
 I'm on Solr 5.1.0 (may switch to 5.2.1)
 
 Thanks
 
 Steve
 
 On Mon, Jun 29, 2015 at 4:16 AM, Upayavira u...@odoko.co.uk wrote:
 
 I'm afraid I don't understand. You're saying that optimising is 
 causing performance issues?
 
 Simple solution: DO NOT OPTIMIZE!
 
 Optimisation is very badly named. What it does is squashes all 
 segments in your index into one segment, removing all deleted 
 documents. It is good to get rid of deletes - in that sense the index is 
 optimized.
 However, future merges become very expensive. The best way to handle 
 this topic is to leave it to Lucene/Solr to do it for you. Pretend 
 the optimize option never existed.
 
 This is, of course, assuming you are using something like Solr 3.5+.
 
 Upayavira
 
 On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
 
 Have to cause of performance issues.
 Just want to know if there is a way to tap into the status.
 
 On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
 
 Bigger question, why are you optimizing? Since 3.6 or so, it 
 generally hasn't been requires, even, is a bad thing.
 
 Upayavira
 
 On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
 Hi All,
 
 I have two indexers (Independent processes ) writing to a common 
 solr core.
 If One indexer process issued an optimize on the core I want the 
 second indexer to wait adding docs until the optimize has 
 finished.
 
 Are there ways I can do this programmatically?
 pinging the core when the optimize is happening is returning OK
 because
 technically
 solr allows you to update when an optimize is happening.
 
 any suggestions ?
 
 thanks,
 Summer
 


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*



Re: Jetty Plus for Solr 4.10.4

2015-06-29 Thread Shawn Heisey
On 6/29/2015 8:44 AM, Tarala, Magesh wrote:
 We are planning to go to production with Solr 4.10.4. Documentation 
 recommends to use full Jetty package that includes JettyPlus. I'm not able to 
 find the instructions to do this. Can someone point me in the right direction?

I found the official page that talks about JettyPlus.

https://wiki.apache.org/solr/SolrJetty

Note at the top of the page where it says that info is outdated for
Jetty 8.  Solr has been using Jetty 8 since version 4.0-ALPHA -- for
nearly three years now.  Typical use cases for Solr do *not* require a
full Jetty install.  Even most non-typical use cases do not require it.

Solr 4.10 includes the bin/solr script for startup, which runs the Jetty
that's included in the Solr download.  Solr 5.x makes those scripts even
better.

If you haven't made it to production yet, you should probably consider
upgrading to Solr 5.2.1.

If you are not going to use the Jetty included with Solr, then you're
pretty much on your own. You can take the war file from the dist
directory, the logging jars from the example/lib/ext directory, and the
logging config from example/resources, and install it in most of the
available servlet containers.

Starting with 5.0, the included Jetty is the only officially supported
way to start Solr, and the war is no longer included in the dist
directory in the download.

https://wiki.apache.org/solr/WhyNoWar

Thanks,
Shawn



Re: cursorMark and timeAllowed are mutually exclusive?

2015-06-29 Thread Chris Hostetter

:  Have nothing found in the ref guides, docs, wiki, examples about this 
mutually
:  exclusive parameters.
: 
:  Is this a bug or a feature and if it is a feature, where is the sense of 
this?

The problem is that if a timeAllowed exceeded situation pops up, you won't 
get a nextCursorMark to use -- or the one you get might be wrong, and 
could trigger infinit looping.


code doesn't really know about hte cursorMark code, so if a timeAllowed 
exceeded siutation pops up, you might not get a nextCursorMark in your 
response, which i considered unacceptible.  if you ask for a cursorMark, 
you get a cursor mark.  if you ask for a cursor mark and include other 
options that make it possible we can't do that, it's an error.

With a bit of work, both could probably be supported in combination -- but 
for now it's untested, and thus unsupported, so we put in that error 
message to make it clear and to guard end users against the risk of 
nonsensical results.

 Yes, I'm using timeAllowed which is set in my requestHandler as
 invariant to 6 (60 seconds) as a limit to killer searches.

Your best is porbably to confine your cursorMark searches to an alternate 
request handler, not used by your normal arbitrary queries, that doesn't 
have the timeAllowed invariant.



-Hoss
http://www.lucidworks.com/


Questions regarding autosuggest (Solr 5.2.1)

2015-06-29 Thread Thomas Michael Engelke
 

 A friend and I are trying to develop some software using Solr in the
background, and with that comes alot of changes. We're used to older
versions (4.3 and below). We especially have problems with the
autosuggest feature.

This is the field definition (schema.xml) for our autosuggest field:

field name=autosuggest type=autosuggest indexed=true
stored=true required=false multiValued=true /
...
copyField source=name dest=autosuggest /
...
fieldType name=autosuggest class=solr.TextField
positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=0
splitOnNumerics=1 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=0 catenateAll=0
preserveOriginal=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true enablePositionIncrements=true format=snowball/
 filter class=solr.DictionaryCompoundWordTokenFilterFactory
dictionary=dictionary.txt minWordSize=5 minSubwordSize=3
maxSubwordSize=30 onlyLongestMatch=false/
 filter class=solr.GermanNormalizationFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=German2
protected=protwords.txt/
 filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=30/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=0
splitOnNumerics=1 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=0 catenateAll=0
preserveOriginal=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true enablePositionIncrements=true format=snowball/
 filter class=solr.GermanNormalizationFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=German2
protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
/fieldType

Afterwards, we defined an autosuggest component to use this field, like
this (solrconfig.xml):

searchComponent name=suggest class=solr.SuggestComponent
 lst name=suggester
 str name=namemySuggester/str
 str name=lookupImplFuzzyLookupFactory/str
 str name=storeDirsuggester_fuzzy_dir/str
 str name=dictionaryImplDocumentDictionaryFactory/str
 str name=fieldsuggest/str
 str name=suggestAnalyzerFieldTypeautosuggest/str
 str name=buildOnStartupfalse/str
 str name=buildOnCommitfalse/str
 /lst
/searchComponent

And add a requesthandler to test out the functionality:

requestHandler name=/suggesthandler class=solr.SearchHandler
startup=lazy 
 lst name=defaults
 str name=suggesttrue/str
 str name=suggest.count10/str
 str name=suggest.dictionarymySuggester/str
 /lst
 arr name=components
 strsuggest/str
 /arr
/requestHandler

However, trying to start the core that has this configuration, a long
exception occurs, telling us this:

Error in configuration: autosuggest is not defined in the schema

Now, that seems to be wrong. Any idea how to fix that? 

RE: Jetty Plus for Solr 4.10.4

2015-06-29 Thread Tarala, Magesh
Hi Shawn - Thank you for the quick and detailed response!! 

Good to hear that Jetty 8 installation with solr for typical uses does not need 
to be modified. 

I believe what we have is a typical use case. We will be installing solr on 3 
nodes in our Hadoop cluster. Will use Hadoop's zookeeper. One collection with 3 
shards and 2 replicas each. Have not benchmarked performance. So, may need more 
shards, nodes,... Data volume and user volumes are not very high. But we are 
using nested document structure. We are concerned that it may introduce 
performance issues. Will check it out. 

Regarding you recommendation to upgrade to Solr 5.2.1, we have Hortonworks HDP 
2.2 in place and they support 4.10. Will revisit the decision. 

Thanks,
Magesh

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Monday, June 29, 2015 11:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Jetty Plus for Solr 4.10.4

On 6/29/2015 8:44 AM, Tarala, Magesh wrote:
 We are planning to go to production with Solr 4.10.4. Documentation 
 recommends to use full Jetty package that includes JettyPlus. I'm not able to 
 find the instructions to do this. Can someone point me in the right direction?

I found the official page that talks about JettyPlus.

https://wiki.apache.org/solr/SolrJetty

Note at the top of the page where it says that info is outdated for Jetty 8.  
Solr has been using Jetty 8 since version 4.0-ALPHA -- for nearly three years 
now.  Typical use cases for Solr do *not* require a full Jetty install.  Even 
most non-typical use cases do not require it.

Solr 4.10 includes the bin/solr script for startup, which runs the Jetty that's 
included in the Solr download.  Solr 5.x makes those scripts even better.

If you haven't made it to production yet, you should probably consider 
upgrading to Solr 5.2.1.

If you are not going to use the Jetty included with Solr, then you're pretty 
much on your own. You can take the war file from the dist directory, the 
logging jars from the example/lib/ext directory, and the logging config from 
example/resources, and install it in most of the available servlet containers.

Starting with 5.0, the included Jetty is the only officially supported way to 
start Solr, and the war is no longer included in the dist directory in the 
download.

https://wiki.apache.org/solr/WhyNoWar

Thanks,
Shawn



RE: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-29 Thread Dinesh Naik
Hi Eric,
By compressed value I meant value of a field after removing special characters 
. In my example its -. Compressed form of red-apple is redapple .

I wanted to know if we can see the analyzed version of fields .

For example if I use ngram on a field , how do I see the analyzed values in 
index ?


 

-Original Message-
From: Erick Erickson erickerick...@gmail.com
Sent: ‎29-‎06-‎2015 18:12
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Subject: Re: Reading indexed data from solr 5.1.0 using admin/luke?

Not quite sure what you mean by compressed values. admin/luke
doesn't show the results of the compression of the stored values, there's
no way I know of to do that.

Best,
Erick

On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik dineshkumarn...@gmail.com wrote:
 Hi all,

 Is there a way to read the indexed data for field on which the
 analysis/processing  has been done ?

 I know using admin GUI we can see field wise analysis But how can i get
 hold on the complete document using admin/luke? or any other way?

 For example, if i have 2 fields called name and compressedname.

 name has values like apple, green-apple,red-apple
 compressedname has values like apple,greenapple,redapple

 Even though i make both these field indexed=true and stored=true

 I am not able to see the compressed values using admin/luke?id=mydocid

 in response i see something like this-


 lst name=name 
 str name=typestring/str
 str name=schemaITS--/str
 str name=flagsITS--/str
 str name=valueGREEN-APPLE/str
 str name=internalGREEN-APPLE/str
 float name=boost1.0/float
 int name=docFreq0/int
 /lst
 lst name=compressedname
 str name=typestring/str
 str name=schemaITS--/str
 str name=flagsITS--/str
 str name=valueGREEN-APPLE/str
 str name=internalGREEN-APPLE/str
 float name=boost1.0/float
 int name=docFreq0/int
 /lst



 --
 Best Regards,
 Dinesh Naik


Re: optimize status

2015-06-29 Thread Steven White
Thank you guys, this was very helpful.  I was always under the impression
that the index need to be optimize periodically to reclaim disk space
otherwise the index will just keep on growing and growing (was that the
case in Lucene 2.x and prior days?).

I agree with Walter, renaming optimize to something else, even “force
merge” is better.  However, make sure it has the proper documentation
explaining what it does and why it's not worthy for live data.

Steve

On Mon, Jun 29, 2015 at 12:37 PM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:

 Is there really a good reason to consolidate down to a single segment?

 Any incremental query performance benefit is tiny compared to the loss of
 managability.

 I.e. shouldn't segments _always_ be kept small enough to facilitate
 re-balancing data across shards?   Even in non-cloud instances this is
 true.  When a collection grows, you may want shard/split an existing index
 by adding a node and moving some segments around.Isn't this the
 direction Solr is going?   With many, smaller segments, this is feasible.
 With one big segment, the collection must always be reindexed.

 Thus, optimize would mean, get rid of all deleted records and would,
 in fact, optimize queries by eliminating wasted I/O.   Perhaps worth it for
 slowly changing indexes.   Seems like the Tiered merge policy is 90% there
 ...Or am I all wet (again)?

 -Original Message-
 From: Walter Underwood [mailto:wun...@wunderwood.org]
 Sent: Monday, June 29, 2015 10:39 AM
 To: solr-user@lucene.apache.org
 Subject: Re: optimize status

 Optimize is a manual full merge.

 Solr automatically merges segments as needed. This also expunges deleted
 documents.

 We really need to rename optimize to force merge. Is there a Jira for
 that?

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)

 On Jun 29, 2015, at 5:15 AM, Steven White swhite4...@gmail.com wrote:

  Hi Upayavira,
 
  This is news to me that we should not optimize and index.
 
  What about disk space saving, isn't optimization to reclaim disk space
  or is Solr somehow does that?  Where can I read more about this?
 
  I'm on Solr 5.1.0 (may switch to 5.2.1)
 
  Thanks
 
  Steve
 
  On Mon, Jun 29, 2015 at 4:16 AM, Upayavira u...@odoko.co.uk wrote:
 
  I'm afraid I don't understand. You're saying that optimising is
  causing performance issues?
 
  Simple solution: DO NOT OPTIMIZE!
 
  Optimisation is very badly named. What it does is squashes all
  segments in your index into one segment, removing all deleted
  documents. It is good to get rid of deletes - in that sense the index
 is optimized.
  However, future merges become very expensive. The best way to handle
  this topic is to leave it to Lucene/Solr to do it for you. Pretend
  the optimize option never existed.
 
  This is, of course, assuming you are using something like Solr 3.5+.
 
  Upayavira
 
  On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
 
  Have to cause of performance issues.
  Just want to know if there is a way to tap into the status.
 
  On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
 
  Bigger question, why are you optimizing? Since 3.6 or so, it
  generally hasn't been requires, even, is a bad thing.
 
  Upayavira
 
  On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
  Hi All,
 
  I have two indexers (Independent processes ) writing to a common
  solr core.
  If One indexer process issued an optimize on the core I want the
  second indexer to wait adding docs until the optimize has
  finished.
 
  Are there ways I can do this programmatically?
  pinging the core when the optimize is happening is returning OK
  because
  technically
  solr allows you to update when an optimize is happening.
 
  any suggestions ?
 
  thanks,
  Summer
 


 *
 This e-mail may contain confidential or privileged information.
 If you are not the intended recipient, please notify the sender
 immediately and then delete it.

 TIAA-CREF
 *




solr suggester build issues

2015-06-29 Thread Rajesh Hazari
Solr : 4.9.x , with simple solr cloud on jetty.
JDK 1.7
num of replica : 4 , one replica for each shard
num of shards : 1

Hi All,

I have been facing below issues with solr suggester introduced in 4.7.x. Do
any one have good working solution or

buildOnCommit=true property is suggested not to use with index with more
frequent softcommits as suggested in the documentation
   https://cwiki.apache.org/confluence/display/solr/Suggester
So we have disabled this (buildOnCommit=false) and started using
buildOnOptimize=true, which was not helping us to have latest document
 suggestion (with frequent softcommits),
as hardly there was one optimize each day. (we have default optimize
setting in solrconfig)
So we have disabled buildOnOptimize (buildOnOptimize=false)

As suggested in the documentation, as of now, we came up with cron jobs to
build the suggester for every hour.
These jobs are doing their job, i.e, we are having the latest suggestions
available every hour, below are issues that we have this implementation.

*Issue#1* : Suggest built url i.e,
*http://$solrnode:8983/solr/collection1/suggest?suggest.build=true*  if
issued to one replica of solr cloud does not build suggesters in all of the
replicas in solrcloud.
Resolution: For which we have separate cron jobs on each of the
solr instance having the build call to build the suggester, below is the
raw pictorial representation of this impl
(which is not the best implementation which has
many flaws)


*http://$solrnode:8983/solr/collection1/suggest?suggest.build=true*
* |*
* |--
suggestcron.job.sh http://suggestcron.job.sh (on solr1.aws.instance)*


*http://$solrnode:8983/solr/collection1/suggest?suggest.build=true*
* |*
* |--
suggestcron.job.sh http://suggestcron.job.sh (on solr2.aws.instance)*
*  .. similar for other solr nodes*
* We will be coming up with single script to go this for all
collection later.*

we were bit happy that we are having a updated suggester in all of the
instances, *which is not!*

*The issue#2 the suggester built on all solr nodes were not consistent as
the solr core in each solr replica have difference in max-docs and
num-docs *
*(which is quiet normal **with frequent softcommits , when updates mostly
have the same documents updated with different data, **i guess , correct me
if i'm wrong )*

when we query curl -i http://
$solrnode:8983/solr/liveaodfuture/suggest?q=Nirvanawt=jsonindent=true

one of the solr node returns
{
  responseHeader:{
status:0,
QTime:0},
  suggest:{
AnalyzingSuggester:{
  Nirvana:{
numFound:1,
suggestions:[{
term:nirvana,
weight:6,
payload:}]}},
DictionarySuggester:{
  Nirvana:{
numFound:0,
suggestions:[]

/admin/luke/collection/ call status

index:{
numDocs:90564,
maxDoc:94583,
deletedDocs:4019,
...}


while other 3 solr node returns

{
  responseHeader:{
status:0,
QTime:1},
  suggest:{
AnalyzingSuggester:{
  Nirvana:{
numFound:2,
suggestions:[{
term:nirvana,
weight:163,
payload:},
*  {*
*term:nirvana cover,*
*weight:11,*
*payload:}]}},*
DictionarySuggester:{
  Nirvana:{
numFound:0,
suggestions:[]

/admin/luke/collection/ call status on other 3 solr nodes... which have
different maxDoc that the above solr node.

index:{
numDocs:90564,
maxDoc:156760,
}

when i check the built time for suggest directory of the collection on each
solr node have the same time

ls -lah /mnt/solrdrive/solr/cores/*/data/suggest_analyzing/*
-rw-r--r-- 1 root root 3.0M May 20 16:00
/mnt/solrdrive/solr/cores/collection1_shard1_replica3/data/suggest_analyzing/wfsta.bin

Questions:
Does the suggester built url i.e,
*http://$solrnode:8983/solr/collection1/suggest?suggest.build=true
*consider maxdocs or deleted docs also?
  Does the suggester built from  i.e,
*solr/collection1/suggest?suggest.build=true
*is different from buildOnCommit=true property ?
   Do any one have better solution to keep the suggester current
with contents in the index with more frequent softcommits?

   Does solr have any component like scheduler like cron scheduler
to schedule the suggest build and
 scheduling the optimize on daily basis ?


*Thanks,*
*Rajesh**.*


RE: optimize status

2015-06-29 Thread Garth Grimm
 Is there really a good reason to consolidate down to a single segment?

Archiving (as one example).  Come July 1, the collection for log
entries/transactions in June will never be changed, so optimizing is
actually a good thing to do.

Kind of getting away from OP's question on this, but I don't think the
ability to move data between shards in SolrCloud (such as shard splitting)
has much to do with the Lucene segments under the hood.  I'm just guessing,
but I'd think the main issue with shard splitting would be to ensure that
document route ranges are handled properly, and I don't think the value used
for routing has anything to do with what segment they happen to be stored
into.

-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] 
Sent: Monday, June 29, 2015 11:38 AM
To: solr-user@lucene.apache.org
Subject: RE: optimize status

Is there really a good reason to consolidate down to a single segment?

Any incremental query performance benefit is tiny compared to the loss of
managability.   

I.e. shouldn't segments _always_ be kept small enough to facilitate
re-balancing data across shards?   Even in non-cloud instances this is true.
When a collection grows, you may want shard/split an existing index by
adding a node and moving some segments around.Isn't this the direction
Solr is going?   With many, smaller segments, this is feasible.  With one
big segment, the collection must always be reindexed.

Thus, optimize would mean, get rid of all deleted records and would, in
fact, optimize queries by eliminating wasted I/O.   Perhaps worth it for
slowly changing indexes.   Seems like the Tiered merge policy is 90% there
...Or am I all wet (again)?

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Monday, June 29, 2015 10:39 AM
To: solr-user@lucene.apache.org
Subject: Re: optimize status

Optimize is a manual full merge.

Solr automatically merges segments as needed. This also expunges deleted
documents.

We really need to rename optimize to force merge. Is there a Jira for
that?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Jun 29, 2015, at 5:15 AM, Steven White swhite4...@gmail.com wrote:

 Hi Upayavira,
 
 This is news to me that we should not optimize and index.
 
 What about disk space saving, isn't optimization to reclaim disk space 
 or is Solr somehow does that?  Where can I read more about this?
 
 I'm on Solr 5.1.0 (may switch to 5.2.1)
 
 Thanks
 
 Steve
 
 On Mon, Jun 29, 2015 at 4:16 AM, Upayavira u...@odoko.co.uk wrote:
 
 I'm afraid I don't understand. You're saying that optimising is 
 causing performance issues?
 
 Simple solution: DO NOT OPTIMIZE!
 
 Optimisation is very badly named. What it does is squashes all 
 segments in your index into one segment, removing all deleted 
 documents. It is good to get rid of deletes - in that sense the index is
optimized.
 However, future merges become very expensive. The best way to handle 
 this topic is to leave it to Lucene/Solr to do it for you. Pretend 
 the optimize option never existed.
 
 This is, of course, assuming you are using something like Solr 3.5+.
 
 Upayavira
 
 On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
 
 Have to cause of performance issues.
 Just want to know if there is a way to tap into the status.
 
 On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
 
 Bigger question, why are you optimizing? Since 3.6 or so, it 
 generally hasn't been requires, even, is a bad thing.
 
 Upayavira
 
 On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
 Hi All,
 
 I have two indexers (Independent processes ) writing to a common 
 solr core.
 If One indexer process issued an optimize on the core I want the 
 second indexer to wait adding docs until the optimize has 
 finished.
 
 Are there ways I can do this programmatically?
 pinging the core when the optimize is happening is returning OK
 because
 technically
 solr allows you to update when an optimize is happening.
 
 any suggestions ?
 
 thanks,
 Summer
 


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately
and then delete it.

TIAA-CREF
*




Re: optimize status

2015-06-29 Thread Toke Eskildsen
Reitzel, Charles charles.reit...@tiaa-cref.org wrote:
 Question, Toke: in your immutable cases, don't the benefits of
 optimizing come mostly from eliminating deleted records?

Not for us. We have about 1 deleted document for every 1000 or 10.000 standard 
documents.

 Is there any material difference in heap, CPU, etc. between 1, 5 or 10 
 segments?
 I.e. at how many segments/shard do you see a noticeable performance hit?

It really is either 1 or more than 1 segment, coupled with 0 deleted records or 
more than 0.

Having 1 segment means that String faceting benefits from not having to map 
between segment ordinals and global ordinals. That's a speed increase (just a 
null check instead of a memory lookup) as well as a heap requirement reduction: 
We save 2GB+ heap per shard on that account (our current heap size is 8GB). 
Granted, we facet on 600M values for one of the fields, which I don't think is 
very common.

0 deleted records is related as the usual bitmap of deleted documents is null, 
meaning faster checks.

Most of the performance benefit probably comes from the freed memory. We have 
25 shards/machine, so sparing 2GB gives us an extra 50GB of disk cache. The 
performance increase for that is 20-40%, guesstimated from some previous tests 
where we varied the disk cache size.


I doubt that there is much difference between 2, 5, 10 or even 20 segments. The 
persons at UKWA are running some tests on different degrees of optimization of 
their 30 shard TB-class index. You'll have to dig a bit, but there might be 
relevant results: https://github.com/ukwa/shine/tree/master/python/test-logs

 Also, I curious if you have experimented much with the maxMergedSegmentMB
 and reclaimDeletesWeight  properties of the TieredMergePolicy?

I have zero experience with that: We build the shards one at a time and don't 
touch them after that. 90% of our building power goes to Tika analysis, so 
there hasn't been a apparent need for tuning Solr's indexing.

- Toke Eskildsen


Re: optimize status

2015-06-29 Thread Toke Eskildsen
Reitzel, Charles charles.reit...@tiaa-cref.org wrote:
 Is there really a good reason to consolidate down to a single segment?

In the  scenario spawning this thread it does not seem to be the best choice. 
Speaking more broadly there are Solr setups out there that deals with immutable 
data, often tied to a point in time, e.g. log data. We have such a setup 
(harvested web resources) and are able to lower heap requirements significantly 
and increase speed by building fully optimized and immutable shards.

 Any incremental query performance benefit is tiny compared to the loss of 
 managability.

True in many cases and I agree that the Optimize-wording is a bit of a trap. 
While technically correct, it implies that one should do it occasionally to 
keep any index fit. A different wording and maybe a tooltip saying something 
like Only recommended for non-changing indexes might be better.

Turning it around: To minimize the risk of occasional performance-degrading 
large merges, one might want an index where all the shards are below a certain 
size. Splitting larger shards into smaller ones would in that case also be an 
optimization, just towards a different goal.

- Toke Eskildsen


Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-29 Thread Upayavira
Use the schema browser on the admin UI, and click the load term info
button. It'll show you the terms in your index.

You can also use the analysis tab which will show you how it would
tokenise stuff for a specific field.

Upayavira

On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
 Hi Eric,
 By compressed value I meant value of a field after removing special
 characters . In my example its -. Compressed form of red-apple is
 redapple .
 
 I wanted to know if we can see the analyzed version of fields .
 
 For example if I use ngram on a field , how do I see the analyzed values
 in index ?
 
 
  
 
 -Original Message-
 From: Erick Erickson erickerick...@gmail.com
 Sent: ‎29-‎06-‎2015 18:12
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Subject: Re: Reading indexed data from solr 5.1.0 using admin/luke?
 
 Not quite sure what you mean by compressed values. admin/luke
 doesn't show the results of the compression of the stored values, there's
 no way I know of to do that.
 
 Best,
 Erick
 
 On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik dineshkumarn...@gmail.com
 wrote:
  Hi all,
 
  Is there a way to read the indexed data for field on which the
  analysis/processing  has been done ?
 
  I know using admin GUI we can see field wise analysis But how can i get
  hold on the complete document using admin/luke? or any other way?
 
  For example, if i have 2 fields called name and compressedname.
 
  name has values like apple, green-apple,red-apple
  compressedname has values like apple,greenapple,redapple
 
  Even though i make both these field indexed=true and stored=true
 
  I am not able to see the compressed values using admin/luke?id=mydocid
 
  in response i see something like this-
 
 
  lst name=name 
  str name=typestring/str
  str name=schemaITS--/str
  str name=flagsITS--/str
  str name=valueGREEN-APPLE/str
  str name=internalGREEN-APPLE/str
  float name=boost1.0/float
  int name=docFreq0/int
  /lst
  lst name=compressedname
  str name=typestring/str
  str name=schemaITS--/str
  str name=flagsITS--/str
  str name=valueGREEN-APPLE/str
  str name=internalGREEN-APPLE/str
  float name=boost1.0/float
  int name=docFreq0/int
  /lst
 
 
 
  --
  Best Regards,
  Dinesh Naik


Re: Questions regarding autosuggest (Solr 5.2.1)

2015-06-29 Thread Erick Erickson
Try not putting it in double quotes?

Best,
Erick

On Mon, Jun 29, 2015 at 12:22 PM, Thomas Michael Engelke
thomas.enge...@posteo.de wrote:


  A friend and I are trying to develop some software using Solr in the
 background, and with that comes alot of changes. We're used to older
 versions (4.3 and below). We especially have problems with the
 autosuggest feature.

 This is the field definition (schema.xml) for our autosuggest field:

 field name=autosuggest type=autosuggest indexed=true
 stored=true required=false multiValued=true /
 ...
 copyField source=name dest=autosuggest /
 ...
 fieldType name=autosuggest class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=0
 splitOnNumerics=1 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=0 catenateAll=0
 preserveOriginal=0/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory words=stopwords.txt
 ignoreCase=true enablePositionIncrements=true format=snowball/
  filter class=solr.DictionaryCompoundWordTokenFilterFactory
 dictionary=dictionary.txt minWordSize=5 minSubwordSize=3
 maxSubwordSize=30 onlyLongestMatch=false/
  filter class=solr.GermanNormalizationFilterFactory/
  filter class=solr.SnowballPorterFilterFactory language=German2
 protected=protwords.txt/
  filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=30/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=0
 splitOnNumerics=1 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=0 catenateAll=0
 preserveOriginal=0/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory words=stopwords.txt
 ignoreCase=true enablePositionIncrements=true format=snowball/
  filter class=solr.GermanNormalizationFilterFactory/
  filter class=solr.SnowballPorterFilterFactory language=German2
 protected=protwords.txt/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
 /fieldType

 Afterwards, we defined an autosuggest component to use this field, like
 this (solrconfig.xml):

 searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester
  str name=namemySuggester/str
  str name=lookupImplFuzzyLookupFactory/str
  str name=storeDirsuggester_fuzzy_dir/str
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldsuggest/str
  str name=suggestAnalyzerFieldTypeautosuggest/str
  str name=buildOnStartupfalse/str
  str name=buildOnCommitfalse/str
  /lst
 /searchComponent

 And add a requesthandler to test out the functionality:

 requestHandler name=/suggesthandler class=solr.SearchHandler
 startup=lazy 
  lst name=defaults
  str name=suggesttrue/str
  str name=suggest.count10/str
  str name=suggest.dictionarymySuggester/str
  /lst
  arr name=components
  strsuggest/str
  /arr
 /requestHandler

 However, trying to start the core that has this configuration, a long
 exception occurs, telling us this:

 Error in configuration: autosuggest is not defined in the schema

 Now, that seems to be wrong. Any idea how to fix that?


Re: Correcting text at index time

2015-06-29 Thread Walter Underwood
Yes, do this in an update request processor before it gets to the analyzer 
chain.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Jun 29, 2015, at 3:19 PM, Erick Erickson erickerick...@gmail.com wrote:

 Hmmm, very hard to do currently. The _point_ of stored fields is that
 an exact, verbatim
 copy of the input is returned in fl lists and this is violating that
 promise. I suppose some
 kind of custom update processor could work, but it's really roll your
 own funcitonality
 I think.
 
 Best,
 Erick
 
 On Mon, Jun 29, 2015 at 8:38 AM, hossmaa andreea.hossm...@gmail.com wrote:
 Hi Markus
 
 Thanks for the reply. I'm already using the Synonyms filter and it is
 working fine (i.e., when I search for customer, it also returns documents
 containing cst.).
 What the synonyms filter does not do is to actually replace the word cst.
 with customer in the document.
 
 Just to be clearer: in the returned results, I do not want to see the word
 cst. any more (it should be permanently replaced with customer). I want
 to only see the expanded form.
 
 Cheers
 A.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4214643.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: optimize status

2015-06-29 Thread Upayavira
For the sake of history, somewhere around Solr/Lucene 3.2 a new
MergePolicy was introduced. The old one merged simply based upon age,
or index generation, meaning the older the segment, the less likely it
would get merged, hence needing optimize to clear out deletes from your
older segments.

The new MergePolicy, the TieredMergePolicy, uses a more intelligent
algorithm to decide which segments to merge, and is the single reason
why optimization isn't recommended anymore. According to the javadocs:

For normal merging, this policy first computes a budget of how many
segments are allowed to be in the index. If the index is over-budget,
then the policy sorts segments by decreasing size (pro-rating by percent
deletes), and then finds the least-cost merge. Merge cost is measured by
a combination of the skew of the merge (size of largest segment
divided by smallest segment), total merge size and percent deletes
reclaimed, so that merges with lower skew, smaller size and those
reclaiming more deletes, are favored.

If a merge will produce a segment that's larger than
setMaxMergedSegmentMB(double), then the policy will merge fewer segments
(down to 1 at once, if that one has deletions) to keep the segment size
under budget.

Upayavira


On Mon, Jun 29, 2015, at 08:55 PM, Toke Eskildsen wrote:
 Reitzel, Charles charles.reit...@tiaa-cref.org wrote:
  Is there really a good reason to consolidate down to a single segment?
 
 In the  scenario spawning this thread it does not seem to be the best
 choice. Speaking more broadly there are Solr setups out there that deals
 with immutable data, often tied to a point in time, e.g. log data. We
 have such a setup (harvested web resources) and are able to lower heap
 requirements significantly and increase speed by building fully optimized
 and immutable shards.
 
  Any incremental query performance benefit is tiny compared to the loss of 
  managability.
 
 True in many cases and I agree that the Optimize-wording is a bit of a
 trap. While technically correct, it implies that one should do it
 occasionally to keep any index fit. A different wording and maybe a
 tooltip saying something like Only recommended for non-changing indexes
 might be better.
 
 Turning it around: To minimize the risk of occasional
 performance-degrading large merges, one might want an index where all the
 shards are below a certain size. Splitting larger shards into smaller
 ones would in that case also be an optimization, just towards a different
 goal.
 
 - Toke Eskildsen


Re: Correcting text at index time

2015-06-29 Thread Erick Erickson
Hmmm, very hard to do currently. The _point_ of stored fields is that
an exact, verbatim
copy of the input is returned in fl lists and this is violating that
promise. I suppose some
kind of custom update processor could work, but it's really roll your
own funcitonality
I think.

Best,
Erick

On Mon, Jun 29, 2015 at 8:38 AM, hossmaa andreea.hossm...@gmail.com wrote:
 Hi Markus

 Thanks for the reply. I'm already using the Synonyms filter and it is
 working fine (i.e., when I search for customer, it also returns documents
 containing cst.).
 What the synonyms filter does not do is to actually replace the word cst.
 with customer in the document.

 Just to be clearer: in the returned results, I do not want to see the word
 cst. any more (it should be permanently replaced with customer). I want
 to only see the expanded form.

 Cheers
 A.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4214643.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Correcting text at index time

2015-06-29 Thread Jack Krupansky
The regex replace processor can be used to do this:
https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html


-- Jack Krupansky

On Mon, Jun 29, 2015 at 6:20 PM, Walter Underwood wun...@wunderwood.org
wrote:

 Yes, do this in an update request processor before it gets to the analyzer
 chain.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)

 On Jun 29, 2015, at 3:19 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Hmmm, very hard to do currently. The _point_ of stored fields is that
  an exact, verbatim
  copy of the input is returned in fl lists and this is violating that
  promise. I suppose some
  kind of custom update processor could work, but it's really roll your
  own funcitonality
  I think.
 
  Best,
  Erick
 
  On Mon, Jun 29, 2015 at 8:38 AM, hossmaa andreea.hossm...@gmail.com
 wrote:
  Hi Markus
 
  Thanks for the reply. I'm already using the Synonyms filter and it is
  working fine (i.e., when I search for customer, it also returns
 documents
  containing cst.).
  What the synonyms filter does not do is to actually replace the word
 cst.
  with customer in the document.
 
  Just to be clearer: in the returned results, I do not want to see the
 word
  cst. any more (it should be permanently replaced with customer). I
 want
  to only see the expanded form.
 
  Cheers
  A.
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4214643.html
  Sent from the Solr - User mailing list archive at Nabble.com.




RE: optimize status

2015-06-29 Thread Reitzel, Charles
Question, Toke: in your immutable cases, don't the benefits of optimizing 
come mostly from eliminating deleted records?   Is there any material 
difference in heap, CPU, etc. between 1, 5 or 10 segments?   I.e. at how many 
segments/shard do you see a noticeable performance hit?

Also, I curious if you have experimented much with the maxMergedSegmentMB and 
reclaimDeletesWeight  properties of the TieredMergePolicy?

For frequently updated indexes, would setting maxMergedSegmentMB lower (say 512 
or 1024 MB, depending on total index size) and reclaimDeletesWeight higher (say 
2.5?) be a good best practice?

-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Monday, June 29, 2015 3:56 PM
To: solr-user@lucene.apache.org
Subject: Re: optimize status

Reitzel, Charles charles.reit...@tiaa-cref.org wrote:
 Is there really a good reason to consolidate down to a single segment?

In the  scenario spawning this thread it does not seem to be the best choice. 
Speaking more broadly there are Solr setups out there that deals with immutable 
data, often tied to a point in time, e.g. log data. We have such a setup 
(harvested web resources) and are able to lower heap requirements significantly 
and increase speed by building fully optimized and immutable shards.

 Any incremental query performance benefit is tiny compared to the loss of 
 managability.

True in many cases and I agree that the Optimize-wording is a bit of a trap. 
While technically correct, it implies that one should do it occasionally to 
keep any index fit. A different wording and maybe a tooltip saying something 
like Only recommended for non-changing indexes might be better.

Turning it around: To minimize the risk of occasional performance-degrading 
large merges, one might want an index where all the shards are below a certain 
size. Splitting larger shards into smaller ones would in that case also be an 
optimization, just towards a different goal.

- Toke Eskildsen

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*



Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-29 Thread Erick Erickson
You can also use the TermsComponent, that'll read the values from the indexed
fields.That gets the raw terms, they aren't grouped.

But you don't get the document. Reconstructing the doc from the
postings lists is
actually quite tedious. The Luke program (not request handler) has a
function that
does this, it's not fast though, more for troubleshooting than trying to do
anything in a production environment.

That said, I'm not quite sure what the current state of Luke is...

Best,
Erick

On Mon, Jun 29, 2015 at 5:25 PM, Upayavira u...@odoko.co.uk wrote:
 Use the schema browser on the admin UI, and click the load term info
 button. It'll show you the terms in your index.

 You can also use the analysis tab which will show you how it would
 tokenise stuff for a specific field.

 Upayavira

 On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote:
 Hi Eric,
 By compressed value I meant value of a field after removing special
 characters . In my example its -. Compressed form of red-apple is
 redapple .

 I wanted to know if we can see the analyzed version of fields .

 For example if I use ngram on a field , how do I see the analyzed values
 in index ?




 -Original Message-
 From: Erick Erickson erickerick...@gmail.com
 Sent: ‎29-‎06-‎2015 18:12
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Subject: Re: Reading indexed data from solr 5.1.0 using admin/luke?

 Not quite sure what you mean by compressed values. admin/luke
 doesn't show the results of the compression of the stored values, there's
 no way I know of to do that.

 Best,
 Erick

 On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik dineshkumarn...@gmail.com
 wrote:
  Hi all,
 
  Is there a way to read the indexed data for field on which the
  analysis/processing  has been done ?
 
  I know using admin GUI we can see field wise analysis But how can i get
  hold on the complete document using admin/luke? or any other way?
 
  For example, if i have 2 fields called name and compressedname.
 
  name has values like apple, green-apple,red-apple
  compressedname has values like apple,greenapple,redapple
 
  Even though i make both these field indexed=true and stored=true
 
  I am not able to see the compressed values using admin/luke?id=mydocid
 
  in response i see something like this-
 
 
  lst name=name 
  str name=typestring/str
  str name=schemaITS--/str
  str name=flagsITS--/str
  str name=valueGREEN-APPLE/str
  str name=internalGREEN-APPLE/str
  float name=boost1.0/float
  int name=docFreq0/int
  /lst
  lst name=compressedname
  str name=typestring/str
  str name=schemaITS--/str
  str name=flagsITS--/str
  str name=valueGREEN-APPLE/str
  str name=internalGREEN-APPLE/str
  float name=boost1.0/float
  int name=docFreq0/int
  /lst
 
 
 
  --
  Best Regards,
  Dinesh Naik


RE: optimize status

2015-06-29 Thread Reitzel, Charles
Hi Garth,

Yes, I'm straying from OP's question (I think Steve is all set).   But his 
question, quite naturally, comes up often and a similar discussion ensues each 
time.

I take your point about shards and segments being different things.  I 
understand that the hash ranges per segment are not kept in ZK.   I guess I 
wish they were.

In this regard, I liked Mongodb, uses a 2-level sharding scheme.   Each shard 
manages a list of  chunks, each has its own hash range which is kept in the 
cluster state.   If data needs to be balanced across nodes, it works at the 
chunk level.  No record/doc level I/O is necessary.   Much more targeted and 
only the data that needs to move is touched.  Solr does most things better than 
Mongo, imo.  But this is one area where the Mongo got it right.

As for your example, what benefit does an application gain by reducing 10 
segments, say, down to 1?   Even if the index never changes?   The gain _might_ 
be measurable, but it will be small compared to performance gains that can be 
had by maintaining a good data balance across nodes.

Your example is based on implicit routing.  So dynamic management of shards is 
less applicable.  I just hope you get similar volumes of data every year.   
Otherwise, some years will perform better than others due to unbalanced data 
distribution!

best,
Charlie


-Original Message-
From: Garth Grimm [mailto:gdgr...@yahoo.com.INVALID] 
Sent: Monday, June 29, 2015 1:15 PM
To: solr-user@lucene.apache.org
Subject: RE: optimize status

 Is there really a good reason to consolidate down to a single segment?

Archiving (as one example).  Come July 1, the collection for log 
entries/transactions in June will never be changed, so optimizing is actually a 
good thing to do.

Kind of getting away from OP's question on this, but I don't think the ability 
to move data between shards in SolrCloud (such as shard splitting) has much to 
do with the Lucene segments under the hood.  I'm just guessing, but I'd think 
the main issue with shard splitting would be to ensure that document route 
ranges are handled properly, and I don't think the value used for routing has 
anything to do with what segment they happen to be stored into.

-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
Sent: Monday, June 29, 2015 11:38 AM
To: solr-user@lucene.apache.org
Subject: RE: optimize status

Is there really a good reason to consolidate down to a single segment?

Any incremental query performance benefit is tiny compared to the loss of
managability.   

I.e. shouldn't segments _always_ be kept small enough to facilitate
re-balancing data across shards?   Even in non-cloud instances this is true.
When a collection grows, you may want shard/split an existing index by
adding a node and moving some segments around.Isn't this the direction
Solr is going?   With many, smaller segments, this is feasible.  With one
big segment, the collection must always be reindexed.

Thus, optimize would mean, get rid of all deleted records and would, in
fact, optimize queries by eliminating wasted I/O.   Perhaps worth it for
slowly changing indexes.   Seems like the Tiered merge policy is 90% there
...Or am I all wet (again)?

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Monday, June 29, 2015 10:39 AM
To: solr-user@lucene.apache.org
Subject: Re: optimize status

Optimize is a manual full merge.

Solr automatically merges segments as needed. This also expunges deleted 
documents.

We really need to rename optimize to force merge. Is there a Jira for that?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Jun 29, 2015, at 5:15 AM, Steven White swhite4...@gmail.com wrote:

 Hi Upayavira,
 
 This is news to me that we should not optimize and index.
 
 What about disk space saving, isn't optimization to reclaim disk space 
 or is Solr somehow does that?  Where can I read more about this?
 
 I'm on Solr 5.1.0 (may switch to 5.2.1)
 
 Thanks
 
 Steve
 
 On Mon, Jun 29, 2015 at 4:16 AM, Upayavira u...@odoko.co.uk wrote:
 
 I'm afraid I don't understand. You're saying that optimising is 
 causing performance issues?
 
 Simple solution: DO NOT OPTIMIZE!
 
 Optimisation is very badly named. What it does is squashes all 
 segments in your index into one segment, removing all deleted 
 documents. It is good to get rid of deletes - in that sense the index 
 is
optimized.
 However, future merges become very expensive. The best way to handle 
 this topic is to leave it to Lucene/Solr to do it for you. Pretend 
 the optimize option never existed.
 
 This is, of course, assuming you are using something like Solr 3.5+.
 
 Upayavira
 
 On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
 
 Have to cause of performance issues.
 Just want to know if there is a way to tap into the status.
 
 On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
 
 Bigger 

SOLR 5.1.0 DB dataimport handler from orientdb

2015-06-29 Thread Nauman Ramzan
Hi everyone !
I want to import data from orientdb in solr 5.1.0.
here is my configurations

*data-config.xml*

 dataConfig

 dataSource type=JdbcDataSource
 driver=com.orientechnologies.orient.jdbc.OrientJdbcDriver
 url=jdbc:orient:remote:localhost/emallates_combine user=root
 password=root batchSize=-1/

 document

 entity name=item query=select * from sellings

 deltaQuery=select * from sellings where updatedAt 
 '${dataimporter.last_index_time}'

 field name=id column=price /

 field name=title column=status /

 /entity

 /document

 /dataConfig


*JDBC* driver link http://orientdb.com/download/
and I paste this driver in* {solr_install_dir}/dist/orientdb-jdbc-2.0.5.jar*

my configuration is not showing any error nor any output.
here is log of solr after full/delta import call.

INFO  - 2015-06-29 12:37:24.894; [   DB]
 org.apache.solr.handler.dataimport.DataImporter; Loading DIH Configuration:
 db-data-config.xml

INFO  - 2015-06-29 12:37:24.899; [   DB]
 org.apache.solr.handler.dataimport.DataImporter; Data Configuration loaded
 successfully

INFO  - 2015-06-29 12:37:24.900; [   DB] org.apache.solr.core.SolrCore;
 [DB] webapp=/solr path=/dataimport
 params={debug=falseoptimize=falseindent=truecommit=trueclean=truewt=jsoncommand=full-importverbose=false}
 status=0 QTime=7

INFO  - 2015-06-29 12:37:24.902; [   DB]
 org.apache.solr.handler.dataimport.DataImporter; Starting Full Import

*WARN  - 2015-06-29 12:37:24.912; [   DB]
 org.apache.solr.handler.dataimport.SimplePropertiesWriter; Unable to read:
 dataimport.properties*

INFO  - 2015-06-29 12:37:24.914; [   DB] org.apache.solr.core.SolrCore;
 [DB] webapp=/solr path=/dataimport
 params={indent=truewt=jsoncommand=status_=1435567044911} status=0
 QTime=1

INFO  - 2015-06-29 12:37:24.942; [   DB]
 org.apache.solr.handler.dataimport.JdbcDataSource$1; Creating a connection
 for entity item with URL: jdbc:orient:remote:localhost/emallates_combine

INFO  - 2015-06-29 12:37:24.942; [   DB]
 org.apache.solr.update.processor.LogUpdateProcessor; [DB] webapp=/solr
 path=/dataimport
 params={debug=falseoptimize=falseindent=truecommit=trueclean=truewt=jsoncommand=full-importverbose=false}
 status=0 QTime=7 {deleteByQuery=*:* (-1505301149686693888)} 0 49

INFO  - 2015-06-29 12:37:32.992; [   DB] org.apache.solr.core.SolrCore;
 [DB] webapp=/solr path=/dataimport
 params={wt=jsoncommand=abort_=1435567052987} status=0 QTime=1

INFO  - 2015-06-29 12:37:33.000; [   DB] org.apache.solr.core.SolrCore;
 [DB] webapp=/solr path=/dataimport
 params={indent=truewt=jsoncommand=status_=1435567052997} status=0
 QTime=0


SOLR is not importing any data. What am I doing wrong ?
and second why am I getting above warming ?

Thank you
Regards
Nauman Ramzan


Re: optimize status

2015-06-29 Thread Upayavira
I'm afraid I don't understand. You're saying that optimising is causing
performance issues?

Simple solution: DO NOT OPTIMIZE!

Optimisation is very badly named. What it does is squashes all segments
in your index into one segment, removing all deleted documents. It is
good to get rid of deletes - in that sense the index is optimized.
However, future merges become very expensive. The best way to handle
this topic is to leave it to Lucene/Solr to do it for you. Pretend the
optimize option never existed.

This is, of course, assuming you are using something like Solr 3.5+.

Upayavira

On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
 
 Have to cause of performance issues. 
 Just want to know if there is a way to tap into the status. 
 
  On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
  
  Bigger question, why are you optimizing? Since 3.6 or so, it generally
  hasn't been requires, even, is a bad thing.
  
  Upayavira
  
  On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
  Hi All,
  
  I have two indexers (Independent processes ) writing to a common solr
  core.
  If One indexer process issued an optimize on the core 
  I want the second indexer to wait adding docs until the optimize has
  finished.
  
  Are there ways I can do this programmatically?
  pinging the core when the optimize is happening is returning OK because
  technically
  solr allows you to update when an optimize is happening. 
  
  any suggestions ?
  
  thanks,
  Summer


Re: need advice on parent child mulitple category

2015-06-29 Thread Darniz
hello

any advice please



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-advice-on-parent-child-mulitple-category-tp4214140p4214602.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimize status

2015-06-29 Thread Summer Shire

Have to cause of performance issues. 
Just want to know if there is a way to tap into the status. 

 On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
 
 Bigger question, why are you optimizing? Since 3.6 or so, it generally
 hasn't been requires, even, is a bad thing.
 
 Upayavira
 
 On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
 Hi All,
 
 I have two indexers (Independent processes ) writing to a common solr
 core.
 If One indexer process issued an optimize on the core 
 I want the second indexer to wait adding docs until the optimize has
 finished.
 
 Are there ways I can do this programmatically?
 pinging the core when the optimize is happening is returning OK because
 technically
 solr allows you to update when an optimize is happening. 
 
 any suggestions ?
 
 thanks,
 Summer


Re: need advice on parent child mulitple category

2015-06-29 Thread Mikhail Khludnev
http://wiki.apache.org/solr/HierarchicalFaceting

On Mon, Jun 29, 2015 at 11:27 AM, Darniz rnizamud...@edmunds.com wrote:

 hello

 any advice please



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/need-advice-on-parent-child-mulitple-category-tp4214140p4214602.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: optimize status

2015-06-29 Thread Upayavira
Bigger question, why are you optimizing? Since 3.6 or so, it generally
hasn't been requires, even, is a bad thing.

Upayavira

On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
 Hi All,
 
 I have two indexers (Independent processes ) writing to a common solr
 core.
 If One indexer process issued an optimize on the core 
 I want the second indexer to wait adding docs until the optimize has
 finished.
 
 Are there ways I can do this programmatically?
 pinging the core when the optimize is happening is returning OK because
 technically
 solr allows you to update when an optimize is happening. 
 
 any suggestions ?
 
 thanks,
 Summer


RE: optimize status

2015-06-29 Thread Reitzel, Charles
I see what you mean.   Many thanks for the details.   

-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Monday, June 29, 2015 6:36 PM
To: solr-user@lucene.apache.org
Subject: Re: optimize status

Reitzel, Charles charles.reit...@tiaa-cref.org wrote:
 Question, Toke: in your immutable cases, don't the benefits of 
 optimizing come mostly from eliminating deleted records?

Not for us. We have about 1 deleted document for every 1000 or 10.000 standard 
documents.

 Is there any material difference in heap, CPU, etc. between 1, 5 or 10 
 segments?
 I.e. at how many segments/shard do you see a noticeable performance hit?

It really is either 1 or more than 1 segment, coupled with 0 deleted records or 
more than 0.

Having 1 segment means that String faceting benefits from not having to map 
between segment ordinals and global ordinals. That's a speed increase (just a 
null check instead of a memory lookup) as well as a heap requirement reduction: 
We save 2GB+ heap per shard on that account (our current heap size is 8GB). 
Granted, we facet on 600M values for one of the fields, which I don't think is 
very common.

0 deleted records is related as the usual bitmap of deleted documents is null, 
meaning faster checks.

Most of the performance benefit probably comes from the freed memory. We have 
25 shards/machine, so sparing 2GB gives us an extra 50GB of disk cache. The 
performance increase for that is 20-40%, guesstimated from some previous tests 
where we varied the disk cache size.


I doubt that there is much difference between 2, 5, 10 or even 20 segments. The 
persons at UKWA are running some tests on different degrees of optimization of 
their 30 shard TB-class index. You'll have to dig a bit, but there might be 
relevant results: https://github.com/ukwa/shine/tree/master/python/test-logs

 Also, I curious if you have experimented much with the 
 maxMergedSegmentMB and reclaimDeletesWeight  properties of the 
 TieredMergePolicy?

I have zero experience with that: We build the shards one at a time and don't 
touch them after that. 90% of our building power goes to Tika analysis, so 
there hasn't been a apparent need for tuning Solr's indexing.

- Toke Eskildsen

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*



Re: optimize status

2015-06-29 Thread Summer Shire
Hi Upayavira and Erick,

There are two things we are talking about here.

First: Why am I optimizing? If I don’t our SEARCH (NOT INDEXING) performance is 
100% worst. 
The problem lies in the number of total segments. We have to have max segments 
1 or 2. 
I have done intensive performance related tests around number of segments, 
merge factor or changing the Merge policy.

Second: Solr does not perform better for me without an optimize. So now that I 
have to optimize the second issue
is updating concurrently during an optimize. If I update when an optimize is 
happening the optimize takes 5 times as long as
the normal optimize.

So is there any way other than creating a postOptimize hook and writing the 
status in a file and somehow making it available to the indexer. 
All of this just sounds traumatic :) 

Thanks
Summer


 On Jun 29, 2015, at 5:40 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 Steven:
 
 Yes, but
 
 First, here's Mike McCandles' excellent blog on segment merging:
 http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
 
 I think the third animation is the TieredMergePolicy. In short, yes an
 optimize will reclaim disk space. But as you update, this is done for
 you anyway. About the only time optimizing is at all beneficial is
 when you have a relatively static index. If you're continually
 updating documents, and by that I mean replacing some existing
 documents, then you'll immediately start generating holes in your
 index.
 
 And if you _do_ optimize, you wind up with a huge segment. And since
 the default policy tries to merge segments of roughly the same size,
 it accumulates deletes for quite a while before they merged away.
 
 And if you don't update existing docs or delete docs, then there's no
 wasted space anyway.
 
 Summer:
 
 First off, why do you care about not updating during optimizing?
 There's no good reason you have to worry about that, you can freely
 update while optimizing.
 
 But frankly I have to agree with Upayavira that on the face of it
 you're doing a lot of extra work. See above, but you optimize while
 indexing, so immediately you're rather defeating the purpose.
 Personally I'd only optimize relatively static indexes and, by
 definition, you're index isn't static since the second process is just
 waiting to modify it.
 
 Best,
 Erick
 
 On Mon, Jun 29, 2015 at 8:15 AM, Steven White swhite4...@gmail.com wrote:
 Hi Upayavira,
 
 This is news to me that we should not optimize and index.
 
 What about disk space saving, isn't optimization to reclaim disk space or
 is Solr somehow does that?  Where can I read more about this?
 
 I'm on Solr 5.1.0 (may switch to 5.2.1)
 
 Thanks
 
 Steve
 
 On Mon, Jun 29, 2015 at 4:16 AM, Upayavira u...@odoko.co.uk wrote:
 
 I'm afraid I don't understand. You're saying that optimising is causing
 performance issues?
 
 Simple solution: DO NOT OPTIMIZE!
 
 Optimisation is very badly named. What it does is squashes all segments
 in your index into one segment, removing all deleted documents. It is
 good to get rid of deletes - in that sense the index is optimized.
 However, future merges become very expensive. The best way to handle
 this topic is to leave it to Lucene/Solr to do it for you. Pretend the
 optimize option never existed.
 
 This is, of course, assuming you are using something like Solr 3.5+.
 
 Upayavira
 
 On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
 
 Have to cause of performance issues.
 Just want to know if there is a way to tap into the status.
 
 On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
 
 Bigger question, why are you optimizing? Since 3.6 or so, it generally
 hasn't been requires, even, is a bad thing.
 
 Upayavira
 
 On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
 Hi All,
 
 I have two indexers (Independent processes ) writing to a common solr
 core.
 If One indexer process issued an optimize on the core
 I want the second indexer to wait adding docs until the optimize has
 finished.
 
 Are there ways I can do this programmatically?
 pinging the core when the optimize is happening is returning OK
 because
 technically
 solr allows you to update when an optimize is happening.
 
 any suggestions ?
 
 thanks,
 Summer