Re: Solr is NoSQL database or not?

2014-03-03 Thread Charlie Hull

On 01/03/2014 23:53, Jack Krupansky wrote:

NoSQL? To me it's just a marketing term, like Big Data.


+1

Depends very much who you talk to. Marketing folks like to ride the 
current wave, so if NoSQL is current, they'll jump on that one, likewise 
Big Data. Technical types like to be correct in their definitions :)


C


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Solr Shard Query From Inside Search Component Sometimes Gives Wrong Results

2014-03-03 Thread Vishnu Mishra
Hi,
  I am using Solr 4.6 and  doing Solr query on shard from inside
Solr search component and try to use the obtained results for my custom
logic. I have used a Solrj for doing distributed search, but the result
coming from this distributed search vary some time.  So the my questions
are,

1.  Can we do distributed search from Solr Search component. 
2.  Do we need to handle concurrency between Solr server by using
synchronization or other technique. 

Is there a way to make a distributed search in the Solr Search Component and
get the matched documents from all the shards? If anyone have Idea then help
me.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shard-Query-From-Inside-Search-Component-Sometimes-Gives-Wrong-Results-tp4120840.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR cloud disaster recovery

2014-03-03 Thread Jan Van Besien
On Fri, Feb 28, 2014 at 7:50 PM, Per Steffensen st...@designware.dk wrote:
 I might be able to find something for you. Which version are you using - I
 have some scripts that work on 4.0 and some other scripts that work for 4.4
 (and maybe later).

This sounds useful. I am using 4.6.1.

Kind regards
Jan


Re: Slow query time on stemmed fields

2014-03-03 Thread Jens Meiners
Sorry for the delay,

I did not have access to the server and could not query anything.

This is my Query:
http://server:port
/solr/core/select?q=keyword1+keyword2wt=xmlindent=truehl.fragsize=120f.file_URI_tokenized.hl.fragsize=1000spellcheck=truef.file_content.hl.alternateField=spellhl.simple.pre=%3Cb%3Ehl.fl=file_URI_tokenized,xmp_title,file_contenthl=truerows=10fl=file_URI,file_URI_tokenized,file_name,file_lastModification,file_lastModification_raw,xmp_creation_date,xmp_title,xmp_content_type,score,file_URI,host,xmp_manual_summaryhl.snippets=1hl.useFastVectorHighlighter=truehl.maxAlternateFieldLength=120start=0q=itdz+berlinhl.simple.post=%3C/b%3Efq=file_readright:%22wiki-access%22debugQuery=truedefType=edismaxqf=file_URI_tokenized^10.0+file_content^10.0+xmp_title^5.0+spell^0.001pf=file_URI_tokenized~2^1.0+file_content~100^2.0+xmp_title~2^1.0

newly extended testing showed that the normal QTime without a search on the
spell field is expected to be about 713 while it turns out to be at 70503
with the stemming parameter included like in the url above. Therefor its
just 100x slower at the moment.

Here comes the debug:

lst name=debug
str name=rawquerystringkeyword1 keyword2/str
str name=querystringkeyword1 keyword2/str
str
name=parsedquery(+((DisjunctionMaxQuery((file_URI_tokenized:keyword1^10.0
| xmp_title:keyword1^5.0 | spell:keyword1^0.0010 |
file_content:keyword1^10.0))
DisjunctionMaxQuery((file_URI_tokenized:keyword2^10.0 |
xmp_title:keyword2^5.0 | spell:keyword2^0.0010 |
file_content:keyword2^10.0)))~2)
DisjunctionMaxQuery((file_URI_tokenized:keyword1 keyword2~2))
DisjunctionMaxQuery((file_content:keyword1 keyword2~100^2.0))
DisjunctionMaxQuery((xmp_title:keyword1 keyword2~2)))/no_coord/str
str name=parsedquery_toString+(((file_URI_tokenized:keyword1^10.0 |
xmp_title:keyword1^5.0 | spell:keyword1^0.0010 |
file_content:keyword1^10.0) (file_URI_tokenized:keyword2^10.0 |
xmp_title:keyword2^5.0 | spell:keyword2^0.0010 |
file_content:keyword2^10.0))~2) (file_URI_tokenized:keyword1 keyword2~2)
(file_content:keyword1 keyword2~100^2.0) (xmp_title:keyword1
keyword2~2)/str
lst name=explain
str name=.../str
str name=.../str
str name=.../str
str name=.../str
str name=.../str
str name=.../str
str name=.../str
str name=.../str
str name=.../str
str name=...
0.035045296 = (MATCH) sum of:
  0.035045296 = (MATCH) sum of:
0.0318122 = (MATCH) max of:
  8.29798E-4 = (MATCH) weight(spell:keyword1^0.0010 in 71660)
[DefaultSimilarity], result of:
8.29798E-4 = score(doc=71660,freq=2.0 = termFreq=2.0
), product of:
  6.7839865E-5 = queryWeight, product of:
0.0010 = boost
8.64913 = idf(docFreq=618, maxDocs=1299169)
0.0078435475 = queryNorm
  12.231716 = fieldWeight in 71660, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
8.64913 = idf(docFreq=618, maxDocs=1299169)
1.0 = fieldNorm(doc=71660)
  0.0318122 = (MATCH) weight(file_content:keyword1^10.0 in 71660)
[DefaultSimilarity], result of:
0.0318122 = score(doc=71660,freq=2.0 = termFreq=2.0
), product of:
  0.6720717 = queryWeight, product of:
10.0 = boost
8.568466 = idf(docFreq=670, maxDocs=1299169)
0.0078435475 = queryNorm
  0.047334533 = fieldWeight in 71660, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
8.568466 = idf(docFreq=670, maxDocs=1299169)
0.00390625 = fieldNorm(doc=71660)
0.003233097 = (MATCH) max of:
  0.003233097 = (MATCH) weight(file_content:keyword2^10.0 in 71660)
[DefaultSimilarity], result of:
0.003233097 = score(doc=71660,freq=1.0 = termFreq=1.0
), product of:
  0.25479192 = queryWeight, product of:
10.0 = boost
3.2484267 = idf(docFreq=137146, maxDocs=1299169)
0.0078435475 = queryNorm
  0.012689167 = fieldWeight in 71660, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.2484267 = idf(docFreq=137146, maxDocs=1299169)
0.00390625 = fieldNorm(doc=71660)
/str
/lst
str name=QParserExtendedDismaxQParser/str
null name=altquerystring/
null name=boost_queries/
arr name=parsed_boost_queries/
null name=boostfuncs/
arr name=filter_queries
strfile_readright:wiki-access/str/arr
arr
name=parsed_filter_queriesstrfile_readright:wiki-access/str/arr
lst name=timing
double name=time66359.0/double
lst name=prepare/lst
lst name=process
double name=time66357.0/double
lst name=query
double name=time80.0/double/lst
lst name=facet
double name=time0.0/double/lst
lst name=mlt
double name=time0.0/double/lst
lst name=highlight
double name=time65981.0/double/lst
lst name=stats
double name=time0.0/double/lst
lst name=spellcheck
double name=time38.0/double/lst
lst name=debug
double name=time258.0/double/lst
/lst
/lst

Why does the Highlighting take up this mutch time? is it a problem 

Re: Solr Shard Query From Inside Search Component Sometimes Gives Wrong Results

2014-03-03 Thread Shalin Shekhar Mangar
What was the query you are making? What is the sort order for the
query? Are you sure you are not indexing data in between making these
requests? Are you able to reproduce this outside of your search
component?

It is hard to answer questions about custom code without actually
looking at the code.

On Mon, Mar 3, 2014 at 3:37 PM, Vishnu Mishra vdil...@gmail.com wrote:
 Hi,
   I am using Solr 4.6 and  doing Solr query on shard from inside
 Solr search component and try to use the obtained results for my custom
 logic. I have used a Solrj for doing distributed search, but the result
 coming from this distributed search vary some time.  So the my questions
 are,

 1.  Can we do distributed search from Solr Search component.
 2.  Do we need to handle concurrency between Solr server by using
 synchronization or other technique.

 Is there a way to make a distributed search in the Solr Search Component and
 get the matched documents from all the shards? If anyone have Idea then help
 me.






 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Shard-Query-From-Inside-Search-Component-Sometimes-Gives-Wrong-Results-tp4120840.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread Michael Sokolov

On 3/3/2014 1:54 AM, KNitin wrote:

3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
As others have pointed out, this is really unusual for Solr.  We often 
see high permgen in our app servers due to dynamic class loading that 
the framework performs; maybe you are somehow loading lots of new Solr 
plugins, or otherwise creating lots of classes?  Of course if you have a 
plugin or something that does a lot of string interning, that could also 
be an explanation.


-Mike


Solution for reverse order of year facets?

2014-03-03 Thread Michael Lackhoff
If I understand the docs right, it is only possible to sort facets by
count or value in ascending order. Both variants are not very helpful
for year facets if I want the most recent years at the top (or appear at
all if I restrict the number of facet entries).

It looks like a requirement that was articulated repeatedly and the
recommended solution seems to be to do some math like 1 - year and
index that. So far so good. Only problem is that I have many data
sources and I would like to avoid to change every connector to include
the new field. I think a better solution would be to have a custom
TokenFilterFactory that does it.

Since it seems a common request, did someone already build such a
TokenFilterFactory? If not, do you think I could build one myself? I do
some (script-)programming but have no experience with Java, so I think I
could adapt an example. Are there any guides out there?

Or even better, is there a built-in solution I haven't heard of?

-Michael


Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Greg Walters
Josh,

You've mentioned a couple of times that you've got PermGen set to 512M but then 
you say you're running with -XX:MaxPermSize=64M. These two statements are 
contradictory so are you *sure* that you're running with 512M of PermGen? 
Assuming your on a *nix box can you provide `ps` output proving this?

Thanks,
Greg

On Feb 28, 2014, at 5:22 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi;
 
 You can also check here:
 http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled
 
 Thanks;
 Furkan KAMACI
 
 
 2014-02-26 22:35 GMT+02:00 Josh jwda...@gmail.com:
 
 Thanks Timothy,
 
 I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to cause the
 error to happen more quickly. With this option on it didn't seemed to do
 any intermittent garbage collecting that delayed the issue in with it off.
 I was already using a max of 512MB, and I can reproduce it with it set this
 high or even higher. Right now because of how we have this implemented just
 increasing it to something high just delays the problem :/
 
 Anything else you could suggest I would really appreciate.
 
 
 On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter tim.pot...@lucidworks.com
 wrote:
 
 Hi Josh,
 
 Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM
 versions, permgen collection was disabled by default.
 
 Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M may
 be too small.
 
 
 Timothy Potter
 Sr. Software Engineer, LucidWorks
 www.lucidworks.com
 
 
 From: Josh jwda...@gmail.com
 Sent: Wednesday, February 26, 2014 12:27 PM
 To: solr-user@lucene.apache.org
 Subject: Solr Permgen Exceptions when creating/removing cores
 
 We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows
 installation with 64bit Java 1.7U51 and we are seeing consistent issues
 with PermGen exceptions. We have the permgen configured to be 512MB.
 Bitnami ships with a 32bit version of Java for windows and we are
 replacing
 it with a 64bit version.
 
 Passed in Java Options:
 
 -XX:MaxPermSize=64M
 -Xms3072M
 -Xmx6144M
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+CMSClassUnloadingEnabled
 -XX:NewRatio=3
 
 -XX:MaxTenuringThreshold=8
 
 This is our use case:
 
 We have what we call a database core which remains fairly static and
 contains the imported contents of a table from SQL server. We then have
 user cores which contain the record ids of results from a text search
 outside of Solr. We then query for the data we want from the database
 core
 and limit the results to the content of the user core. This allows us to
 combine facet data from Solr with the search results from another engine.
 We are creating the user cores on demand and removing them when the user
 logs out.
 
 Our issue is the constant creation and removal of user cores combined
 with
 the constant importing seems to push us over our PermGen limit. The user
 cores are removed at the end of every session and as a test I made an
 application that would loop creating the user core, import a set of data
 to
 it, query the database core using it as a limiter and then remove the
 user
 core. My expectation was in this scenario that all the permgen associated
 with that user cores would be freed upon it's unload and allow permgen to
 reclaim that memory during a garbage collection. This was not the case,
 it
 would constantly go up until the application would exhaust the memory.
 
 I also investigated whether the there was a connection between the two
 cores left behind because I was joining them together in a query but even
 unloading the database core after unloading all the user cores won't
 prevent the limit from being hit or any memory to be garbage collected
 from
 Solr.
 
 Is this a known issue with creating and unloading a large number of
 cores?
 Could it be configuration based for the core? Is there something other
 than
 unloading that needs to happen to free the references?
 
 Thanks
 
 Notes: I've tried using tools to determine if it's a leak within Solr
 such
 as Plumbr and my activities turned up nothing.
 
 



Re: Solr 4.5.0 replication numDocs larger in slave

2014-03-03 Thread Greg Walters
I just ran into an issue similar to this that effected document scores on 
distributed searches. You might try doing an optimize and purging your deleted 
documents while no indexing is being done then checking your counts. Once I 
optimized all my indexes the document counts on all of my cores matched up and 
scoring was consistent.

Thanks,
Greg

On Feb 28, 2014, at 8:22 PM, Erick Erickson erickerick...@gmail.com wrote:

 That really shouldn't be happening IF indexing is shut off. Otherwise
 the slave is taking a snapshot of the master index and synching.
 
 bq: The slave has about 33 more documents and one fewer
 segements (according to Overview in solr admin
 
 Sounds like the master is still indexing and you've deleted documents
 on the master.
 
 Best,
 Erick
 
 
 On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank 
 frank.ge...@zoominfo.comwrote:
 
 Hi,
 
 I'm using Solr 4.5.0, I have a single master replicating to a single
 slave.  Only the master is being indexed to - never the slave.  The master
 is committed once each night.  After the first commit and replication the
 numDoc counts are identical.  After the next nightly commit and after the
 second replication a few minutes later, the numDocs has increased in both
 the master and the slave as expected, but numDocs is not the same in the
 master as it is in the slave.  The slave has about 33 more documents and
 one fewer segements (according to Overview in solr admin).
 
 I suspect the numDocs may be in sync again after tonight, but can anyone
 explain what is going on here?   Is it possible a few deletions got
 committed to the master but not replicated to the slave?
 
 Thanks
 
 Frank
 
 
 



Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Josh
It's a windows installation using a bitnami solr installer. I incorrectly
put 64M into the configuration for this, as I had copied the test
configuration I was using to recreate the permgen issue we were seeing on
our production system (that is configured to 512M) as it takes awhile with
to recreate the issue with larger permgen values. In the test scenario
there was a small 180 document data core that's static with 8 dynamic user
cores that are used to index the unique document ids in the users view,
which is then merged into a single user core. The final user core contains
the same number of document ids as the data core and the data core is
queried against with the ids in the final merged user core as the limiter.
The user cores are then unloaded, and deleted from the drive and then the
test is reran again with the user cores re-created

We are also using the core discovery mode to store/find our cores and the
database data core is using dynamic fields with a mix of single value and
multi value fields. The user cores use a static configuration. The data is
indexed from SQL Server using jtDS for both the user and data cores. As a
note we also reversed the test case I mention above where we keep the user
cores static and dynamically create the database core and this created the
same issue only it leaked faster. We assumed this because the configuration
was larger/loaded more classes then the simpler user core.

When I get the time I'm going to put together a SolrJ test app to recreate
the issue outside of our environment to see if others see the same issue
we're seeing to rule out any kind of configuration problem. Right now we're
interacting with solr with POCO via the restful interface and it's not very
easy for us to spin this off into something someone else could use. In the
mean time we've made changes to make the user cores more static, this has
slowed down the build up of permgen to something that can be managed by a
weekly reset.

Sorry about the confusion in my initial email and I appreciate the
response. Anything about my configuration that you can think might be
useful just let me know and I can provide it. We have a work around, but it
really hampers what our long term goals were for our Solr implementation.

Thanks
Josh


On Mon, Mar 3, 2014 at 9:57 AM, Greg Walters greg.walt...@answers.comwrote:

 Josh,

 You've mentioned a couple of times that you've got PermGen set to 512M but
 then you say you're running with -XX:MaxPermSize=64M. These two statements
 are contradictory so are you *sure* that you're running with 512M of
 PermGen? Assuming your on a *nix box can you provide `ps` output proving
 this?

 Thanks,
 Greg

 On Feb 28, 2014, at 5:22 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  Hi;
 
  You can also check here:
 
 http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled
 
  Thanks;
  Furkan KAMACI
 
 
  2014-02-26 22:35 GMT+02:00 Josh jwda...@gmail.com:
 
  Thanks Timothy,
 
  I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to cause
 the
  error to happen more quickly. With this option on it didn't seemed to do
  any intermittent garbage collecting that delayed the issue in with it
 off.
  I was already using a max of 512MB, and I can reproduce it with it set
 this
  high or even higher. Right now because of how we have this implemented
 just
  increasing it to something high just delays the problem :/
 
  Anything else you could suggest I would really appreciate.
 
 
  On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter tim.pot...@lucidworks.com
  wrote:
 
  Hi Josh,
 
  Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM
  versions, permgen collection was disabled by default.
 
  Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M
 may
  be too small.
 
 
  Timothy Potter
  Sr. Software Engineer, LucidWorks
  www.lucidworks.com
 
  
  From: Josh jwda...@gmail.com
  Sent: Wednesday, February 26, 2014 12:27 PM
  To: solr-user@lucene.apache.org
  Subject: Solr Permgen Exceptions when creating/removing cores
 
  We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows
  installation with 64bit Java 1.7U51 and we are seeing consistent issues
  with PermGen exceptions. We have the permgen configured to be 512MB.
  Bitnami ships with a 32bit version of Java for windows and we are
  replacing
  it with a 64bit version.
 
  Passed in Java Options:
 
  -XX:MaxPermSize=64M
  -Xms3072M
  -Xmx6144M
  -XX:+UseParNewGC
  -XX:+UseConcMarkSweepGC
  -XX:CMSInitiatingOccupancyFraction=75
  -XX:+CMSClassUnloadingEnabled
  -XX:NewRatio=3
 
  -XX:MaxTenuringThreshold=8
 
  This is our use case:
 
  We have what we call a database core which remains fairly static and
  contains the imported contents of a table from SQL server. We then have
  user cores which contain the record ids of results from a text search
  outside of Solr. We then query for the data we want from 

Re: Solution for reverse order of year facets?

2014-03-03 Thread Ahmet Arslan
Hi,

Currently there are two storing criteria available. However sort by index - to 
return the constraints sorted in their index order (lexicographic by indexed 
term) - should return most recent year at top, no?

Ahmet



On Monday, March 3, 2014 4:36 PM, Michael Lackhoff mich...@lackhoff.de wrote:
If I understand the docs right, it is only possible to sort facets by
count or value in ascending order. Both variants are not very helpful
for year facets if I want the most recent years at the top (or appear at
all if I restrict the number of facet entries).

It looks like a requirement that was articulated repeatedly and the
recommended solution seems to be to do some math like 1 - year and
index that. So far so good. Only problem is that I have many data
sources and I would like to avoid to change every connector to include
the new field. I think a better solution would be to have a custom
TokenFilterFactory that does it.

Since it seems a common request, did someone already build such a
TokenFilterFactory? If not, do you think I could build one myself? I do
some (script-)programming but have no experience with Java, so I think I
could adapt an example. Are there any guides out there?

Or even better, is there a built-in solution I haven't heard of?

-Michael



Multiple partial match

2014-03-03 Thread Zwer
Hi Guys,

Faced with a problem: make query to SOLR *name:co*^5*

It returns me two docs with equal score: {id: 1, name: 'Coca-Cola Company'},
{id: 2, name: Microsoft Corporation}.


How can I boost Coca-Cola Company because it contains more partial matches ?


P.S. All normalization used by TF-IDF engine disabled.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-partial-match-tp4120886.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solution for reverse order of year facets?

2014-03-03 Thread Michael Lackhoff
On 03.03.2014 16:33 Ahmet Arslan wrote:

 Currently there are two storing criteria available. However sort by index - 
 to return the constraints sorted in their index order (lexicographic by 
 indexed term) - should return most recent year at top, no?

No, it returns them -- as you say -- in lexicographic order and that
means oldest first, like:
1815
1820
...
2012
2013
(might well stop before we get here)
2014

-Michael


Re: Solr is NoSQL database or not?

2014-03-03 Thread Furkan KAMACI
Hi;

I said that:

What are the main differences between ElasticSearch
and Solr that makes ElasticSearc a NoSQL store but not Solr.

because it is just a marketing term as Jack indicated after me. Also I said:

The first link you provided includes ElasticSearch:
http://en.wikipedia.org/wiki/NoSQL
 as a Document Store

I mean if you can add Solr to the wikipedia page but it is not a reference.
Because these are all marketin terms as like Big Data. You should
remember the definition of Big Data: Data that is much more than you can
process with traditional methods so it is not an exactly defined
definition. One can say Big Data for something but one can not. It is
similar to NoSQL.

Thanks;
Furkan KAMACI


2014-03-03 11:28 GMT+02:00 Charlie Hull char...@flax.co.uk:

 On 01/03/2014 23:53, Jack Krupansky wrote:

 NoSQL? To me it's just a marketing term, like Big Data.

  +1

 Depends very much who you talk to. Marketing folks like to ride the
 current wave, so if NoSQL is current, they'll jump on that one, likewise
 Big Data. Technical types like to be correct in their definitions :)

 C


 --
 Charlie Hull
 Flax - Open Source Enterprise Search

 tel/fax: +44 (0)8700 118334
 mobile:  +44 (0)7767 825828
 web: www.flax.co.uk



RE: Solr 4.5.0 replication numDocs larger in slave

2014-03-03 Thread Geary, Frank
Thanks Erick.  Indexing is not happening to the slave since it has never been 
set up there - there aren't even any commits happening on the slave (which we 
normally do via cron job).  But Indexing is definitely happening to the master 
at the time replication happens.  

 Sounds like the master is still indexing and you've deleted documents on the 
master.:

Yes, that's exactly what I suspect is happening.  But if that's true, I'd like 
to understand how those deletes could find there way into being replicated to 
the slave when the only commit happening on the master was presumably completed 
before the replication.  Do deletes get committed in some special way outside 
of an explicit commit?  Or do they get copied over to the slave as part of the 
replication and therefore effectively get committed to the slave before they 
are committed to the master?

My replication is configured to replicate after commit and after startup.  The 
slave polls the master every 10 minutes.  The master commits only once a day.  
Presumably the only time the number of documents changes is at the end of the 
commit.  Then once the commit is done I'd expect replication to begin.  So in 
order to end up with a different numDocs in the slave there would need to be 
some sort of commit happening during the replication, right?

Frank

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, February 28, 2014 9:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.5.0 replication numDocs larger in slave

That really shouldn't be happening IF indexing is shut off. Otherwise the slave 
is taking a snapshot of the master index and synching.

bq: The slave has about 33 more documents and one fewer segements (according to 
Overview in solr admin

Sounds like the master is still indexing and you've deleted documents on the 
master.

Best,
Erick


On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank frank.ge...@zoominfo.comwrote:

 Hi,

 I'm using Solr 4.5.0, I have a single master replicating to a single 
 slave.  Only the master is being indexed to - never the slave.  The 
 master is committed once each night.  After the first commit and 
 replication the numDoc counts are identical.  After the next nightly 
 commit and after the second replication a few minutes later, the 
 numDocs has increased in both the master and the slave as expected, 
 but numDocs is not the same in the master as it is in the slave.  The 
 slave has about 33 more documents and one fewer segements (according to 
 Overview in solr admin).

 I suspect the numDocs may be in sync again after tonight, but can anyone
 explain what is going on here?   Is it possible a few deletions got
 committed to the master but not replicated to the slave?

 Thanks

 Frank







Re: SolrCloud: heartbeat succeeding while node has failing SSD?

2014-03-03 Thread Gregg Donovan
Thanks, Mark!

The supervised process sounds very promising but complicated to get right.
E.g. where does the supervisor run, where do nodes report their status to,
are the checks active or passive, etc.

Having each node perform a regular background self-check and remove itself
from the cluster if that healthcheck doesn't pass seems like a great first
step, though. The most common failure we've seen has been disk failure and
a self-check should usually detect that. (JIRA:
https://issues.apache.org/jira/browse/SOLR-5805)

It would also be nice, as a cluster operator, to have an easy way to remove
a failing node from the cluster. Ideally, right from the Solr UI, but even
from a command-line script would be great. In the cases of disk failure, we
can often not SSH into a node to shut down the VM that's still connected to
ZooKeeper. We have to physically power it down. Having something quicker
would be great. (JIRA: https://issues.apache.org/jira/browse/SOLR-5806)




On Sun, Mar 2, 2014 at 9:36 PM, Mark Miller markrmil...@gmail.com wrote:

 The heartbeat that keeps the node alive is the connection it maintains
 with ZooKeeper.

 We don't currently have anything built in that will actively make sure
 each node can serve queries and remove it from clusterstatem.json if it
 cannot. If a replica is maintaining it's connection with ZooKeeper and in
 most cases, if it is accepting updates, it will appear up. Load balancing
 should handle the failures, but I guess it depends on how sticky the
 request fails are.

 In the past, I've seen this handled on a different search engine by having
 a variety of external agent scripts that would occasionally attempt to do a
 query, and if things did not go right, it killed the process to cause it to
 try and startup again (supervised process).

 I'm not sure what the right long term feature for Solr is here, but feel
 free to start a JIRA issue around it.

 One simple improvement might even be a background thread that periodically
 checks some local readings and depending on the results, pulls itself out
 of the mix as best it can (remove itself from clusterstate.json or simply
 closes it's zk conneciton).

 - Mark

 http://about.me/markrmiller

 On Mar 2, 2014, at 3:42 PM, Gregg Donovan gregg...@gmail.com wrote:

  We had a brief SolrCloud outage this weekend when a node's SSD began to
  fail but the node still appeared to be up to the rest of the SolrCloud
  cluster (i.e. still green in clusterstate.json). Distributed queries that
  reached this node would fail but whatever heartbeat keeps the node in the
  clustrstate.json must have continued to succeed.
 
  We eventually had to power the node down to get it to be removed from
  clusterstate.json.
 
  This is our first foray into SolrCloud, so I'm still somewhat fuzzy on
 what
  the default heartbeat mechanism is and how we may augment it to be sure
  that the disk is checked as part of the heartbeat and/or we verify that
 it
  can serve queries.
 
  Any pointers would be appreciated.
 
  Thanks!
 
  --Gregg




Configuration problem

2014-03-03 Thread Thomas Fischer
Hello,

for some reason I have problems to get my local solr system to run (MacBook, 
tomcat 6.0.35).

The setting is
solr directories (I use different solr versions at the same time):
/srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the 
new discovery type (no cores), and inside the core directories are empty 
files core.properties and symbolic links to the universal conf directory.
 
solr webapps (I use very different webapps simultaneously):
/srv/www/webapps/solr/solr4.6.1 is the solr webapp

I tried to convey this information to the tomcat server by putting a file 
solr4.6.1.xml into the cataiina/localhost folder with the contents
?xml version=1.0 encoding=utf-8?
Context docBase=/srv/www/webapps/solr/solr4.6.1 debug=0 
crossContext=true
Environment name=solr/home type=java.lang.String 
value=/srv/solr/solr4.6.1 override=true/
/Context

The Tomcat Manager shows solr4.6.1 as started, but following the given link 
gives an error with the message:
SolrCore 'collection1' is not available due to init failure: Could not load 
config file /srv/solr4.6.1/collection1/solrconfig.xml
which is plausible, since
1. there is no folder /srv/solr4.6.1/collection1 and
2.for the actual cores solrconfig.xml is inside of 
/srv/solr4.6.1/cores/geo/conf/

But why does Tomcat try to find a solrconfig.xml there?
The problem persists if I start tomcat with 
-Dsolr.solr.home=/srv/solr/solr4.6.1, it seems that the system just ignores the 
solr home setting.

Can somebody give me a hint what I'm doing wrong?

Best regards
Thomas

P.S.: Is there a way to stop Tomcat from throwing these errors into my face 
threefold: once as heading (h1!), once as message and once as description?




Re: Solr is NoSQL database or not?

2014-03-03 Thread Jack Krupansky
For the record, I am +1 for somebody to add Solr to the NoSQL wikipedia 
page, in much the same way that Elasticsearch is already there.


From a LucidWorks webinar blurb: The long awaited Solr 4 release brings a 
large amount of new functionality that blurs the line between search engines 
and NoSQL databases. Now you can have your cake and search it too with 
Atomic updates, Versioning and Optimistic Concurrency, Durability, and 
Real-time Get! Learn about new Solr NoSQL features and implementation 
details of how the distributed indexing of Solr Cloud was designed from the 
ground up to accommodate them.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Monday, March 3, 2014 10:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr is NoSQL database or not?

Hi;

I said that:

What are the main differences between ElasticSearch
and Solr that makes ElasticSearc a NoSQL store but not Solr.

because it is just a marketing term as Jack indicated after me. Also I said:

The first link you provided includes ElasticSearch:
http://en.wikipedia.org/wiki/NoSQL
as a Document Store

I mean if you can add Solr to the wikipedia page but it is not a reference.
Because these are all marketin terms as like Big Data. You should
remember the definition of Big Data: Data that is much more than you can
process with traditional methods so it is not an exactly defined
definition. One can say Big Data for something but one can not. It is
similar to NoSQL.

Thanks;
Furkan KAMACI


2014-03-03 11:28 GMT+02:00 Charlie Hull char...@flax.co.uk:


On 01/03/2014 23:53, Jack Krupansky wrote:


NoSQL? To me it's just a marketing term, like Big Data.

 +1


Depends very much who you talk to. Marketing folks like to ride the
current wave, so if NoSQL is current, they'll jump on that one, likewise
Big Data. Technical types like to be correct in their definitions :)

C


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk





Re: Fetching uniqueKey and other int quickly from documentCache?

2014-03-03 Thread Gregg Donovan
Yonik,

That's a very clever idea. Unfortunately, I think that will skip the
distributed query optimization we were hoping to take advantage of in
SOLR-1880 [1], but it should work with the proposed distrib.singlePass
optimization in SOLR-5768 [2]. Does that sound right?

--Gregg

[1] https://issues.apache.org/jira/browse/SOLR-1880
[2] https://issues.apache.org/jira/browse/SOLR-5768


On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley yo...@heliosearch.com wrote:

 You could try forcing things to go through function queries (via
 pseudo-fields):

 fl=field(id), field(myfield)

 If you're not requesting any stored fields, that *might* currently
 skip that step.

 -Yonik
 http://heliosearch.org - native off-heap filters and fieldcache for solr


 On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan gregg...@gmail.com wrote:
  We fetch a large number of documents -- 1000+ -- for each search. Each
  request fetches only the uniqueKey or the uniqueKey plus one secondary
  integer key. Despite this, we find that we spent a sizable amount of time
  in SolrIndexSearcher#doc(int docId, SetString fields). Time is spent
  fetching the two stored fields, LZ4 decoding, etc.
 
  I would love to be able to tell Solr to always fetch these two fields
 from
  memory. We have them both in the fieldCache so we're already spending the
  RAM. I've seen this asked previously [1], so it seems like a fairly
 common
  need, especially for distributed search. Any ideas?
 
  A few possible ideas I had:
 
  --Check FieldCache.html#getCacheEntries() before going to stored fields.
  --Give the documentCache config a list of fields it should load from the
  fieldCache
 
 
  Having an in-memory mapping from docId-uniqueKey has come up for us
  before. We've used a custom SolrCache maintaining that mapping to quickly
  filter over personalized collections. Maybe the uniqueKey should be more
  optimized out of the box? Perhaps a custom uniqueKey codec that also
  maintained the docId-uniqueKey mapping in memory?
 
  --Gregg
 
  [1] http://search-lucene.com/m/oCUKJ1heHUU1



Solr Filter Cache Size

2014-03-03 Thread Benjamin Wiens
How can we calculate how much heap memory the filter cache will consume? We
understand that in order to determine a good size we also need to evaluate
how many filterqueries would be used over a certain time period.



Here's our setting:



filterCache

  class=solr.FastLRUCache

  size=30

  initialSize=30

  autowarmCount=5/



According to the post below, 53 GB of RAM would be needed just by the
filter cache alone with 1.4 million Docs. Not sure if this true and how
this would work.



Reference:
http://stackoverflow.com/questions/2004/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem



We filled the filterquery cache with Solr Meter and had a JVM Heap Size of
far less than 53 GB.



Can anyone chime in and enlighten us?



Thank you!


Ben Wiens  Benjamin Mosior


Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Tri Cao
Hey Josh,I am not an expert in Java performance, but I would start withdumping a the heapand investigatewith visualvm (the free tool that comes with JDK).In my experience, the most common cause for PermGen exception is the app createstoo manyinterned strings.Solr (actually Lucene) interns the field names so if you havetoo many fields, it might be the cause. How many fields in total across cores did youcreate before the exception?Can you reproduce the problem with the standard Solr? Is the bitnami distribution justSolr or do they have some other libraries?Hope this helps,TriOn Mar 03, 2014, at 07:28 AM, Josh jwda...@gmail.com wrote:It's a windows installation using a bitnami solr installer. I incorrectly put 64M into the configuration for this, as I had copied the test configuration I was using to recreate the permgen issue we were seeing on our production system (that is configured to 512M) as it takes awhile with to recreate the issue with larger permgen values. In the test scenario there was a small 180 document data core that's static with 8 dynamic user cores that are used to index the unique document ids in the users view, which is then merged into a single user core. The final user core contains the same number of document ids as the data core and the data core is queried against with the ids in the final merged user core as the limiter. The user cores are then unloaded, and deleted from the drive and then the test is reran again with the user cores re-created  We are also using the core discovery mode to store/find our cores and the database data core is using dynamic fields with a mix of single value and multi value fields. The user cores use a static configuration. The data is indexed from SQL Server using jtDS for both the user and data cores. As a note we also reversed the test case I mention above where we keep the user cores static and dynamically create the database core and this created the same issue only it leaked faster. We assumed this because the configuration was larger/loaded more classes then the simpler user core.  When I get the time I'm going to put together a SolrJ test app to recreate the issue outside of our environment to see if others see the same issue we're seeing to rule out any kind of configuration problem. Right now we're interacting with solr with POCO via the restful interface and it's not very easy for us to spin this off into something someone else could use. In the mean time we've made changes to make the user cores more static, this has slowed down the build up of permgen to something that can be managed by a weekly reset.  Sorry about the confusion in my initial email and I appreciate the response. Anything about my configuration that you can think might be useful just let me know and I can provide it. We have a work around, but it really hampers what our long term goals were for our Solr implementation.  Thanks Josh   On Mon, Mar 3, 2014 at 9:57 AM, Greg Walters greg.walt...@answers.comwrote: Josh,You've mentioned a couple of times that you've got PermGen set to 512M butthen you say you're running with -XX:MaxPermSize=64M. These two statementsare contradictory so are you *sure* that you're running with 512M ofPermGen? Assuming your on a *nix box can you provide `ps` output provingthis?Thanks,GregOn Feb 28, 2014, at 5:22 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; You can also check here:http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled Thanks; Furkan KAMACI 2014-02-26 22:35 GMT+02:00 Josh jwda...@gmail.com: Thanks Timothy, I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to causethe error to happen more quickly. With this option on it didn't seemed to do any intermittent garbage collecting that delayed the issue in with itoff. I was already using a max of 512MB, and I can reproduce it with it setthis high or even higher. Right now because of how we have this implementedjust increasing it to something high just delays the problem :/ Anything else you could suggest I would really appreciate. On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter tim.pot...@lucidworks.com wrote: Hi Josh, Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM versions, permgen collection was disabled by default. Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64Mmay be too small. Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com  From: Josh jwda...@gmail.com Sent: Wednesday, February 26, 2014 12:27 PM To: solr-user@lucene.apache.org Subject: Solr Permgen Exceptions when creating/removing cores We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows installation with 64bit Java 1.7U51 and we are seeing consistent issues with PermGen exceptions. We have the permgen configured to be 512MB. Bitnami ships with a 32bit version of Java for windows and we are replacing it with a 64bit version. Passed in Java Options: -XX:MaxPermSize=64M 

Re: Multiple partial match

2014-03-03 Thread Jack Krupansky

Add a function query boost that uses the term frequency, tf:

bf=tf(name,'co')  -- additive boost

boost=tf(name,'co')  -- multiplicative boost

That does of course require that term frequency is not disabled for that 
field in the schema.


You can multiply the term frequency as well in the function query.

boost=product(tf(name,'co'),10)

-- Jack Krupansky

-Original Message- 
From: Zwer

Sent: Monday, March 3, 2014 10:34 AM
To: solr-user@lucene.apache.org
Subject: Multiple partial match

Hi Guys,

Faced with a problem: make query to SOLR *name:co*^5*

It returns me two docs with equal score: {id: 1, name: 'Coca-Cola Company'},
{id: 2, name: Microsoft Corporation}.


How can I boost Coca-Cola Company because it contains more partial matches ?


P.S. All normalization used by TF-IDF engine disabled.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-partial-match-tp4120886.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Fetching uniqueKey and other int quickly from documentCache?

2014-03-03 Thread Yonik Seeley
On Mon, Mar 3, 2014 at 11:14 AM, Gregg Donovan gregg...@gmail.com wrote:
 Yonik,

 That's a very clever idea. Unfortunately, I think that will skip the
 distributed query optimization we were hoping to take advantage of in
 SOLR-1880 [1], but it should work with the proposed distrib.singlePass
 optimization in SOLR-5768 [2]. Does that sound right?


Yep, the two together should do the trick.

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr


 --Gregg

 [1] https://issues.apache.org/jira/browse/SOLR-1880
 [2] https://issues.apache.org/jira/browse/SOLR-5768


 On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley yo...@heliosearch.com wrote:

 You could try forcing things to go through function queries (via
 pseudo-fields):

 fl=field(id), field(myfield)

 If you're not requesting any stored fields, that *might* currently
 skip that step.

 -Yonik
 http://heliosearch.org - native off-heap filters and fieldcache for solr


 On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan gregg...@gmail.com wrote:
  We fetch a large number of documents -- 1000+ -- for each search. Each
  request fetches only the uniqueKey or the uniqueKey plus one secondary
  integer key. Despite this, we find that we spent a sizable amount of time
  in SolrIndexSearcher#doc(int docId, SetString fields). Time is spent
  fetching the two stored fields, LZ4 decoding, etc.
 
  I would love to be able to tell Solr to always fetch these two fields
 from
  memory. We have them both in the fieldCache so we're already spending the
  RAM. I've seen this asked previously [1], so it seems like a fairly
 common
  need, especially for distributed search. Any ideas?
 
  A few possible ideas I had:
 
  --Check FieldCache.html#getCacheEntries() before going to stored fields.
  --Give the documentCache config a list of fields it should load from the
  fieldCache
 
 
  Having an in-memory mapping from docId-uniqueKey has come up for us
  before. We've used a custom SolrCache maintaining that mapping to quickly
  filter over personalized collections. Maybe the uniqueKey should be more
  optimized out of the box? Perhaps a custom uniqueKey codec that also
  maintained the docId-uniqueKey mapping in memory?
 
  --Gregg
 
  [1] http://search-lucene.com/m/oCUKJ1heHUU1



Re: Multiple partial match

2014-03-03 Thread Zwer
AFAICS tf(name, 'co') returns 0 on the {id:1, name:'Coca-Cola Company'}
because it does not support partial match. 
tf(name, 'company') will return 1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-partial-match-tp4120886p4120919.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Josh
In the user core there are two fields, the database core in question was
40, but in production environments the database core is dynamic. My time
has been pretty crazy trying to get this out the door and we haven't tried
a standard solr install yet but it's on my plate for the test app and I
don't know enough about Solr/Bitnami to know if they've done any serious
modifications to it.

I had tried doing a dump from VisualVM previously but it didn't seem to
give me anything useful but then again I didn't know how to look for
interned strings. This is something I can take another look at in the
coming weeks when I do my test case against a standard solr install with
SolrJ. The exception with user cores happens after 80'ish runs, so 640'ish
user cores with the PermGen set to 64MB. The database core test was far
lower, it was in the 10-15 range. As a note once the permgen limit is hit,
if we simply restart the service with the same number of cores loaded the
permgen usage is minimal even with the amount of user cores being high in
our production environment (500-600).

If this does end up being the interning of strings, is there anyway it can
be mitigated? Our production environment for our heavier users would see in
the range of 3200+ user cores created a day.

Thanks for the help.
Josh


On Mon, Mar 3, 2014 at 11:24 AM, Tri Cao tm...@me.com wrote:

 Hey Josh,

 I am not an expert in Java performance, but I would start with  dumping a
 the heap
 and investigate with visualvm (the free tool that comes with JDK).

 In my experience, the most common cause for PermGen exception is the app
 creates
 too many interned strings. Solr (actually Lucene) interns the field names
 so if you have
 too many fields, it might be the cause. How many fields in total across
 cores did you
 create before the exception?

 Can you reproduce the problem with the standard Solr? Is the bitnami
 distribution just
 Solr or do they have some other libraries?

 Hope this helps,
 Tri

 On Mar 03, 2014, at 07:28 AM, Josh jwda...@gmail.com wrote:

 It's a windows installation using a bitnami solr installer. I incorrectly
 put 64M into the configuration for this, as I had copied the test
 configuration I was using to recreate the permgen issue we were seeing on
 our production system (that is configured to 512M) as it takes awhile with
 to recreate the issue with larger permgen values. In the test scenario
 there was a small 180 document data core that's static with 8 dynamic user
 cores that are used to index the unique document ids in the users view,
 which is then merged into a single user core. The final user core contains
 the same number of document ids as the data core and the data core is
 queried against with the ids in the final merged user core as the limiter.
 The user cores are then unloaded, and deleted from the drive and then the
 test is reran again with the user cores re-created

 We are also using the core discovery mode to store/find our cores and the
 database data core is using dynamic fields with a mix of single value and
 multi value fields. The user cores use a static configuration. The data is
 indexed from SQL Server using jtDS for both the user and data cores. As a
 note we also reversed the test case I mention above where we keep the user
 cores static and dynamically create the database core and this created the
 same issue only it leaked faster. We assumed this because the configuration
 was larger/loaded more classes then the simpler user core.

 When I get the time I'm going to put together a SolrJ test app to recreate
 the issue outside of our environment to see if others see the same issue
 we're seeing to rule out any kind of configuration problem. Right now we're
 interacting with solr with POCO via the restful interface and it's not very
 easy for us to spin this off into something someone else could use. In the
 mean time we've made changes to make the user cores more static, this has
 slowed down the build up of permgen to something that can be managed by a
 weekly reset.

 Sorry about the confusion in my initial email and I appreciate the
 response. Anything about my configuration that you can think might be
 useful just let me know and I can provide it. We have a work around, but it
 really hampers what our long term goals were for our Solr implementation.

 Thanks
 Josh


 On Mon, Mar 3, 2014 at 9:57 AM, Greg Walters greg.walt...@answers.com
 wrote:

 Josh,

 You've mentioned a couple of times that you've got PermGen set to 512M but

 then you say you're running with -XX:MaxPermSize=64M. These two statements

 are contradictory so are you *sure* that you're running with 512M of

 PermGen? Assuming your on a *nix box can you provide `ps` output proving

 this?

 Thanks,

 Greg

 On Feb 28, 2014, at 5:22 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  Hi;

 

  You can also check here:

 


 http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled

 

  Thanks;

RE: Solr 4.5.0 replication numDocs larger in slave

2014-03-03 Thread Geary, Frank
Thanks Greg.  We optimize the master once a week (early in the day Sunday) and 
we do not do a commit Sunday evening (the only evening of the week when we do 
not commit).  So now after optimization/replication the master/slave pair that 
were out on sync on Friday now have the same numDocs (and every other value on 
the Overview page agrees except size under Replication where it shows the 
slave is smaller).  Unfortunately, a different master/slave pair now have 
different numDocs after the optimize and replication done yesterday.  

For the newly out of sync master/slave pair, the Version (Under Statistics on 
the Overview page) is 4 revisions earlier on the slave than on the master and 
there are two fewer segments on the slave than there are on the master.   Under 
Replication on the Overview page, the Versions and Gen's are all the same, but 
the size of the slave is smaller than the master.  The slave has 51 fewer 
documents than the master.   But indexing is continuing on the master (but no 
commit has happened since the optimization early Sunday.)

I wonder if this is related to the NRT functionality in some way.  I see Impl: 
org.apache.solr.core.NRTCachingDirectoryFactory on the Overview page.  I've 
been trying to rely on default behavior whenever possible.  But perhaps I need 
to turn something off? 

Frank

-Original Message-
From: Greg Walters [mailto:greg.walt...@answers.com] 
Sent: Monday, March 03, 2014 10:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.5.0 replication numDocs larger in slave

I just ran into an issue similar to this that effected document scores on 
distributed searches. You might try doing an optimize and purging your deleted 
documents while no indexing is being done then checking your counts. Once I 
optimized all my indexes the document counts on all of my cores matched up and 
scoring was consistent.

Thanks,
Greg

On Feb 28, 2014, at 8:22 PM, Erick Erickson erickerick...@gmail.com wrote:

 That really shouldn't be happening IF indexing is shut off. Otherwise 
 the slave is taking a snapshot of the master index and synching.
 
 bq: The slave has about 33 more documents and one fewer segements 
 (according to Overview in solr admin
 
 Sounds like the master is still indexing and you've deleted documents 
 on the master.
 
 Best,
 Erick
 
 
 On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank 
 frank.ge...@zoominfo.comwrote:
 
 Hi,
 
 I'm using Solr 4.5.0, I have a single master replicating to a single 
 slave.  Only the master is being indexed to - never the slave.  The 
 master is committed once each night.  After the first commit and 
 replication the numDoc counts are identical.  After the next nightly 
 commit and after the second replication a few minutes later, the 
 numDocs has increased in both the master and the slave as expected, 
 but numDocs is not the same in the master as it is in the slave.  The 
 slave has about 33 more documents and one fewer segements (according to 
 Overview in solr admin).
 
 I suspect the numDocs may be in sync again after tonight, but can anyone
 explain what is going on here?   Is it possible a few deletions got
 committed to the master but not replicated to the slave?
 
 Thanks
 
 Frank
 
 
 



Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Tri Cao
If it's really the interned strings, you could try upgrade JDK, as the newer HotSpotJVM puts interned strings in regular heap:http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html(search for String.intern() in that release)I haven't got a chance to look into the new core auto discovery code, so I don't knowif it's implemented with reflection or not. Reflection and dynamic class loading is anothersource of PermGen exception, in my experience.I don't see anything wrong with your JVM config, which is very much standard.Hope this helps,TriOn Mar 03, 2014, at 08:52 AM, Josh jwda...@gmail.com wrote:In the user core there are two fields, the database core in question was 40, but in production environments the database core is dynamic. My time has been pretty crazy trying to get this out the door and we haven't tried a standard solr install yet but it's on my plate for the test app and I don't know enough about Solr/Bitnami to know if they've done any serious modifications to it.  I had tried doing a dump from VisualVM previously but it didn't seem to give me anything useful but then again I didn't know how to look for interned strings. This is something I can take another look at in the coming weeks when I do my test case against a standard solr install with SolrJ. The exception with user cores happens after 80'ish runs, so 640'ish user cores with the PermGen set to 64MB. The database core test was far lower, it was in the 10-15 range. As a note once the permgen limit is hit, if we simply restart the service with the same number of cores loaded the permgen usage is minimal even with the amount of user cores being high in our production environment (500-600).  If this does end up being the interning of strings, is there anyway it can be mitigated? Our production environment for our heavier users would see in the range of 3200+ user cores created a day.  Thanks for the help. Josh   On Mon, Mar 3, 2014 at 11:24 AM, Tri Cao tm...@me.com wrote: Hey Josh,I am not an expert in Java performance, but I would start with dumping athe heapand investigate with visualvm (the free tool that comes with JDK).In my experience, the most common cause for PermGen exception is the appcreatestoo many interned strings. Solr (actually Lucene) interns the field namesso if you havetoo many fields, it might be the cause. How many fields in total acrosscores did youcreate before the exception?Can you reproduce the problem with the standard Solr? Is the bitnamidistribution justSolr or do they have some other libraries?Hope this helps,TriOn Mar 03, 2014, at 07:28 AM, Josh jwda...@gmail.com wrote:It's a windows installation using a bitnami solr installer. I incorrectlyput 64M into the configuration for this, as I had copied the testconfiguration I was using to recreate the permgen issue we were seeing onour production system (that is configured to 512M) as it takes awhile withto recreate the issue with larger permgen values. In the test scenariothere was a small 180 document data core that's static with 8 dynamic usercores that are used to index the unique document ids in the users view,which is then merged into a single user core. The final user core containsthe same number of document ids as the data core and the data core isqueried against with the ids in the final merged user core as the limiter.The user cores are then unloaded, and deleted from the drive and then thetest is reran again with the user cores re-createdWe are also using the core discovery mode to store/find our cores and thedatabase data core is using dynamic fields with a mix of single value andmulti value fields. The user cores use a static configuration. The data isindexed from SQL Server using jtDS for both the user and data cores. As anote we also reversed the test case I mention above where we keep the usercores static and dynamically create the database core and this created thesame issue only it leaked faster. We assumed this because the configurationwas larger/loaded more classes then the simpler user core.When I get the time I'm going to put together a SolrJ test app to recreatethe issue outside of our environment to see if others see the same issuewe're seeing to rule out any kind of configuration problem. Right now we'reinteracting with solr with POCO via the restful interface and it's not veryeasy for us to spin this off into something someone else could use. In themean time we've made changes to make the user cores more static, this hasslowed down the build up of permgen to something that can be managed by aweekly reset.Sorry about the confusion in my initial email and I appreciate theresponse. Anything about my configuration that you can think might beuseful just let me know and I can provide it. We have a work around, but itreally hampers what our long term goals were for our Solr implementation.ThanksJoshOn Mon, Mar 3, 2014 at 9:57 AM, Greg Walters greg.walt...@answers.comwrote:Josh,You've mentioned a couple of times that you've got PermGen 

RE: How to best handle search like Dave David

2014-03-03 Thread Susheel Kumar
Thanks, Arun for sharing the idea on EdgeNGramFilter. In our case we are doing 
search using automated process so may EdgeNGramFilter may not work.  Wwe have 
used NGramFilterFactory in the past but will look into it again.

For cases like Dave  David and other English names does anyone has  idea which 
stemmer (currently using PorterStemFilterFactory) can work better? 

-Original Message-
From: Arun Rangarajan [mailto:arunrangara...@gmail.com] 
Sent: Sunday, March 02, 2014 1:47 PM
To: solr-user@lucene.apache.org
Subject: Re: How to best handle search like Dave  David

If you are trying to serve results as users are typing, then you can use 
EdgeNGramFilter (see 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
).

Let's say you configure your field like this, as shown in the Solr wiki:

fieldType name=text_general_edge_ngram class=solr.TextField
positionIncrementGap=100
   analyzer type=index
  tokenizer class=solr.LowerCaseTokenizerFactory/
  filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=15 side=front/
   /analyzer
   analyzer type=query
  tokenizer class=solr.LowerCaseTokenizerFactory/
   /analyzer
/fieldType

Then this is what happens at index time for your tokens:

David --- | LowerCaseTokenizerFactory | --- david --- | 
EdgeNGramFilterFactory
| --- da dav davi david
Dave --- | LowerCaseTokenizerFactory | --- dave --- | EdgeNGramFilterFactory
| --- da dav dave

And at query time, when your user enters 'Dav' it will match both those tokens. 
Note that the moment your user starts typing more, say 'davi' it won't match 
'Dave' since you are doing edge N gramming only at index time and not at query 
time. You can also do edge N gramming at query time if you want 'Dave' to match 
'David', probably keeping a larger minGramSize (in this case 3) to avoid noise 
(like say 'Dave' matching 'Dana' though with a lower score), but it will be 
expensive to do n-gramming at query time.




On Fri, Feb 28, 2014 at 3:22 PM, Susheel Kumar  
susheel.ku...@thedigitalgroup.net wrote:

 Hi,

 We have name searches on Solr for millions of documents. User may 
 search like Morrison Dave or other may search like Morrison David.  
 What's the best way to handle that both brings similar results. Adding 
 Synonym is the option we are using right.

 But we may need to add around such 50,000+ synonyms for different 
 names for each specific name there can be couple of synonyms like for 
 Richard, it can be Rich, Rick, Richie etc.

 Any experience adding so many synonyms or any other thoughts? Stemming 
 may help in few situations but not like Dave and David.

 Thanks,
 Susheel



RegexTransformer and xpath in DataImportHandler

2014-03-03 Thread eShard
Good afternoon,
I have this DIH:
?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource type=URLDataSource /
document
entity name=blogFeed
pk=id
url=https://redacted/;
processor=XPathEntityProcessor
forEach=/rss/channel/item
   
transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer

field column=id xpath=/rss/channel/item/id /
field column=link xpath=/rss/channel/item/link /
field column=blogtitle xpath=/rss/channel/item/title /
field column=short_blogtitle xpath=/rss/channel/item/title 
/
field column=short_blogtitle regex=^(.{250})([^\.]*\.)(.*)$
replaceWith=$1 sourceColName=blogtitle /
field column=pubdateiso xpath=/rss/channel/item/pubDate
dateTimeFormat=-MM-dd /
field column=category xpath=/rss/channel/item/category /
field column=author xpath=/rss/channel/item/author /
field column=authoremail 
xpath=/rss/channel/item/authoremail /
field column=content xpath=/rss/channel/item/content /
field column=summary xpath=/rss/channel/item/summary /
field column=index_category template=ConnectionsBlogs/

/entity
/document
/dataConfig

I can't seem to populate BOTH blogtitle and short_blogtitle with the same
xpath.
I can only do one or the other; why can't I put the same xpath in 2
different fields?
I removed the short_blogtitle (with the xpath statement) and left in the
regex statement and blogtitle gets populated and short_blogtitle goes to my
update.chain (to the auto complete index) but the field itself is blank in
this index.

If I leave the dih as above, then blogtitle doesn't get populated but
short_blogtitle does.

What am I doing wrong here? Is there a way to populate both? 
And I CANNOT use copyfield here because then the update.chain won't work

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/RegexTransformer-and-xpath-in-DataImportHandler-tp4120946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facets, termvectors, relevancy and Multi word tokenizing

2014-03-03 Thread epnRui
Hi guys,

I'm on my way to solve it properly.

This is how my field looks like now:


fieldType name=text_en class=solr.TextField positionIncrementGap=100
  analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=(#)|(%23) replacement=79f20724d6985c5b857d2fa06a3ff8c6/
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=(((?i)((european parliament)|(parlament europeenne)))|(EP)|(PE))
replacement=0ee062d61f44ae0a2aee145076ca6a69european_parliament/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.StopFilterFactory words=blacklist.txt
ignoreCase=true/
filter class=solr.StopFilterFactory words=en
ignoreCase=true/
filter class=solr.HunspellStemFilterFactory
dictionary=en_GB.dic affix=en_GB.aff ignoreCase=true /
filter class=solr.PatternReplaceFilterFactory
pattern=0ee062d61f44ae0a2aee145076ca6a69european_parliament
replacement=european parliament replace=all /
filter class=solr.PatternReplaceFilterFactory
pattern=79f20724d6985c5b857d2fa06a3ff8c6 replacement=# replace=all /
  /analyzer

I still have one case where I'm facing issues because in fact I want to
preserve the #:
 - #European Parliament is translated into one token instead of two:
#European and Parliament... anyway, I have some ideas on how to do it.
Ill let you know whatss the final solution



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120948.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Elevation and core create

2014-03-03 Thread David Stuart
HI Erick,

Thanks for the response. 
On the wiki it states

config-file
Path to the file that defines query elevation. This file must exist in 
$instanceDir/conf/config-file or$dataDir/config-file. 

If the file exists in the /conf/ directory it will be loaded once at startup. 
If it exists in the data directory, it will be reloaded for each IndexReader.

Which is the elevate.xml. So looks like I will go down the custom coding route.

Regards,


David Stuart
M  +44(0) 778 854 2157
T   +44(0) 845 519 5465
www.axistwelve.com
Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK

AXIS12 - Enterprise Web Solutions

Reg Company No. 7215135
VAT No. 997 4801 60

This e-mail is strictly confidential and intended solely for the ordinary user 
of the e-mail account to which it is addressed. If you have received this 
e-mail in error please inform Axis12 immediately by return e-mail or telephone. 
We advise that in keeping with good computing practice the recipient of this 
e-mail should ensure that it is virus free. We do not accept any responsibility 
for any loss or damage that may arise from the use of this email or its 
contents.



On 2 Mar 2014, at 18:07, Erick Erickson erickerick...@gmail.com wrote:

 Hmmm, you _ought_ to be able to specify a relative path
 in str name=confFilessolrconfig_slave.xml:solrconfig.xml,x.xml,y.xml/str
 
 But there's certainly the chance that this is hard-coded in
 the query elevation component so I can't say that this'll work
 with assurance.
 
 Best,
 Erick
 
 On Sun, Mar 2, 2014 at 6:14 AM, David Stuart d...@axistwelve.com wrote:
 Hi sorry for the cross post but I got no response in the dev group so 
 assumed I posted in the wrong place.
 
 
 
 I am using Solr 3.6 and am trying to automate the deployment of cores with a 
 custom elevate file. It is proving to be difficult as most of the file 
 (schema, stop words etc) support absolute path elevate seems to need to be 
 in either a conf directory as a sibling to data or in the data directory 
 itself. I am able to achieve my goal by having a secondary process that 
 places the file but thought I would as the group just in case I have missed 
 the obvious. Should I move to Solr 4 is it fixed here? I could also go down 
 the root of extending the SolrCore create function to accept additional 
 params and move the file into the defined data directory.
 
 Ideas?
 
 Thanks for your help
 David Stuart
 M  +44(0) 778 854 2157
 T   +44(0) 845 519 5465
 www.axistwelve.com
 Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK
 
 AXIS12 - Enterprise Web Solutions
 
 Reg Company No. 7215135
 VAT No. 997 4801 60
 
 This e-mail is strictly confidential and intended solely for the ordinary 
 user of the e-mail account to which it is addressed. If you have received 
 this e-mail in error please inform Axis12 immediately by return e-mail or 
 telephone. We advise that in keeping with good computing practice the 
 recipient of this e-mail should ensure that it is virus free. We do not 
 accept any responsibility for any loss or damage that may arise from the use 
 of this email or its contents.
 
 
 



Re: range types in SOLR

2014-03-03 Thread Smiley, David W.
The main reference for this approach is here:
http://wiki.apache.org/solr/SpatialForTimeDurations


Hoss’s illustrations he developed for the meetup presentation are great.
However, there are bugs in the instruction — specifically it’s important
to slightly buffer the query and choose an appropriate maxDistErr.  Also,
it’s more preferable to use the rectangle range query style of spatial
query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using
“Intersects(minX minY maxX maxY)”.  There’s no technical difference but
the latter is deprecated and will eventually be removed from Solr 5 /
trunk.

All this said, recognize this is a bit of a hack (one that works well).
There is a good chance a more ideal implementation approach is going to be
developed this year.

~ David


On 3/1/14, 2:54 PM, Shawn Heisey s...@elyograg.org wrote:

On 3/1/2014 11:41 AM, Thomas Scheffler wrote:
 Am 01.03.14 18:24, schrieb Erick Erickson:
 I'm not clear what you're really after here.

 Solr certainly supports ranges, things like time:[* TO date_spec] or
 date_field:[date_spec TO date_spec] etc.


 There's also a really creative use of spatial (of all things) to, say
 answer questions involving multiple dates per record. Imagine, for
 instance, employees with different hours on different days. You can
 use spatial to answer questions like which employees are available
 on Wednesday between 4PM and 8PM.

 And if none of this is relevant, how about you give us some
 use-cases? This could well be an XY problem.
 
 Hi,
 
 lets try this example to show the problem. You have some old text that
 was written in two periods of time:
 
 1.) 2nd half of 13th century: - 1250-1299
 2.) Beginning of 18th century: - 1700-1715
 
 You are searching for text that were written between 1300-1699, than
 this document described above should not be hit.
 
 If you make start date and end date multiple this results in:
 
 start: [1250, 1700]
 end: [1299, 1715]
 
 A search for documents written between 1300-1699 would be:
 
 (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300
 TO *]) (+start:[*-1699] +end:[1700 TO *])
 
 You see that the document above would obviously hit by (+start:[* TO
 1300] +end:[1300 TO *])

This sounds exactly like the spatial use case that Erick just described.

http://wiki.apache.org/solr/SpatialForTimeDurations
https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117
/

I am not sure whether the following presentation covers time series with
spatial, but it does say deep dive.  It's over an hour long, and done by
David Smiley, who wrote most of the Spatial code in Solr:

http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive

Hopefully someone who has actually used this can hop in and give you
some additional pointers.

Thanks,
Shawn




Re: Solution for reverse order of year facets?

2014-03-03 Thread Ahmet Arslan
Hi Michael,

Yes you are correct, oldest comes fist. 

There is no built in solution for this.

Two workaround :

1) use facet.limit=-1 and invert the list (faceting response) at client side

2) use multiples facet.query
   a)facet.query=year:[2012 TO 2014]facet.query=year:[2010 TO 2012] 
   b)facet.query=year:2014facet.query=year:2013 ...



On Monday, March 3, 2014 5:45 PM, Michael Lackhoff mich...@lackhoff.de wrote:
On 03.03.2014 16:33 Ahmet Arslan wrote:

 Currently there are two storing criteria available. However sort by index - 
 to return the constraints sorted in their index order (lexicographic by 
 indexed term) - should return most recent year at top, no?

No, it returns them -- as you say -- in lexicographic order and that
means oldest first, like:
1815
1820
...
2012
2013
(might well stop before we get here)

2014

-Michael



Re: Solr Permgen Exceptions when creating/removing cores

2014-03-03 Thread Josh
Thanks Tri,

I really appreciate the response. When I get some free time shortly I'll
start giving some of these a try and report back.


On Mon, Mar 3, 2014 at 12:42 PM, Tri Cao tm...@me.com wrote:

 If it's really the interned strings, you could try upgrade JDK, as the
 newer HotSpot
 JVM puts interned strings in regular heap:

 http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html

 http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html(search
 for String.intern() in that release)

 I haven't got a chance to look into the new core auto discovery code, so I
 don't know
 if it's implemented with reflection or not. Reflection and dynamic class
 loading is another
 source of PermGen exception, in my experience.

 I don't see anything wrong with your JVM config, which is very much
 standard.

 Hope this helps,
 Tri


 On Mar 03, 2014, at 08:52 AM, Josh jwda...@gmail.com wrote:

 In the user core there are two fields, the database core in question was
 40, but in production environments the database core is dynamic. My time
 has been pretty crazy trying to get this out the door and we haven't tried
 a standard solr install yet but it's on my plate for the test app and I
 don't know enough about Solr/Bitnami to know if they've done any serious
 modifications to it.

 I had tried doing a dump from VisualVM previously but it didn't seem to
 give me anything useful but then again I didn't know how to look for
 interned strings. This is something I can take another look at in the
 coming weeks when I do my test case against a standard solr install with
 SolrJ. The exception with user cores happens after 80'ish runs, so 640'ish
 user cores with the PermGen set to 64MB. The database core test was far
 lower, it was in the 10-15 range. As a note once the permgen limit is hit,
 if we simply restart the service with the same number of cores loaded the
 permgen usage is minimal even with the amount of user cores being high in
 our production environment (500-600).

 If this does end up being the interning of strings, is there anyway it can
 be mitigated? Our production environment for our heavier users would see in
 the range of 3200+ user cores created a day.

 Thanks for the help.
 Josh


 On Mon, Mar 3, 2014 at 11:24 AM, Tri Cao tm...@me.com wrote:

 Hey Josh,

 I am not an expert in Java performance, but I would start with dumping a

 the heap

 and investigate with visualvm (the free tool that comes with JDK).

 In my experience, the most common cause for PermGen exception is the app

 creates

 too many interned strings. Solr (actually Lucene) interns the field names

 so if you have

 too many fields, it might be the cause. How many fields in total across

 cores did you

 create before the exception?

 Can you reproduce the problem with the standard Solr? Is the bitnami

 distribution just

 Solr or do they have some other libraries?

 Hope this helps,

 Tri

 On Mar 03, 2014, at 07:28 AM, Josh jwda...@gmail.com wrote:

 It's a windows installation using a bitnami solr installer. I incorrectly

 put 64M into the configuration for this, as I had copied the test

 configuration I was using to recreate the permgen issue we were seeing on

 our production system (that is configured to 512M) as it takes awhile with

 to recreate the issue with larger permgen values. In the test scenario

 there was a small 180 document data core that's static with 8 dynamic user

 cores that are used to index the unique document ids in the users view,

 which is then merged into a single user core. The final user core contains

 the same number of document ids as the data core and the data core is

 queried against with the ids in the final merged user core as the limiter.

 The user cores are then unloaded, and deleted from the drive and then the

 test is reran again with the user cores re-created

 We are also using the core discovery mode to store/find our cores and the

 database data core is using dynamic fields with a mix of single value and

 multi value fields. The user cores use a static configuration. The data is

 indexed from SQL Server using jtDS for both the user and data cores. As a

 note we also reversed the test case I mention above where we keep the user

 cores static and dynamically create the database core and this created the

 same issue only it leaked faster. We assumed this because the configuration

 was larger/loaded more classes then the simpler user core.

 When I get the time I'm going to put together a SolrJ test app to recreate

 the issue outside of our environment to see if others see the same issue

 we're seeing to rule out any kind of configuration problem. Right now we're

 interacting with solr with POCO via the restful interface and it's not very

 easy for us to spin this off into something someone else could use. In the

 mean time we've made changes to make the user cores more static, this has

 slowed down the build up of permgen to something that can 

Re: Solution for reverse order of year facets?

2014-03-03 Thread Shawn Heisey

On 3/3/2014 7:35 AM, Michael Lackhoff wrote:

If I understand the docs right, it is only possible to sort facets by
count or value in ascending order. Both variants are not very helpful
for year facets if I want the most recent years at the top (or appear at
all if I restrict the number of facet entries).


There's already an issue in Jira.

https://issues.apache.org/jira/browse/SOLR-1672

I can't take a look now, but I will later if someone else hasn't taken 
it up.


Thanks,
Shawn



Re: Solution for reverse order of year facets?

2014-03-03 Thread Michael Lackhoff
Hi Ahmet,

 There is no built in solution for this.

Yes, I know, that's why I would like the TokenFilterFactory

 Two workaround :
 
 1) use facet.limit=-1 and invert the list (faceting response) at client side
 
 2) use multiples facet.query
a)facet.query=year:[2012 TO 2014]facet.query=year:[2010 TO 2012] 
b)facet.query=year:2014facet.query=year:2013 ...

I thought about these but they have the disadvantage that 1) could
return hundreds of facet entries. 2b) is better but would need about 30
facet-queries which makes quite a long URL and it wouldn't always work
as expected. There are subjects that were very popular in the past but
with no (or very few) recent publications. For these I would get empty
results for my 2014-1985 facet-queries but miss all the stuff from the
1960s.

From all these thoughts I came to the conclusion that a custom
TokenFilterFactory could do exactly what I want. In effect it would give
me a reverse sort:
1 - 2014 = 7986
1 - 2013 = 7987
...
The client code can easily regain the original year values for display.

And I think it shouldn't be too difficult to write such a beast, only
problem is I am not a Java programmer. That is why I asked if someone
has done it already or if there is a guide I could use.
After all it is just a simple subtraction...

-Michael



Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread KNitin
Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud
(containing custom parsing, analyzers). But I haven't specifically enabled
any string interning. Does solr intern all strings in a collection by
default?

I agree with doc and Filter Query Cache. Query Result cache hits are
practically 0 for the large collection since our queries are tail by nature


Thanks
Nitin


On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov 
msoko...@safaribooksonline.com wrote:

 On 3/3/2014 1:54 AM, KNitin wrote:

 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)

 As others have pointed out, this is really unusual for Solr.  We often see
 high permgen in our app servers due to dynamic class loading that the
 framework performs; maybe you are somehow loading lots of new Solr plugins,
 or otherwise creating lots of classes?  Of course if you have a plugin or
 something that does a lot of string interning, that could also be an
 explanation.

 -Mike



Re: Solution for reverse order of year facets?

2014-03-03 Thread Michael Lackhoff
On 03.03.2014 19:58 Shawn Heisey wrote:

 There's already an issue in Jira.
 
 https://issues.apache.org/jira/browse/SOLR-1672

Thanks, this is of course the best solution. Only problem is that I use
a custom verson from a vendor (based on version 4.3) I want to enhance.
But perhaps they apply the patch. In the meantime I still think the
custom filter could be a workaround.

 I can't take a look now, but I will later if someone else hasn't taken 
 it up.

That would be great!

Thanks
-Michael



Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread KNitin
Is there a way to dump the contents of permgen and look at which classes
are occupying the most memory in that?

- Nitin


On Mon, Mar 3, 2014 at 11:19 AM, KNitin nitin.t...@gmail.com wrote:

 Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud
 (containing custom parsing, analyzers). But I haven't specifically enabled
 any string interning. Does solr intern all strings in a collection by
 default?

 I agree with doc and Filter Query Cache. Query Result cache hits are
 practically 0 for the large collection since our queries are tail by nature


 Thanks
 Nitin


 On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov 
 msoko...@safaribooksonline.com wrote:

 On 3/3/2014 1:54 AM, KNitin wrote:

 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)

 As others have pointed out, this is really unusual for Solr.  We often
 see high permgen in our app servers due to dynamic class loading that the
 framework performs; maybe you are somehow loading lots of new Solr plugins,
 or otherwise creating lots of classes?  Of course if you have a plugin or
 something that does a lot of string interning, that could also be an
 explanation.

 -Mike





SOLR and Kerberos enabled HDFS

2014-03-03 Thread Jimmy
Hello,

I am trying to connect SOLR (tried 4.4 and 4.7) to kerberos enabled HDFS -
I am using Cloudera CDH 4.2.1
http://maven-repository.com/artifact/com.cloudera.cdh/cdh-root/4.2.1/pom_effective

the keytab and principal is valid (I tested it with flume as well as simple
hdfs cli)


did anobody successfully connect SOLR 4.x to CDH 4.2.1?



str
name=solr.hdfs.security.kerberos.enabled${solr.hdfs.security.kerberos.enabled:true}/str
str
name=solr.hdfs.security.kerberos.keytabfile${solr.hdfs.security.kerberos.keytabfile:/my.keytab}/str
str name=solr.hdfs.security.kerberos.principal${
solr.hdfs.security.kerberos.principal:m...@mydomain.com}/str


I am getting follow error


HTTP Status 500 - {msg=SolrCore 'collection1' is not available due to init
failure: java.io.IOException: Login failure for m...@mydomain.com from keytab
/my.keytab,
trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not
available due to init failure:
java.io.IOException: Login failure for m...@mydomain.com from keytab
/my.keytab
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:251)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Caused by: java.lang.RuntimeException: java.io.IOException: Login failure
for me@MYDOMAIN.COMfrom keytab /my.keytab
at
org.apache.solr.core.HdfsDirectoryFactory.initKerberos(HdfsDirectoryFactory.java:282)
at
org.apache.solr.core.HdfsDirectoryFactory.init(HdfsDirectoryFactory.java:90)
at org.apache.solr.core.SolrCore.initDirectoryFactory(SolrCore.java:443)
at org.apache.solr.core.SolrCore.init(SolrCore.java:672)
at org.apache.solr.core.SolrCore.init(SolrCore.java:629)
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138) ...

... 3 more Caused by: java.io.IOException: Login failure for
m...@mydomain.com from
keytab /my.keytab
at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:825)
at
org.apache.solr.core.HdfsDirectoryFactory.initKerberos(HdfsDirectoryFactory.java:280)

... 16 more Caused by: javax.security.auth.login.LoginException:
java.lang.IllegalArgumentException: Illegal principal name m...@mydomain.com
at org.apache.hadoop.security.User.init(User.java:50)
at org.apache.hadoop.security.User.init(User.java:43)
at
org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
javax.security.auth.login.LoginContext.invoke(LoginContext.java:769)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
at java.security.AccessController.doPrivileged(Native Method)
at
javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
at 

Wildcard searches and tokenization

2014-03-03 Thread Hayden Muhl
I'm working on a user name autocomplete feature, and am having some issues
with the way we are tokenizing user names.

We're using the StandardTokenizerFactory to tokenize user names, so
foo-bar gets split into two tokens. We take input from the user and use
it as a prefix to search on the user name. This means wildcard searches of
fo* and ba* both return foo-bar, which is what we want.

We have a problem when someone types in foo-b as a prefix. I would like
to split this into foo and b, then use each as a prefix in a wildcard
search. Is there an easy way to tell Solr, Tokenize this, then do a prefix
search?

I've written at least one QParserPlugin, so that's an option. Hopefully
there's an easier way I'm unaware of.

- Hayden


What types is supported by Solrj addBean() in the fields of POJO objects?

2014-03-03 Thread T. Kuro Kurosaka
What are supported types of the POJO objects that are sent to 
SolrServer.addBean(obj)?

A quick glance of DocumentObjectBinder seems to suggest that
an arbitrary combination of an Collection, List, ArrayList, array ([]), Map, 
Hashmap,

of primitive types, String and Date is supported, but I'm not too sure. I would 
also
like to know what Solr field types are allowed for each object's (Java) field 
types.
Is there documentation explaining this?

Kuro


Re: Solution for reverse order of year facets?

2014-03-03 Thread Ahmet Arslan
Hi,

Regarding just a simple subtraction you do it in indexer code or in a update 
prcessor too. You can either modify original field or you can create an 
additional one. Java-script could be used : 
http://wiki.apache.org/solr/ScriptUpdateProcessor

Ahmet


On Monday, March 3, 2014 9:11 PM, Michael Lackhoff mich...@lackhoff.de wrote:
Hi Ahmet,

 There is no built in solution for this.

Yes, I know, that's why I would like the TokenFilterFactory

 Two workaround :
 
 1) use facet.limit=-1 and invert the list (faceting response) at client side
 
 2) use multiples facet.query
    a)facet.query=year:[2012 TO 2014]facet.query=year:[2010 TO 2012] 
    b)facet.query=year:2014facet.query=year:2013 ...

I thought about these but they have the disadvantage that 1) could
return hundreds of facet entries. 2b) is better but would need about 30
facet-queries which makes quite a long URL and it wouldn't always work
as expected. There are subjects that were very popular in the past but
with no (or very few) recent publications. For these I would get empty
results for my 2014-1985 facet-queries but miss all the stuff from the
1960s.

From all these thoughts I came to the conclusion that a custom
TokenFilterFactory could do exactly what I want. In effect it would give
me a reverse sort:
1 - 2014 = 7986
1 - 2013 = 7987
...
The client code can easily regain the original year values for display.

And I think it shouldn't be too difficult to write such a beast, only
problem is I am not a Java programmer. That is why I asked if someone
has done it already or if there is a guide I could use.
After all it is just a simple subtraction...


-Michael


Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread Tri Cao
If you just want to see which classes are occupying the most memory in a live JVM,you can do:jmap -permstat pidI don't think you can dump the contents of PERM space.Hope this helps,TriOn Mar 03, 2014, at 11:41 AM, KNitin nitin.t...@gmail.com wrote:Is there a way to dump the contents of permgen and look at which classes are occupying the most memory in that?  - Nitin   On Mon, Mar 3, 2014 at 11:19 AM, KNitin nitin.t...@gmail.com wrote: Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud(containing custom parsing, analyzers). But I haven't specifically enabledany string interning. Does solr intern all strings in a collection bydefault?I agree with doc and Filter Query Cache. Query Result cache hits arepractically 0 for the large collection since our queries are tail by natureThanksNitinOn Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov msoko...@safaribooksonline.com wrote:On 3/3/2014 1:54 AM, KNitin wrote:3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)As others have pointed out, this is really unusual for Solr. We oftensee high permgen in our app servers due to dynamic class loading that theframework performs; maybe you are somehow loading lots of new Solr plugins,or otherwise creating lots of classes? Of course if you have a plugin orsomething that does a lot of string interning, that could also be anexplanation.-Mike

Re: Solution for reverse order of year facets?

2014-03-03 Thread Ahmet Arslan
Hi Michael,


I forgot to include what I did for one customer :

1) Using StatsComponent I get min and max values of the field (year)
2) Calculate smart gap/range values according to minimum and maximum.
3) Re-issue the same query (for thee second time) that includes a set of 
facet.query.

Ahmet



On Monday, March 3, 2014 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote:
Hi,

Regarding just a simple subtraction you do it in indexer code or in a update 
prcessor too. You can either modify original field or you can create an 
additional one. Java-script could be used : 
http://wiki.apache.org/solr/ScriptUpdateProcessor

Ahmet



On Monday, March 3, 2014 9:11 PM, Michael Lackhoff mich...@lackhoff.de wrote:
Hi Ahmet,

 There is no built in solution for this.

Yes, I know, that's why I would like the TokenFilterFactory

 Two workaround :
 
 1) use facet.limit=-1 and invert the list (faceting response) at client side
 
 2) use multiples facet.query
    a)facet.query=year:[2012 TO 2014]facet.query=year:[2010 TO 2012] 
    b)facet.query=year:2014facet.query=year:2013 ...

I thought about these but they have the disadvantage that 1) could
return hundreds of facet entries. 2b) is better but would need about 30
facet-queries which makes quite a long URL and it wouldn't always work
as expected. There are subjects that were very popular in the past but
with no (or very few) recent publications. For these I would get empty
results for my 2014-1985 facet-queries but miss all the stuff from the
1960s.

From all these thoughts I came to the conclusion that a custom
TokenFilterFactory could do exactly what I want. In effect it would give
me a reverse sort:
1 - 2014 = 7986
1 - 2013 = 7987
...
The client code can easily regain the original year values for display.

And I think it shouldn't be too difficult to write such a beast, only
problem is I am not a Java programmer. That is why I asked if someone
has done it already or if there is a guide I could use.
After all it is just a simple subtraction...


-Michael


Re: network slows when solr is running - help

2014-03-03 Thread Lan
How frequently are you committing? Frequent commits can slow everything down.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/network-slows-when-solr-is-running-help-tp4120523p4120992.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boost query syntax error

2014-03-03 Thread Chris Hostetter

: But this query does not work:
: 
: q={!boost
: b=if(exists(query({!v='user_type:ADMIN'})),10,1)}id:1rows=1fl=*,score
: It gives an error like this:

The problem is the way you are trying to nest queries inside of each other 
w/o any sort of quoting -- the parser has no indication that the b param 
is if(exists(query({!v='user_type:ADMIN'})),10,1) it thinks it' 
if(exists(query({!v='user_type:ADMIN' and the rest is confusing it.

If you quote the b param to the boost parser, then it should work...

http://localhost:8983/solr/select?q={!boost%20b=%22if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29%22}id:1

...or if you could use variable derefrencing, either of these should 
work...

http://localhost:8983/solr/select?q={!boost%20b=$b}id:1b=if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29
http://localhost:8983/solr/select?q={!boost%20b=if(exists(query($nestedq)),10,1)}id:1nestedq=foo_s:ADMIN


-Hoss
http://www.lucidworks.com/


Re[2]: query parameters

2014-03-03 Thread Andreas Owen
ok i like the logic, you can do much more. i think this should do it for me:

         (-organisations:[ TO *] -roles:[ TO *]) (+organisations:(150 42) 
+roles:(174 72))


i want to use this in fq and i need to set the operator to OR. My q.op is AND 
but I need OR in fq. I have read about ofq but that is for putting OR between 
multiple fq. Can I set the operator for fq?

The statement should find all docs without organisations and roles or those 
that have at least one roles and organisations entry. these fields are 
multivalued.

-Original-Nachricht- 
 Von: Erick Erickson erickerick...@gmail.com 
 An: solr-user@lucene.apache.org 
 Datum: 19/02/2014 04:09 
 Betreff: Re: query parameters 
 
 Solr/Lucene query language is NOT strictly boolean, see
 Chris's excellent blog here:
 http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/
 
 Best,
 Erick
 
 
 On Tue, Feb 18, 2014 at 11:54 AM, Andreas Owen a...@conx.ch wrote:
 
  I tried it in solr admin query and it showed me all the docs without a
  value
  in ogranisations and roles. It didn't matter if i used a base term, isn't
  that give through the q-parameter?
 
  -Original Message-
  From: Raymond Wiker [mailto:rwi...@gmail.com]
  Sent: Dienstag, 18. Februar 2014 13:19
  To: solr-user@lucene.apache.org
  Subject: Re: query parameters
 
  That could be because the second condition does not do what you think it
  does... have you tried running the second condition separately?
 
  You may have to add a base term to the second condition, like what you
  have for the bq parameter in your config file; i.e, something like
 
  (*:* -organisations:[ TO *] -roles:[ TO *])
 
 
 
 
  On Tue, Feb 18, 2014 at 12:16 PM, Andreas Owen a...@conx.ch wrote:
 
   It seams that fq doesn't except OR because: (organisations:(150 OR 41)
   AND
   roles:(174)) OR  (-organisations:[ TO *] AND -roles:[ TO *]) only
   returns docs that match the first conditions. it doesn't return any
   docs with the empty fields organisations and roles.
  
   -Original Message-
   From: Andreas Owen [mailto:a...@conx.ch]
   Sent: Montag, 17. Februar 2014 05:08
   To: solr-user@lucene.apache.org
   Subject: query parameters
  
  
   in solrconfig of my solr 4.3 i have a userdefined requestHandler. i
   would like to use fq to force the following conditions:
      1: organisations is empty and roles is empty
      2: organisations contains one of the commadelimited list in
   variable $org
      3: roles contains one of the commadelimited list in variable $r
      4: rule 2 and 3
  
   snipet of what i got (havent checked out if the is a in operator
   like in sql for the list value)
  
   lst name=defaults
          str name=echoParamsexplicit/str
          int name=rows10/int
          str name=defTypeedismax/str
              str name=synonymstrue/str
              str name=qfplain_text^10 editorschoice^200
                   title^20 h_*^14
                   tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
                   contentmanager^5 links^5
                   last_modified^5 url^5
              /str
              str name=fq(organisations='' roles='') or
   (organisations=$org roles=$r) or (organisations='' roles=$r) or
   (organisations=$org roles='')/str
              str name=bq(expiration:[NOW TO *] OR (*:*
   -expiration:*))^6/str  !-- tested: now or newer or empty gets small
   boost --
              str name=bfdiv(clicks,max(displays,1))^8/str !--
   tested
   --
  
  
  
  
  
  
 
 





Re: Configuration problem

2014-03-03 Thread Shawn Heisey

On 3/3/2014 9:02 AM, Thomas Fischer wrote:

The setting is
solr directories (I use different solr versions at the same time):
/srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the new 
discovery type (no cores), and inside the core directories are empty files 
core.properties and symbolic links to the universal conf directory.
  
solr webapps (I use very different webapps simultaneously):

/srv/www/webapps/solr/solr4.6.1 is the solr webapp

I tried to convey this information to the tomcat server by putting a file 
solr4.6.1.xml into the cataiina/localhost folder with the contents
?xml version=1.0 encoding=utf-8?
Context docBase=/srv/www/webapps/solr/solr4.6.1 debug=0 
crossContext=true
Environment name=solr/home type=java.lang.String value=/srv/solr/solr4.6.1 
override=true/
/Context


Your message is buried deep in another message thread about NoSQL, 
because you replied to an existing message rather than starting a new 
message to solr-user@lucene.apache.org.  On list-mirroring forums like 
Nabble, nobody will even see your message (or this reply) unless they 
actually open that other thread.  This is what it looks like on a 
threading mail reader (Thunderbird):


https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png

I don't use Tomcat, so I can't even begin to comment on that.  I can 
talk about your solr home setting and what Solr is going to do with that.


You probably do not have /srv/solr/solr4.6.1/solr.xml on your system.  
Solr will look for solr.mxl in your solr home, and if it cannot find it, 
it assumes that you are not running multicore, so it look for things 
like collection1/conf/solrconfig.xml instead.


There is a solr.xml in the example.  Use that, changing as necessary, or 
create a solr.xml file with just the following line in it.  It will 
probably start working:


solr/

You *might* need the following instead, but since Solr uses standard XML 
parsing libraries, I would guess that the above line will work.


solr
/solr

Thanks,
Shawn



is it possible to consolidate filterquery cache strings

2014-03-03 Thread solr-user
lets say I have a largish set of data (120M docs) and that I am partitioning
my data by groups of states (using the state codes)

Someone suggested that I could use the following format in my solrconfig.xml
when defining the filterqueries work:

listener event=newSearcher class=solr.QuerySenderListener
  arr name=queries
lst
  str name=q*:*/str
  str name=fqState:AL/str
  str name=fqState:AK/str
...
  str name=fqState:WY/str
  /arr
/listener

Would that work, and if so how would I know that the cache is being hit?

Or do I need to use the following traditional syntax instead:

listener event=newSearcher class=solr.QuerySenderListener
  arr name=queries
lst
  str name=q*:*/str
  str name=fqState:AL/str
/str
lst
  str name=q*:*/str
  str name=fqState:AK/str
/str
...
lst
  str name=q*:*/str
  str name=fqState:WY/str
/str
  /arr
/listener

any help appreciated



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Startup

2014-03-03 Thread KNitin
A quick ping on this. To give more stats, I have 100's of collections on
every node. The time it takes for one collection to boot up /loadonStartup
is around 10-20 seconds (and sometimes even 1 minute). I do not have any
query auto warming etc. On a per collection basis I load a bunch of
libraries (for custom analyzer plugins) to compute the classpath. That
might be a reason for the high boot up time

  My solrconfig.xml entry is as follows

  lib dir=/mnt/solr/lib/ regex=.*\.jar /

 Every core that boots up seems to be loading all jars over and over again.
Is there a way to ask solr to load all jars only once?

Thanks
- Nitin


On Wed, Feb 26, 2014 at 3:06 PM, KNitin nitin.t...@gmail.com wrote:

 Thanks, Shawn. I will try to upgrade solr soon

 Reg firstSearcher: I think it does nothing now. I have configured to use
 ExternalFileLoader but there the external file has no contents. Most of the
 queries hitting the collection are expensive and tail queries. What will be
 your recommendation to warm the first Searcher/new Searcher?

 Thanks
 Nitin


 On Tue, Feb 25, 2014 at 4:12 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/25/2014 4:30 PM, KNitin wrote:

 Jeff :  Thanks. I have tried reload before but it is not reliable
 (atleast
 in 4.3.1). A few cores get initialized and few dont (show as just
 recovering or down) and hence had to move away from it. Is it a known
 issue
 in 4.3.1?


 With Solr 4.3.1, you are running into this bug with reloads under
 SolrCloud:

 https://issues.apache.org/jira/browse/SOLR-4805

 The only way to recover from this bug is to restart Solr.The bug is fixed
 in 4.4.0 and later.


  Shawn,Otis,Erick

   Yes I have reviewed the page before and have given 1/4 of my mem to JVM
 and the rest to RAM/Os Cache. (15 Gb heap and 45 G to rest. Totally 60G
 machine). I have also reviewed the tlog file and they are in the order of
 KB (4-10 or 30). I have SSD and the reads are hardly noticable (in the
 order of 100Kb during that time frame). I have also disabled swap on all
 machines

 Regarding firstSearcher, It is currently set to externalFileLoader. What
 is
 the use of first searcher? I havent played around with it


 I don't think it's a good idea to have extensive warming queries.  I do
 exactly one query in firstSearcher and newSearcher: a query for all
 documents with zero rows, sorted on our most common sort field.  This is
 designed purely to preload the sort data into the FieldCache.

 Thanks,
 Shawn





Re: SolrCloud Startup

2014-03-03 Thread Shawn Heisey

On 3/3/2014 3:30 PM, KNitin wrote:

A quick ping on this. To give more stats, I have 100's of collections on
every node. The time it takes for one collection to boot up /loadonStartup
is around 10-20 seconds (and sometimes even 1 minute). I do not have any
query auto warming etc. On a per collection basis I load a bunch of
libraries (for custom analyzer plugins) to compute the classpath. That
might be a reason for the high boot up time

   My solrconfig.xml entry is as follows

   lib dir=/mnt/solr/lib/ regex=.*\.jar /

  Every core that boots up seems to be loading all jars over and over again.
Is there a way to ask solr to load all jars only once?


Three steps:

1) Get rid of all your lib directives in solrconfig.xml entirely.
2) Copy all the extra jars that you need into ${solr.solr.home}/lib.
3) Remove any sharedLib parameter from your solr.xml file.

Step 3 is required because you are on 4.3.1 (or later if you have 
already upgraded).


The final comment on the following issue summarizes issues that I ran 
into while migrating this approach from 4.2.1 to later releases:


https://issues.apache.org/jira/browse/SOLR-4852

Thanks,
Shawn



Re: Configuration problem

2014-03-03 Thread Thomas Fischer
Am 03.03.2014 um 22:43 schrieb Shawn Heisey:

 On 3/3/2014 9:02 AM, Thomas Fischer wrote:
 The setting is
 solr directories (I use different solr versions at the same time):
 /srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the 
 new discovery type (no cores), and inside the core directories are empty 
 files core.properties and symbolic links to the universal conf directory.
  solr webapps (I use very different webapps simultaneously):
 /srv/www/webapps/solr/solr4.6.1 is the solr webapp
 
 I tried to convey this information to the tomcat server by putting a file 
 solr4.6.1.xml into the cataiina/localhost folder with the contents
 ?xml version=1.0 encoding=utf-8?
 Context docBase=/srv/www/webapps/solr/solr4.6.1 debug=0 
 crossContext=true
  Environment name=solr/home type=java.lang.String 
 value=/srv/solr/solr4.6.1 override=true/
 /Context
 
 Your message is buried deep in another message thread about NoSQL, because 
 you replied to an existing message rather than starting a new message to 
 solr-user@lucene.apache.org.  On list-mirroring forums like Nabble, nobody 
 will even see your message (or this reply) unless they actually open that 
 other thread.  This is what it looks like on a threading mail reader 
 (Thunderbird):
 
 https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png

Yes, I'm sorry, I only afterwards realized that my question inherited the 
thread from the E-Mail I was reading and using as a template for the answer.

Meanwhile I figured out that I overlooked the third place to define solr home 
for Tomcat (after JAVA_OPTS and JNDI): web.xml in WEB-INF of the given webapp.
This overrides the other definitions and created the impression that I couldn't 
set  solr home.

But now I get the message
Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml
for the core geo.
In the solr wiki I read (http://wiki.apache.org/solr/ConfiguringSolr):
In each core, Solr will look for a conf/solrconfig.xml file and expected solr 
to look for
/srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml (which exists), but obviously 
it doesn't.
Why? My misunderstanding?

Best
Thomas





Re: is it possible to consolidate filterquery cache strings

2014-03-03 Thread solr-user
note: by partitioning I mean that I have sharded the 120M docs into 9 Solr
partitions (each on a separate server)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121012.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Startup

2014-03-03 Thread KNitin
Thanks, Shawn.  Right now my solr.solr.home is not being passed from the
java runtime

Lets say /mnt/solr/ is my solr root. I can add all jars to /mnt/solr/lib/
and use -Dsolr.solr.home=/mnt/solr/  , that should do it right?

Thanks
Nitin


On Mon, Mar 3, 2014 at 2:44 PM, Shawn Heisey s...@elyograg.org wrote:

 On 3/3/2014 3:30 PM, KNitin wrote:

 A quick ping on this. To give more stats, I have 100's of collections on
 every node. The time it takes for one collection to boot up /loadonStartup
 is around 10-20 seconds (and sometimes even 1 minute). I do not have any
 query auto warming etc. On a per collection basis I load a bunch of
 libraries (for custom analyzer plugins) to compute the classpath. That
 might be a reason for the high boot up time

My solrconfig.xml entry is as follows

lib dir=/mnt/solr/lib/ regex=.*\.jar /

   Every core that boots up seems to be loading all jars over and over
 again.
 Is there a way to ask solr to load all jars only once?


 Three steps:

 1) Get rid of all your lib directives in solrconfig.xml entirely.
 2) Copy all the extra jars that you need into ${solr.solr.home}/lib.
 3) Remove any sharedLib parameter from your solr.xml file.

 Step 3 is required because you are on 4.3.1 (or later if you have already
 upgraded).

 The final comment on the following issue summarizes issues that I ran into
 while migrating this approach from 4.2.1 to later releases:

 https://issues.apache.org/jira/browse/SOLR-4852

 Thanks,
 Shawn




solrconfig.xml

2014-03-03 Thread Thomas Fischer
Hello,

I'm sorry to repeat myself but I didn't manage to get out of the thread I 
inadvertently slipped into.

My problem now is this:
I have a core geo (with an empty file core.properties inside) and 
solrconfig.xml at
/srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml
following the hint from the solr wiki  
(http://wiki.apache.org/solr/ConfiguringSolr):
In each core, Solr will look for a conf/solrconfig.xml file
But I get the error message:
Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml
Why? My misunderstanding?

Best
Thomas


Re: is it possible to consolidate filterquery cache strings

2014-03-03 Thread Chris Hostetter


: Would that work, and if so how would I know that the cache is being hit?

It should work -- filters are evaluated independently, so the fact that 
you are using all of them in query query (vs all of them in individual 
queries) won't change anything as far as the filterCache goes.

You can prove that it works by looking at the cache stats (available 
from the Admin UI) after opening a new searcher and verifying that they 
are all in the new caches.  you can also then do a query for soemthing 
like q=foofq=State:AK and reload the cache stats and see a hit on 
your filterCcahe.

: Or do I need to use the following traditional syntax instead:

The only reason to break them all out like that is if you in addition to 
populating the *filterCache* you also want to populate the 
*queryResultCache* with ~50 queries for *:* each with a different fq 
applied.



-Hoss
http://www.lucidworks.com/


Re: Boost query syntax error

2014-03-03 Thread Arun Rangarajan
All of them work like a charm! Thanks, Chris.


On Mon, Mar 3, 2014 at 1:28 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : But this query does not work:
 :
 : q={!boost
 : b=if(exists(query({!v='user_type:ADMIN'})),10,1)}id:1rows=1fl=*,score
 : It gives an error like this:

 The problem is the way you are trying to nest queries inside of each other
 w/o any sort of quoting -- the parser has no indication that the b param
 is if(exists(query({!v='user_type:ADMIN'})),10,1) it thinks it'
 if(exists(query({!v='user_type:ADMIN' and the rest is confusing it.

 If you quote the b param to the boost parser, then it should work...


 http://localhost:8983/solr/select?q={!boost%20b=%22if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29%22}id:1

 ...or if you could use variable derefrencing, either of these should
 work...


 http://localhost:8983/solr/select?q={!boost%20b=$b}id:1b=if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29

 http://localhost:8983/solr/select?q={!boost%20b=if(exists(query($nestedq)),10,1)}id:1nestedq=foo_s:ADMIN


 -Hoss
 http://www.lucidworks.com/



Re: solrconfig.xml

2014-03-03 Thread Alexandre Rafalovitch
File permissions? Malformed XML? Are there any other exceptions
earlier in the log? If you substitute that file with one from example
distribution, does it work?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Mar 4, 2014 at 6:07 AM, Thomas Fischer fischer...@aon.at wrote:
 Hello,

 I'm sorry to repeat myself but I didn't manage to get out of the thread I 
 inadvertently slipped into.

 My problem now is this:
 I have a core geo (with an empty file core.properties inside) and 
 solrconfig.xml at
 /srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml
 following the hint from the solr wiki  
 (http://wiki.apache.org/solr/ConfiguringSolr):
 In each core, Solr will look for a conf/solrconfig.xml file
 But I get the error message:
 Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml
 Why? My misunderstanding?

 Best
 Thomas


Re: is it possible to consolidate filterquery cache strings

2014-03-03 Thread solr-user
would not breaking the FQs out by state be faster for warming up the fq
caches?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121030.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solrconfig.xml

2014-03-03 Thread Chris Hostetter

: I have a core geo (with an empty file core.properties inside) and 
solrconfig.xml at
: /srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml
...
: But I get the error message:
: Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml

1) what does your solr.xml file look like?
2) what does cores/geo/core.properties look like?
3) do you get any other errors before this one in your log?
4) what kind of file permissions are set on cores, cores/geo, 
cores/geo/conf, etc... ?


It's possible that this just a mistake in the error message after some 
real error with your actual geo/conf/solrconfig.xml has already been 
logged.  Or it's possible that solr couldn't read geo/conf/solrconfig.xml 
(permissions) and tried to fallback by looking for geo/solrconfig.xml (we 
used to do that, look in the instanceDir as a last resort -- not sure if 
the code is still in there) and you're just looking at the last errror.


-Hoss
http://www.lucidworks.com/


Re: java.lang.Exception: Conflict with StreamingUpdateSolrServer

2014-03-03 Thread Chris Hostetter

: Subject: java.lang.Exception: Conflict with StreamingUpdateSolrServer

the fact that you are using StreamingUpdateSolrServer isn't really a 
factor here -- what matters is the data you are sending to solr in the 
updates...

: location=StreamingUpdateSolrServer line=162 Status for: null is 409
...
: Conflict

A 409 HTTP Status is a Conflict.  

It means that Optimistic concurrency failed.  Your update indicated a 
document version but the version of hte document on the server didn't meet 
the version requirements...

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency



-Hoss
http://www.lucidworks.com/


Re: Searching with special chars

2014-03-03 Thread deniz
So as there was no quick work around to this issue, we simply change the http
method from get to post, to avoid further problems which could be triggered
by user input too. though this violates the restful standards... at least we
have something running properly



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-with-special-chars-tp4120047p4121043.html
Sent from the Solr - User mailing list archive at Nabble.com.


Please add me to wiki contributors

2014-03-03 Thread Susheel Kumar
Hi,

Can you please add me to wiki contributors. I wanted to add some stats on Linux 
vs Windows we came across recently, CSV update handler examples,  and also 
wanted to add company name to public server page.

Thanks,
Susheel


Automate search results filtering based on scoring

2014-03-03 Thread Susheel Kumar
Hi,

We are looking to automate searches (name searches)  filter out the results 
based on some scoring confidence. Any suggestions on what different approaches 
we can use to pick only top closer matches and filter out rest of the results.


Thanks,
Susheel



Re: java.lang.Exception: Conflict with StreamingUpdateSolrServer

2014-03-03 Thread Gopal Patwa
Thanks Chirs,  I found in our application code it was related to optimistic
concurrency failure.


On Mon, Mar 3, 2014 at 6:13 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : Subject: java.lang.Exception: Conflict with StreamingUpdateSolrServer

 the fact that you are using StreamingUpdateSolrServer isn't really a
 factor here -- what matters is the data you are sending to solr in the
 updates...

 : location=StreamingUpdateSolrServer line=162 Status for: null is 409
 ...
 : Conflict

 A 409 HTTP Status is a Conflict.

 It means that Optimistic concurrency failed.  Your update indicated a
 document version but the version of hte document on the server didn't meet
 the version requirements...


 https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency



 -Hoss
 http://www.lucidworks.com/



RE: Please add me to wiki contributors

2014-03-03 Thread Susheel Kumar
My user name is SusheelKumar for solr wiki.

-Original Message-
From: Susheel Kumar [mailto:susheel.ku...@thedigitalgroup.net] 
Sent: Monday, March 03, 2014 9:36 PM
To: solr-user@lucene.apache.org
Subject: Please add me to wiki contributors

Hi,

Can you please add me to wiki contributors. I wanted to add some stats on Linux 
vs Windows we came across recently, CSV update handler examples,  and also 
wanted to add company name to public server page.

Thanks,
Susheel


Re: range types in SOLR

2014-03-03 Thread Thomas Scheffler

Am 03.03.2014 19:12, schrieb Smiley, David W.:

The main reference for this approach is here:
http://wiki.apache.org/solr/SpatialForTimeDurations


Hoss’s illustrations he developed for the meetup presentation are great.
However, there are bugs in the instruction — specifically it’s important
to slightly buffer the query and choose an appropriate maxDistErr.  Also,
it’s more preferable to use the rectangle range query style of spatial
query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using
“Intersects(minX minY maxX maxY)”.  There’s no technical difference but
the latter is deprecated and will eventually be removed from Solr 5 /
trunk.

All this said, recognize this is a bit of a hack (one that works well).
There is a good chance a more ideal implementation approach is going to be
developed this year.


Thank you,

having a working example is great but having a practically working 
example that hides this implementation detail would even better.


I would like to store:

2014-03-04T07:05:12,345Z, 2014-03-04, 2014-03 and 2014 into one field 
and make queries on that field.


Currently I have to normalize all to the first format (inventing 
information). That is only the worst approximation. Normalize them to a 
range would be the best in my opinion. So a query like date:2014 would 
hit all but also date:[2014-01 TO 2014-03].


kind regards,

Thomas


Re: SOLRJ and SOLR compatibility

2014-03-03 Thread Thomas Scheffler

Am 27.02.2014 09:15, schrieb Shawn Heisey:

On 2/27/2014 12:49 AM, Thomas Scheffler wrote:

What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.


Actually bug reports arrive me that sound like

Unknown type 19


Aha!  I found it!  It was caused by the change applied for SOLR-5658,
fixed in 4.7.0 (just released) by SOLR-5762.  Just my luck that there's
a bug bad enough to contradict what I told you.

https://issues.apache.org/jira/browse/SOLR-5658
https://issues.apache.org/jira/browse/SOLR-5762

I've added a comment that will help users find SOLR-5762 with a search
for Unknown type 19.

If you use SolrJ 4.7.0, compatibility should be better.


Hi,

I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR 
4.5.1. I received a client stack trace this morning and still waiting 
for a Log-Output from the Server:


--
ERROR unable to submit tasks
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unknown type 19
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
--

There is not much information in that Stacktrace, I know.
I'll send further information, when I receive more. In the mean time I 
asked our customer not to upgrade the SOLR server to resolve the issue. 
So we could dig deeper.


kind regards,

Thomas


Re: SOLRJ and SOLR compatibility

2014-03-03 Thread Thomas Scheffler

Am 04.03.2014 07:21, schrieb Thomas Scheffler:

Am 27.02.2014 09:15, schrieb Shawn Heisey:

On 2/27/2014 12:49 AM, Thomas Scheffler wrote:

What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.


Actually bug reports arrive me that sound like

Unknown type 19


Aha!  I found it!  It was caused by the change applied for SOLR-5658,
fixed in 4.7.0 (just released) by SOLR-5762.  Just my luck that there's
a bug bad enough to contradict what I told you.

https://issues.apache.org/jira/browse/SOLR-5658
https://issues.apache.org/jira/browse/SOLR-5762

I've added a comment that will help users find SOLR-5762 with a search
for Unknown type 19.

If you use SolrJ 4.7.0, compatibility should be better.


Hi,

I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR
4.5.1. I received a client stack trace this morning and still waiting
for a Log-Output from the Server:


Here we go for the server side (4.5.1):

Mrz 03, 2014 2:39:26 PM org.apache.solr.core.SolrCore execute
Information: [clausthal_test] webapp=/solr path=/select
params={fl=*,scoresort=mods.dateIssued+descq=%2BobjectType:mods+%2Bcategory:clausthal_status\:publishedwt=javabinversion=2rows=3}
hits=186 status=0 QTime=2
Mrz 03, 2014 2:39:38 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
Information: [clausthal_test] webapp=/solr path=/update
params={wt=javabinversion=2} {} 0 0
Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log
Schwerwiegend: java.lang.RuntimeException: Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log
Schwerwiegend: null:java.lang.RuntimeException: Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
at