New operator.

2013-06-16 Thread Yanis Kakamaikis
Hi all,I want to add a new operator to my solr. I need that operator to call my proprietary engine and build an answer vector to solr, in a way that this vector will be part of the boolean query at the next step. How do I do that? Thanks

Re: New operator.

2013-06-16 Thread Mikhail Khludnev
Hello Yanis, Two options. 1. Create own SearchComponent, which adds filterQuery into request, and add it into SearchHandler. http://wiki.apache.org/solr/SearchComponent 2. Create QParserPlugin and call them by request param ...fq={!yanisqp}applyvector...

Re: Solr large boolean filter

2013-06-16 Thread Mikhail Khludnev
Right. FieldCacheTermsFilter is an option. You need to create own QParserPlugin which yields FieldCacheTermsFilter, hook him as ..fq={!idsqp cache=false}.. Mind disabling caching! Mind term ecoding due to field type! I also suggest to check how much it spend for tokenization. Once a day I've got

Re: in Solr 3.5, optimization increase the index size to double

2013-06-16 Thread Erick Erickson
Optimzing will _temporarily_ double the index size, but it shouldn't be permanent. Is it possible that you have inadvertently told Solr to keep an extra snapshot? I think it's numberToKeep in your replication handler, but I'm going from memory here. Best Erick On Fri, Jun 14, 2013 at 2:15 AM,

Re: Solr using a ridiculous amount of memory

2013-06-16 Thread Erick Erickson
John: If you'd like to add your experience to the Wiki, create an ID and let us know what it is and we'll add you to the contributors list. Unfortunately we had problems with spam pages to we added this step. Make sure you include your logon in the request. Thanks, Erick On Fri, Jun 14, 2013

Re: Solr 3.5 Optimization takes index file size almost double

2013-06-16 Thread Erick Erickson
Unix or Windows? And are the files still there after restarting Solr? Best Erick On Fri, Jun 14, 2013 at 10:54 AM, Pravin Bhutada pravin.bhut...@gmail.com wrote: One thing that you can try is optimize incrementally. Instead of optimizing to 1 segment, optimize to 100, then 50 , 25, 10 ,5 ,2 ,1

Re: Replicas and soft commit

2013-06-16 Thread Erick Erickson
You're mixing things up pretty thoroughly G SolrCloud with leaders and replicas is orthogonal to Master/Slave setups, generally people use one or the other. Master/Slave setups don't get NRT updates at all. I'm a little surprised that your setup works, it sounds like you have replication set

Re: Solr cloud: zkHost in solr.xml gets wiped out

2013-06-16 Thread Erick Erickson
Al: As it happens, I hope sometime today to put up a patch for SOLR-4910 that should harden up many things in persisting solr.xml, I'll be sure to include this. It's kind of a pain to create an automated test for this, so I'll give it a whirl manually. As you say, most of this is going away in

Re: New operator.

2013-06-16 Thread Jack Krupansky
It all depends on what you mean by an operator. Start by describing in more detail what problem you are trying to solve. And how do you expect your users or applications to use this operator. Give some examples. Solr and Lucene do not have operators per say, except in query parser syntax,

Re: Solr large boolean filter

2013-06-16 Thread Jack Krupansky
Whenever I see one of this big query filters, my first thought is that there is something wrong with the application data model. Where do the long list of IDs come from? Somebody must be generating and/or storing them, right? Why not store them in Solr, right in the data model? Maybe store

Re: Solr using a ridiculous amount of memory

2013-06-16 Thread adityab
It was interesting to read this post. I had similar issue on Solr v4.2.1. The nature of our document is that it has huge multiValued fields and we were able to knock off out server in about 30muns We then found a bug Lucene-4995 which was causing all the problem. Applying the patch has helped a

Re: Best way to match umlauts

2013-06-16 Thread adityab
Thanks for the explanation Steve. I now see it clearly. In my case it should work. -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-match-umlauts-tp4070256p4070805.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr using a ridiculous amount of memory

2013-06-16 Thread Jack Krupansky
Yeah, this is yet another anti-pattern we need to be discouraging - large multivalued fields. They indicate that the data model is not well balanced and aligned with the strengths of Solr and Lucene. -- Jack Krupansky -Original Message- From: adityab Sent: Sunday, June 16, 2013 9:36

Re: in Solr 3.5, optimization increase the index size to double

2013-06-16 Thread Jason Hellman
And let's not forget the interesting bug in MMapDirectory: http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/store/MMapDirectory.html NOTE: memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Jan Høydahl
Hi, I've never heard the complaint that Solr is hard to use. To the contrary, most people I come across have downloaded Solr themselves, walked through the tutorial and praise the simplicity with which they can start indexing and searching content. When they come to us asking for consultancy

RE: filter query from external list of Solr unique IDs

2013-06-16 Thread samabhiK
Does anything exists already in solr 4.3 to meet this usecase scenario? -- View this message in context: http://lucene.472066.n3.nabble.com/filter-query-from-external-list-of-Solr-unique-IDs-tp1709060p4070874.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Jack Krupansky
Jan, you made no mention of mastering Solr - which was the crux of my comments. I think everyone agrees that anyone can download and use Solr, in a basic sense, with minimal effort. The issue is how far the average application developer can get beyond start towards mastery without a detailed

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Yonik Seeley
On Sun, Jun 16, 2013 at 6:05 PM, Jack Krupansky j...@basetechnology.com wrote: Except, that Solr's divergence from a true, pure REST API is certainly one of the elements of its badness. Most complex systems seem to feel the need to diverge from pure REST for the sake of being practical. From

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Jack Krupansky
Exactly. For the case in point that is the real, underlying subject of this thread, the desire is to partially update an existing Solr document using the output of SolrCell/Tika. With true/pure REST, that should be the HTTP PUT verb. And the path would indicate the collection and key value.

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Yago Riveiro
I'm share the yonik's opinion that a pure REST application is in some cases is a pain in the ass. But like Jack referred, exists some cases where REST is more expressive and is easy to understand what are you doing. At this point, I think that is more important make the actual API more stable

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Walter Underwood
1. Total mastery of a product is a strange requirement. That would would be a huge trivia contest that would include all the vestigial bad bits. For example, I feel no need to master the Porter stemmer. I have no idea how to do geo search in Solr, though I'm sure I could learn it pretty quickly

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Alexandre Rafalovitch
On Sun, Jun 16, 2013 at 7:27 PM, Walter Underwood wun...@wunderwood.org wrote: 2. Someone who expects partial update in a search engine, or transactions, has a deep misunderstandings of the tradeoffs you make for what search can do. That isn't mastery of arcane details, that is search 101.

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Otis Gospodnetic
Serious thread hiJacking here Hey, why was I singled out? ;) I don't have time to get deep into this (there are non-experts I need to help! kidding...) , but I'll say this: * Do you know any non-trivial piece of software in which an average developer is a master? I've managed to master

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Lance Norskog
No, they just learned a few features and then stopped because it was good enough, and they had a thousand other things to code. As to REST- yes, it is worth having a coherent API. Solr is behind the curve here. Look at the HATEOS paradigm. It's ornate (and a really goofy name) but it provides

Re: Best way to match umlauts

2013-06-16 Thread Lance Norskog
One small thing: German u-umlaut is often flattened as 'ue' instead of 'u'. And the same with o-umlaut, it can be 'oe' or 'o'. I don't know if Lucene has a good solution for this problem. On 06/16/2013 06:44 AM, adityab wrote: Thanks for the explanation Steve. I now see it clearly. In my case

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Jack Krupansky
I won't assert total mastery as a requirement. Degrees of mastery are sufficient. But even then, even partial mastery of some rather basic areas of Solor can be quite daunting. It is enlightening to consider just how many nooks and crannies of Solr there are to master, and how many reasonable

sort=geodist() asc

2013-06-16 Thread William Bell
This simple feature of sort=geodist() asc is very powerful since it enables us to move from SOLR 3 to SOLR 4 without rewriting all our queries. We also use boost=geodist() in some cases, and some bf/bq. bf=recip(geodist(),2,200,20)sort=score

SOLR 4.3.1?

2013-06-16 Thread William Bell
When is 4.3.1 coming out? -- Bill Bell billnb...@gmail.com cell 720-256-8076