Re: Solr architecture

2016-02-08 Thread Jack Krupansky
can execute them or if they require fanout to other shards and then aggregation of results from those other shards. -- Jack Krupansky On Mon, Feb 8, 2016 at 11:24 AM, Erick Erickson wrote: > Short form: You really have to prototype. Here's the long form: > > > https://lucidwo

Re: URI is too long

2016-02-06 Thread Jack Krupansky
And you're sure that you can't use the terms query parser, which was explicitly designed for handling a very long list of terms to be implicitly ORed? -- Jack Krupansky On Sat, Feb 6, 2016 at 2:26 PM, Salman Ansari wrote: > It looked like there was another issue with my query. I

Re: indexing pdf binary stored in mongodb?

2016-02-05 Thread Jack Krupansky
://docs.mongodb.org/manual/reference/program/mongofiles/ -- Jack Krupansky On Fri, Feb 5, 2016 at 3:13 PM, Arnett, Gabriel wrote: > Anyone have any experience indexing pdfs stored in binary form in mongodb? > > . > Gabe Arnett > Senior Dir

Re: large number of fields

2016-02-05 Thread Jack Krupansky
uot;, definitely not "quite long." That said, the starting point for any data modeling effort is to look at the full range of desired queries and that should drive the data model. So, give us more info on queries, in terms of plain English descriptions of what the user is trying to achieve.

Re: implement exact match for one of the search fields only?

2016-02-04 Thread Jack Krupansky
. Besides, the general goal is to avoid app clients talking directly to Solr anyway. -- Jack Krupansky On Thu, Feb 4, 2016 at 2:57 AM, Derek Poh wrote: > Hi Erick > > << > The manual way of doing this would be to construct an elaborate query, > like q=spp_keyword_e

Re: Error configuring UIMA

2016-02-01 Thread Jack Krupansky
Yeah, that's exactly the kind of innocent user error that UIMA simply has no code to detect and reasonably report. -- Jack Krupansky On Mon, Feb 1, 2016 at 12:13 PM, Gian Maria Ricci - aka Alkampfer < alkamp...@nablasoft.com> wrote: > It was a stupid error, I've mi

Re: Error configuring UIMA

2016-02-01 Thread Jack Krupansky
does not exist. -- Jack Krupansky On Mon, Feb 1, 2016 at 10:18 AM, alkampfer wrote: > > > From: outlook_288fbf38c031d...@outlook.com > To: solr-user@lucene.apache.org > Cc: > Date: Mon, 1 Feb 2016 15:59:02 +0100 > Subject: Error configuring UIMA > > I've solv

Re: alternative forum for SOLR user

2016-02-01 Thread Jack Krupansky
Some people prefer to use Stack Overflow, but this mailing list is still the definitive "forum" for Solr users. See: http://stackoverflow.com/questions/tagged/solr -- Jack Krupansky On Mon, Feb 1, 2016 at 10:58 AM, Shawn Heisey wrote: > On 2/1/2016 1:13 AM, Jean-Jacques MONOT wr

Re: Error in UIMA, probably opencalais,

2016-02-01 Thread Jack Krupansky
At the bottom (the fine print!) it says: lineNumber: 15; columnNumber: 7; The element type "meta" must be terminated by the matching end-tag "". -- Jack Krupansky On Mon, Feb 1, 2016 at 10:45 AM, Gian Maria Ricci - aka Alkampfer < alkamp...@nablasoft.com> wrote: >

Re: Determine if Merge is triggered in SOLR

2016-01-31 Thread Jack Krupansky
://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig -- Jack Krupansky On Sun, Jan 31, 2016 at 1:59 PM, abhi Abhishek wrote: > Hi All, > any suggestions/ ideas? > > Thanks, > Abhishek > > On Tue, Jan 26, 2016 at 9:16 PM, abhi Abhishek > wrote: > > >

Re: URI is too long

2016-01-31 Thread Jack Krupansky
Or try the terms query parser that lets you eliminate all the OR operators: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser -- Jack Krupansky On Sun, Jan 31, 2016 at 9:23 AM, Paul Libbrecht wrote: > How about using POST? > > paul >

Re: Increasing maxMergedSegmentMB value

2016-01-31 Thread Jack Krupansky
of 5GB. If you want to get a lot above that, you're in uncharted territory. Besides, if you start pushing your index well above the amount of available system memory your query performance will suffer. I'd watch for the latter before pushing on the former. -- Jack Krupansky On Sun, Jan

Re: Increasing maxMergedSegmentMB value

2016-01-30 Thread Jack Krupansky
d not be possible with a limit of only 15GB. Maybe you could clue us in as to what effect you are trying to achieve. I mean, why should any app care whether segments are 10GB or 15GB? -- Jack Krupansky On Sat, Jan 30, 2016 at 6:28 PM, Shawn Heisey wrote: > On 1/30/2016 7:31 AM, Zheng Lin Edwin

Re: How much JVM should we allocate

2016-01-29 Thread Jack Krupansky
have room to expand and handle spikes. 8. Run that final config for an extended period (days) with as realistic a load as possible 9. If it too hits OOM or frequent GC, you may have to bump up the heap some more, like another 10%. -- Jack Krupansky On Fri, Jan 29, 2016 at 11:51 AM, Erick Eri

Re: Nested documents and many-many relation

2016-01-29 Thread Jack Krupansky
block must be written to a new segment. -- Jack Krupansky On Fri, Jan 29, 2016 at 5:13 AM, Sathyakumar Seshachalam < sathyakumar_seshacha...@trimble.com> wrote: > Hi, > > Am trying to investigate the possibility of using Block Join query parser > in a many-to-many

Re: implement exact match for one of the search fields only?

2016-01-28 Thread Jack Krupansky
A simple boost query (bq) might do the trick, using edismax: q=dvd bracket bq=spp_keyword_exact:"dvd bracket"^100 qf=P_VeryShortDescription P_ShortDescription P_CatConcatKeyword -- Jack Krupansky On Thu, Jan 28, 2016 at 12:49 PM, Erick Erickson wrote: > bq: if you are interested

Re: Solr cannot return result when query with # * like title:#7654321*

2016-01-28 Thread Jack Krupansky
sing curl, please post the full curl command. -- Jack Krupansky On Thu, Jan 28, 2016 at 1:03 AM, diyun2008 wrote: > The query is rather simple: > http://127.0.0.1:8080/solr/collection1/select?q=title:#7654321* > > > > > -- > View this message in context: > http://l

Re: Adding new documents to the search results and rescoring. Is it possible?

2016-01-28 Thread Jack Krupansky
would never be a need to "re" score them. Are you simply looking for a way to shift/boost the scores somehow? Again, tell us more about what you are actually trying to achieve. -- Jack Krupansky On Thu, Jan 28, 2016 at 9:52 AM, vitaly bulgakov wrote: > I have Solr 4.2. Is it p

Re: Solr cannot return result when query with # * like title:#7654321*

2016-01-27 Thread Jack Krupansky
Just be to sure, please post the lines of code or command line that you are using to issue the query. -- Jack Krupansky On Wed, Jan 27, 2016 at 10:50 PM, Yonik Seeley wrote: > On Wed, Jan 27, 2016 at 10:47 PM, diyun2008 wrote: > > Hi Yonik > > > >I do actually en

Re: unmerged index segments

2016-01-26 Thread Jack Krupansky
doc, which for Tiered is here: http://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/TieredMergePolicy.html I did doc all of these options (as of Solr 4.4) in my Solr 4.x Deep Dive e-book and I don't think much of that has changed since then: http://www.lulu.com/us/en/shop/jack-krupans

Re: unmerged index segments

2016-01-25 Thread Jack Krupansky
What exacting are you merge policy settings in solrconfig? They control when the background merges will be performed. Sometimes they do need to be tweaked. -- Jack Krupansky On Mon, Jan 25, 2016 at 1:50 PM, James Mason wrote: > Hi, > > I’ve have a large index that has been adde

Re: One complex wildcard query lead solr OOM

2016-01-24 Thread Jack Krupansky
Just escape them with a backslash. Or put each term in quotes. -- Jack Krupansky On Sun, Jan 24, 2016 at 5:21 AM, Jian Mou wrote: > Hi Jack, > > Thanks! Do you know how to disable wildcards, What I want is if input is > wildcards, just treat it as a normal char. I other words, >

Re: Taking Solr to production

2016-01-22 Thread Jack Krupansky
ll as HA availability requirements. -- Jack Krupansky On Fri, Jan 22, 2016 at 5:45 PM, Toke Eskildsen wrote: > Aswath Srinivasan (TMS) wrote: > > * Totally about 2.5 million documents to be indexed > > * Documents average size is 512 KB - pdfs and htmls > > &g

Re: Mix Solr 4 and 5?

2016-01-22 Thread Jack Krupansky
To be clear, having separate Solr servers on different versions should definitely not be a problem. The only potential difficulty here is the SolrJ vs. server back-compat issue. -- Jack Krupansky On Fri, Jan 22, 2016 at 10:57 AM, wrote: > Shawn wrote: > > > > If you are NOT ru

Re: Mix Solr 4 and 5?

2016-01-22 Thread Jack Krupansky
nts aren't using any new features there would be a reasonable expectation that they should continue to work. -- Jack Krupansky On Fri, Jan 22, 2016 at 10:40 AM, wrote: > Yeah, sort of. Solr isn't bundled in the CMS, it is in a separate Tomcat > instance. But our code is running

Re: Mix Solr 4 and 5?

2016-01-22 Thread Jack Krupansky
), the app should work fine. So... if you stick with SolrJ 4 and use the Solr 4 doc as your guide, you should be okay. That's the theory. Worst case, you would have to deploy a Solr 4 server. That's not the preferred choice, but is a decent backup plan. -- Jack Krupansky On Fri, Jan 22, 201

Re: Mix Solr 4 and 5?

2016-01-22 Thread Jack Krupansky
Just to be clear, are you talking about a single app that does SolrJ calls to both your CMS and your free text search index? So, one Java app that is simultaneously sending requests to two Solr instances (once 4, one 5)? -- Jack Krupansky On Fri, Jan 22, 2016 at 1:57 AM, wrote: > Hi, >

Re: One complex wildcard query lead solr OOM

2016-01-21 Thread Jack Krupansky
complex wildcard is used - should an exception be thrown, or... what? I suppose it might be simplest to have a Solr option to limit the number of wildcard characters used in a term, like to 4 or 8 or something like that. IOW, have Solr check the term before the WildcardQuery is generated. -- Jack

Re: Couple of question about Virtualization and Load Balancer

2016-01-21 Thread Jack Krupansky
issue for Solr. The only issue there is assuring that you have enough Solr shards and replicas to handle the aggregate request load. -- Jack Krupansky On Thu, Jan 21, 2016 at 6:37 AM, Gian Maria Ricci - aka Alkampfer < alkamp...@nablasoft.com> wrote: > Hi, > > > > I’ve

Re: Returning all documents in a collection

2016-01-20 Thread Jack Krupansky
te the doc for this stored field restriction, right?!) -- Jack Krupansky On Wed, Jan 20, 2016 at 9:38 AM, Joel Bernstein wrote: > CloudSolrStream is available in Solr 5. The "search" streaming expression > can used or CloudSolrStream can be used in directly. > > https://cwi

Re: Returning all documents in a collection

2016-01-20 Thread Jack Krupansky
ients that automatically send requests to all the shards in a collection (or multiple collections) and then merge the sorted sets any way they wish." -- Jack Krupansky On Wed, Jan 20, 2016 at 8:41 AM, Susheel Kumar wrote: > Hello Salman, > > Please checkout the export fu

Re: Solr Block join not working after parent update

2016-01-15 Thread Jack Krupansky
ogether.*" They must also be updated together. -- Jack Krupansky On Fri, Jan 15, 2016 at 3:31 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > On Thu, Jan 14, 2016 at 10:01 PM, sairamkumar < > sairam.subraman...@gmail.com> > wrote: > > > This is a

Re: Issue with stemming and lemmatizing

2016-01-15 Thread Jack Krupansky
. Plenty of doc for you to start reading. Once you get the basics, then you can move on to more specific and advanced details: https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers%2C+Tokenizers%2C+and+Filters -- Jack Krupansky On Fri, Jan 15, 2016 at 2:58 PM, sara hajili

Re: Speculation on Memory needed to efficently run a Solr Instance.

2016-01-15 Thread Jack Krupansky
the entire index. If you actually don't need minimal latency, then of course you can feel free to trade off RAM for lower latency. -- Jack Krupansky On Fri, Jan 15, 2016 at 4:43 AM, Gian Maria Ricci - aka Alkampfer < alkamp...@nablasoft.com> wrote: > Hi, > > > > When it

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-15 Thread Jack Krupansky
shard, let alone all shards. Should backups be collection-based as well? -- Jack Krupansky On Fri, Jan 15, 2016 at 3:26 AM, Gian Maria Ricci - aka Alkampfer < alkamp...@nablasoft.com> wrote: > Yes, I've checked that jira some weeks ago and it is the reason why I was > telling

Re: Solr Query Tuning

2016-01-14 Thread Jack Krupansky
ng parsing of the query) to send the request to exactly the node (or replica) that owns that token/ID. But if you really just trying to "query by ID", that should really have a nice clean API so you don't have to build query syntax. -- Jack Krupansky On Thu, Jan 14, 2016 at 8:41 P

Re: Solr Query Tuning

2016-01-14 Thread Jack Krupansky
although even that should not be a big problem. And make sure the ID field is string or numeric, not tokenized text. -- Jack Krupansky On Thu, Jan 14, 2016 at 7:53 PM, Shawn Heisey wrote: > On 1/14/2016 5:20 PM, Shivaji Dutta wrote: > > I am working with a customer that has abou

Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Jack Krupansky
Which release of Solr are you using? Last year (or so) there was a Lucene change that had the effect of keeping all terms for WDF at the same position. There was also some discussion about whether this was either a bug or a bug fix, but I don't recall any resolution. -- Jack Krupansky O

Re: &fq degrades qtime in a 20million doc collection

2016-01-14 Thread Jack Krupansky
That sounds like it. Sorry my memory is so hazy. Maybe Yonik can either confirm that that Jira is still outstanding or close it, and confirm if these symptoms are related. -- Jack Krupansky On Thu, Jan 14, 2016 at 10:54 AM, Erick Erickson wrote: > Jack: > > I think that was for facet

Re: Monitor backup progress when location parameter is used.

2016-01-14 Thread Jack Krupansky
t" indicates success or "Exception while creating snapshot" indicates failure. If only that first message appeals, it means the backup is still in progress. -- Jack Krupansky On Thu, Jan 14, 2016 at 9:23 AM, Gian Maria Ricci - aka Alkampfer < alkamp...@nablasoft.com> wro

Re: &fq degrades qtime in a 20million doc collection

2016-01-13 Thread Jack Krupansky
I recall a couple of previous discussions regarding some sort of filter/field cache change in Lucene where they removed what had been an optimization for Solr. -- Jack Krupansky On Wed, Jan 13, 2016 at 8:10 PM, Erick Erickson wrote: > It's quite surprising that you're getting

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-13 Thread Jack Krupansky
e considered a fresh new distributed Solr deployment with anything other than SolrCloud. (Hmmm... have any of the committers considered deprecating the old non-SolrCloud distributed mode features?) -- Jack Krupansky On Wed, Jan 13, 2016 at 9:02 AM, Shivaji Dutta wrote: > - SolrCloud uses

Re: Dynamically Adding query parameters in my custom Request Handler class

2016-01-09 Thread Jack Krupansky
and invest significant effort in a custom request handler when simpler techniques may suffice. -- Jack Krupansky On Sat, Jan 9, 2016 at 12:08 PM, Ahmet Arslan wrote: > Hi Mark, > > Yes this is possible. Better, you can use a custom SearchComponent for > this task too. > You retri

Re: Query behavior difference.

2016-01-06 Thread Jack Krupansky
ption.*" So that's a second reason - to avoid the max clause count limitation of Boolean Query. See: https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/MultiTermQuery.html#CONSTANT_SCORE_REWRITE https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/WildcardQuery

Re: Count multivalued field issue

2016-01-06 Thread Jack Krupansky
ork, be sure to provide detail of what the symptom is rather than simply saying that it doesn't work. -- Jack Krupansky On Wed, Jan 6, 2016 at 8:43 AM, marotosg wrote: > Hi, > > I am trying to add a new field to my schema to add the number of items of a > multivalued field. &g

Re: Many patterns against many sentences, storing all results

2016-01-05 Thread Jack Krupansky
://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html -- Jack Krupansky On Tue, Jan 5, 2016 at 11:05 AM, Allison, Timothy B. wrote: > Might want to look into: > > https://github.com/flaxsearch/luwak > > or > https://github.com/OpenSextant/Solr

Re: Multiple solr instances on one server

2016-01-04 Thread Jack Krupansky
ctory should contain a solr.xml file, unless solr.xml exists in ZooKeeper. The default value is server/solr. " https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference -- Jack Krupansky On Mon, Jan 4, 2016 at 10:28 AM, Mugeesh Husain wrote: > you could start solr

Re: Issue with if() statement

2015-12-31 Thread Jack Krupansky
need function queries there as well. -- Jack Krupansky On Thu, Dec 31, 2015 at 6:50 PM, William Bell wrote: > We are getting weird results with if(exists(a),b,c). We are getting b+c!! > > > http://localhost:8983/solr/providersearch/select?q=*:*&wt=json&state=state:%22CO%22&stat

Re: Adding the same field value question

2015-12-28 Thread Jack Krupansky
Is the field multivalued? -- Jack Krupansky On Sun, Dec 27, 2015 at 11:16 PM, Jamie Johnson wrote: > What is the difference of adding a field with the same value twice or > adding it once and boosting the field on add? Is there a situation where > one approach is preferred? > > Jamie >

Re: Changing Solr Schema with Data

2015-12-28 Thread Jack Krupansky
abase. Was someone telling you something different? -- Jack Krupansky On Mon, Dec 28, 2015 at 1:48 PM, Salman Ansari wrote: > Hi, > > I am facing an issue where I need to change Solr schema but I have crucial > data that I don't want to delete. Is there a way where I can chan

Re: Best practices on monitoring Solr

2015-12-23 Thread Jack Krupansky
itself (other than raw JMX and ping.) -- Jack Krupansky On Wed, Dec 23, 2015 at 6:27 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Shail, > As William mentioned, our SPM <https://sematext.com/spm/index.html> > allows you to monitor all main Solr/Jvm/Host me

Re: How to check when a search exceeds the threshold of timeAllowed parameter

2015-12-22 Thread Jack Krupansky
, making more copies of the index that can each be searched in parallel. How long do queries take when the site is operating normally? Make sure that you have enough system memory to cache the index, otherwise the machine wish be thrashing with lots of I/O for competing requests. -- Jack Krupansky On

Re: Schema/Index design for disparate data sources (Federated / Google like search)

2015-12-22 Thread Jack Krupansky
formance is consumed when you have a lot of fields which are not present for a particular data source. -- Jack Krupansky On Tue, Dec 22, 2015 at 11:25 AM, Susheel Kumar wrote: > Hello, > > I am going thru few use cases where we have kind of multiple disparate data > sources which in

Re: While idexing millions of data Getting error

2015-12-18 Thread Jack Krupansky
the exact practical limit depends on your particular hardware and your particular data model and the data itself. How large is each document, roughly? Hundreds, thousands, or millions of bytes? Are some documents extremely large? -- Jack Krupansky On Fri, Dec 18, 2015 at 10:30 AM, Toke Eskild

Re: Slow query response.

2015-12-17 Thread Jack Krupansky
or to return a large bulk of documents? -- Jack Krupansky On Thu, Dec 17, 2015 at 7:01 AM, Modassar Ather wrote: > Hi, > > I have a field f which is defined as follows. > omitNorms="true"/> > > Solr-5.2.1 is used. The index is spread across 12 shards (no replic

Re: Append fields to a document

2015-12-16 Thread Jack Krupansky
update has various caveats so that it is only useful in a subset of use cases. -- Jack Krupansky On Wed, Dec 16, 2015 at 10:09 AM, Jamie Johnson wrote: > I have a use case where we only need to append some fields to a document. > To retrieve the full representation is very expensive but I can

Re: Solr High Availability

2015-12-15 Thread Jack Krupansky
There is no HA with a single replica for each shard. Replication factor must be at least 2 for HA. -- Jack Krupansky On Wed, Dec 16, 2015 at 12:38 AM, Peter Tan wrote: > Hi Jack, What happens when there is only one replica setup? > > On Tue, Dec 15, 2015 at 9:32 PM, Jack Krupansky

Re: Solr High Availability

2015-12-15 Thread Jack Krupansky
Solr Cloud provides HA when you configure at least two replicas for each shard and have at least 3 zookeepers. That's it. No deck or detail document is needed. -- Jack Krupansky On Tue, Dec 15, 2015 at 9:07 PM, wrote: > Hi Team, > > Can you help me in understanding in achieving

Re: Partial sentence match with block join

2015-12-15 Thread Jack Krupansky
ink of the company as being named "Apple Computer" even though they dropped "Computer" from the name back in 2007. Also, it is "Inc.", not "Company", so a proper search would be for "Apple Inc." or the old "Apple Computer, Inc." -- Jack Kr

Re: similarity as a parameter

2015-12-15 Thread Jack Krupansky
same things as well. -- Jack Krupansky On Tue, Dec 15, 2015 at 2:42 PM, Chris Hostetter wrote: > > : Sweetspot does require reindexing but is that the only one? I have not > : investigated some exotic implementations, anyone to confirm sweetspot is > : the only one? In that case you

Re: similarity as a parameter

2015-12-15 Thread Jack Krupansky
You would need to define an alternate field which copied a base field but then had the desired alternate similarity, using SchemaSimilarityFactory. See: https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements -- Jack Krupansky On Tue, Dec 15, 2015 at 10:02 AM, Dmitry Kan wrote

Re: Help Indexing Large File

2015-12-14 Thread Jack Krupansky
and then index the raw text. -- Jack Krupansky On Mon, Dec 14, 2015 at 12:04 PM, Antelmo Aguilar wrote: > Hello, > > I am trying to index a very large file in Solr (around 5GB). However, I > get out of memory errors using Curl. I tried using the post script and I > had some

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

2015-12-11 Thread Jack Krupansky
in a separate table (use the same partition key to assure that the join will be more efficient by being on the same node.) -- Jack Krupansky On Fri, Dec 11, 2015 at 6:21 AM, Andrea Gazzarini wrote: > Hi Vikram, > sounds like you're using those "dynamic" fields only for visua

Re: Unstructured/Structured data for indexing

2015-12-09 Thread Jack Krupansky
You can also use Solr Cell to send entire PDF or office documents: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika -- Jack Krupansky On Wed, Dec 9, 2015 at 3:09 AM, subinalex wrote: > Hi, > > I am a solr newbie,just got a quick

Re: capacity of storage a single core

2015-12-09 Thread Jack Krupansky
monly. And, yes, each app has its own latency requirements. The purpose of a general rule is to generally avoid unhappiness, but if you have an appetite and tolerance for unhappiness, then go for it. Replica vs. shard? They're basically the same - a replica is a copy of a shard. -- Jack Kr

Re: capacity of storage a single core

2015-12-08 Thread Jack Krupansky
constantly re-read portions of the index into memory. The practical limit for documents is not per core or number of cores but across all cores on the node since it is mostly a memory limit and the available CPU resources for accessing that memory. -- Jack Krupansky On Tue, Dec 8, 2015 at 8:57 AM

Re: Stop adding content in Solr through /update URL

2015-12-04 Thread Jack Krupansky
Never made it into CHANGES.txt either. Not part of any patch either. Appears to have been secretly committed as a part of SOLR-6787 (Blob API) via Revision *1650448 <http://svn.apache.org/viewvc?view=revision&revision=1650448>* in Solr 5.1. -- Jack Krupansky On Fri, Dec 4, 2015 a

Re: Synonyms in Search Results and More Accurate Matches

2015-12-01 Thread Jack Krupansky
recall (even the most remote partial match to avoid missing any documents) with a much higher boost for exact matches. -- Jack Krupansky On Tue, Dec 1, 2015 at 10:10 AM, Erik Hatcher wrote: > One technique that works well is to use copyField to end up with two > indexed fields, on

Re: Difference in query behavior.

2015-11-30 Thread Jack Krupansky
The mm parameter or default operator logic only applies to the top level of the query. Once you get nested in parentheses below the top level, Solr/Lucene reverts to the default of the OR (SHOULD) operator. -- Jack Krupansky On Mon, Nov 30, 2015 at 5:45 AM, Modassar Ather wrote: > Hi, &g

Re: [Edismax] * escaping

2015-11-25 Thread Jack Krupansky
Yeah, this stuff is poorly documented, not very intuitive, and the terminology is poorly designed in the first place, so it's completely expected to easily get confused by it. Not even a mention of it in the Solr reference guide. -- Jack Krupansky On Wed, Nov 25, 2015 at 4:39 AM, Aless

Re: Range Query on a language specific field

2015-11-24 Thread Jack Krupansky
x27;m not sure how useful it will be. -- Jack Krupansky On Tue, Nov 24, 2015 at 4:06 AM, Manohar Sripada wrote: > I have a requirement where I need to be able to query on a field (say > "salary"). This field contains data in Chinese. > > Is it possible in Solr to do a ra

Re: Querying nested datastructures

2015-11-24 Thread Jack Krupansky
The primary recommendation is that you flatten nested documents. That means one Solr document per cpc, not multivalued. As always, queries should drive your data model, so please specify what a typical query might be like, in plain English. -- Jack Krupansky On Tue, Nov 24, 2015 at 4:39 AM

Re: Search with very large boolean filter

2015-11-20 Thread Jack Krupansky
IDs in use during a particular interval of time? -- Jack Krupansky On Fri, Nov 20, 2015 at 4:50 PM, jichi wrote: > Hi, > > I am using Solr 4.7.0 to search text with an id filter, like this: > > id:(100 OR 2 OR 5 OR 81 OR 10 ...) > > The number of IDs in the boolean fi

Re: RealTimeGetHandler doesn't retrieve documents

2015-11-19 Thread Jack Krupansky
Do the failing IDs have any special characters that might need to be escaped? Can you find the documents using a normal query on the unique key field? -- Jack Krupansky On Thu, Nov 19, 2015 at 10:27 AM, Jérémie MONSINJON < jeremie.monsin...@gmail.com> wrote: > Hello everyone ! >

Re: Shards and Replicas

2015-11-18 Thread Jack Krupansky
per shard. But be aware that a query for the sharded version will be slower than for a single-shard implementation. -- Jack Krupansky On Wed, Nov 18, 2015 at 11:02 PM, Troy Edwards wrote: > I am looking for some good articles/guidance on how to determine number of > shards and replicas

Re: Arabic analyser

2015-11-09 Thread Jack Krupansky
Use an index-time (but not query time) synonym filter with a rule like: Abd Allah,Abdallah This will index the combined word in addition to the separate words. -- Jack Krupansky On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem wrote: > Hello, > > We are indexing Arabic content and

Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-06 Thread Jack Krupansky
if that is a lot faster, both with old and new Solr. -- Jack Krupansky On Fri, Nov 6, 2015 at 3:01 PM, wei wrote: > Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if > the slowness of MatchAllDocsQuery is also caused by the removal of > fieldcache. Can someo

Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-05 Thread Jack Krupansky
I vaguely recall some discussion concerning removal of the field cache in Lucene. -- Jack Krupansky On Thu, Nov 5, 2015 at 10:38 PM, wei wrote: > We are running our search on solr4.7 and I am evaluating whether to upgrade > to solr5.3.1. I found MatchAllDocsQuery is much slower in sol

Re: Invalid parsing with solr edismax operators

2015-11-05 Thread Jack Krupansky
Great. Now, we'll have to see if any enterprising committers will step up and take a look. -- Jack Krupansky On Thu, Nov 5, 2015 at 4:46 AM, Mahmoud Almokadem wrote: > Thanks Jack. I have reported it as a bug on JIRA > > https://issues.apache.org/jira/browse/SOLR-

Re: Solr Features

2015-11-05 Thread Jack Krupansky
ittle outdated (since 4.4) and even then was not complete (no SolrCloud or DIH), but its table of contents would probably give you a fair view of the sheer magnitude of the number of Solr features: http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548

Re: Invalid parsing with solr edismax operators

2015-11-04 Thread Jack Krupansky
I think you should go ahead and file a Jira ticket for this as a bug since either it is an actual bug or some behavior nuance that needs to be documented better. -- Jack Krupansky On Wed, Nov 4, 2015 at 8:24 AM, Mahmoud Almokadem wrote: > I removed the q.op=“AND” and add the mm=2 >

Re: Invalid parsing with solr edismax operators

2015-11-04 Thread Jack Krupansky
top-level parentheses is causing the query parser logic to act as if the parentheses were not there. You neglected to give us your qf parameter, but obviously it is: qf=Title^200.0 TotalField, I think. -- Jack Krupansky On Wed, Nov 4, 2015 at 3:39 AM, Mahmoud Almokadem wrote: > Hello, >

Re: Stem Words Highlighted - Keyword Not Highlighted

2015-10-29 Thread Jack Krupansky
Did you index the data before adding the word delimiter filter? The white space tokenizer preserves the period after "stocks.", but the WDF should remove it. The period is likely interfering with stemming. Are your filters the same for index time and query time? -- Jack Krupansky On T

Re: language plugin

2015-10-29 Thread Jack Krupansky
Are you trying to do an atomic update without the content field? If so, it sounds like Solr needs an enhancement (bug fix?) so that language detection would be skipped if the input field is not present. Or maybe that could be an option. -- Jack Krupansky On Thu, Oct 29, 2015 at 3:25 AM, Chaushu

Re: Two seperate intance of Solr on the same machine

2015-10-26 Thread Jack Krupansky
Each instance should be installed in a separate directory. IOW, don't try running multiple Solr processes for the same data. -- Jack Krupansky On Mon, Oct 26, 2015 at 1:33 PM, Steven White wrote: > Hi, > > For reasons I have no control over, I'm required to run 2 (maybe m

Re: Does docValues impact termfreq ?

2015-10-24 Thread Jack Krupansky
o use Solr in a way other than it was intended. -- Jack Krupansky On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh wrote: > Gotcha - that's disheartening. > > One idea: when I run termfreq, I get all of the termfreqs for each document > one-by-one. > > Is there a way to h

Re: Does docValues impact termfreq ?

2015-10-23 Thread Jack Krupansky
about your usage? Generally, moderate use of a feature is much more advisable to heavy usage, unless you don't care about performance. -- Jack Krupansky On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh wrote: > Hello, > > In our solr application, we use a Function Query (termfreq) very

Re: getting cached terms inside UpdateRequestProcessor...

2015-10-22 Thread Jack Krupansky
only we know what your problem really was. -- Jack Krupansky On Thu, Oct 22, 2015 at 11:18 AM, Roxana Danger < roxana.dan...@reedonline.co.uk> wrote: > Hi Erik, > > Thanks for the links, but the analyzers are called correctly. The problem > is that I need to get access to the

Re: Blob store, blob size & storage mechanism

2015-10-20 Thread Jack Krupansky
I checked the code and the limit is actually 5MB and configurable via the blob.max.size.mb config property. I posted a comment on the Solr doc for this. In any case, thanks for sharing info that you gleaned from the conference, for all of us who couldn't make it. -- Jack Krupansky On Tue

Re: Blob store, blob size & storage mechanism

2015-10-20 Thread Jack Krupansky
ard to specify the common prefix for the files. -- Jack Krupansky On Tue, Oct 20, 2015 at 8:19 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > No, the maximum size is limited to 2MB for now. The use-case behind > the blob store is to store small jars (custom plugins) and s

Re: Lucene Revolution ?

2015-10-18 Thread Jack Krupansky
networking. In any case, keep those user reports flowing. I'm sure there are plenty of people who didn't make it to the conference. -- Jack Krupansky On Sun, Oct 18, 2015 at 8:52 AM, Erik Hatcher wrote: > The Revolution was not televised (though heavily tweeted, and videos

Re: Efficiency of integer storage/use

2015-10-18 Thread Jack Krupansky
ctual algorithm, but the effect for a bunch of the common use cases. -- Jack Krupansky On Sun, Oct 18, 2015 at 10:18 AM, Erick Erickson wrote: > On the surface this seems like something of a distraction. > > 10M docs x 100 values/docs = 1B integers. Assuming all > need to be held in m

Re: catchall fields or multiple fields

2015-10-13 Thread Jack Krupansky
initial query comes up empty, then you could move on to the next highest most likely field, maybe product title (short one line description), and query voluminous fields like detailed product descriptions, specifications, and user comments/reviews only as a last resort. -- Jack Krupansky On Tue

Re: catchall fields or multiple fields

2015-10-12 Thread Jack Krupansky
actual product names or important keywords rather than random words from the English language that happen to occur in descriptions, all of which would occur in a catchall. -- Jack Krupansky On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoit wrote: > Hello, > > We're using solr 4.10 a

Re: Reverse query?

2015-10-03 Thread Jack Krupansky
uot;. Including specific examples. -- Jack Krupansky On Fri, Oct 2, 2015 at 9:33 AM, remi tassing wrote: > Hi, > I have medium-low experience on Solr and I have a question I couldn't quite > solve yet. > > Typically we have quite short query strings (a couple of words) and th

Re: Solr vs Lucene

2015-10-02 Thread Jack Krupansky
same machine as the Lucene/Solr index directory. -- Jack Krupansky On Fri, Oct 2, 2015 at 7:42 AM, Mark Fenbers wrote: > Thanks for the suggestion, but I've looked at aspell and hunspell and > neither provide a native Java API. Further, I already use Solr for a > search engine, to

Re: Keyword match distance rule issue

2015-09-30 Thread Jack Krupansky
nalyzed as if it were simple text. -- Jack Krupansky On Wed, Sep 30, 2015 at 9:32 AM, anil.vadhavane wrote: > Hi Benedetti, > > Yes, at first it looks like a user error and I am surprised as well with > the > case. > > We tested this on two different system. We tried it wi

Re: modular QueryParser in contrib

2015-09-21 Thread Jack Krupansky
/flexible/standard/StandardQueryParser.html -- Jack Krupansky On Mon, Sep 21, 2015 at 6:57 AM, Jack Krupansky wrote: > Probably a reference to the so-called flex query parser: > > https://lucene.apache.org/core/4_10_0/queryparser/org/apache/lucene/queryparser/flexible

Re: modular QueryParser in contrib

2015-09-21 Thread Jack Krupansky
-summary.html The original Jira: https://issues.apache.org/jira/browse/LUCENE-1567 This new query parser was dumped into Lucene some years ago, but I haven't noticed any real activity or interest in it. -- Jack Krupansky On Mon, Sep 21, 2015 at 6:36 AM, Dmitry Kan wrote: > Hello! >

Re: filling multiple fields with one analyzer

2015-09-09 Thread Jack Krupansky
An update request processor is a preferred approach - take the source value, split it, and create separate source values for each of the associated fields. -- Jack Krupansky On Wed, Sep 9, 2015 at 3:30 AM, Roxana Danger < roxana.dan...@reedonline.co.uk> wrote: > Hello, > I have

<    1   2   3   4   5   6   7   8   9   10   >