Re: SOLR indexing strategy

2015-03-20 Thread Jack Krupansky
have a slice of the fields. Then separate Solr clusters could be used for each of the slices. -- Jack Krupansky On Fri, Mar 20, 2015 at 7:12 AM, varun sharma wrote: > Requirements of the system that we are trying to build are for each date > we need to create a SOLR index containing abo

Re: Solr Unexpected Query Parser Exception

2015-03-20 Thread Jack Krupansky
Which query parser are you using? The dismax query parser does not support wild cards or "*:*". Either way, the error message is unhelpful - worth filing a Jira. -- Jack Krupansky On Fri, Mar 20, 2015 at 7:21 AM, Vishnu Mishra wrote: > Hi, I am using solr 4.10.3 and doing dist

Re: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread Jack Krupansky
. I think it's worth a Jira - text types should use language codes, not country codes. -- Jack Krupansky On Tue, Mar 17, 2015 at 1:35 PM, Eduard Moraru wrote: > Hi, > > First of all, a bit of a disclaimer: I am not a Czech language speaker, at > all. > > We are using Sol

Re: Re[2]: discrepancy between LuceneQParser and ExtendedDismaxQParser

2015-03-17 Thread Jack Krupansky
Great, glad to hear it! One last question: What release of Solr are you using? -- Jack Krupansky On Tue, Mar 17, 2015 at 11:43 AM, Arsen wrote: > Hello Jack, > > Jack, you made "my day" for me. > > Indeed, when I inserted space between "(" and "*:*

Re: discrepancy between LuceneQParser and ExtendedDismaxQParser

2015-03-16 Thread Jack Krupansky
There was a Solr release with a bug that required that you put a space between the left parenthesis and the "*:*". The edismax parsed query here indicates that the "*:*" has not parsed properly. You have "area", but in your jira you had a range query. -- Jack Krupan

Re: Distributed IDF performance

2015-03-13 Thread Jack Krupansky
Oops... I said "StatsInfo" and that should have been "StatsCache" (""). -- Jack Krupansky On Fri, Mar 13, 2015 at 6:04 PM, Anshum Gupta wrote: > There's no rough formula or performance data that I know of at this point. > About he guidance, if you wa

Re: Parsing error on space

2015-03-13 Thread Jack Krupansky
sted query term with "\u0020". -- Jack Krupansky On Fri, Mar 13, 2015 at 2:37 AM, Rajesh wrote: > Hi, > > I want to retrieve the parent document which contain "Test Street" in > street > field or if any of it's child contain "Test Street" in

Distributed IDF performance

2015-03-13 Thread Jack Krupansky
le now using Distributed IDF as their default? I'm not currently using this, but the existing doc and Jira is too minimal to offer guidance as requested above. Mostly I'm just curious. Thanks. -- Jack Krupansky

Re: DocumentAnalysisRequestHandler

2015-03-12 Thread Jack Krupansky
citly registered (refer to SOLR-6792)*". IOW, remove the XML element from your solrconfig. As far as the document analysis request handler, that should still be fine. Are you encountering some problem? The first log line you gave is just an INFO - information only, not a problem. -- Jack Krupans

Re: Search over a multiValued field

2015-03-03 Thread Jack Krupansky
just trying to match the product name and availability. -- Jack Krupansky On Tue, Mar 3, 2015 at 4:51 PM, Tom Devel wrote: > Hi, > > I am running Solr 5.0.0 and have a question about proximity search and > multiValued fields. > > I am indexing xml files of the following form

Re: Encrypt Data in SOLR

2015-02-27 Thread Jack Krupansky
You could simply hash the value before sending it to Solr and then hash the user query before sending it to Solr as well. Do you need or want only exact matches, or do you need keyword search, wildcards, etc? -- Jack Krupansky On Fri, Feb 27, 2015 at 4:38 PM, Alexandre Rafalovitch wrote

Re: Leading Wildcard Support (ReversedWildcardFilterFactory)

2015-02-26 Thread Jack Krupansky
Most of the magic is done internal to the query parser which actually inspects the index analyzer chain when a leading wildcard is present. Look at the parsed_query in the debug response, and you should see that special prefix query. -- Jack Krupansky On Thu, Feb 26, 2015 at 3:49 PM, jaime

Re: Leading Wildcard Support (ReversedWildcardFilterFactory)

2015-02-26 Thread Jack Krupansky
Please post your field type... or at least confirm a comparison to the example in the javadoc: http://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html -- Jack Krupansky On Thu, Feb 26, 2015 at 2:38 PM, jaime spicciati wrote: > All, >

Re: qt.shards in solrconfig.xml

2015-02-26 Thread Jack Krupansky
s the qt.shards parameter as suggested, to re-emphasize to people that if they want to use a custom handler in distributed mode, then they will most likely need this parameter. -- Jack Krupansky On Thu, Feb 26, 2015 at 11:28 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Hel

Re: Unable to find query result in solr 5.0.0

2015-02-26 Thread Jack Krupansky
. Please confirm which doc you were reading for the tutorial steps. -- Jack Krupansky On Thu, Feb 26, 2015 at 6:17 AM, rupak wrote: > Hi, > > I am new in Solr and using Solr 5.0.0 search server. After installing when > I’m going to search any keyword in solr 5.0.0 it dose not give any re

Re: Add fields without manually editing Schema.xml.

2015-02-25 Thread Jack Krupansky
Solr also now has a schema API to dynamically edit the schema without the need to manually edit the schema file: https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaDynamicFieldRule -- Jack Krupansky On Wed, Feb 25, 2015 at 3:15 PM, Vishal Swaroop wrote: > Thanks a

Re: Problem with queries that includes NOT

2015-02-25 Thread Jack Krupansky
As a general proposition, your first stop with any query interpretation questions should be to add the debigQuery=true parameter and look at the parsed_query in the query response which shows how the query is really interpreted. -- Jack Krupansky On Wed, Feb 25, 2015 at 8:21 AM, wrote: >

Re: Special character and wildcard matching

2015-02-24 Thread Jack Krupansky
It's a string field, so there shouldn't be any analysis. (read back in the thread for the field and field type.) -- Jack Krupansky On Tue, Feb 24, 2015 at 3:19 PM, Alexandre Rafalovitch wrote: > What happens if the query does not have wildcard expansion (*)? If the > behavior

Re: Special character and wildcard matching

2015-02-24 Thread Jack Krupansky
u provided in this thread. -- Jack Krupansky On Tue, Feb 24, 2015 at 2:35 PM, Arun Rangarajan wrote: > Exact query: > /select?q=raw_name:beyonce*&wt=json&fl=raw_name > > Response: > > { "responseHeader": {"status": 0,"QTime": 0,

Re: Special character and wildcard matching

2015-02-24 Thread Jack Krupansky
Please post the info I requested - the exact query, and the Solr response. -- Jack Krupansky On Tue, Feb 24, 2015 at 12:45 PM, Arun Rangarajan wrote: > In our case, the lower-casing is happening in a custom Java indexer code, > via Java's String.toLowerCase() method. > > I

Re: Special character and wildcard matching

2015-02-23 Thread Jack Krupansky
eyword tokenizer and then filter it for lower case, such as when the user query might have a capital "B". String field is most appropriate when the field really is 100% raw. -- Jack Krupansky On Mon, Feb 23, 2015 at 7:37 PM, Arun Rangarajan wrote: > Yes, it is a string field and not

Re: Special character and wildcard matching

2015-02-23 Thread Jack Krupansky
Is it really a string field - as opposed to a text field? Show us the field and field type. Besides, if it really were a "raw" name, wouldn't that be a capital "B"? -- Jack Krupansky On Mon, Feb 23, 2015 at 6:52 PM, Arun Rangarajan wrote: > I have a string fi

Re: more like this and term vectors

2015-02-23 Thread Jack Krupansky
It's never helpful when you merely say that it "did not work" - detail the symptom, please. Post both the query and the response. As well as the field and type definitions for the fields for which you expected term vectors - no term vectors are enabled by default. -- Jack Krupans

Re: edismax removes query string: (pg_int:-1) becomes ()

2015-02-21 Thread Jack Krupansky
he edismax query parser has a few too many parsing heuristics, causing way too many odd combinations that are not exhaustively tested. -- Jack Krupansky On Sat, Feb 21, 2015 at 5:43 PM, Tang, Rebecca wrote: > Hi there, > > I have a field pg_int which is number of pages stored as intege

Re: How to achieve lemmatization for english words in Solr 4.10.2

2015-02-18 Thread Jack Krupansky
Please provide a few examples that illustrate your requirements. Specifically, requirements that are not met by the existing Solr stemming filters. What is your specific goal? -- Jack Krupansky On Wed, Feb 18, 2015 at 10:50 AM, dinesh naik wrote: > Hi, > IS there a way to achieve lemmati

Re: AND query not working on stopwords as expected

2015-02-16 Thread Jack Krupansky
ueries with operators and the case of a leading or trailing stopword. The old Lucid query parser did have better support for queries with stop words, but that's no longer available in their current product. -- Jack Krupansky On Mon, Feb 16, 2015 at 8:16 PM, Alexandre Rafalovitch wrote:

Re: AND query not working on stopwords as expected

2015-02-16 Thread Jack Krupansky
time when they are not at either end of the query. This way, queries such as "to be or not to be", "vitamin a", and "the office" can still provide meaningful and precise matches even as stop words are generally ignored. -- Jack Krupansky On Mon, Feb 16, 2015 at 4

Re: Solr 4.8.1 : Response Code 500 when creating the new request handler

2015-02-15 Thread Jack Krupansky
t in invariants, but also in the actual request, which is a contradiction in terms - what is your actual intent? This isn't the cause of the exception, but does raise questions of what you are trying to do. 4. Why don't you have a q parameter for the actual query? -- Jack Krupansky On

Re: Question about session affinity and SolrCloud

2015-02-14 Thread Jack Krupansky
oss users, so a given query is likely to have been queried recently by another user. -- Jack Krupansky On Sat, Feb 14, 2015 at 3:39 PM, jaime spicciati wrote: > All, > This is my current understanding of how SolrCloud load balancing works... > > Within SolrCloud, for a cluster with more

Re: Solr - Mahout

2015-02-13 Thread Jack Krupansky
There is no recommendation built into Solr itself, but you might get some good ideas from this presentation: http://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine -- Jack Krupansky On Fri, Feb 13, 2015 at 8:33 AM, wrote: > Sir , >I need to kno

Re: 43sec commit duration - blocked by index merge events?

2015-02-13 Thread Jack Krupansky
that soft commit waits for background merges! (Hoss??) -- Jack Krupansky On Fri, Feb 13, 2015 at 4:47 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Check > http://search-lucene.com/?q=commit+wait+block&fc_type=mail+_hash_+user > > e.g. http://search-

Re: Multy-tenancy and quarantee of service per application (tenant)

2015-02-12 Thread Jack Krupansky
tenant has their own app and the service provider controls the Solr server but has no control over the app or load. The first is supported by Solr. The second is not, other than the service provider spinning up separate instances of Solr on separate physical servers. -- Jack Krupansky On Thu

Re: Exception while loading 2 Billion + Documents in Solr 4.8.0

2015-02-11 Thread Jack Krupansky
this front? -- Jack Krupansky On Wed, Feb 11, 2015 at 8:05 AM, Erick Erickson wrote: > bq: Are there any such structures? > > Well, I thought there were, but I've got to admit I can't call any to mind > immediately. > > bq: 2b is just the hard limit > > Yeah,

Re: Exception while loading 2 Billion + Documents in Solr 4.8.0

2015-02-04 Thread Jack Krupansky
l not be a matter of how many documents you can load, but whether the query response latency for those documents is sufficient. -- Jack Krupansky On Wed, Feb 4, 2015 at 4:54 PM, Arumugam, Suresh wrote: > Hi All, > > > > We are trying to load 14+ Billion documents into Solr. But we a

Re: Where can we set the parameters in Solr Config?

2015-02-03 Thread Jack Krupansky
The Solr properties can also be defined in solrcore.properties and core.properties files: https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml -- Jack Krupansky On Tue, Feb 3, 2015 at 3:31 PM, O. Olson wrote: > Thank you Jim. I was hoping if there is an alternative

Re: CopyField exclude patterns

2015-02-02 Thread Jack Krupansky
Sorry, that feature is not available in Solr at this time. You could implement an update processor which copied only the desired input field values. This can be done in JavaScript using the script update processor. -- Jack Krupansky On Mon, Feb 2, 2015 at 2:53 AM, danny teichthal wrote: >

Re: Solr facet search improvements

2015-01-28 Thread Jack Krupansky
need to be able to handle. -- Jack Krupansky On Wed, Jan 28, 2015 at 5:56 AM, thakkar.aayush wrote: > I have around 1 million job titles which are indexed on Solr and am looking > to improve the faceted search results on job title matches. > > For example: a job search for *Resear

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Jack Krupansky
Take a look at the RegexTransformer. Or,in some cases your may need to use the raw ScriptTransformer. See: https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler -- Jack Krupansky On Sat, Jan 24, 2015 at 3:49 PM, Carl Roberts wrote

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Jack Krupansky
How are you currently importing data? -- Jack Krupansky On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts wrote: > Sorry if I was not clear. What I am asking is this: > > How can I parse the data during import to tokenize it by (:) and strip the > cpe:/o? > > > > On 1/2

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Jack Krupansky
which treated the colons as token separators. -- Jack Krupansky On Sat, Jan 24, 2015 at 3:28 PM, Alexandre Rafalovitch wrote: > You are using keywords here that seem to contradict with each other. > Or your use case is not clear. > > Specifically, you are saying you are getting s

Re: Solr regex query help

2015-01-24 Thread Jack Krupansky
or maybe use a Solr update processor to pull the string apart and store the individual pieces as separate fields. As always, the first question is not how to store your data, but how your users intend to access your data. Post some sample queries. I imagine that any sane user would like to refere

Re: Retrieving Phonetic Code as result

2015-01-23 Thread Jack Krupansky
That's phone the filter is doing - transforming text into phonetic codes at index time. And at query time as well to do the phonetic matching in the query. The actual phonetic codes are stored in the index for the purposes of query matching. -- Jack Krupansky On Fri, Jan 23, 2015 at 12:

Re: Retrieving Phonetic Code as result

2015-01-23 Thread Jack Krupansky
/org/apache/solr/handler/FieldAnalysisRequestHandler.html and in solrconfig.xml -- Jack Krupansky On Thu, Jan 22, 2015 at 8:42 AM, Amit Jha wrote: > Hi, > > I need to know how can I retrieve phonetic codes. Does solr provide it as > part of result? I need codes for record matching. &g

Re: Avoiding wildcard queries using edismax query parser

2015-01-23 Thread Jack Krupansky
Presence of a wildcard in a query term is detected by the traditional Solr and edismax query parsers and causes normal term analysis to be bypassed. As I said, wildcards are a specific feature that dismax specifically doesn't support - this has nothing to do with edismax. -- Jack Krupansk

Re: Avoiding wildcard queries using edismax query parser

2015-01-22 Thread Jack Krupansky
The dismax query parser does not support wildcards. It is designed to be simpler. -- Jack Krupansky On Thu, Jan 22, 2015 at 5:57 PM, Jorge Luis Betancourt González < jlbetanco...@uci.cu> wrote: > I was also suspecting something like that, the odd thing was that the with > the dismax

Re: How do you query a sentence composed of multiple words in a description field?

2015-01-22 Thread Jack Krupansky
Solr tried to find the remaining terms in the default query field. -- Jack Krupansky On Thu, Jan 22, 2015 at 5:47 PM, Carl Roberts wrote: > Hi, > > How do you query a sentence composed of multiple words in a description > field? > > I want to search for sentence "Oracle Fusi

Re: Avoiding wildcard queries using edismax query parser

2015-01-22 Thread Jack Krupansky
The problem is that the presence of a wildcard causes Solr to skip the usual token analysis. But... you could add a "multiterm" analyzer, and then the wildcard would just get treated as punctuation. -- Jack Krupansky On Thu, Jan 22, 2015 at 4:33 PM, Jorge Luis Betancourt González &

Re: shards per disk

2015-01-20 Thread Jack Krupansky
It sounds like your app needs a lot more RAM so that it is not doing so much I/O. -- Jack Krupansky On Tue, Jan 20, 2015 at 9:24 AM, Nimrod Cohen wrote: > Hi > > I done some performance test, and I wanted to know if any one saw the same > behavior. > > > > We need to

Re: OutOfMemoryError for PDF document upload into Solr

2015-01-16 Thread Jack Krupansky
to do customization, entity extraction, boiler-plate removal, etc. in app-friendly code, before transport to the Solr server. The extraction request handler is a really cool feature and quite sufficient for a lot of scenarios, but additional architectural flexibility would be a big win. -- Jack

Re: Distributed mode for stats component?

2015-01-14 Thread Jack Krupansky
admittedly, it's moot if stats is eventually to be superseded by the analytics component. -- Jack Krupansky On Wed, Jan 14, 2015 at 12:26 PM, Chris Hostetter wrote: > > : Does anybody know for sure whether the stats component fully supports > : distributed mode? It is listed in

Distributed mode for stats component?

2015-01-14 Thread Jack Krupansky
ow the new analytics component doesn't support distributed mode, but my question is about the old "stats" component. -- Jack Krupansky

Re: Tokenizer or Filter ?

2015-01-14 Thread Jack Krupansky
It's what Java has, whatever that is: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html So, maybe the correct answer is neither, but similar to both. -- Jack Krupansky On Wed, Jan 14, 2015 at 9:06 AM, tomas.kalas wrote: > Oh yeah, that is it. Thank you very much

Re: Tokenizer or Filter ?

2015-01-14 Thread Jack Krupansky
I was suspecting it might do that - the pattern is "greedy" and takes the longest matching pattern. Add a question mark after the asterisk to use stingy mode that matches the shortest pattern. -- Jack Krupansky On Wed, Jan 14, 2015 at 8:37 AM, tomas.kalas wrote: > I just used Sol

Re: Tokenizer or Filter ?

2015-01-14 Thread Jack Krupansky
It should replace all occurrences of the pattern. Post your specific filter XML. Patterns can be very tricky. Use the Solr Admin UI analysis page to see how the filtering is occurring. -- Jack Krupansky On Wed, Jan 14, 2015 at 7:16 AM, tomas.kalas wrote: > Jack, thanks for help, but if i u

Re: Engage custom hit collector for special search processing

2015-01-13 Thread Jack Krupansky
umber of unique row sets. -- Jack Krupansky On Tue, Jan 13, 2015 at 4:29 PM, tedsolr wrote: > I have a complicated problem to solve, and I don't know enough about > lucene/solr to phrase the question properly. This is kind of a shot in the > dark. My requirement is to return searc

Re: Tokenizer or Filter ?

2015-01-13 Thread Jack Krupansky
s only . You can use a second pattern char filter to remove the "<[/]d[12>" markers as well, probably changing them to a space in both cases. See: http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilterFactory.html -- Jack K

Re: Tokenizer or Filter ?

2015-01-13 Thread Jack Krupansky
ipt update processors, see my Solr e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Krupansky On Tue, Jan 13, 2015 at 9:21 AM, tomas.kalas wrote: > Thanks Jack for your advice. Can you please explain me little

Re: Extending solr analysis in index time

2015-01-13 Thread Jack Krupansky
A function query or an update processor to create a separate field are still your best options. -- Jack Krupansky On Tue, Jan 13, 2015 at 4:18 AM, Ali Nazemian wrote: > Dear Markus, > > Unfortunately I can not use payload since I want to retrieve this score to > each user as a

Re: Solr grouping problem - need help

2015-01-13 Thread Jack Krupansky
That's your job. The easiest way is to do a copyField to a "string" field. -- Jack Krupansky On Tue, Jan 13, 2015 at 7:33 AM, Naresh Yadav wrote: > *Schema :* > > > *Code :* > SolrQuery q = new SolrQuery().setQuery("*:*"); > q.set(GroupParams.GR

Re: Extending solr analysis in index time

2015-01-12 Thread Jack Krupansky
Could you clarify what you mean by "Lucene reverse index"? That's not a term I am familiar with. -- Jack Krupansky On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian wrote: > Dear Jack, > Thank you very much. > Yeah I was thinking of function query for sorting, but I have to

Re: Extending solr analysis in index time

2015-01-11 Thread Jack Krupansky
Won't function queries do the job at query time? You can add or multiply the tf*idf score by a function of the term frequency of arbitrary terms, using the tf, mul, and add functions. See: https://cwiki.apache.org/confluence/display/solr/Function+Queries -- Jack Krupansky On Sun, Jan 11,

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Jack Krupansky
detect some common use cases and handle them specially in your client. Such as the example you gave - you could extract the terms and generate separate bq parameters. -- Jack Krupansky On Sun, Jan 11, 2015 at 1:28 PM, Michael Lackhoff wrote: > Am 11.01.2015 um 18:30 schrieb Jack Krupan

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Jack Krupansky
client or app layer code, then maybe you just need to put more intelligence into that query-generation code in the client. -- Jack Krupansky On Sun, Jan 11, 2015 at 12:08 PM, Michael Lackhoff wrote: > Hi Ahmet, > > > You might find this useful : > > https://lucidworks.com/blog/

Re: Frequent deletions

2015-01-11 Thread Jack Krupansky
than this optimize operation? -- Jack Krupansky On Sun, Jan 11, 2015 at 1:46 AM, ig01 wrote: > Thank you all for your response, > The thing is that we have 180G index while half of it are deleted > documents. > We tried to run an optimization in order to shrink index size but it

Re: Extending solr analysis in index time

2015-01-11 Thread Jack Krupansky
ities/TFIDFSimilarity.html And to use your custom similarity class in Solr: https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity -- Jack Krupansky On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian wrote: > Hi everybody, > > I am going to add some analy

Re: edismax and mm: strange behaviour

2015-01-10 Thread Jack Krupansky
ot;required".) So, please explain in plain English what effect you are trying to achieve. mm is not for newbies! Also, please point us to whatever doc or other material you were reading that gave you the impression that mm was appropriate for your use case, so that we can correct any bad documen

Re: ignoring bad documents during index

2015-01-10 Thread Jack Krupansky
the server rather than optimize performance. -- Jack Krupansky On Sat, Jan 10, 2015 at 6:02 AM, SolrUser1543 wrote: > Would it be a good solution to index single document instead of bulk ? > In this case I will know about the status of each message . > > What is recommendation

Re: ignoring bad documents during index

2015-01-10 Thread Jack Krupansky
Correct, Solr clearly needs improvement in this area. Feel free to comment on the Jira about what options you would like to see supported. -- Jack Krupansky On Sat, Jan 10, 2015 at 5:49 AM, SolrUser1543 wrote: > From reading this (https://issues.apache.org/jira/browse/SOLR-445) I see >

Re: How does text-rev work?

2015-01-10 Thread Jack Krupansky
uot;expert" feature. And there should be doc on how to use it. I do have some doc in my e-book, with some examples, but even that does not show the complete end-to-end config and schema. -- Jack Krupansky On Sat, Jan 10, 2015 at 1:13 AM, Alexandre Rafalovitch wrote: > So, Query Parser does

Re: How does text-rev work?

2015-01-09 Thread Jack Krupansky
that the field type uses the reversed wildcard filter, and then it generates a wildcard query that using the reversed query token and wildcard pattern so that the leading wildcard becomes a trailing wildcard or prefix query -- Jack Krupansky On Fri, Jan 9, 2015 at 3:15 PM, Alexandre Rafalovitch

Re: Tokenizer or Filter ?

2015-01-09 Thread Jack Krupansky
Consider an update processor - it can take any input, break it up any way you want, and then output multiple field values. You can even us the stateless script update processor to write the logic in JavaScript. -- Jack Krupansky On Fri, Jan 9, 2015 at 6:47 AM, tomas.kalas wrote: > Hello

Re: Determining the Number of Solr Shards

2015-01-08 Thread Jack Krupansky
table performance for both indexing and a full range of queries, and then use 10x that RAM for the RAM for the 100% load. That's the OS system memory for file caching, not the total system RAM. -- Jack Krupansky On Thu, Jan 8, 2015 at 4:55 PM, Nishanth S wrote: > Thanks guys for your inpu

Re: Solr Cloud, 100 shards, shards progressively become slower

2015-01-08 Thread Jack Krupansky
mean there will be a reduction in the amount of system memory needed for file caching of the Lucene index. 100 / 4 * 2.8GB = 70 GB of RAM needed on each server. -- Jack Krupansky On Thu, Jan 8, 2015 at 10:57 AM, Andrew Butkus < andrew.but...@c6-intelligence.com> wrote: > Hi Shawn, >

Re: Determining the Number of Solr Shards

2015-01-07 Thread Jack Krupansky
number of CPU cores? -- Jack Krupansky On Wed, Jan 7, 2015 at 9:14 PM, Nishanth S wrote: > Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads for the > moment would be in the 1000 reads/second. Guess finding out the right > number of shards would be my starting point. &

Re: Solr support for multi-tenant applications

2015-01-07 Thread Jack Krupansky
cores/tenants. Will tenants be directly accessing Solr, or will you provide them with a REST API for an application layer that intermediates access to Solr? -- Jack Krupansky On Wed, Jan 7, 2015 at 4:31 AM, Bram Van Dam wrote: > One possibility is to have separate core for each tenant domain. &

Re: Vertical search Engine

2015-01-06 Thread Jack Krupansky
queries are expressed and the results being returned. -- Jack Krupansky On Tue, Jan 6, 2015 at 3:39 AM, klunwebale wrote: > hello > > i want to create a vertical search engine like trovit.com. > > I have installed solr and solarium. > > What else to i need can you recomme

Re: edismax with multiple words for keyword tokenizer splitting on space

2015-01-06 Thread Jack Krupansky
You need to escape the space in your query (using backslash or quotes around the term) - the query parser doesn't parse based on the analyzer/tokenizer for each field. -- Jack Krupansky On Tue, Jan 6, 2015 at 4:05 AM, Sankalp Gupta wrote: > Hi > I come across this weird behaviour i

Re: How large is your solr index?

2015-01-03 Thread Jack Krupansky
t I agree that it would be highly desirable to push that 100 million number up to 350 million or even 500 million ASAP since the pain of unnecessarily sharding is unnecessarily excessive. I wonder what changes will have to occur in Lucene, or... what evolution in commodity hardware will be necessary t

Re: How large is your solr index?

2015-01-03 Thread Jack Krupansky
ere. So the race is on between when Lucene will relax the 2G limit and when hardware gets fast enough that 2G documents can be indexed within a small number of hours. -- Jack Krupansky On Sat, Jan 3, 2015 at 4:00 PM, Toke Eskildsen wrote: > Erick Erickson [erickerick...@gmail.com] wrote: &

Re: De Duplication using Solr

2015-01-03 Thread Jack Krupansky
First, see if you can get your requirements to align to the de-dupe feature that Solr already has: https://cwiki.apache.org/confluence/display/solr/De-Duplication -- Jack Krupansky On Sat, Jan 3, 2015 at 2:54 AM, Amit Jha wrote: > I am trying to find out duplicate records based on dista

Re: Queries not supported by Lucene Query Parser syntax

2015-01-01 Thread Jack Krupansky
R-839 -- Jack Krupansky On Thu, Jan 1, 2015 at 4:08 AM, Leonid Bolshinsky wrote: > Hello, > > Are we always limited by the query parser syntax when passing a query > string to Solr? > What about the query elements which are not supported by the syntax? > For example, BooleanQuery.setM

Re: Join in SOLR

2014-12-31 Thread Jack Krupansky
You would have to do your own build since the patch has not been committed. -- Jack Krupansky On Wed, Dec 31, 2014 at 12:27 AM, Rajesh wrote: > Mikhail, > > How can I get a nightly build with fix for SOLR-5147 included. I've > searched and found that nightly build will not be

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jack Krupansky
I do have a more thorough discussion of WDF in my Solr Deep Dive e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html You're not "wrong" about anything here... you just need to accept that WDF is not magic a

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jack Krupansky
Right, that's what I meant by WDF not being "magic" - you can configure it to match any three out of four use cases as you choose, but there is no choice that matches all of the use cases. To be clear, this is not a "bug" in WDF, but simply a limitation. -- Jack Krupan

Re: How large is your solr index?

2014-12-30 Thread Jack Krupansky
a proof of concept implementation to validate whether the sweet spot for your particular data, data model, and application access patterns may be well above or even below that. Yes, indeed, sing praises for heroes, but don't kill yourself and drag down others trying to be one yourself. --

Re: Solr server becomes non-responsive.

2014-12-30 Thread Jack Krupansky
e absolute precision. Sometimes you just want to know whether "something" exists matching the pattern, or "generally" what the values look like. I think it would be worth a Jira. -- Jack Krupansky On Tue, Dec 30, 2014 at 6:16 AM, Modassar Ather wrote: > Hi, > >

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jack Krupansky
term and the multi-term phrase, while the query analyzer would NOT do the split on case, so that the query could be a unitary term (possibly with mixed case, but that would not split the term) or could be a two-word phrase. -- Jack Krupansky -- Jack Krupansky On Mon, Dec 29, 2014 at 5:12 PM

Re: How large is your solr index?

2014-12-29 Thread Jack Krupansky
. -- Jack Krupansky -- Jack Krupansky On Mon, Dec 29, 2014 at 12:54 PM, Erick Erickson wrote: > When you say 2B docs on a single Solr instance, are you talking only one > shard? > Because if you are, you're very close to the absolute upper limit of a > shard, internally > the doc

Re: How to implement multi-set in a Solr schema.

2014-12-28 Thread Jack Krupansky
You can also use group.query or group.func to group documents matching a query or unique values of a function query. For the latter you could implement an NLP algorithm. -- Jack Krupansky On Sun, Dec 28, 2014 at 5:56 PM, Meraj A. Khan wrote: > Thanks Aman, the thing is the bookName fi

Re: Solr server becomes non-responsive.

2014-12-26 Thread Jack Krupansky
are no longer I/O bound. If compute bound, shard more heavily until the query latency becomes acceptable. -- Jack Krupansky On Fri, Dec 26, 2014 at 1:02 AM, Modassar Ather wrote: > Thanks for your suggestions Erick. > > This may be one of those situations where you really have to &g

Re: solr export get wrong results

2014-12-26 Thread Jack Krupansky
/solr/Exporting+Result+Sets -- Jack Krupansky On Fri, Dec 26, 2014 at 3:58 AM, Sandy Ding wrote: > Hi, all > > I've recently set up a solr cluster and found that "export" returns > different results from "select". > And I confirmed that the "expor

Re: 'Illegal character in query' on Solr cloud 4.10.1

2014-12-24 Thread Jack Krupansky
ther it is Tomcat or Solr that gives the error, the main point is that the raw circumflex shouldn't be sent to either. -- Jack Krupansky On Wed, Dec 24, 2014 at 4:32 PM, Erick Erickson wrote: > OK, then I don't think it's a Solr problem. I think 5 of your Tomcats are > con

Re: first time user

2014-12-16 Thread Jack Krupansky
thing, but the real problem is further upstream and hasn't been fully expressed. My model is to give you a lot of examples and you can decide for yourself which best exemplifies what you are trying to do. And to give more detail on the features of Solr. -- Jack Krupansky -Origina

Re: first time user

2014-12-16 Thread Jack Krupansky
My Solr Deep Dive e-book has full details and lots of examples for CSV indexing: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Tuesday, December

Re: different fields for user-supplied phrases in edismax

2014-12-13 Thread Jack Krupansky
boost as do less-precise phrases. But it does need to be optional since it has an added cost at query time. -- Jack Krupansky -Original Message- From: Michael Sokolov Sent: Saturday, December 13, 2014 8:43 AM To: solr-user@lucene.apache.org Subject: Re: different fields for user-supplied

Re: How to stop Solr tokenising search terms with spaces

2014-12-10 Thread Jack Krupansky
If possible, please post your field type for others to see the final solution. Thanks! -- Jack Krupansky -Original Message- From: Dinesh Babu Sent: Wednesday, December 10, 2014 9:54 AM To: solr-user@lucene.apache.org ; Ahmet Arslan Subject: RE: How to stop Solr tokenising search

Re: How to stop Solr tokenising search terms with spaces

2014-12-07 Thread Jack Krupansky
combined with the NGramFilterFactory and lower case filter, but only use the ngram filter at index time. See: http://lucene.apache.org/core/4_10_2/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html But be aware that use of the ngram filter dramatically increases the index

Re: How to stop Solr tokenising search terms with spaces

2014-12-06 Thread Jack Krupansky
to providing us with more specific requirements. My guess, from your mention of LDAP, is that the field would contain only a name, but... that's me guessing when you need to be specific. Once this distinction is cleared up, we can then focus on solutions that work either for arbitrary text or

Re: Large fields storage

2014-12-01 Thread Jack Krupansky
In particular, if they are image-intensive, all the images go away. And the formatting as well. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Monday, December 1, 2014 6:02 PM To: solr-user@lucene.apache.org Subject: Re: Large fields storage Hi Avi, I assume your

Re: Disappearance of post.jar from the new tutorial

2014-11-30 Thread Jack Krupansky
of adopting for Solr. I mean, are we trying too reinvent the wheel here, or what?! Note: This is the Solr USER list, which isn't the best forum for development discussions. -- Jack Krupansky -Original Message- From: Erik Hatcher Sent: Sunday, November 30, 2014 10

<    1   2   3   4   5   6   7   8   9   10   >