Re: Adding field to solr dynamically
Thanks. On Sun, Oct 13, 2013 at 4:18 PM, Jack Krupansky j...@basetechnology.comwrote: Either simply use a dynamic field, or use the Schema API to add a static field: https://cwiki.apache.org/**confluence/display/solr/**Schema+APIhttps://cwiki.apache.org/confluence/display/solr/Schema+API Dynamic fields (your nominal field name plus a suffix that specifies the type and muliplicity - as detailed in the Solr example schema) may be good enough, depending on the rest of your requirements. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Sunday, October 13, 2013 5:32 AM To: solr-user@lucene.apache.org Subject: Adding field to solr dynamically My database model is designed using dynamic attributes (Entity Attribute Value model). For the db I have a service that adds a new attribute. But everytime a new attributes is added I need to add it to the schema.xml Is there a possibile way to add a field to solr schama.xml dynamically?
Adding field to solr dynamically
My database model is designed using dynamic attributes (Entity Attribute Value model). For the db I have a service that adds a new attribute. But everytime a new attributes is added I need to add it to the schema.xml Is there a possibile way to add a field to solr schama.xml dynamically?
Re: How to define facet.prefix as case-insensitive
thanks. On Sun, Sep 22, 2013 at 6:24 PM, Erick Erickson erickerick...@gmail.comwrote: You'll have to lowercase the term in your app and set terms.prefix to that value, there's no analysis done on the terms.prefix value. Best, Erick On Sun, Sep 22, 2013 at 4:07 AM, Mysurf Mail stammail...@gmail.com wrote: I am using facet.prefix for auto complete. This is my definition requestHandler name=/ac class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str ... str name=lowercaseOperatorstrue/str str name=faceton/str str name=facet.fieldSuggest/str /lst this is my field field name=Suggest type=text_auto indexed=true stored=true required=false multiValued=true/ and fieldType class=solr.TextField name=text_auto analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType all works fine but when I search using caps lock it doesn't return answers. Even when the field contains capitals letters - it doesn't. I assume that the field in solr is lowered (from the field type filter definition) but the search term is not. How can I control the search term caps/no caps? Thanks.
solr - searching part of words
My field is defined as field name=PackageName type=text_en indexed=true stored=true required=true/ *text_en is defined as in the original schema.xml that comes with solr Now, my field has the following vaues - one - one1 searching for one returns only the field one. What causes it? how can I change it?
How to define facet.prefix as case-insensitive
I am using facet.prefix for auto complete. This is my definition requestHandler name=/ac class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str ... str name=lowercaseOperatorstrue/str str name=faceton/str str name=facet.fieldSuggest/str /lst this is my field field name=Suggest type=text_auto indexed=true stored=true required=false multiValued=true/ and fieldType class=solr.TextField name=text_auto analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType all works fine but when I search using caps lock it doesn't return answers. Even when the field contains capitals letters - it doesn't. I assume that the field in solr is lowered (from the field type filter definition) but the search term is not. How can I control the search term caps/no caps? Thanks.
Re: solr suggestion -
Yes. I understood that from the result. But how do I change that behaviour? Don't do any analysis on the field you are using for suggestion Please elaborate. On Mon, Sep 9, 2013 at 8:48 PM, tamanjit.bin...@yahoo.co.in tamanjit.bin...@yahoo.co.in wrote: Don't do any analysis on the field you are using for suggestion. What is happening here is that query time and indexing time the tokens are being broken on white space. So effectively, at is being taken as one token and l is being taken as another token for which you get two different suggestions. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-suggestion-tp4087841p4088919.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr suggest - How to define solr suggest as case insensitive
I have added it and it didnt work. Still returning different result to 1=C and q=c On Tue, Sep 10, 2013 at 1:52 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : This is probably because your dictionary is made up of all lower case tokens, : but when you query the spell-checker similar analysis doesnt happen. Ideal : case would be when you query the spellchecker you send lower case queries You can init the SpellCheckComponent with a queryAnalyzerFieldType option that will control what analysis happens. ie... !-- This field type's analyzer is used by the QueryConverter to tokenize the value for q parameter -- str name=queryAnalyzerFieldTypephrase_suggest/str ...it would be nice if this defaulted to using the fieldType of hte field you configure on the Suggester, but not all Impls are based on the index (you might be using an external dict file) so it has to be explicitly configured, and defaults to using a simple WhitespaceAnalyzer. -Hoss
Solr Suggester - How do I filter autocomplete results
I want to filter the auto complete results from my suggester Lets say I have a book table Table (Id Guid, BookName String, BookOwner id) I want each user to get a list to autocomplete from its own books. I want to add something like the http://.../solr/vault/suggest?q=cfq=BookOwner:3 This doesnt work. What other ways do I have to implement it?
Solr doesnt return answer when searching numbers
I am querying using http://...:8983/solr/vault/select?q=design testfl=PackageName I get 3 result: - design test - design test 2013 - design test for jobs Now when I query using q=test for jobs - I get only design test for jobs But when I query using q = 2013 http://...:8983/solr/vault/select?q=2013fl=PackageName I get no result. Why doesnt it return an answer when I query with numbers? In schema xml field name=PackageName type=text_en indexed=true stored=true required=true/
Solr suggest - How to define solr suggest as case insensitive
My suggest (spellchecker) is returning case sensitive answers. (I use it to autocomplete - dog and Dog return different phrases)\ my suggest is defined as follows - in solrconfig - searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldsuggest/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str !--str name=sourceLocationamerican-english/str-- /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler in schema field name=suggest type=phrase_suggest indexed=true stored=true required=false multiValued=true/ and copyField source=Name dest=suggest/ and fieldtype name=phrase_suggest class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=([^\p{L}\p{M}\p{N}\p{Cs}]*[\p{L}\p{M}\p{N}\p{Cs}\_]+:)|([^\p{L}\p{M}\p{N}\p{Cs}])+ replacement= replace=all/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldtype
Problem parsing suggest response
Hi, I am having problems parsing suggest json response in c#. Here is an example { - responseHeader: { - status: 0, - QTime: 1 }, - spellcheck: { - suggestions: [ - at, - { - numFound: 1, - startOffset: 1, - endOffset: 3, - suggestion: [ - atrion ] }, - l, - { - numFound: 2, - startOffset: 4, - endOffset: 5, - suggestion: [ - lot, - loadtest_template_700 ] }, - collation, - atrion lot ] } } 1. Is this a valid json? Shouldnt every item be surrounded by quatation marks? 2. The items at and l are not preceded by name. (This generates different xml in every online json-to-xml translater. Is this a standard json? Can I interfere with the structure? Thanks.
solr suggestion -
the following request http://127.0.0.1:8983/solr/vault/suggest?wt=jsonq=at%20l Returns phrases that starts with at and with l (as shown below ) Now, what if I want phrases that starts with At l such as At Least... Thanks. { - responseHeader: { - status: 0, - QTime: 1 }, - spellcheck: { - suggestions: [ - at, - { - numFound: 1, - startOffset: 1, - endOffset: 3, - suggestion: [ - atrion ] }, - l, - { - numFound: 2, - startOffset: 4, - endOffset: 5, - suggestion: [ - lot, - loadtest_template_700 ] }, - collation, - atrion lot ] } }
Troubles defining suggester/ understanding results
I am having troubles defining suggester for auto complete after reading the tutorial. Here are my shcema definitions: field name=PackageName type=text_en indexed=true stored=true required=true/ field name=PackageVersionComments type=text_en indexed=true stored=true required=false/ ... field name=SKUDescription type=text_en indexed=true stored=true required=false multiValued=true/ I also added two field types fieldtype name=text class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype fieldtype name=phrase_suggest class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=([^\p{L}\p{M}\p{N}\p{Cs}]*[\p{L}\p{M}\p{N}\p{Cs}\_]+:)|([^\p{L}\p{M}\p{N}\p{Cs}])+ replacement= replace=all/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldtype Now since I want to make suggestions from multiple fields and I cant declare two fields I defined : and copied three of the fields using : Problems: 1. everything loads pretty well. but copying the fields to a new fields just inflate my index. is there a possiblity to define the suggester on mopre then one field? 2. I cant understand the results. querying http://127.0.0.1:8983/solr/Book/suggest?q=th returns docs such as that are labelled in black on a black background a little black light though quering http://127.0.0.1:8983/solr/vault-Book/suggest?q=lab doesnt return anything. lab is found in the previous result as well. What is the problem?
Re: autocomplete feature - where to begin
Thanks. Will read it now :-) On Tue, Aug 13, 2013 at 8:33 PM, Cassandra Targett casstarg...@gmail.comwrote: The autocomplete feature in Solr is built on the spell checker component, and is called Suggester, which is why you've seen both of those mentioned. It's implemented with a searchComponent and a requestHandler. The Solr Reference Guide has a decent overview of how to implement it and I just made a few edits to make what needs to be done a bit more clear: https://cwiki.apache.org/confluence/display/solr/Suggester If you have suggestions for improvements to that doc (such as steps that aren't clear), you're welcome to set up an account there and leave a comment. Cassandra On Tue, Aug 13, 2013 at 11:16 AM, Mysurf Mail stammail...@gmail.com wrote: I have indexed the data from the db and so far it searches really well. Now I want to create auto-complete/suggest feature in my website So far I have seen articles about Suggester, spellchecker, and searchComponents. Can someone point me to a good article about basic autocomplete implementation?
solr not writing logs when it runs not from its main folder
When I run solr using java -jar C:\solr\example\start.jar It writes logs to C:\solr\example\logs. When I run it using java -Dsolr.solr.home=C:\solr\example\solr -Djetty.home=C:\solr\example -Djetty.logs=C:\solr\example\logs -jar C:\solr\example\ start.jar it writes logs only if I run it from C:\solr\example any other folder - logs are not written. This is important as I need to run it as a service later (using nssm) What should I change?
autocomplete feature - where to begin
I have indexed the data from the db and so far it searches really well. Now I want to create auto-complete/suggest feature in my website So far I have seen articles about Suggester, spellchecker, and searchComponents. Can someone point me to a good article about basic autocomplete implementation?
Solr - how do I index barcode
I have a documnet that contains the following data car { id: guid name: string sku: listbarcode } Now, The barcodes dont have a pattern. It can be either one of the follwings: ABCD-EF34GD-JOHN ABCD-C08-YUVF I want to index my documents so that search for 1. ABCD will return both. 2. AB will return both. 3. JO - will return ABCD-EF34GD-JOHN but not car with name john. so far I have defined car and sku as text_en. But I dont get bulletes no 2 and 3. IS there a better way to define sku attribute. Thanks.
Re: Solr - how do I index barcode
2. notes 1. My current query is similiar to this http://127.0.0.1:8983/solr/vault/select?q=ABCDqf=Name+SKUdefType=edismax 2. I want it to be case insensitive On Thu, Aug 8, 2013 at 2:52 PM, Mysurf Mail stammail...@gmail.com wrote: I have a documnet that contains the following data car { id: guid name: string sku: listbarcode } Now, The barcodes dont have a pattern. It can be either one of the follwings: ABCD-EF34GD-JOHN ABCD-C08-YUVF I want to index my documents so that search for 1. ABCD will return both. 2. AB will return both. 3. JO - will return ABCD-EF34GD-JOHN but not car with name john. so far I have defined car and sku as text_en. But I dont get bulletes no 2 and 3. IS there a better way to define sku attribute. Thanks.
Re: solr - using fq parameter does not retrieve an answer
Thanks. On Mon, Aug 5, 2013 at 4:57 PM, Shawn Heisey s...@elyograg.org wrote: On 8/5/2013 2:35 AM, Mysurf Mail wrote: When I query using http://localhost:8983/solr/vault/select?q=*:* I get reuslts including the following doc ... ... int name=VersionNumber7/int ... /doc Now I try to get only that row so I add to my query fq=VersionNumber:7 http://localhost:8983/solr/vault/select?q=*:*fq=VersionNumber:7 And I get nothing. Any idea? Is the VersionNumber field indexed? If it's not, you won't be able to search on it. If you change your schema so that the field has 'indexed=true, you'll have to reindex. http://wiki.apache.org/solr/HowToReindex When you are retrieving a single document, it's better to use the q parameter rather than the fq parameter. Querying a single document will pollute the cache. It's a lot better to pollute the queryResultCache than the filterCache. The former is generally much larger than the latter and better able to deal with pollution. Thanks, Shawn
Knowing what field caused the retrival of the document
I have two indexed fields in my document.- Name, Comment. The user searches for a phrase and I need to act differently if it appeared in the comment or the name. Is there a way to know why the document was retrieved? Thanks.
How to plan field boosting
I query using qf=Name+Tag Now I want that documents that have the phrase in tag will arrive first so I use qf=Name+Tag^2 and they do appear first. What should be the rule of thumb regarding the number that comes after the field? How do I know what number to set it?
Re: Knowing what field caused the retrival of the document
But what if this for multiple words ? I am guessing solr knows why the document is there since I get to see the paragraph in the highlight.(hl) section. On Tue, Aug 6, 2013 at 11:36 AM, Raymond Wiker rwi...@gmail.com wrote: If you were searching for single words (terms), you could use the 'tf' function, by adding something like matchesinname:tf(name, whatever) to the 'fl' parameter - if the 'name' field contains whatever, the (result) field 'matchesinname' will be 1. On Tue, Aug 6, 2013 at 10:24 AM, Mysurf Mail stammail...@gmail.com wrote: I have two indexed fields in my document.- Name, Comment. The user searches for a phrase and I need to act differently if it appeared in the comment or the name. Is there a way to know why the document was retrieved? Thanks.
Multiple sorting does not work as expected
My documents has 2 indexed attribute - name (string) and version (number) I want within the same score the documents will be displayed by the following order score(desc),name(desc),version(desc) Therefor I query using : http://localhost:8983/solr/vault/select? q=BOMfl=*:score sort=score+desc,Name+desc,Version+desc And I get the following inside the result: doc str name=NameBOM Total test2/str ... int name=Version2/int ... float name=score2.2388418/float /doc doc str name=NameBOM Total test - Copy/str ... int name=Version2/int ... float name=score2.2388418/float /doc doc str name=NameBOM Total test2/str ... int name=Version1/int ... float name=score2.2388418/float /doc The scoring is equal, but the name is not sorted. What am I doing wrong here?
Re: Multiple sorting does not work as expected
my schema field name=Name type=text_en indexed=true stored=true required=true/ field name=Version type=int indexed=true stored=true required=true/ On Tue, Aug 6, 2013 at 5:06 PM, Mysurf Mail stammail...@gmail.com wrote: My documents has 2 indexed attribute - name (string) and version (number) I want within the same score the documents will be displayed by the following order score(desc),name(desc),version(desc) Therefor I query using : http://localhost:8983/solr/vault/select? q=BOMfl=*:score sort=score+desc,Name+desc,Version+desc And I get the following inside the result: doc str name=NameBOM Total test2/str ... int name=Version2/int ... float name=score2.2388418/float /doc doc str name=NameBOM Total test - Copy/str ... int name=Version2/int ... float name=score2.2388418/float /doc doc str name=NameBOM Total test2/str ... int name=Version1/int ... float name=score2.2388418/float /doc The scoring is equal, but the name is not sorted. What am I doing wrong here?
Re: Multiple sorting does not work as expected
I don't see how it is sorted. this is the order as displayed above 1- BOM Total test2 2- BOM Total test - Copy 3- BOM Total test2 all in the same 2.2388418 score On Tue, Aug 6, 2013 at 5:28 PM, Jack Krupansky j...@basetechnology.comwrote: The Name field is sorted as you have requested - desc. I suspect that you wanted name to be sorted asc (natural order.) -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, August 06, 2013 10:22 AM To: solr-user@lucene.apache.org Subject: Re: Multiple sorting does not work as expected my schema field name=Name type=text_en indexed=true stored=true required=true/ field name=Version type=int indexed=true stored=true required=true/ On Tue, Aug 6, 2013 at 5:06 PM, Mysurf Mail stammail...@gmail.com wrote: My documents has 2 indexed attribute - name (string) and version (number) I want within the same score the documents will be displayed by the following order score(desc),name(desc),**version(desc) Therefor I query using : http://localhost:8983/solr/**vault/selecthttp://localhost:8983/solr/vault/select ? q=BOMfl=*:score sort=score+desc,Name+desc,**Version+desc And I get the following inside the result: doc str name=NameBOM Total test2/str ... int name=Version2/int ... float name=score2.2388418/float /doc doc str name=NameBOM Total test - Copy/str ... int name=Version2/int ... float name=score2.2388418/float /doc doc str name=NameBOM Total test2/str ... int name=Version1/int ... float name=score2.2388418/float /doc The scoring is equal, but the name is not sorted. What am I doing wrong here?
solr - using fq parameter does not retrieve an answer
When I query using http://localhost:8983/solr/vault/select?q=*:* I get reuslts including the following doc ... ... int name=VersionNumber7/int ... /doc Now I try to get only that row so I add to my query fq=VersionNumber:7 http://localhost:8983/solr/vault/select?q=*:*fq=VersionNumber:7 And I get nothing. Any idea?
Re: solr - please help me arrange my search url
So, If I query over more than one field always. And they are always the same fields then I cannot place them in a config file. I should always list all them in my url? On Thu, Aug 1, 2013 at 5:05 PM, Jack Krupansky j...@basetechnology.comwrote: 1. df only supports a single field. All but the first will be ignored. 2. qf takes a list as a space-delimited string, with optional boost (^n) after each field name. 3. df is only used by edismax if qf is not present. 3. Your working query uses a different term (walk) than your other queries (jump). Are you sure that jump appears in that field? What does your field analyzer look like? Or is it a string field? If the latter, does the case match exactly and are there any extraneous spaces? -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Thursday, August 01, 2013 7:48 AM To: solr-user@lucene.apache.org Subject: solr - please help me arrange my search url I am still doing something wrong with solr. I am querying with the following parameters http://...:8983/solr/vault/**select?q=jumpqf=PackageTag**defType=edismax (meaning I am using edismax and I query on the field PackageTag ) I get nothing. when I dont declare the field and query http://...:8983/solr/vault/**select?q=jumpdefType=edismax and declare the searched on fileds in lst name=defaults str name=echoParamsexplicit/**str int name=rows10/int str name=dfPackageName/str str name=dfPackageTag/str I get also nothing Its only when I query with http://...:8983/solr/vault/**select?q=PackageTag:walk**defType=edismax My goal is to have two kinds of url - 1. one that will query without getting the SearchedOn fields. I will put default declaration in another place (where then?) 2. one that will query with getting the SearchedOn fields. should I use dismax?edismax? qf or q=..:... Thanks.
solr - please help me arrange my search url
I am still doing something wrong with solr. I am querying with the following parameters http://...:8983/solr/vault/select?q=jumpqf=PackageTagdefType=edismax (meaning I am using edismax and I query on the field PackageTag ) I get nothing. when I dont declare the field and query http://...:8983/solr/vault/select?q=jumpdefType=edismax and declare the searched on fileds in lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfPackageName/str str name=dfPackageTag/str I get also nothing Its only when I query with http://...:8983/solr/vault/select?q=PackageTag:walkdefType=edismax My goal is to have two kinds of url - 1. one that will query without getting the SearchedOn fields. I will put default declaration in another place (where then?) 2. one that will query with getting the SearchedOn fields. should I use dismax?edismax? qf or q=..:... Thanks.
Working with solr over two different db schemas
Been working on it for quitre some time. this is my config dataConfig dataSource type=JdbcDataSource name=ds1 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://...:1433;databaseName=A user=XX password=XX / document entity name=PackageVersion pk=PackageVersionId query= /*PackageVersion.Query*/ select PackageVersion.Id PackageVersionId, PackageVersion.VersionNumber, CONVERT(char(19), PackageVersion.LastModificationTime ,126) + 'Z' LastModificationTime, Package.Id PackageId, Package.Name PackageName, PackageVersion.Comments PackageVersionComments, Package.CreatedBy CreatedBy from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId where Package.RecordStatusId=0 and PackageVersion.RecordStatusId=0 entity name=PackageTag pk=ResourceId processor=CachedSqlEntityProcessor cacheKey=ResourceId cacheLookup=PackageVersion.PackageId query=/*PackageTag.Query*/ select ResourceId,[Text] PackageTag from [dbo].[Tag] Tag Where ResourceType = 0/ /entity /document /dataConfig Now, this runs in my test env and the only thing I do is change the configuration to another db( and as a result also the schema name from [dbo] to another ) This result in a totally different behavior. In the first configuration the selects were done be this order - inner object and then outer object. which means that the cache works. In the second configuration - over the other db the order was first the outer and then the inner. cache did not work at all. the inner query is not stored at all. What could be the problem?
solr - set fileds as default search field
The following query works well for me http://[]:8983/solr/vault/select?q=VersionComments%3AWhite returns all the documents where version comments includes White I try to omit the field name and put it as a default value as follows : In solr config I write requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfPackageName/str str name=dfTag/str str name=dfVersionComments/str str name=dfVersionTag/str str name=dfDescription/str str name=dfSKU/str str name=dfSKUDesc/str /lst I restart the solr and create a full import. Then I try using http://[]:8983/solr/vault/select?q=White (Where http://[]:8983/solr/vault/select?q=VersionComments%3AWhite still works) But I dont get the document any as answer. What am I doing wrong?
Re: adding date column to the index
Ahaa I deleted the data folder and now I get Invalid Date String:'2010-01-01 00:00:00 +02:00' I need to cast it to solr. as I read it in the schema using field name=LastModificationTime type=date indexed=false stored=true required=true/ On Tue, Jul 23, 2013 at 10:50 AM, Gora Mohanty g...@mimirtech.com wrote: On 23 July 2013 11:13, Mysurf Mail stammail...@gmail.com wrote: clarify: I did deleted the data in the index and reloaded it (+ commit). (As i said, I have seen it loaded in the sb profiler) [...] Please share your DIH configuration file, and Solr's schema.xml. It must be that somehow the column is not getting indexed. Regards, Gora
Re: adding date column to the index
How do I cast datetimeoffset(7)) to solr date On Tue, Jul 23, 2013 at 11:11 AM, Mysurf Mail stammail...@gmail.com wrote: Ahaa I deleted the data folder and now I get Invalid Date String:'2010-01-01 00:00:00 +02:00' I need to cast it to solr. as I read it in the schema using field name=LastModificationTime type=date indexed=false stored=true required=true/ On Tue, Jul 23, 2013 at 10:50 AM, Gora Mohanty g...@mimirtech.com wrote: On 23 July 2013 11:13, Mysurf Mail stammail...@gmail.com wrote: clarify: I did deleted the data in the index and reloaded it (+ commit). (As i said, I have seen it loaded in the sb profiler) [...] Please share your DIH configuration file, and Solr's schema.xml. It must be that somehow the column is not getting indexed. Regards, Gora
solr - Deleting a row from the index, using the configuration files only.
I am updating my solr index using deltaQuery and deltaImportQuery attributes in data-config.xml. In my condition I write where MyDoc.LastModificationTime '${dataimporter.last_index_time}' then after I add a row I trigger an update using data-config.xml. Now, sometimes I delete a row. How can I implement this with configuration files only (without sending a delete rest command to solr ). Lets say my object is not deleted but its status is changed to deleted. I dont index that status field, as I want to hold only the live rows. (otherwise I could have just filtered it) Is there a way to do it? thanks.
filter query result by user
I want to restrict the returned results to be only the documents that were created by the user. I then load to the index the createdBy attribute and set it to index false,stored=true field name=CreatedBy type=string indexed=false stored=true required=true/ then in the I want to filter by CreatedBy so I use the dashboard, check edismax and add I check edismax and add CreatedBy:user1 to the qf field. the result query is http:// :8983/solr/vault/select?q=*%3A*defType=edismaxqf=CreatedBy%3Auser1 Nothing is filtered. all rows returned. What was I doing wrong?
Re: filter query result by user
But I dont want it to be searched.on lets say the user name is giraffe I do want to filter to be where created by = giraffe but when the user searches his name, I will want only documents with name Giraffe. since it is indexed, wouldn't it return all rows created by him? Thanks. On Tue, Jul 23, 2013 at 4:28 PM, Raymond Wiker rwi...@gmail.com wrote: Simple: the field needs to be indexed in order to search (or filter) on it. On Tue, Jul 23, 2013 at 3:26 PM, Mysurf Mail stammail...@gmail.com wrote: I want to restrict the returned results to be only the documents that were created by the user. I then load to the index the createdBy attribute and set it to index false,stored=true field name=CreatedBy type=string indexed=false stored=true required=true/ then in the I want to filter by CreatedBy so I use the dashboard, check edismax and add I check edismax and add CreatedBy:user1 to the qf field. the result query is http:// :8983/solr/vault/select?q=*%3A*defType=edismaxqf=CreatedBy%3Auser1 Nothing is filtered. all rows returned. What was I doing wrong?
Re: filter query result by user
I am probably using it wrong. http:// ...:8983/solr/vault10k/select?q=*%3A*defType=edismaxqf=CreatedBy%BLABLA returns all rows. It neglects my qf filter. Should I even use qf for filtrering with edismax? (It doesnt say that in the doc http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29) On Tue, Jul 23, 2013 at 4:32 PM, Mysurf Mail stammail...@gmail.com wrote: But I dont want it to be searched.on lets say the user name is giraffe I do want to filter to be where created by = giraffe but when the user searches his name, I will want only documents with name Giraffe. since it is indexed, wouldn't it return all rows created by him? Thanks. On Tue, Jul 23, 2013 at 4:28 PM, Raymond Wiker rwi...@gmail.com wrote: Simple: the field needs to be indexed in order to search (or filter) on it. On Tue, Jul 23, 2013 at 3:26 PM, Mysurf Mail stammail...@gmail.com wrote: I want to restrict the returned results to be only the documents that were created by the user. I then load to the index the createdBy attribute and set it to index false,stored=true field name=CreatedBy type=string indexed=false stored=true required=true/ then in the I want to filter by CreatedBy so I use the dashboard, check edismax and add I check edismax and add CreatedBy:user1 to the qf field. the result query is http:// :8983/solr/vault/select?q=*%3A*defType=edismaxqf=CreatedBy%3Auser1 Nothing is filtered. all rows returned. What was I doing wrong?
adding date column to the index
I have added a date field to my index. I dont want the query to search on this field, but I want it to be returned with each row. So I have defined it in the scema.xml as follows: field name=LastModificationTime type=date indexed=false stored=true required=true/ I added it to the select in data-config.xml and I see it selected in the profiler. now, when I query all fileds (using the dashboard) I dont see it. Even when I ask for it specifically I dont see it. What am I doing wrong? (In the db it is (datetimeoffset(7)))
deserializing highlighting json result
When I request a json result I get the following streucture in the highlighting {highlighting:{ 394c65f1-dfb1-4b76-9b6c-2f14c9682cc9:{ PackageName:[- emTestingem channel twenty.]}, baf8434a-99a4-4046-8a4d-2f7ec09eafc8:{ PackageName:[- emTestingem channel twenty.]}, 0a699062-cd09-4b2e-a817-330193a352c1:{ PackageName:[- emTestingem channel twenty.]}, 0b9ec891-5ef8-4085-9de2-38bfa9ea327e:{ PackageName:[- emTestingem channel twenty.]}}} It is difficult to deserialize this json because the guid is in the attribute name. Is that solveable (using c#)?
Re: adding date column to the index
clarify: I did deleted the data in the index and reloaded it (+ commit). (As i said, I have seen it loaded in the sb profiler) Thanks for your comment. On Mon, Jul 22, 2013 at 9:25 PM, Lance Norskog goks...@gmail.com wrote: Solr/Lucene does not automatically add when asked, the way DBMS systems do. Instead, all data for a field is added at the same time. To get the new field, you have to reload all of your data. This is also true for deleting fields. If you remove a field, that data does not go away until you re-index. On 07/22/2013 07:31 AM, Mysurf Mail wrote: I have added a date field to my index. I dont want the query to search on this field, but I want it to be returned with each row. So I have defined it in the scema.xml as follows: field name=LastModificationTime type=date indexed=false stored=true required=true/ I added it to the select in data-config.xml and I see it selected in the profiler. now, when I query all fileds (using the dashboard) I dont see it. Even when I ask for it specifically I dont see it. What am I doing wrong? (In the db it is (datetimeoffset(7)))
Re: deserializing highlighting json result
the guid appears as the attribute id and not as id:baf8434a-99a4-4046-8a4d-2f7ec09eafc8: Trying to create an object that holds this guid will create an attribute with name baf8434a-99a4-4046-8a4d-2f7ec09eafc8 On Mon, Jul 22, 2013 at 6:30 PM, Jack Krupansky j...@basetechnology.comwrote: Exactly why is it difficult to deserialize? Seems simple enough. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Monday, July 22, 2013 11:14 AM To: solr-user@lucene.apache.org Subject: deserializing highlighting json result When I request a json result I get the following streucture in the highlighting {highlighting:{ 394c65f1-dfb1-4b76-9b6c-**2f14c9682cc9:{ PackageName:[- emTestingem channel twenty.]}, baf8434a-99a4-4046-8a4d-**2f7ec09eafc8:{ PackageName:[- emTestingem channel twenty.]}, 0a699062-cd09-4b2e-a817-**330193a352c1:{ PackageName:[- emTestingem channel twenty.]}, 0b9ec891-5ef8-4085-9de2-**38bfa9ea327e:{ PackageName:[- emTestingem channel twenty.]}}} It is difficult to deserialize this json because the guid is in the attribute name. Is that solveable (using c#)?
Running Solr in a cluster - high availability only
Hi, I would like to run two Solr instances on different computers as a cluster. My main interest is High availability - meaning, in case one server crashes or is down there will be always another one. (my performances on a single instance are great. I do not need to split the data to two servers.) Questions: 1. What is the best practice? Is it different than clustering for index splitting? Do I need Shards? 2. Do I need zoo keeper? 3. Is it a container based configuration (different for jetty and tomcat) 4, Do I need an external NLB for that ? 5. When one computer is up after crashing. how dows it updates its index?
Re: two types of answers in my query
This will work. Thanks. On Tue, Jul 9, 2013 at 4:37 PM, Jack Krupansky j...@basetechnology.comwrote: Usually a car term and a car part term will look radically different. So, simply use the edismax query parser and set qf to be both the car and car part fields. If either matches, the document will be selected. And if you have a type field, you can check that to see if a car or part was matched in the results. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, July 09, 2013 2:38 AM To: solr-user@lucene.apache.org Subject: two types of answers in my query Hi, A general question: Let's say I have Car And CarParts 1:n relation. And I have discovered that the user had entered in the search field instead of car name - a part serial number (SKU). (I discovered it useing regex) Is there a way to fetch different types of answers in Solr? Is there a way to fetch mixed types in the answers? Is there something similiar to that and how is that feature called? Thank you.
Disabling workd breaking for codes and SKUs
Some of the data in my index is SKUs and barcodes as follows ASDF3-DASDD-2133DD-21H44 I want to disable the wordbreaking for this type (maybe through Regex. Is there a possible way to do this?
two types of answers in my query
Hi, A general question: Let's say I have Car And CarParts 1:n relation. And I have discovered that the user had entered in the search field instead of car name - a part serial number (SKU). (I discovered it useing regex) Is there a way to fetch different types of answers in Solr? Is there a way to fetch mixed types in the answers? Is there something similiar to that and how is that feature called? Thank you.
Solr - Delta Query Via Full Import
I am using DIH to fetch rows from db to solr. I have many 1:n relations and I can do it only if I use caching (super fast) Therefor I am adding the following attributes to my inner entity processor=CachedSqlEntityProcessor cacheKey= cacheLookup= Everything works great and fast. (First the n tables are queried than the main entity.) Now I want configured the delta import. And it is not actually working. I know that by standardhttp://wiki.apache.org/solr/DataImportHandler#Delta-Import_Example I need to define the following attributes: 1. query - Initial Query 2. DeltaQuery - The rows that were changed 3. DeltaImportQuery - Fetch the data that was changed 4. parentDeltaQuery - The Keys of the parent entity that has changed rows in the current entity (2-4 only used in delta queries) And I have seen in a hack in the documentshttp://wiki.apache.org/solr/DataImportHandler#Delta-Import_Example that you can do delta query via full import. So instead of adding the following attribute - Query,deltaImportQuery,deltaQuery -I can just add query and call full instead of delta. Problem - Only the first query (main entity) is executed when I run the full import without clean. Here is a part of my configuration in data-config.xml (I have left deltaImportQuery though I call only full import) entity name=PackageVersion pk=PackageVersionId query= select from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId Where '${dataimporter.request.clean}' != 'false' OR Package.LastModificationTime '${dataimporter.last_index_time}' OR PackageVersion.Timestamp '${dataimporter.last_index_time}' deltaImportQuery=select ... from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId Where '${dataimporter.request.clean}' != 'false' OR Package.LastModificationTime '${dataimporter.last_index_time}' OR PackageVersion.Timestamp '${dataimporter.last_index_time}' and ID=='${dih.delta.id}' entity name=PackageTag pk=ResourceId processor=CachedSqlEntityProcessor cacheKey=ResourceId cacheLookup=PackageVersion.PackageId query= SELECT ResourceId,[Text] PackageTag from [dbo].[Tag] Tag Where '${dataimporter.request.clean}' = 'true' OR Tag.TimeStamp '${dataimporter.last_index_time}' parentDeltaQuery=select PackageVersion.PackageVersionId from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion ON Package.Id = PackageVersion.PackageId where Package.Id=${PackageTag.ResourceId} /entity /entity
parent Import Query doent run
I have 1:n relation between my main entity(PackageVersion) and its tag in my DB. I add a new tag with this date to the db at the timestamp and I run delta import command. the select retrieves the line but i dont see any other sql. Here are my data-config.xml configurations: entity name=PackageVersion pk=PackageVersionId query= select ... from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId deltaQuery = select PackageVersion.Id PackageVersionId from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId where Package.LastModificationTime '${dataimporter.last_index_time}' OR PackageVersion.Timestamp '${dataimporter.last_index_time}' deltaImportQuery=select ... from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId Where PackageVersionId=='${dih.delta.id}' entity name=PackageTag pk=ResourceId processor=CachedSqlEntityProcessor cacheKey=ResourceId cacheLookup=PackageVersion.PackageId query= SELECT ResourceId,[Text] PackageTag from [dbo].[Tag] Tag deltaQuery=SELECT ResourceId,[Text] PackageTag from [dbo].[Tag] Tag Where Tag.TimeStamp '${dataimporter.last_index_time}' parentDeltaQuery=select PackageVersion.PackageVersionId from [dbo].[Package] where Package.Id=${PackageVersion.PackageVersionId} /entity /entity
Solr - working with delta import and cache
I have two entities in 1:n relation - PackageVersion and Tag. I have configured DIH to use CachedSqlEntityProcessor and everything works as planned. First, Tag entity is selected using the query attribute. Then the main entity. Ultra Fast. Now I am adding the delta import. Everything runs and loads, but too slow. Looking at the db profiler output i see : 1. the delta query of the inner entities run first - which is good. 2. the delta query of the main entities runs later - which is still good. 3. deltaImportQuery of the main entity with each of the ID's runs as a single select can be improved using where in all the result. Is it possible? 4. All of the Query attribute of the other tables are running now. This is bad. (In real life I have more than one table in 1:n connection). for instance I get a lot of select ResourceId,[Text] PackageTag from [dbo].[Tag] Tag Where ResourceType = 0 run. Because it is from the Query attribute - there is no where clause for using the ids. a. How can I fix it ? b. Can I translate the importquery to use where in c. There is no real order for all the select when requesting deltaImport. is it possible to implement the caching also when updating delta? Here is my configuration entity name=PackageVersion pk=PackageVersionId query= select from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId deltaQuery = select PackageVersion.Id PackageVersionId from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId where Package.LastModificationTime '${dataimporter.last_index_time}' OR PackageVersion.Timestamp '${dih.last_index_time}' deltaImportQuery= select from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId Where PackageVersion.Id='${dih.delta.PackageVersionId}' entity name=PackageTag pk=ResourceId processor=CachedSqlEntityProcessor cacheKey=ResourceId cacheLookup=PackageVersion.PackageId query=select ResourceId,[Text] PackageTag from [dbo].[Tag] Tag Where ResourceType = 0 deltaQuery=select ResourceId,[Text] PackageTag from [dbo].[Tag] Tag Where ResourceType = 0 and Tag.TimeStamp '${dih.last_index_time}' parentDeltaQuery=select PackageVersion.PackageVersionId from [dbo].[Package] where Package.Id=${PackageTag.ResourceId} /entity /entity
Re: Solr - working with delta import and cache
BTW: Just found out that a delta import is only supported by the SqlEntityProcessor . Does it matter that I defined processor=CachedSqlEntityProcessor? On Tue, Jul 2, 2013 at 5:58 PM, Mysurf Mail stammail...@gmail.com wrote: I have two entities in 1:n relation - PackageVersion and Tag. I have configured DIH to use CachedSqlEntityProcessor and everything works as planned. First, Tag entity is selected using the query attribute. Then the main entity. Ultra Fast. Now I am adding the delta import. Everything runs and loads, but too slow. Looking at the db profiler output i see : 1. the delta query of the inner entities run first - which is good. 2. the delta query of the main entities runs later - which is still good. 3. deltaImportQuery of the main entity with each of the ID's runs as a single select can be improved using where in all the result. Is it possible? 4. All of the Query attribute of the other tables are running now. This is bad. (In real life I have more than one table in 1:n connection). for instance I get a lot of select ResourceId,[Text] PackageTag from [dbo].[Tag] Tag Where ResourceType = 0 run. Because it is from the Query attribute - there is no where clause for using the ids. a. How can I fix it ? b. Can I translate the importquery to use where in c. There is no real order for all the select when requesting deltaImport. is it possible to implement the caching also when updating delta? Here is my configuration entity name=PackageVersion pk=PackageVersionId query= select from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId deltaQuery = select PackageVersion.Id PackageVersionId from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId where Package.LastModificationTime '${dataimporter.last_index_time}' OR PackageVersion.Timestamp '${dih.last_index_time}' deltaImportQuery= select from [dbo].[Package] Package inner join [dbo].[PackageVersion] PackageVersion on Package.Id = PackageVersion.PackageId Where PackageVersion.Id='${dih.delta.PackageVersionId}' entity name=PackageTag pk=ResourceId processor=CachedSqlEntityProcessor cacheKey=ResourceId cacheLookup=PackageVersion.PackageId query=select ResourceId,[Text] PackageTag from [dbo].[Tag] Tag Where ResourceType = 0 deltaQuery=select ResourceId,[Text] PackageTag from [dbo].[Tag] Tag Where ResourceType = 0 and Tag.TimeStamp '${dih.last_index_time}' parentDeltaQuery=select PackageVersion.PackageVersionId from [dbo].[Package] where Package.Id=${PackageTag.ResourceId} /entity /entity
Is there a way to speed up my import
I have a relational database model This is the basics of my data-config.xml entity name=MyMainEntity pk=pID query=select ... from [dbo].[TableA] inner join TableB on ... entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2] where ResourceId = '${MyMainEntity.pId}'/entity entity name=Entity1 pk=Id2 query=SELECT [Text] Tag from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity entity name=LibraryItem pk=ResourceId query=select SKU FROM [TableB] INNER JOIN ... ON ... INNER JOIN ... ON ... WHERE ... AND ...' /entity /entity Now, this takes a lot of time. 1 rows in the first query and then each other inner entities are fetched later (around 10 rows each). If I use a db profiler I see a the three inner entities query running over and over (3 select sentences than again 3 select sentences over and over) This is really not efficient. And the import can run over 40 hrs () Now, What are my options to run it faster . 1. Obviously there is an option to flat the tables to one big table - but that will create a lot of other side effects. I would really like to avoid that extra effort and run solr on my production relational tables. So far it works great out of the box and I am searching here if there is a configuration tweak. 2. If I will flat the rows that - does the schema.xml need to be change too? or the same fields that are multivalued will keep being multivalued. Thanks.
Re: Is there a way to speed up my import
I just configured with the caching and it works mighty fast now. Instead of unbelievable amount queries it queris only 4 times. CPU usage has moved from the db to the solr computer but only for a very short time. Problem : I dont see the multi value fields (Inner Entities) anymore This is my configuration entity name=PackageVersion pk=PackageVersionId query=select PackageVersion.Id PackageVersionId, from entity name=PackageTag pk=ResourceId processor=CachedSqlEntityProcessor where=ResourceId = '${PackageVersion.PackageId}' query=SELECT [Text] PackageTag from [dbo].[Tag] /entity entity name=PackageVersionTag pk=ResourceId processor=CachedSqlEntityProcessor where=ResourceId = PackageVersion.PackageVersionId query=SELECT [Text] PackageVersionTag from [dbo].[Tag] /entity entity name=LibraryItem pk=ResourceId processor=CachedSqlEntityProcessor where=Asset.[PackageVersionId] = PackageVersion.PackageVersionId query=select CatalogVendorPartNum SKU, LibraryItems.[Description] SKUDescription FROM ... INNER JOIN ... ON Asset.Id = LibraryVendors.DesignProjectId INNER JOIN ... ON LibraryVendors.LibraryVendorId = LibraryItems.LibraryVendorId WHERE Asset.[AssetTypeId]=1 /entity /entity Now, when I query http://localhost:8983/solr/vaultCache/select?q=*indent=true it returns only the main entity attriburtes. Where are my inner entities attributes now? Thanks a lot. On Thu, Jun 27, 2013 at 10:15 AM, Gora Mohanty g...@mimirtech.com wrote: On 27 June 2013 12:32, Mysurf Mail stammail...@gmail.com wrote: I have a relational database model This is the basics of my data-config.xml entity name=MyMainEntity pk=pID query=select ... from [dbo].[TableA] inner join TableB on ... entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2] where ResourceId = '${MyMainEntity.pId}'/entity entity name=Entity1 pk=Id2 query=SELECT [Text] Tag from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity entity name=LibraryItem pk=ResourceId query=select SKU FROM [TableB] INNER JOIN ... ON ... INNER JOIN ... ON ... WHERE ... AND ...' /entity /entity Now, this takes a lot of time. 1 rows in the first query and then each other inner entities are fetched later (around 10 rows each). If I use a db profiler I see a the three inner entities query running over and over (3 select sentences than again 3 select sentences over and over) This is really not efficient. And the import can run over 40 hrs () Now, What are my options to run it faster . 1. Obviously there is an option to flat the tables to one big table - but that will create a lot of other side effects. I would really like to avoid that extra effort and run solr on my production relational tables. So far it works great out of the box and I am searching here if there is a configuration tweak. 2. If I will flat the rows that - does the schema.xml need to be change too? or the same fields that are multivalued will keep being multivalued. You have not shared your actual queries, so it is difficult to tell, but my guess would be that it is the JOINs that are the bottle-neck rather than the SELECTs. You should start by: 1. Profile queries from the database back-end to see which are taking the most time, and try to simplify them. 2. Make sure that relevant database columns are indexed. This can make a huge difference, though going overboard in indexing all columns might be counter-productive. 3. Use Solr DIH's CachedSqlEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor 4. Measure the time that Solr indexing takes: From your description, you seem to be guessing at it. In general, you should not flatten the records in the database as that is supposed to be relational data. Regards, Gora
Need assistance in defining search urls
Now, each doc looks like this (i generated random user text in the freetext columns in the DB) doc str name=PackageNameWe have located the ship./str arr name= CatalogVendorPartNum strd1771fc0-d3c2-472d-aa33-4bf5d1b79992/str str b2986a4f-9687-404c-8d45-57b073d900f7/str str a99cf760-d78e-493f-a827-585d11a765f3/str str ba349832-c655-4a02-a552-d5b76b45d58c/str str 35e86a61-eba8-49f4-95af-8915bd9561ac/str str 6d8eb7d9-b417-4bda-b544-16bc26ab1d85/str str 31453eff-be19-4193-950f-fffcea70ef9e/str str 08e27e4f-3d07-4ede-a01d-4fdea3f7ddb0/str str 79a19a3f-3f1b-486f-9a84-3fb40c41e9c7/str str b34c6f78-75b1-42f1-8ec7-e03d874497df/str /arr float name=score 1.7437795/float/doc doc My searches are : (PackageName is deined as default search) 1. I try to search for any package that name has the word have or had or has 2. I try to search for any package that consists d1771fc0-d3c2-472d-aa33-4bf5d1b79992 Therefore I use this searches 1. http://localhost:8983/solr/vault/select?q=*have*fl=PackageName%2CscoredefType=edismaxstopwords=truelowercaseOperators=true questions : 1.a. even if i display all results, I dont get any results with has (inflections). Why? 1.b. what is the difference between *have*http://localhost:8983/solr/vault/select?q=*have*fl=PackageName%2CscoredefType=edismaxstopwords=truelowercaseOperators=true and havehttp://localhost:8983/solr/vault/select?q=*have*fl=PackageName%2CscoredefType=edismaxstopwords=truelowercaseOperators=true. the score is differnt. 2. http://localhost:8983/solr/vault/select?q=*:d1771fc0-d3c2-472d-aa33-4bf5d1b79992fl=PackageName,scoredefType=edismaxstopwords=truelowercaseOperators=truestart=0rows=300 Questions: 2.a. I get no result. even though i search it on all fields. (*) and it appears in 2.b. If I want to search on more than one field i.e. packageName description, what is the best way to do it? define all as default? Thanks,
What should be the definitions ( field type ) for a field that will be search with user free text
currently I am using text_general. I want to search with user free text search, therefor I would like tokenization, stemmings ... How do I define stemmers? Should I use text_en instead of text_general? Thank you.
Re: Need assistance in defining search urls
Thanks Jack and Giovanni. Jack: Regarding 1.b. have vs *have* the results were identical apart from the score. Basically i cant do all the stuff you recommended. I want a stemmer for an unknown search (send the query when user enters free text to a textbox ). giovanni- regarding requestHandler test will I need to query using /test/...? shouldnt it be names /test? . On Mon, Jun 24, 2013 at 4:40 PM, Jack Krupansky j...@basetechnology.comwrote: I don't get any results with has (inflections). Why? Wildcard patterns on strings are literal, exact. There is no automatic natural language processing. You could try a regular expression match: q=/ ha(s|ve) / Or, just use OR: q=*has* OR *have* Or, use a copyField of the package name to a text field and than you can use simple keywords: q=package_name_text:(has OR have) Is PackageName a string field? Or, maybe best, use an update processor to populate a Boolean field to indicate whether the has/have pattern is seen in the package name. A simple JavaScript script with a StatelessScriptUpdateProcessor could do this in just a couple of lines and make the query much faster. For question 1.b the two queries seem identical - was that the case? There is no *: feature to query all fields in Solr - although the LucidWorks Search query parser does support that feature. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Monday, June 24, 2013 7:26 AM To: solr-user@lucene.apache.org Subject: Need assistance in defining search urls Now, each doc looks like this (i generated random user text in the freetext columns in the DB) doc str name=PackageNameWe have located the ship./str arr name= CatalogVendorPartNum strd1771fc0-d3c2-472d-aa33-**4bf5d1b79992/str str b2986a4f-9687-404c-8d45-**57b073d900f7/str str a99cf760-d78e-493f-a827-**585d11a765f3/str str ba349832-c655-4a02-a552-**d5b76b45d58c/str str 35e86a61-eba8-49f4-95af-**8915bd9561ac/str str 6d8eb7d9-b417-4bda-b544-**16bc26ab1d85/str str 31453eff-be19-4193-950f-**fffcea70ef9e/str str 08e27e4f-3d07-4ede-a01d-**4fdea3f7ddb0/str str 79a19a3f-3f1b-486f-9a84-**3fb40c41e9c7/str str b34c6f78-75b1-42f1-8ec7-**e03d874497df/str /arr float name=score 1.7437795/float/doc doc My searches are : (PackageName is deined as default search) 1. I try to search for any package that name has the word have or had or has 2. I try to search for any package that consists d1771fc0-d3c2-472d-aa33-**4bf5d1b79992 Therefore I use this searches 1. http://localhost:8983/solr/**vault/select?q=*have*fl=** PackageName%2CscoredefType=**edismaxstopwords=true** lowercaseOperators=truehttp://localhost:8983/solr/vault/select?q=*have*fl=PackageName%2CscoredefType=edismaxstopwords=truelowercaseOperators=true questions : 1.a. even if i display all results, I dont get any results with has (inflections). Why? 1.b. what is the difference between *have*http://localhost:8983/**solr/vault/select?q=*have*fl=** PackageName%2CscoredefType=**edismaxstopwords=true** lowercaseOperators=truehttp://localhost:8983/solr/vault/select?q=*have*fl=PackageName%2CscoredefType=edismaxstopwords=truelowercaseOperators=true and havehttp://localhost:8983/**solr/vault/select?q=*have*fl=** PackageName%2CscoredefType=**edismaxstopwords=true** lowercaseOperators=truehttp://localhost:8983/solr/vault/select?q=*have*fl=PackageName%2CscoredefType=edismaxstopwords=truelowercaseOperators=true . the score is differnt. 2. http://localhost:8983/solr/**vault/select?q=*:d1771fc0-** d3c2-472d-aa33-4bf5d1b79992**fl=PackageName,scoredefType=** edismaxstopwords=true**lowercaseOperators=truestart=**0rows=300http://localhost:8983/solr/vault/select?q=*:d1771fc0-d3c2-472d-aa33-4bf5d1b79992fl=PackageName,scoredefType=edismaxstopwords=truelowercaseOperators=truestart=0rows=300 Questions: 2.a. I get no result. even though i search it on all fields. (*) and it appears in 2.b. If I want to search on more than one field i.e. packageName description, what is the best way to do it? define all as default? Thanks,
Re: Need assistance in defining search urls
Regarding There is no *: feature to query all fields in Solr When I enter the dashboard - solr/#/[core]/query the default is *:* and it brings everything. On Mon, Jun 24, 2013 at 5:41 PM, Mysurf Mail stammail...@gmail.com wrote: Thanks Jack and Giovanni. Jack: Regarding 1.b. have vs *have* the results were identical apart from the score. Basically i cant do all the stuff you recommended. I want a stemmer for an unknown search (send the query when user enters free text to a textbox ). giovanni- regarding requestHandler test will I need to query using /test/...? shouldnt it be names /test? . On Mon, Jun 24, 2013 at 4:40 PM, Jack Krupansky j...@basetechnology.comwrote: I don't get any results with has (inflections). Why? Wildcard patterns on strings are literal, exact. There is no automatic natural language processing. You could try a regular expression match: q=/ ha(s|ve) / Or, just use OR: q=*has* OR *have* Or, use a copyField of the package name to a text field and than you can use simple keywords: q=package_name_text:(has OR have) Is PackageName a string field? Or, maybe best, use an update processor to populate a Boolean field to indicate whether the has/have pattern is seen in the package name. A simple JavaScript script with a StatelessScriptUpdateProcessor could do this in just a couple of lines and make the query much faster. For question 1.b the two queries seem identical - was that the case? There is no *: feature to query all fields in Solr - although the LucidWorks Search query parser does support that feature. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Monday, June 24, 2013 7:26 AM To: solr-user@lucene.apache.org Subject: Need assistance in defining search urls Now, each doc looks like this (i generated random user text in the freetext columns in the DB) doc str name=PackageNameWe have located the ship./str arr name= CatalogVendorPartNum strd1771fc0-d3c2-472d-aa33-**4bf5d1b79992/str str b2986a4f-9687-404c-8d45-**57b073d900f7/str str a99cf760-d78e-493f-a827-**585d11a765f3/str str ba349832-c655-4a02-a552-**d5b76b45d58c/str str 35e86a61-eba8-49f4-95af-**8915bd9561ac/str str 6d8eb7d9-b417-4bda-b544-**16bc26ab1d85/str str 31453eff-be19-4193-950f-**fffcea70ef9e/str str 08e27e4f-3d07-4ede-a01d-**4fdea3f7ddb0/str str 79a19a3f-3f1b-486f-9a84-**3fb40c41e9c7/str str b34c6f78-75b1-42f1-8ec7-**e03d874497df/str /arr float name=score 1.7437795/float/doc doc My searches are : (PackageName is deined as default search) 1. I try to search for any package that name has the word have or had or has 2. I try to search for any package that consists d1771fc0-d3c2-472d-aa33-**4bf5d1b79992 Therefore I use this searches 1. http://localhost:8983/solr/**vault/select?q=*have*fl=** PackageName%2CscoredefType=**edismaxstopwords=true** lowercaseOperators=truehttp://localhost:8983/solr/vault/select?q=*have*fl=PackageName%2CscoredefType=edismaxstopwords=truelowercaseOperators=true questions : 1.a. even if i display all results, I dont get any results with has (inflections). Why? 1.b. what is the difference between *have*http://localhost:8983/**solr/vault/select?q=*have*fl=** PackageName%2CscoredefType=**edismaxstopwords=true** lowercaseOperators=truehttp://localhost:8983/solr/vault/select?q=*have*fl=PackageName%2CscoredefType=edismaxstopwords=truelowercaseOperators=true and havehttp://localhost:8983/**solr/vault/select?q=*have*fl=** PackageName%2CscoredefType=**edismaxstopwords=true** lowercaseOperators=truehttp://localhost:8983/solr/vault/select?q=*have*fl=PackageName%2CscoredefType=edismaxstopwords=truelowercaseOperators=true . the score is differnt. 2. http://localhost:8983/solr/**vault/select?q=*:d1771fc0-** d3c2-472d-aa33-4bf5d1b79992**fl=PackageName,scoredefType=** edismaxstopwords=true**lowercaseOperators=truestart=**0rows=300http://localhost:8983/solr/vault/select?q=*:d1771fc0-d3c2-472d-aa33-4bf5d1b79992fl=PackageName,scoredefType=edismaxstopwords=truelowercaseOperators=truestart=0rows=300 Questions: 2.a. I get no result. even though i search it on all fields. (*) and it appears in 2.b. If I want to search on more than one field i.e. packageName description, what is the best way to do it? define all as default? Thanks,
why does the uniqueKey has to be indexed.
Currently, I cant describe my unique key with indexed false. As I understand from the docs the field attribute indexed should be true only if i want the field to be searchable or sortable. Let's say I have a schema with id and name only, wouldn't I want the following configuration id - indexed false, stored = true name indexed true, stored = true I don't want the id to be searched but I would want it to be defined as the unique key and to be stored (for retrieval).
Re: What should be the definitions ( field type ) for a field that will be search with user free text
Thanks. On Mon, Jun 24, 2013 at 5:52 PM, Jack Krupansky j...@basetechnology.comwrote: The general idea is that tokenization can generally be done in a language-independent manner, but stemming, synonyms, stop words, etc. must be done in a language-dependent manner. So, yes, text_en is a better starting point for adding in the more advanced language processing features. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Monday, June 24, 2013 10:26 AM To: solr-user@lucene.apache.org Subject: What should be the definitions ( field type ) for a field that will be search with user free text currently I am using text_general. I want to search with user free text search, therefor I would like tokenization, stemmings ... How do I define stemmers? Should I use text_en instead of text_general? Thank you.
Re: modeling multiple values on 1:n connection
Thanks for your comment. What I need is to model it so that i can connect between the featureName and the feature description of the. Currently if item has 3 features I get two list - each three elements long. But then I need to correlate them. On Sun, Jun 23, 2013 at 9:25 AM, Gora Mohanty g...@mimirtech.com wrote: On 23 June 2013 01:31, Mysurf Mail stammail...@gmail.com wrote: I try to model my db using thishttp://wiki.apache.org/solr/DataImportHandler#Full_Import_Example example from solr wiki. I have a table called item and a table called features with id,featureName,description here is the updated xml (added featureName) dataConfig dataSource driver=org.hsqldb.jdbcDriver url=jdbc:hsqldb:/temp/example/ex user=sa / document entity name=item query=select * from item entity name=feature query=select description, featureName as features from feature where item_id='${item.ID}'/ /entity /document Now I get two lists in the xml element doc arr name=featureName strnumber of miles in every direction the universal cataclysm was gathering/str strAll around the Restaurant people and things relaxed and chatted. The/str str- Do we have... - he put up a hand to hold back the cheers, - Do we/str /arr arr name=description strto a stupefying climax. Glancing at his watch, Max returned to the stage/str strair was filled with talk of this and that, and with the mingled scents of/str strhave a party here from the Zansellquasure Flamarion Bridge Club from/str /arr. /doc But I would like to see the list together (using xml attributes) so that I dont have to join the values. Is it possible? While it is not clear to me what you are asking, I am guessing that you do not want the featureName and description fields to appear as arrays. This is happening because you have defined them as multi-valued in the Solr schema. What exactly do you want to join here? Regards, Gora
modeling multiple values on 1:n connection
I try to model my db using thishttp://wiki.apache.org/solr/DataImportHandler#Full_Import_Exampleexample from solr wiki. I have a table called item and a table called features with id,featureName,description here is the updated xml (added featureName) dataConfig dataSource driver=org.hsqldb.jdbcDriver url=jdbc:hsqldb:/temp/example/ex user=sa / document entity name=item query=select * from item entity name=feature query=select description, featureName as features from feature where item_id='${item.ID}'/ /entity /document Now I get two lists in the xml element doc arr name=featureName strnumber of miles in every direction the universal cataclysm was gathering/str strAll around the Restaurant people and things relaxed and chatted. The/str str- Do we have... - he put up a hand to hold back the cheers, - Do we/str /arr arr name=description strto a stupefying climax. Glancing at his watch, Max returned to the stage/str strair was filled with talk of this and that, and with the mingled scents of/str strhave a party here from the Zansellquasure Flamarion Bridge Club from/str /arr. /doc But I would like to see the list together (using xml attributes) so that I dont have to join the values. Is it possible?
Re: How to define my data in schema.xml
Well, Avoiding flattening the db to a flat table sounds like a great plan. I found this solution http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example import.a join. not handling a flat table. On Tue, Jun 18, 2013 at 5:53 PM, Jack Krupansky j...@basetechnology.comwrote: You can in fact have multiple collections in Solr and do a limited amount of joining, and Solr has multivalued fields as well, but none of those techniques should be used to avoid the process of flattening and denormalizing a relational data model. It is hard work, but yes, it is required to use Solr effectively. Again, start with the queries - what problem are you trying to solve. Nobody stores data just for the sake of storing it - how will the data be used? -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, June 18, 2013 9:58 AM To: solr-user@lucene.apache.org Subject: Re: How to define my data in schema.xml Hi Jack, Thanks, for you kind comment. I am truly in the beginning of data modeling my schema over an existing working DB. I have used the school-teachers-student db as an example scenario. (a, I have written it as a disclaimer in my first post. b. I really do not know anyone that has 300 hobbies too.) In real life my db is obviously much different, I just used this as an example of potential pitfalls that will occur if I use my old db data modeling notions. obviously, the old relational modeling idioms do not apply here. Now, my question was referring to the fact that I would really like to avoid a flat table/join/view because of the reason listed above. So, my scenario is answering a plain user generated text search over a MSSQLDB that contains a few 1:n relation (and a few 1:n:n relationship). So, I come here for tips. Should I use one combined index (treat it as a nosql source) or separate indices or another. any other ways to define relation data ? Thanks. On Tue, Jun 18, 2013 at 4:30 PM, Jack Krupansky j...@basetechnology.com* *wrote: It sounds like you still have a lot of work to do on your data model. No matter how you slice it, 8 billion rows/fields/whatever is still way too much for any engine to search on a single server. If you have 8 billion of anything, a heavily sharded SolrCloud cluster is probably warranted. Don't plan ahead to put more than 100 million rows on a single node; plan on a proof of concept implementation to determine that number. When we in Solr land say flattened or denormalized, we mean in an intelligent, smart, thoughtful sense, not a mindless, mechanical flattening. It is an opportunity for you to reconsider your data models, both old and new. Maybe data modeling is beyond your skill set. If so, have a chat with your boss and ask for some assistance, training, whatever. Actually, I am suspicious of your 8 billion number - change each of those 300's to realistic, average numbers. Each teacher teaches 300 courses? Right. Each Student has 300 hobbies? If you say so, but... Don't worry about schema.xml until you get your data model under control. For an initial focus, try envisioning the use cases for user queries. That will guide you in thinking about how the data would need to be organized to satisfy those user queries. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, June 18, 2013 2:20 AM To: solr-user@lucene.apache.org Subject: Re: How to define my data in schema.xml Thanks for your reply. I have tried the simplest approach and it works absolutely fantastic. Huge table - 0s to result. two problems as I described earlier, and that is what I try to solve: 1. I create a flat table just for solar. This requires maintenance and develop. Can I run solr over my regular tables? This is my simplest approach. Working over my relational tables, 2. When you query a flat table by school name, as I described, if the school has 300 student, 300 teachers, 300 with 300 teacherCourses, 300 studentHobbies, you get 8.1 Billion rows (300*300*300*300). As I am sure this will work great on solar - searching for the school name will retrieve 8.1 B rows. 3. Lets say all my searches are user generated free text search that is searching name and comments columns. Thanks. On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty g...@mimirtech.com wrote: On 18 June 2013 01:10, Mysurf Mail stammail...@gmail.com wrote: Thanks for your quick reply. Here are some notes: 1. Consider that all tables in my example have two columns: Name Description which I would like to index and search. 2. I have no other reason to create flat table other than for solar. So I would like to see if I can avoid it. 3. If in my example I will have a flat table then obviously it will hold a lot of rows for a single school. By searching the exact school name I will likely receive a lot of rows. (my flat table has its own pk) Yes, all
Re: Solr data files
Thanks., On Mon, Jun 17, 2013 at 10:42 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: The index files are under the the collection's directory in the subdirectory called 'data'. Right next to the directory called 'conf' where your schema.xml and solrconfig.xml live. If the Solr is not running, you can delete that directory to clear the index content. I don't think you can do that while Solr is running. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Jun 17, 2013 at 3:33 PM, Mysurf Mail stammail...@gmail.com wrote: Where are the core data files located? Can I just delete folder/files in order to quick clean the core/indexes? Thanks
Re: How to define my data in schema.xml
Thanks for your reply. I have tried the simplest approach and it works absolutely fantastic. Huge table - 0s to result. two problems as I described earlier, and that is what I try to solve: 1. I create a flat table just for solar. This requires maintenance and develop. Can I run solr over my regular tables? This is my simplest approach. Working over my relational tables, 2. When you query a flat table by school name, as I described, if the school has 300 student, 300 teachers, 300 with 300 teacherCourses, 300 studentHobbies, you get 8.1 Billion rows (300*300*300*300). As I am sure this will work great on solar - searching for the school name will retrieve 8.1 B rows. 3. Lets say all my searches are user generated free text search that is searching name and comments columns. Thanks. On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty g...@mimirtech.com wrote: On 18 June 2013 01:10, Mysurf Mail stammail...@gmail.com wrote: Thanks for your quick reply. Here are some notes: 1. Consider that all tables in my example have two columns: Name Description which I would like to index and search. 2. I have no other reason to create flat table other than for solar. So I would like to see if I can avoid it. 3. If in my example I will have a flat table then obviously it will hold a lot of rows for a single school. By searching the exact school name I will likely receive a lot of rows. (my flat table has its own pk) Yes, all of this is definitely the case, but in practice it does not matter. Solr can efficiently search through millions of rows. To start with, just try the simplest approach, and only complicate things as and when needed. That is something I would like to avoid and I thought I can avoid this by defining teachers and students as multiple value or something like this and than teacherCourses and studentHobbies as 1:n respectively. This is quite similiar to my real life demand, so I came here to get some tips as a solr noob. You have still not described what are the searches that you would want to do. Again, I would suggest starting with the most straightforward approach. Regards, Gora
implementing identity authentication in SOLR
Hi, In order to add solr to my prod environmnet I have to implement some security restriction. Is there a way to add user/pass to the requests and to keep them *encrypted*in a file. thanks.
Re: implementing identity authentication in SOLR
Just to make sure. In my previous question I was referring to the user/pass that queries the db. Now I was referring to the user/pass that i want for the solr http request. Think of it as if my user sends a request where he filter documents created by another user. I want to restrict that. I currently work in a .NET environment where we have identity provider that provides trusted claims to the http request. In similar situations I take the user name property from a trusted claim and not from a parameter in the url . I want to know how solr can restrict his http request/responses. Thank you. On Tue, Jun 18, 2013 at 10:56 AM, Gora Mohanty g...@mimirtech.com wrote: On 18 June 2013 13:10, Mysurf Mail stammail...@gmail.com wrote: Hi, In order to add solr to my prod environmnet I have to implement some security restriction. Is there a way to add user/pass to the requests and to keep them *encrypted*in a file. As mentioned earlier, no there is no built-in way of doing that if you are using the Solr DataImportHandler. Probably the easiest way would be to implement your own indexing using a library like SolrJ. Then, you can handle encryption as you wish. Regards, Gora
Re: Need assistance in defining solr to process user generated query text
great tip :-) On Tue, Jun 18, 2013 at 2:36 PM, Erick Erickson erickerick...@gmail.comwrote: if the _solr_ type is string, then you aren't getting any tokenization, so my dog has fleas is indexed as my dog has fleas, a single token. To search for individual words you need to use, say, the text_general type, which would index my dog has fleas Best Erick On Mon, Jun 17, 2013 at 11:26 AM, Mysurf Mail stammail...@gmail.com wrote: I have one fact table with a lot of string columns and a few GUIDs just for retreival (Not for search) On Mon, Jun 17, 2013 at 6:01 PM, Jack Krupansky j...@basetechnology.com wrote: It sounds like you have your text indexed in a string field (why the wildcards are needed), or that maybe you are using the keyword tokenizer rather than the standard tokenizer. What is your default or query fields for dismax/edismax? And what are the field types for those fields? -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Monday, June 17, 2013 10:51 AM To: solr-user@lucene.apache.org Subject: Need assistance in defining solr to process user generated query text Hi, I have been reading solr wiki pages and configured solr successfully over my flat table. I have a few question though regarding the querying and parsing of user generated text. 1. I have understood through this http://wiki.apache.org/solr/**DisMax http://wiki.apache.org/solr/DisMaxpage that I want to use dismax. Through this http://wiki.apache.org/solr/**LocalParams http://wiki.apache.org/solr/LocalParamspage I can do it using localparams But I think the best way is to define this in my xml files. Can I do this? 2.in this http://lucene.apache.org/**solr/4_3_0/tutorial.html http://lucene.apache.org/solr/4_3_0/tutorial.html **tutorial (solr) the following query appears http://localhost:8983/solr/#/**collection1/query?q=video http://localhost:8983/solr/#/collection1/query?q=video When I want to query my fact table I have to query using *video*. just video retrieves nothing. How can I query it using video only? 3. In this http://wiki.apache.org/solr/**ExtendedDisMax#Configuration http://wiki.apache.org/solr/ExtendedDisMax#Configuration **page it says that Extended DisMax is already configured in the example configuration, with the name edismax But I see it only in the /browse requestHandler as follows: requestHandler name=/browse class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/**str ... !-- Query settings -- str name=defTypeedismax/str Do I use it also when I use select in my url ? 4. In general, I want to transfer a user generated text to my url request using the most standard rules (translate ,+,- signs to the q parameter value). What is the best way to Thanks.
Re: Is there a way to encrypt username and pass in the solr config file
@Gora: yes. User name and pass. On Tue, Jun 18, 2013 at 2:57 PM, Gora Mohanty g...@mimirtech.com wrote: On 18 June 2013 17:16, Erick Erickson erickerick...@gmail.com wrote: What do you mean encrypt? The stored value? the indexed value? Over the wire? [...] My understanding was that he wanted to encrypt the username/password in the DIH configuration file. Mysurf Mail, could you please clarify? Regards, Gora
Re: How to define my data in schema.xml
Hi Jack, Thanks, for you kind comment. I am truly in the beginning of data modeling my schema over an existing working DB. I have used the school-teachers-student db as an example scenario. (a, I have written it as a disclaimer in my first post. b. I really do not know anyone that has 300 hobbies too.) In real life my db is obviously much different, I just used this as an example of potential pitfalls that will occur if I use my old db data modeling notions. obviously, the old relational modeling idioms do not apply here. Now, my question was referring to the fact that I would really like to avoid a flat table/join/view because of the reason listed above. So, my scenario is answering a plain user generated text search over a MSSQLDB that contains a few 1:n relation (and a few 1:n:n relationship). So, I come here for tips. Should I use one combined index (treat it as a nosql source) or separate indices or another. any other ways to define relation data ? Thanks. On Tue, Jun 18, 2013 at 4:30 PM, Jack Krupansky j...@basetechnology.comwrote: It sounds like you still have a lot of work to do on your data model. No matter how you slice it, 8 billion rows/fields/whatever is still way too much for any engine to search on a single server. If you have 8 billion of anything, a heavily sharded SolrCloud cluster is probably warranted. Don't plan ahead to put more than 100 million rows on a single node; plan on a proof of concept implementation to determine that number. When we in Solr land say flattened or denormalized, we mean in an intelligent, smart, thoughtful sense, not a mindless, mechanical flattening. It is an opportunity for you to reconsider your data models, both old and new. Maybe data modeling is beyond your skill set. If so, have a chat with your boss and ask for some assistance, training, whatever. Actually, I am suspicious of your 8 billion number - change each of those 300's to realistic, average numbers. Each teacher teaches 300 courses? Right. Each Student has 300 hobbies? If you say so, but... Don't worry about schema.xml until you get your data model under control. For an initial focus, try envisioning the use cases for user queries. That will guide you in thinking about how the data would need to be organized to satisfy those user queries. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, June 18, 2013 2:20 AM To: solr-user@lucene.apache.org Subject: Re: How to define my data in schema.xml Thanks for your reply. I have tried the simplest approach and it works absolutely fantastic. Huge table - 0s to result. two problems as I described earlier, and that is what I try to solve: 1. I create a flat table just for solar. This requires maintenance and develop. Can I run solr over my regular tables? This is my simplest approach. Working over my relational tables, 2. When you query a flat table by school name, as I described, if the school has 300 student, 300 teachers, 300 with 300 teacherCourses, 300 studentHobbies, you get 8.1 Billion rows (300*300*300*300). As I am sure this will work great on solar - searching for the school name will retrieve 8.1 B rows. 3. Lets say all my searches are user generated free text search that is searching name and comments columns. Thanks. On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty g...@mimirtech.com wrote: On 18 June 2013 01:10, Mysurf Mail stammail...@gmail.com wrote: Thanks for your quick reply. Here are some notes: 1. Consider that all tables in my example have two columns: Name Description which I would like to index and search. 2. I have no other reason to create flat table other than for solar. So I would like to see if I can avoid it. 3. If in my example I will have a flat table then obviously it will hold a lot of rows for a single school. By searching the exact school name I will likely receive a lot of rows. (my flat table has its own pk) Yes, all of this is definitely the case, but in practice it does not matter. Solr can efficiently search through millions of rows. To start with, just try the simplest approach, and only complicate things as and when needed. That is something I would like to avoid and I thought I can avoid this by defining teachers and students as multiple value or something like this and than teacherCourses and studentHobbies as 1:n respectively. This is quite similiar to my real life demand, so I came here to get some tips as a solr noob. You have still not described what are the searches that you would want to do. Again, I would suggest starting with the most straightforward approach. Regards, Gora
Need assistance in defining solr to process user generated query text
Hi, I have been reading solr wiki pages and configured solr successfully over my flat table. I have a few question though regarding the querying and parsing of user generated text. 1. I have understood through this http://wiki.apache.org/solr/DisMaxpage that I want to use dismax. Through this http://wiki.apache.org/solr/LocalParamspage I can do it using localparams But I think the best way is to define this in my xml files. Can I do this? 2.in this http://lucene.apache.org/solr/4_3_0/tutorial.htmltutorial (solr) the following query appears http://localhost:8983/solr/#/collection1/query?q=video When I want to query my fact table I have to query using *video*. just video retrieves nothing. How can I query it using video only? 3. In this http://wiki.apache.org/solr/ExtendedDisMax#Configurationpage it says that Extended DisMax is already configured in the example configuration, with the name edismax But I see it only in the /browse requestHandler as follows: requestHandler name=/browse class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str ... !-- Query settings -- str name=defTypeedismax/str Do I use it also when I use select in my url ? 4. In general, I want to transfer a user generated text to my url request using the most standard rules (translate ,+,- signs to the q parameter value). What is the best way to Thanks.
Re: Need assistance in defining solr to process user generated query text
I have one fact table with a lot of string columns and a few GUIDs just for retreival (Not for search) On Mon, Jun 17, 2013 at 6:01 PM, Jack Krupansky j...@basetechnology.comwrote: It sounds like you have your text indexed in a string field (why the wildcards are needed), or that maybe you are using the keyword tokenizer rather than the standard tokenizer. What is your default or query fields for dismax/edismax? And what are the field types for those fields? -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Monday, June 17, 2013 10:51 AM To: solr-user@lucene.apache.org Subject: Need assistance in defining solr to process user generated query text Hi, I have been reading solr wiki pages and configured solr successfully over my flat table. I have a few question though regarding the querying and parsing of user generated text. 1. I have understood through this http://wiki.apache.org/solr/**DisMaxhttp://wiki.apache.org/solr/DisMaxpage that I want to use dismax. Through this http://wiki.apache.org/solr/**LocalParamshttp://wiki.apache.org/solr/LocalParamspage I can do it using localparams But I think the best way is to define this in my xml files. Can I do this? 2.in this http://lucene.apache.org/**solr/4_3_0/tutorial.htmlhttp://lucene.apache.org/solr/4_3_0/tutorial.html **tutorial (solr) the following query appears http://localhost:8983/solr/#/**collection1/query?q=videohttp://localhost:8983/solr/#/collection1/query?q=video When I want to query my fact table I have to query using *video*. just video retrieves nothing. How can I query it using video only? 3. In this http://wiki.apache.org/solr/**ExtendedDisMax#Configurationhttp://wiki.apache.org/solr/ExtendedDisMax#Configuration **page it says that Extended DisMax is already configured in the example configuration, with the name edismax But I see it only in the /browse requestHandler as follows: requestHandler name=/browse class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/**str ... !-- Query settings -- str name=defTypeedismax/str Do I use it also when I use select in my url ? 4. In general, I want to transfer a user generated text to my url request using the most standard rules (translate ,+,- signs to the q parameter value). What is the best way to Thanks.
How to define my data in schema.xml
Hi, I have created a flat table from my DB and defined a solr core on it. It works excellent so far. My problem is that my table has two hierarchies. So when flatted it is too big. Lets consider the following example scenario My Tables are School Students (1:n with school) Teachers(1:n with school) Now, each school has many students and teachers but each student/teacher has another multivalue field. i.e. the following table studentHobbies - 1:N with students teacherCourses - 1:N with teachers My main Entity is School and that what I want to get in the result. Flattening does not help me much and is very expensive. Can you direct me to how I define 1:n relationships ( and 1:n:n) In data-config.xml Thanks.
Is there a way to encrypt username and pass in the solr config file
Hi, I want to encrypt (rsa maybe?) my user name/pass in solr . Cant leave a simple plain text on the server. What is the recomended way? Thanks.
Solr data files
Where are the core data files located? Can I just delete folder/files in order to quick clean the core/indexes? Thanks
Re: How to define my data in schema.xml
Thanks for your quick reply. Here are some notes: 1. Consider that all tables in my example have two columns: Name Description which I would like to index and search. 2. I have no other reason to create flat table other than for solar. So I would like to see if I can avoid it. 3. If in my example I will have a flat table then obviously it will hold a lot of rows for a single school. By searching the exact school name I will likely receive a lot of rows. (my flat table has its own pk) That is something I would like to avoid and I thought I can avoid this by defining teachers and students as multiple value or something like this and than teacherCourses and studentHobbies as 1:n respectively. This is quite similiar to my real life demand, so I came here to get some tips as a solr noob. On Mon, Jun 17, 2013 at 9:08 PM, Gora Mohanty g...@mimirtech.com wrote: On 17 June 2013 21:39, Mysurf Mail stammail...@gmail.com wrote: Hi, I have created a flat table from my DB and defined a solr core on it. It works excellent so far. My problem is that my table has two hierarchies. So when flatted it is too big. What do you mean by too big? Have you actually tried indexing the data into Solr, and does the performance not meet your needs, or are you guessing from the size of the tables? Lets consider the following example scenario My Tables are School Students (1:n with school) Teachers(1:n with school) [...] Um, all of this crucially depends on what your 'n' is. Plus, you need to describe your use case in much more detail. At the moment, you are asking us to guess at what you are trying to do, which is inefficient, and unlikely to solve your problem. Regards, Gora
Re: Estimating the required volume to
Thanks for your answer. Can you please elaborate on mssql text searching is pretty primitive compared to Solr (Link or anything) Thanks. On Sun, Jun 2, 2013 at 4:54 PM, Erick Erickson erickerick...@gmail.comwrote: 1 Maybe, maybe not. mssql text searching is pretty primitive compared to Solr, just as Solr's db-like operations are primitive compared to mssql. They address different use-cases. So, you can store the docs in Solr and not touch your SQL db at all to return the docs. You can store just the IDs in Solr and retrieve your docs from the SQL store. You can store just enough data in Solr to display the results page and when the user tries to drill down you can go to your SQL database for assembling the full document. You can. It all depend on your use case, data size, all that rot. Very often, something like the DB is considered the system-of-record and it's indexed to Solr (See DIH or SolrJ) periodically. There is no underlying connection between your SQL store and Solr. You control when data is fetched from SQL and put into Solr. You control what the search experience is. etc. 2 Not really :(. See: http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best Erick On Sat, Jun 1, 2013 at 1:07 PM, Mysurf Mail stammail...@gmail.com wrote: Hi, I am just starting to learn about solr. I want to test it in my env working with ms sql server. I have followed the tutorial and imported some rows to the Solr. Now I have a few noob question regarding the benefits of implementing Solr on a sql environment. 1. As I understand, When I send a query request over http, I receive a result with ID from the Solr system and than I query the full object row from the db. Is that right? Is there a comparison next to ms sql full text search which retrieves the full object in the same select? Is there a comparison that relates to db/server cluster and multiple machines? 2. Is there a technic that will assist me to estimate the volume size I will need for the indexed data (obviously, based on the indexed data properties) ?
Re: Estimating the required volume to
Hi, Thanks for your answer. I want to refer to your message, because I am trying to choose the right tool. 1. regarding stemming: I am running in ms-sql SELECT * FROM sys.dm_fts_parser ('FORMSOF(INFLECTIONAL,provide)', 1033, 0, 0) and I receive group_id phrase_id occurrence special_term display_term expansion_type source_term 1 0 1 Exact Match *provided *2 provide 1 0 1 Exact Match *provides *2 provide 1 0 1 Exact Match *providing *2 provide 1 0 1 Exact Match *provide *0 provide isnt that stemming ? 2. Regarding synonyms sql server has a full thesaurus featurehttp://msdn.microsoft.com/en-us/library/ms142491.aspx. Doesnt it mean synonyms? On Mon, Jun 3, 2013 at 2:43 PM, Erick Erickson erickerick...@gmail.comwrote: Here's a link to various transformations you can do while indexing and searching in Solr: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Consider stemming ngrams WordDelimiterFilterFactory ASCIIFoldingFilterFactory phrase queries boosting synonyms blah blah blah You can't do a lot of these transformations, at least not easily in SQL. OTOH, you can't do 5-way joins in Solr. Different problems, different tools All that said, there's no good reason to use Solr if your use-case is satisfied by simple keyword searches that have no transformations, mysql etc. work just fine in those cases. It's all about selecting the right tool for the use-case. FWIW, Erick On Mon, Jun 3, 2013 at 4:44 AM, Mysurf Mail stammail...@gmail.com wrote: Thanks for your answer. Can you please elaborate on mssql text searching is pretty primitive compared to Solr (Link or anything) Thanks. On Sun, Jun 2, 2013 at 4:54 PM, Erick Erickson erickerick...@gmail.com wrote: 1 Maybe, maybe not. mssql text searching is pretty primitive compared to Solr, just as Solr's db-like operations are primitive compared to mssql. They address different use-cases. So, you can store the docs in Solr and not touch your SQL db at all to return the docs. You can store just the IDs in Solr and retrieve your docs from the SQL store. You can store just enough data in Solr to display the results page and when the user tries to drill down you can go to your SQL database for assembling the full document. You can. It all depend on your use case, data size, all that rot. Very often, something like the DB is considered the system-of-record and it's indexed to Solr (See DIH or SolrJ) periodically. There is no underlying connection between your SQL store and Solr. You control when data is fetched from SQL and put into Solr. You control what the search experience is. etc. 2 Not really :(. See: http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best Erick On Sat, Jun 1, 2013 at 1:07 PM, Mysurf Mail stammail...@gmail.com wrote: Hi, I am just starting to learn about solr. I want to test it in my env working with ms sql server. I have followed the tutorial and imported some rows to the Solr. Now I have a few noob question regarding the benefits of implementing Solr on a sql environment. 1. As I understand, When I send a query request over http, I receive a result with ID from the Solr system and than I query the full object row from the db. Is that right? Is there a comparison next to ms sql full text search which retrieves the full object in the same select? Is there a comparison that relates to db/server cluster and multiple machines? 2. Is there a technic that will assist me to estimate the volume size I will need for the indexed data (obviously, based on the indexed data properties) ?
Clearing a specific index / all indice
I am running solr with two cores in solr.xml One is product (import from db) and one is collection1 (from the tutorial) Now in order to clear the index I run http://localhost:8983/solr/update?stream.body=deletequery*:*/query/delete http://localhost:8983/solr/update?stream.body=commit/ only the collection1 core (of the tutorial) is cleared. How can I clear a specific index? How can I clear all indice? Thanks.
word stem
Using solr over my sql db I query the following http://localhost:8983/solr/products/select?q=requirewt=xmlindent=truefl=*,score where the queried word require is found in the index since I imported the following: Each frame is hand-crafted in our Bothell facility to the optimum diameter and wall-thickness *required *of a premium mountain frame. The heat-treated welded aluminum frame has a larger diameter tube that absorbs the bumps. required!=require I try it in the analysis tool in the portal for debugging and I see in the fierld value the PST (stem) filter does make a token from the required as requir I write required in the debug query field and when I click on Analyse Values I see requir is highlited. But the http query only return values when I wuery required. not require. Thanks.
Re: installing configuring solr over ms sql server - tutorial needed
My problem was with sql server. This http://danpincas.com/2013/03/03/searching-with-solr-part-1.html is a great step by step On Sat, Jun 1, 2013 at 2:06 AM, bbarani bbar...@gmail.com wrote: Why dont you follow this one tutorial to set the SOLR on tomcat.. http://wiki.apache.org/solr/SolrTomcat -- View this message in context: http://lucene.472066.n3.nabble.com/installing-configuring-solr-over-ms-sql-server-tutorial-needed-tp4067344p4067488.html Sent from the Solr - User mailing list archive at Nabble.com.
Estimating the required volume to
Hi, I am just starting to learn about solr. I want to test it in my env working with ms sql server. I have followed the tutorial and imported some rows to the Solr. Now I have a few noob question regarding the benefits of implementing Solr on a sql environment. 1. As I understand, When I send a query request over http, I receive a result with ID from the Solr system and than I query the full object row from the db. Is that right? Is there a comparison next to ms sql full text search which retrieves the full object in the same select? Is there a comparison that relates to db/server cluster and multiple machines? 2. Is there a technic that will assist me to estimate the volume size I will need for the indexed data (obviously, based on the indexed data properties) ?
installing configuring solr over ms sql server - tutorial needed
I am trying to config solr over ms sql server. I found only this tutorial http://www.chrisumbel.com/article/lucene_solr_sql_serverwhih is a bit old (2011) Is there an updated / formal tutorial?
Re: installing configuring solr over ms sql server - tutorial needed
Thanks. A tutorial on getting solr over mssql ? I didnt find it even with jetty On Fri, May 31, 2013 at 6:21 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: You have two mostly-separate issues here. Running Solr in Tomcat and indexing MSSql server. Try just running a default embedded-Jetty example until you get data import sorted out. Then, you can worry about Tomcat. And it would be easier to help with one problem at a time. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, May 31, 2013 at 11:03 AM, Mysurf Mail stammail...@gmail.com wrote: I am trying to config solr over ms sql server. I found only this tutorial http://www.chrisumbel.com/article/lucene_solr_sql_serverwhih is a bit old (2011) Is there an updated / formal tutorial?
Re: installing configuring solr over ms sql server - tutorial needed
for instance step 5 - Download and install a SQL Server JDBC drive. Where do I put it when using jetty? * Just asked here a question if an official tutorial for ms sql server exists before I try to go through several tutorials. On Fri, May 31, 2013 at 6:42 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: What's wrong with the one you found. Just ignore steps 1-4 and go right into driver and DIH setup. If you hit any problems, you now have a specific question to ask. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, May 31, 2013 at 11:29 AM, Mysurf Mail stammail...@gmail.com wrote: Thanks. A tutorial on getting solr over mssql ? I didnt find it even with jetty On Fri, May 31, 2013 at 6:21 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: You have two mostly-separate issues here. Running Solr in Tomcat and indexing MSSql server. Try just running a default embedded-Jetty example until you get data import sorted out. Then, you can worry about Tomcat. And it would be easier to help with one problem at a time. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, May 31, 2013 at 11:03 AM, Mysurf Mail stammail...@gmail.com wrote: I am trying to config solr over ms sql server. I found only this tutorial http://www.chrisumbel.com/article/lucene_solr_sql_serverwhih is a bit old (2011) Is there an updated / formal tutorial?
Re: installing configuring solr over ms sql server - tutorial needed
btw: The other stages still refer to location relative to tomcat On Sat, Jun 1, 2013 at 12:02 AM, Mysurf Mail stammail...@gmail.com wrote: for instance step 5 - Download and install a SQL Server JDBC drive. Where do I put it when using jetty? * Just asked here a question if an official tutorial for ms sql server exists before I try to go through several tutorials. On Fri, May 31, 2013 at 6:42 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's wrong with the one you found. Just ignore steps 1-4 and go right into driver and DIH setup. If you hit any problems, you now have a specific question to ask. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, May 31, 2013 at 11:29 AM, Mysurf Mail stammail...@gmail.com wrote: Thanks. A tutorial on getting solr over mssql ? I didnt find it even with jetty On Fri, May 31, 2013 at 6:21 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: You have two mostly-separate issues here. Running Solr in Tomcat and indexing MSSql server. Try just running a default embedded-Jetty example until you get data import sorted out. Then, you can worry about Tomcat. And it would be easier to help with one problem at a time. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, May 31, 2013 at 11:03 AM, Mysurf Mail stammail...@gmail.com wrote: I am trying to config solr over ms sql server. I found only this tutorial http://www.chrisumbel.com/article/lucene_solr_sql_serverwhih is a bit old (2011) Is there an updated / formal tutorial?
Re: installing configuring solr over ms sql server - tutorial needed
Hi, I am still having a problem with this http://www.chrisumbel.com/article/lucene_solr_sql_servertutorial trying to get solr on tomcat. in step 4 when I copy apache-solr-1.4.0\example\solr to my tomcat dir I get a folder with bin and collection1 folder. Do I need them? should I create conf under solr or under collection1? I dont have any solrconfig or schema files under solr. only under collection1. On Sat, Jun 1, 2013 at 12:26 AM, bbarani bbar...@gmail.com wrote: solrconfig.xml - the lib directives specified in the configuration file are the lib locations where Solr would look for the jars. solr.xml - In case of the Multi core setup, you can have a sharedLib for all the collections. You can add the jdbc driver into the sharedLib folder. -- View this message in context: http://lucene.472066.n3.nabble.com/installing-configuring-solr-over-ms-sql-server-tutorial-needed-tp4067344p4067465.html Sent from the Solr - User mailing list archive at Nabble.com.