document categorization using solr?
Hi, Does solr have something built in, or recommended add-on that does document categorization? ( I found a thread about a year ago, but not exact same topic) For example, here is a commercial categorization product that will take a website and categorize it http://grapeshot.co.uk/online-demo-3.php?url=http://www.solutionstreet.com I am looking for something similar that works with Solr/Lucene and is open source based. Seems like Weka (http://weka.wikispaces.com/Frequently+Asked +Questions) might be close, but not sure. Also not sure how to come up with a category list thanks Joel
Re: weird sorting behavior
Hi, After some further investigation, it turns out that null fields were sorting first, so if the title was null it was coming up first. This is true even with 1.5 and collatedROOT. (I tried on last nights build). So let me change my question, how do I make items with null values sort last? thanks Joel On Dec 30, 2009, at 3:11 PM, Joel Nylund wrote: Hi, so this is only available in 1.5? I tried in 1.4 and got : org.apache.solr.common.SolrException: Error loading class 'solr.CollationKeyFilterFactory' Is there a way to do this in 1.4? The link Shalin sent is a 1.5 link I think. thanks Joel On Dec 25, 2009, at 10:52 PM, Robert Muir wrote: Hello, as Shalin said, you might want to try CollationKeyFilterFactory. Below is an example (using the multilingual root locale), where the spaces will sort after the letters and numbers as you mentioned, but it will still not be case-sensitive. This is because strength is 'secondary'. But are you really sure you want the spaces sorted after the letters and numbers? Or instead do you just want them ignored for sorting? If this is the case, then try 'primary', so that spaces, punctuation, accents and things like that in addition to case are ignored in the sort: for example Test-1234 andtest1234 sort the same with primary, but not with secondary (the one with leading spaces will sort last) If all else fails, you can write custom rules for it too, as Shalin mentioned. fieldType name=collatedROOT class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=secondary / /analyzer /fieldType On Fri, Dec 25, 2009 at 5:37 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Dec 24, 2009 at 11:51 PM, Joel Nylund jnyl...@yahoo.com wrote: update, I tried changing to datatype string, and it sorts the numerics better, but the other sorts are not as good. Is there a way to control sorting for special chars, for example, I want blanks to sort after letters and numbers. In the general case, CollationKeyFilterFactory will do the trick. You could create a custom rule set which sorts spaces after letters and numbers. See http://wiki.apache.org/solr/UnicodeCollation using alphaOnlySort - sorts nicely for alpha, but numbers dont work string - sorts nicely for numbers and letters, but special chars like blanks show up first in the list alphaOnlySort has a PatternReplaceFilterFactory which removes all characters except a-z. This is the reason behind those wierd results. You could try removing that filter and see if thats what you need. -- Regards, Shalin Shekhar Mangar. -- Robert Muir rcm...@gmail.com
Re: weird sorting behavior
Thanks Erik, the null problem was introduced when I copied the example below, now I have the nulls excluded using (sortMissingLast=true), in 1.5 using the suggested config below and im still not seeing the desired behavior. It seems to me that the default behavior of the Java Collator using the ROOT locale (PRIMARY or SECONDARY dont seem to matter in this example) is as follows: empty string symbols (by this I mean $, , *, * etc) numerics alpha leading spaces My desire is: alpha numeric symbols leading spaces empty string Im going to try a custom RuleBasedCollator to see if I can make this happen as Shalin suggested. thanks Joel I On Dec 31, 2009, at 11:11 AM, Erick Erickson wrote: have you tried setting sortMissingLast=true in your schema.xml? Something like... fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ or perhaps in your individual field definition instead. The schema.xml examples have additional information that you really should scan at least HTH Erick On Thu, Dec 31, 2009 at 8:53 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, After some further investigation, it turns out that null fields were sorting first, so if the title was null it was coming up first. This is true even with 1.5 and collatedROOT. (I tried on last nights build). So let me change my question, how do I make items with null values sort last? thanks Joel On Dec 30, 2009, at 3:11 PM, Joel Nylund wrote: Hi, so this is only available in 1.5? I tried in 1.4 and got : org.apache.solr.common.SolrException: Error loading class 'solr.CollationKeyFilterFactory' Is there a way to do this in 1.4? The link Shalin sent is a 1.5 link I think. thanks Joel On Dec 25, 2009, at 10:52 PM, Robert Muir wrote: Hello, as Shalin said, you might want to try CollationKeyFilterFactory. Below is an example (using the multilingual root locale), where the spaces will sort after the letters and numbers as you mentioned, but it will still not be case-sensitive. This is because strength is 'secondary'. But are you really sure you want the spaces sorted after the letters and numbers? Or instead do you just want them ignored for sorting? If this is the case, then try 'primary', so that spaces, punctuation, accents and things like that in addition to case are ignored in the sort: for example Test-1234 andtest1234 sort the same with primary, but not with secondary (the one with leading spaces will sort last) If all else fails, you can write custom rules for it too, as Shalin mentioned. fieldType name=collatedROOT class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=secondary / /analyzer /fieldType On Fri, Dec 25, 2009 at 5:37 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Dec 24, 2009 at 11:51 PM, Joel Nylund jnyl...@yahoo.com wrote: update, I tried changing to datatype string, and it sorts the numerics better, but the other sorts are not as good. Is there a way to control sorting for special chars, for example, I want blanks to sort after letters and numbers. In the general case, CollationKeyFilterFactory will do the trick. You could create a custom rule set which sorts spaces after letters and numbers. See http://wiki.apache.org/solr/UnicodeCollation using alphaOnlySort - sorts nicely for alpha, but numbers dont work string - sorts nicely for numbers and letters, but special chars like blanks show up first in the list alphaOnlySort has a PatternReplaceFilterFactory which removes all characters except a-z. This is the reason behind those wierd results. You could try removing that filter and see if thats what you need. -- Regards, Shalin Shekhar Mangar. -- Robert Muir rcm...@gmail.com
Re: weird sorting behavior
Hi, so this is only available in 1.5? I tried in 1.4 and got : org.apache.solr.common.SolrException: Error loading class 'solr.CollationKeyFilterFactory' Is there a way to do this in 1.4? The link Shalin sent is a 1.5 link I think. thanks Joel On Dec 25, 2009, at 10:52 PM, Robert Muir wrote: Hello, as Shalin said, you might want to try CollationKeyFilterFactory. Below is an example (using the multilingual root locale), where the spaces will sort after the letters and numbers as you mentioned, but it will still not be case-sensitive. This is because strength is 'secondary'. But are you really sure you want the spaces sorted after the letters and numbers? Or instead do you just want them ignored for sorting? If this is the case, then try 'primary', so that spaces, punctuation, accents and things like that in addition to case are ignored in the sort: for example Test-1234 andtest1234 sort the same with primary, but not with secondary (the one with leading spaces will sort last) If all else fails, you can write custom rules for it too, as Shalin mentioned. fieldType name=collatedROOT class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=secondary / /analyzer /fieldType On Fri, Dec 25, 2009 at 5:37 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Dec 24, 2009 at 11:51 PM, Joel Nylund jnyl...@yahoo.com wrote: update, I tried changing to datatype string, and it sorts the numerics better, but the other sorts are not as good. Is there a way to control sorting for special chars, for example, I want blanks to sort after letters and numbers. In the general case, CollationKeyFilterFactory will do the trick. You could create a custom rule set which sorts spaces after letters and numbers. See http://wiki.apache.org/solr/UnicodeCollation using alphaOnlySort - sorts nicely for alpha, but numbers dont work string - sorts nicely for numbers and letters, but special chars like blanks show up first in the list alphaOnlySort has a PatternReplaceFilterFactory which removes all characters except a-z. This is the reason behind those wierd results. You could try removing that filter and see if thats what you need. -- Regards, Shalin Shekhar Mangar. -- Robert Muir rcm...@gmail.com
weird sorting behavior
I have a field: field name=title type=alphaOnlySort indexed=true stored=true required=false/ fieldType name=alphaOnlySort class=solr.TextField sortMissingLast=true omitNorms=true analyzer !-- KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token -- tokenizer class=solr.KeywordTokenizerFactory/ !-- The LowerCase TokenFilter does what you expect, which can be when you want your sorting to be case insensitive -- filter class=solr.LowerCaseFilterFactory / !-- The TrimFilter removes any leading or trailing whitespace -- filter class=solr.TrimFilterFactory / !-- The PatternReplaceFilter gives you the flexibility to use Java Regular expression to replace any sequence of characters matching a pattern with an arbitrary replacement string, which may include back references to portions of the original string matched by the pattern. See the Java Regular Expression documentation for more information on pattern and replacement string syntax. http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html -- filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / /analyzer /fieldType When I sort it using titles that are alphanumeric it works great, but if the titles start with numbers, it almost seems random. Any suggestions? thanks Joel
Re: weird sorting behavior
update, I tried changing to datatype string, and it sorts the numerics better, but the other sorts are not as good. Is there a way to control sorting for special chars, for example, I want blanks to sort after letters and numbers. using alphaOnlySort - sorts nicely for alpha, but numbers dont work string - sorts nicely for numbers and letters, but special chars like blanks show up first in the list thanks Joel On Dec 24, 2009, at 11:20 AM, Joel Nylund wrote: I have a field: field name=title type=alphaOnlySort indexed=true stored=true required=false/ fieldType name=alphaOnlySort class=solr.TextField sortMissingLast=true omitNorms=true analyzer !-- KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token -- tokenizer class=solr.KeywordTokenizerFactory/ !-- The LowerCase TokenFilter does what you expect, which can be when you want your sorting to be case insensitive -- filter class=solr.LowerCaseFilterFactory / !-- The TrimFilter removes any leading or trailing whitespace -- filter class=solr.TrimFilterFactory / !-- The PatternReplaceFilter gives you the flexibility to use Java Regular expression to replace any sequence of characters matching a pattern with an arbitrary replacement string, which may include back references to portions of the original string matched by the pattern. See the Java Regular Expression documentation for more information on pattern and replacement string syntax. http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html -- filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / /analyzer /fieldType When I sort it using titles that are alphanumeric it works great, but if the titles start with numbers, it almost seems random. Any suggestions? thanks Joel
suggestions for DIH batchSize
Hi, it looks like from looking at the code the default is 500, is the recommended setting for this? Has anyone notice any significant performance/memory tradeoffs by making this much bigger? thanks Joel
Re: Request Assistance with DIH
Hi, sorry im not familiar with the dataimporthandler development console, I thought you were just trying to do a import. For me to try to import data, I would do: http://localhost:8983/solr/dataimport?command=full-import Then check status of it using: http://localhost:8983/solr/dataimport you can refresh this screen as many times as you want, this should show progress and if it worked or not, also you should see errors in the log. I used this to get started with DIH, even though its mysql, it might help you http://www.cabotsolutions.com/blog/200905/using-solr-lucene-for-full-text-search-with-mysql/ Joel On Dec 14, 2009, at 10:27 AM, Turner, Robbin J wrote: How does this help answer my question? I am trying to use the DATAImportHandler Development console. The url you suggest assumes I had it working already. Looking at my logs and the response to the Development console, it does not appear that the connection to Oracle is being made. So if someone could offer some configuration/connection setup directions I would very much appreciate it. Thanks Robbin -Original Message- From: Joel Nylund [mailto:jnyl...@yahoo.com] Sent: Friday, December 11, 2009 8:26 PM To: solr-user@lucene.apache.org Subject: Re: Request Assistance with DIH add ?command=full-import to your url http://localhost:8983/solr/dataimport?command=full-import thanks Joel On Dec 11, 2009, at 7:45 PM, Robbin wrote: I've been trying to use the DIH with oracle and would love it if someone could give me some pointers. I put the ojdbc14.jar in both the Tomcat lib and solr home/lib. I created a dataimport.xml and enabled it in the solrconfig.xml. I go to the http://solr server/ solr/admin/dataimport.jsp. This all seems to be fine, but I get the default page response and doesn't look like the connection to the oracle server is even attempted. I'm using the Solr 1.4 release on Nov 10. Do I need an oracle client on the server? I thought having the ojdbc jar should be sufficient. Any help or configuration examples for setting this up would be much appreciated. Thanks Robbin
Re: Auto update with deltaimport
windows or unix? unix - make a shell script and call it from cron windows - make a .bat or .cmd file and call it from scheduler within the shell scripts/bat files use wget or curl to call the right import: wget -q -O /dev/null http://localhost:8983/solr/dataimport?command=delta-import Joel On Dec 12, 2009, at 1:38 AM, Olala wrote: Hi All! I am developing a search engine using Solr, I was tested full-import and delta-import command successfully.But now,I want to run delta-import automatically with my schedule.So, can anyone help me??? Thanks Regards, -- View this message in context: http://old.nabble.com/Auto-update-with-deltaimport-tp26755386p26755386.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Request Assistance with DIH
add ?command=full-import to your url http://localhost:8983/solr/dataimport?command=full-import thanks Joel On Dec 11, 2009, at 7:45 PM, Robbin wrote: I've been trying to use the DIH with oracle and would love it if someone could give me some pointers. I put the ojdbc14.jar in both the Tomcat lib and solr home/lib. I created a dataimport.xml and enabled it in the solrconfig.xml. I go to the http://solr server/ solr/admin/dataimport.jsp. This all seems to be fine, but I get the default page response and doesn't look like the connection to the oracle server is even attempted. I'm using the Solr 1.4 release on Nov 10. Do I need an oracle client on the server? I thought having the ojdbc jar should be sufficient. Any help or configuration examples for setting this up would be much appreciated. Thanks Robbin
Re: # in query
Thanks Eric, I looked more into this, but still stuck: I have this field indexed using text_rev I looked at the luke analysis for this field, but im unsure how to read it. When I query the field by the id I get: result name=response numFound=1 start=0 − doc str name=id5405255/str str name=textTitle###'s test blog/str /doc /result If I try to query even multiple ### I get nothing. Here is what luke handler says: (btw when I used id instead of docid on luke I got a nullpointer exception /admin/luke?docid=5405255 vs / admin/luke?id=5405255) lst name=textTitle str name=typetext_rev/str str name=schemaITS---/str str name=indexITS--/str int name=docs290329/int int name=distinct401016/int − lst name=topTerms int name=#1;golb49362/int int name=blog49362/int int name=#1;ecapsym29426/int int name=myspace29426/int int name=#1;s8773/int int name=s8773/int int name=#1;ed8033/int int name=de8033/int int name=com6884/int int name=#1;moc6884/int /lst − lst name=histogram int name=1308908/int int name=234340/int int name=421916/int int name=814474/int int name=169122/int int name=325578/int int name=643162/int int name=1281844/int int name=256910/int int name=512464/int int name=1024182/int int name=204872/int int name=409626/int int name=819212/int int name=163842/int int name=327682/int int name=655362/int /lst /lst solr/select?q=textTitle:%23%23%23 - gets no results. I have the same field indexed as a alphaOnlySort, and it gives me lots of results, but not the ones I want. Any other ideas? thanks Joel On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote: Well, the very first thing I would is examine the field definition in your schema file. I suspect that the tokenizers and/or filters you're using for indexing and/or querying is doing something to the # symbol. Most likely stripping it. If you're just searching for the single-letter term #, I *think* the query parser silently just drops that part of the clause out, but check on that. The second thing would be to get a copy of Luke and examine your index to see if what you *think* is in your index actually is there. HTH Erick On Mon, Dec 7, 2009 at 3:28 PM, Joel Nylund jnyl...@yahoo.com wrote: ok thanks, sorry my brain wasn't working, but even when I url encode it, I dont get any results, is there something special I have to do for solr? thanks Joel On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote: Sure you have to escape it! %23 otherwise the browser considers it as a separator between the URL for the server (on the left) and the fragment identifier (on the right) which is not sent the server. You might want to read about URL-encoding, escaping with backslash is a shell-thing, not a thing for URLs! paul Le 07-déc.-09 à 21:16, Joel Nylund a écrit : Hi, How can I put a # sign in a query, do I need to escape it? For example I want to query books with title that contain # No work so far: http://localhost:8983/solr/select?q=textTitle:#; http://localhost:8983/solr/select?q=textTitle:# http://localhost:8983/solr/select?q=textTitle:\#; Getting org.apache.lucene.queryParser.ParseException: Cannot parse 'textTitle:\': Lexical error at line 1, column 12. Encountered: EOF after : and sometimes just no response. thanks Joel
Re: # in query
ok, I just realized I was using the luke handler, didnt know there was a fat client, I assume thats what you are talking about. I downloaded the lukeall.jar, ran it, pointed to my index, found the document in question, didn't see how it was tokenized, but I clicked the reconstruct edit button, this gives me a tab that has tokenized per field, for this field it shows: s|s, ecapsym|myspace, golb|blog title is: ###'s myspace blog schema is: !-- A general unstemmed text field that indexes tokens normally and also reversed (via ReversedWildcardFilterFactory), to enable more efficient leading wildcard queries. -- fieldType name=text_rev class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ReversedWildcardFilterFactory withOriginal=true maxPosAsterisk=3 maxPosQuestion=2 maxFractionAsterisk=0.33/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=textTitle type=text_rev indexed=true stored=true required=false multiValued=false/ thanks Joel On Dec 8, 2009, at 11:14 AM, Erick Erickson wrote: In Luke, there's a tab that will let you go to a document ID. From there you can see all the fields in a particular document, and examine what the actual tokens stored are. Until and unless you know what tokens are being indexed, you simply can't know what your queries should look like... *Assuming* that the ### are getting indexed and *assuming* your tokenizer tokenized on, whitespace, and *assuming* that by text_rev you are talking about ReversedWildcardFilterFactory, I wouldn't expect a search to match if it wasn't exactly: s'###. But as you see, there's a long chain of assumptions there any one of which may be violated by your schema. So please post the relevant portions of your schema to make it easier to help. Best Erick On Tue, Dec 8, 2009 at 9:54 AM, Joel Nylund jnyl...@yahoo.com wrote: Thanks Eric, I looked more into this, but still stuck: I have this field indexed using text_rev I looked at the luke analysis for this field, but im unsure how to read it. When I query the field by the id I get: result name=response numFound=1 start=0 - doc str name=id5405255/str str name=textTitle###'s test blog/str /doc /result If I try to query even multiple ### I get nothing. Here is what luke handler says: (btw when I used id instead of docid on luke I got a nullpointer exception /admin/luke?docid=5405255 vs /admin/luke?id=5405255) lst name=textTitle str name=typetext_rev/str str name=schemaITS---/str str name=indexITS--/str int name=docs290329/int int name=distinct401016/int - lst name=topTerms int name=#1;golb49362/int int name=blog49362/int int name=#1;ecapsym29426/int int name=myspace29426/int int name=#1;s8773/int int name=s8773/int int name=#1;ed8033/int int name=de8033/int int name=com6884/int int name=#1;moc6884/int /lst - lst name=histogram int name=1308908/int int name=234340/int int name=421916/int int name=814474/int int name=169122/int int name=325578/int int name=643162/int int name=1281844/int int name=256910/int int name=512464/int int name=1024182/int int name=204872/int int name=409626/int int name=819212/int int name=163842/int int name=327682/int int name=655362/int /lst /lst solr/select?q=textTitle:%23%23%23 - gets no results. I have the same field indexed as a alphaOnlySort, and it gives me lots of results, but not the ones I want. Any other ideas? thanks Joel On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote: Well, the very first thing I would is examine the field definition in your schema file. I suspect that the tokenizers and/or filters you're using for indexing and/or querying is doing something to the # symbol. Most likely stripping it. If you're just searching for the single-letter term #, I *think* the query parser silently just drops that part of the clause out, but check on that. The second thing would be to get a copy of Luke
# in query
Hi, How can I put a # sign in a query, do I need to escape it? For example I want to query books with title that contain # No work so far: http://localhost:8983/solr/select?q=textTitle:#; http://localhost:8983/solr/select?q=textTitle:# http://localhost:8983/solr/select?q=textTitle:\#; Getting org.apache.lucene.queryParser.ParseException: Cannot parse 'textTitle: \': Lexical error at line 1, column 12. Encountered: EOF after : and sometimes just no response. thanks Joel
Re: # in query
ok thanks, sorry my brain wasn't working, but even when I url encode it, I dont get any results, is there something special I have to do for solr? thanks Joel On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote: Sure you have to escape it! %23 otherwise the browser considers it as a separator between the URL for the server (on the left) and the fragment identifier (on the right) which is not sent the server. You might want to read about URL-encoding, escaping with backslash is a shell-thing, not a thing for URLs! paul Le 07-déc.-09 à 21:16, Joel Nylund a écrit : Hi, How can I put a # sign in a query, do I need to escape it? For example I want to query books with title that contain # No work so far: http://localhost:8983/solr/select?q=textTitle:#; http://localhost:8983/solr/select?q=textTitle:# http://localhost:8983/solr/select?q=textTitle:\#; Getting org.apache.lucene.queryParser.ParseException: Cannot parse 'textTitle:\': Lexical error at line 1, column 12. Encountered: EOF after : and sometimes just no response. thanks Joel
how to get list of unique terms for a field
Hi, lets say I have a field called countryName, is there a way to get a list of all the countries for this field? Trying to figure out a nice way to keep my categories and the solr results in sync, would be nice to get these from solr instead of the database. thanks Joel
weird behavior between 2 enviorments
I have 2 environments one works great for this query: my osx environment: http://localhost:8983/solr/select?q=countryName:%22Bosnia%20and%20Herzegovina%22 - returns 2 results my linux environment: http://localhost:8983/solr/select?q=countryName:%22Bosnia%20and%20Herzegovina%22 - returns 0 results same configs, same index etc, both using solr 1.4, in linux env if I run this query: /solr/select?q=id:96465437 response − lst name=responseHeader int name=status0/int int name=QTime1/int − lst name=params str name=qid:96465437/str /lst /lst − result name=response numFound=1 start=0 − doc ... str name=countryNameBosnia and Herzegovina/str /doc /result /response So the records are in the index. I checked the admin, they are indexed using the same type (text), and I cannot see any differences. any idea why it works on one env and not the other? anything I can check in admin to get to the bottom of this? thanks Joel
Re: weird behavior between 2 enviorments
thanks that was it Joel On Dec 3, 2009, at 11:06 AM, Yonik Seeley wrote: The schemas probably aren't the same. Looks like one has position increments enabled for the stopword filter in the field type, and one doesn't. -Yonik http://www.lucidimagination.com On Thu, Dec 3, 2009 at 11:00 AM, Joel Nylund jnyl...@yahoo.com wrote: same client, here are the debug results, something interesting is going on, I dont understand solr/lucene well enough to understand, see below not working env (linux) response - lst name=responseHeader int name=status0/int int name=QTime2/int - lst name=params str name=debugQuerytrue/str str name=qcountryName:Bosnia and Herzegovina/str /lst /lst result name=response numFound=0 start=0/ - lst name=debug str name=rawquerystringcountryName:Bosnia and Herzegovina/str str name=querystringcountryName:Bosnia and Herzegovina/str str name=parsedqueryPhraseQuery(countryName:bosnia herzegovina)/str str name=parsedquery_toStringcountryName:bosnia herzegovina/ str lst name=explain/ str name=QParserLuceneQParser/str - lst name=timing double name=time2.0/double - lst name=prepare double name=time1.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time1.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst - lst name=process double name=time1.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response working env (osx) response - lst name=responseHeader int name=status0/int int name=QTime54/int - lst name=params str name=qcountryName:Bosnia and Herzegovina/str str name=debugQuerytrue/str /lst /lst - result name=response numFound=2 start=0 - doc str name=countryNameBosnia and Herzegovina/str str name=id83964763/str /doc - doc str name=countryNameBosnia and Herzegovina/str str name=id96465437/str /doc /result - lst name=debug str name=rawquerystringcountryName:Bosnia and Herzegovina/str str name=querystringcountryName:Bosnia and Herzegovina/str str name=parsedqueryPhraseQuery(countryName:bosnia ? herzegovina)/str str name=parsedquery_toStringcountryName:bosnia ? herzegovina/str - lst name=explain - str name=83964763 15.619301 = fieldWeight(countryName:bosnia herzegovina in 260955), product of: 1.0 = tf(phraseFreq=1.0) 24.990881 = idf(countryName: bosnia=2 herzegovina=2) 0.625 = fieldNorm(field=countryName, doc=260955) /str - str name=96465437 15.619301 = fieldWeight(countryName:bosnia herzegovina in 275091), product of: 1.0 = tf(phraseFreq=1.0) 24.990881 = idf(countryName: bosnia=2 herzegovina=2) 0.625 = fieldNorm(field=countryName, doc=275091) /str /lst str name=QParserLuceneQParser/str - lst name=timing double name=time53.0/double - lst name=prepare double name=time24.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst - lst name=process double name=time27.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time27.0/double /lst /lst /lst /lst /response On Dec 3, 2009, at 10:20 AM, Yonik Seeley wrote: Are you querying both systems from the same browser / client? Try adding
debugging javascript DIH
is there a way to print to std out or anything from my javascript DIH transformer? thanks Joel
getting value from parent query in subquery transformer
Hi, I have an entity that has a entity within it that executes a query for each row and calls a transformer. Is there a way to pass a value from the parent query into the transformer? For example, I have an entity called document, and it it has an ID and sometimes it has a category. I have a sub entity called category that does another complex query using the documents ID to get data to send to the transformer to determine the category. I would like to pass the parents category to this transformer, so I dont have to join in data I already have. Is this possible? Im using ${item.id} in the where clause, so I guess im wondering, can I do something like. entity name=item query=.. entity name=category transformer=script:SplitAndPrettyCategory(${item.category}) query=.. thanks Joel
NOT combined with OR is not getting exected results
http://localhost:8983/solr/select?q=%28NOT%20categoryType:%22MEDIATYPE%22%29 :gives 292289 results http://localhost:8983/solr/select?q=fmMediaType:%22text%22 :gives 530 results http://localhost:8983/solr/select?q=%28NOT%20categoryType:%22MEDIATYPE%22%29%20OR%20fmMediaType:%22text%22 :gives 530 results I expected a number higher than the first query. thanks Joel
Re: NOT combined with OR is not getting exected results
Hi, thanks, but still get 530 results for this new query your proposed. thanks Joel On Dec 2, 2009, at 12:00 PM, AHMET ARSLAN wrote: http://localhost:8983/solr/select?q=%28NOT%20categoryType:%22MEDIATYPE%22%29 :gives 292289 results http://localhost:8983/solr/select?q=fmMediaType:%22text%22 :gives 530 results http://localhost:8983/solr/select?q=%28NOT%20categoryType:%22MEDIATYPE%22%29%20OR%20fmMediaType:%22text%22 :gives 530 results I expected a number higher than the first query. NOT operator behaves a little bit different. It is like a filter. You just can't combine OR and NOT directly. Try this: q=(categoryType:[* TO* ] NOT categoryType:MEDIATYPE) OR fmMediaType:text Solr allows q=(NOT categoryType:MEDIATYPE) query but it can be seen as q= *:* NOT categoryType:MEDIATYPE Hope this helps.
Re: NOT combined with OR is not getting exected results
thanks that worked! and yes I have some with no categoryType thanks Joel On Dec 2, 2009, at 2:24 PM, AHMET ARSLAN wrote: Hi, thanks, but still get 530 results for this new query your proposed. May be you have some documents that has empty categoryType field. Can you try this: q = ((*:* -categoryType:MEDIATYPE) OR fmMediaType:text) It should return at lest 292289 documents.
Re: getting total index size last update date/time from query
Hi, Luke worked, but we are finding it really slow in our environment (8-10 seconds). Is there a way to just get document count last index time with a faster call, possibly passing something to luke? thanks Joel On Nov 19, 2009, at 11:54 AM, Binkley, Peter wrote: The Luke request handler (normally available at solr/admin/luke) will give you the document count (not size on the disk, though, if that's what you want) and last update and other info: lst name=index int name=numDocs14591/int int name=maxDoc14598/int int name=numTerms128730/int long name=version1196962176380/long bool name=optimizedfalse/bool bool name=currenttrue/bool bool name=hasDeletionstrue/bool str name=directory s/solr/data/index/str date name=lastModified2009-11-19T16:44:45Z/date /lst See http://wiki.apache.org/solr/LukeRequestHandler Peter -Original Message- From: Joel Nylund [mailto:jnyl...@yahoo.com] Sent: Thursday, November 19, 2009 8:31 AM To: solr-user@lucene.apache.org Subject: getting total index size last update date/time from query Hi, Looking for total number of documents in my index and the last updated date/time of the index. Is there a way to get this through the standard query q=? if not, what is the best way to get this info from solr. thanks Joel
Re: how to do partial word searches?
Hi Erick, thanks for the links, I read both of them and I still have no idea what to do, lots of back and forth, but didn't see any solution on it. One person talked about indexing the field in reverse and doing and ON on it, this might work I guess. thanks Joel On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote: copying from Eric Hatcher: See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently does not have leading wildcard support enabled. There's a pretty extensive recent exchange on this, see the thread on the user's list titled leading and trailing wildcard queryBest Erick On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I saw some older postings on this, but didnt see a resolution. I have a field called title, I would like to be able to find partial word matches within the title. For example: http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22 I would expect it to find: str name=textTitlethe daily dish | by andrew sullivan/str but it doesnt, it does find sully (which is fine with me also as a bonus), but doesnt seem to get any of the partial word stuff. Oddly enough before I lowercased the title, the wildcard matching seemed to work a bit better, it just didnt deal with the case sensitive query. At first I had mixed case titles and I read that the wildcard doesn't work with mixed case, so I created another field that is a lowered version of the title called textTitle, it is of type text. Is it possible with solr to achieve what I am trying to do, if so how? If not, anything closer than what I have? thanks Joel
Re: solr/jetty not working for anything other than localhost
I see: tcp46 0 0 *.8983 *.* LISTEN tcp4 0 0 127.0.0.1.8983 *.* LISTEN thanks Joel On Nov 25, 2009, at 5:21 PM, simon wrote: first, check what port 8983 is bound to - should be listening on all interfaces netstat -an |grep 8983 You should see tcp0 0 0.0.0.0:8983 0.0.0.0:* LISTEN -Simon On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, if I try to use any other hostname jetty doesnt work, gives a blank page, if I telnet too the server/port it just disconnects. I tried editing the scripts.conf to change the hostname, that didnt seem to help. For example I tried editing my etc/hosts file and added: 127.0.0.1 solriscool then: ping solriscool PING solriscool (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms sh-3.2# telnet solriscool 8983 Trying 127.0.0.1... Connected to solriscool. Escape character is '^]'. GET / HTTP/1.1 Connection closed by foreign host. telnet localhost 8983 Trying ::1... Connected to localhost. Escape character is '^]'. GET /solr HTTP/1.1 Host: localhost HTTP/1.1 302 Found Location: http://localhost/solr/ Content-Length: 0 Server: Jetty(6.1.3) any ideas? thanks Joel
Re: solr/jetty not working for anything other than localhost
yes says: 2009-11-25 18:08:59.967::INFO: Started SocketConnector @ 0.0.0.0:8983 running on osx thanks Joel On Nov 25, 2009, at 6:00 PM, simon wrote: On Wed, Nov 25, 2009 at 5:27 PM, Joel Nylund jnyl...@yahoo.com wrote: I see: tcp46 0 0 *.8983 *.* LISTEN tcp4 0 0 127.0.0.1.8983 *.* LISTEN Not the same version of linux/netstat as mine, but I'd guess that the second line is the key to the problem -looks as though TCP over IPv4 is onl y listening on the localhost interface, which is a network configuration issue. what does the Solr log say after it's started - should be a line INFO: Started SelectChannelConnector @ 0.0.0.0:8983 -Simon thanks Joel On Nov 25, 2009, at 5:21 PM, simon wrote: first, check what port 8983 is bound to - should be listening on all interfaces netstat -an |grep 8983 You should see tcp0 0 0.0.0.0:8983 0.0.0.0:* LISTEN -Simon On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, if I try to use any other hostname jetty doesnt work, gives a blank page, if I telnet too the server/port it just disconnects. I tried editing the scripts.conf to change the hostname, that didnt seem to help. For example I tried editing my etc/hosts file and added: 127.0.0.1 solriscool then: ping solriscool PING solriscool (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms sh-3.2# telnet solriscool 8983 Trying 127.0.0.1... Connected to solriscool. Escape character is '^]'. GET / HTTP/1.1 Connection closed by foreign host. telnet localhost 8983 Trying ::1... Connected to localhost. Escape character is '^]'. GET /solr HTTP/1.1 Host: localhost HTTP/1.1 302 Found Location: http://localhost/solr/ Content-Length: 0 Server: Jetty(6.1.3) any ideas? thanks Joel
Re: help with dataimport delta query
Thanks that was it, well really this part: ${dataimporter.delta.job_jobs_id} I thought the jobs_id was part of the DIH, but I guess it was just the example, duh! thanks Joel --- On Tue, 11/24/09, Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com wrote: From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Subject: Re: help with dataimport delta query To: solr-user@lucene.apache.org Date: Tuesday, November 24, 2009, 12:15 AM I guess the field names do not match in the deltaQuery you are selecting the field id and in the deltaImportQuery you us the field as ${dataimporter.delta.job_jobs_id} I guess it should be ${dataimporter.delta.id} On Tue, Nov 24, 2009 at 1:19 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I have solr all working nicely, except im trying to get deltas to work on my data import handler Here is a simplification of my data import config, I have a table called Book which has categories, im doing subquries for the category info and calling a javascript helper. This all works perfectly for the regular query. I added these lines for the delta stuff: deltaImportQuery=SELECT f.id,f.title FROM Book f f.id='${dataimporter.delta.job_jobs_id}' deltaQuery=SELECT id FROM `Book` WHERE fm.inMyList=1 AND lastModifiedDate '${dataimporter.last_index_time}' basically im trying to rows that lastModifiedDate is newer than the last index (or deltaindex). I run: http://localhost:8983/solr/dataimport?command=delta-import And it says in logs: Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=delta-import} status=0 QTime=0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: category Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: category rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: category Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: item Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: item rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: item Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:0:0.21 But the browser says no documents added/modified (even though one record in db is a match) Is there a way to turn debugging so I can see the queries the DIH is sending to the db? Any other ideas of what I could be doing wrong? thanks Joel document name=doc entity name=item query=SELECT f.id, f.title FROM Book f WHERE f.inMyList=1 deltaImportQuery=SELECT f.id,f.title FROM Book f f.id='${dataimporter.delta.job_jobs_id}' deltaQuery=SELECT id FROM `Book` WHERE fm.inMyList=1 AND lastModifiedDate '${dataimporter.last_index_time}' field column=id name=id / field column=title name=title / entity name=category transformer=script:SplitAndPrettyCategory query=select fc.bookId, group_concat(cr.name) as categoryName, from BookCat fc where fc.bookId = '${item.id}' AND group by fc.bookId field column=categoryType name=categoryType / /entity /entity /document -- - Noble Paul | Principal Engineer| AOL | http://aol.com
how to do partial word searches?
Hi, I saw some older postings on this, but didnt see a resolution. I have a field called title, I would like to be able to find partial word matches within the title. For example: http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22 I would expect it to find: str name=textTitlethe daily dish | by andrew sullivan/str but it doesnt, it does find sully (which is fine with me also as a bonus), but doesnt seem to get any of the partial word stuff. Oddly enough before I lowercased the title, the wildcard matching seemed to work a bit better, it just didnt deal with the case sensitive query. At first I had mixed case titles and I read that the wildcard doesn't work with mixed case, so I created another field that is a lowered version of the title called textTitle, it is of type text. Is it possible with solr to achieve what I am trying to do, if so how? If not, anything closer than what I have? thanks Joel
Re: configure solr
for #1, under example, is there a webapps folder, does it contain solr.war ? are there any errors in your startup log for jetty, does it say anything about setting up solr, and solr home etc. Joel On Nov 24, 2009, at 4:55 PM, Jill Han wrote: Hi, I just downloaded solr -1.4.0 to my computer, C:\apache-solr-1.4.0. 1.I followed the instruction to run the sample, java -jar start.jar at C:\apache-solr-1.4.0\example And then go to http://localhost:8983/solr/admin, however, I got HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin Powered by jetty:// http://jetty.mortbay.org Did I miss something? 2. Since I can't get sample run, I tried to run it on tomcat server(5.5) directly as a. Copy/paste apache-solr-1.4.0.war to C:\Tomcat 5.5\webapps, b. Go to http://localhost:8080/apache-solr-1.4.0/ The error message is HTTP Status 500 - Severe errors in solr configuration.. 3. How to configure it on tomcat server? Your help is appreciated very much as always, Jill
Re: help with dataimport delta query
got to love it when yahoo thinks your own mail is spam, anyone have any ideas how to get logging to work with 1.4. I went to the admin panel and set all logging to finest. In my jetty std out I see no SQL for any of the dataimport handler run. I see Nov 23, 2009 9:26:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 6 Nov 23, 2009 9:26:32 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity category with URL: jdbc:mysql:// localhost/feeddb Nov 23, 2009 9:26:32 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 5 But no sql, from looking at the source, it looks like it should be logging the sql if Im in debug mode. any ideas, I think I am losing my mind. my full import works, but the delta does nothing thanks Joel On Nov 23, 2009, at 2:49 PM, Joel Nylund wrote: Hi, I have solr all working nicely, except im trying to get deltas to work on my data import handler Here is a simplification of my data import config, I have a table called Book which has categories, im doing subquries for the category info and calling a javascript helper. This all works perfectly for the regular query. I added these lines for the delta stuff: deltaImportQuery=SELECT f.id,f.title FROM Book f f.id='${dataimporter.delta.job_jobs_id}' deltaQuery=SELECT id FROM `Book` WHERE fm.inMyList=1 AND lastModifiedDate '${dataimporter.last_index_time}' basically im trying to rows that lastModifiedDate is newer than the last index (or deltaindex). I run: http://localhost:8983/solr/dataimport?command=delta-import And it says in logs: Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Nov 23, 2009 2:33:02 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=delta-import} status=0 QTime=0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: category Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: category rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: category rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: category Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: item Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: item rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: item rows obtained : 0 Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: item Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully Nov 23, 2009 2:33:02 PM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:0:0.21 But the browser says no documents added/modified (even though one record in db is a match) Is there a way to turn debugging so I can see the queries the DIH is sending to the db? Any other ideas of what I could be doing wrong? thanks Joel document name=doc entity name=item query=SELECT f.id, f.title FROM Book f WHERE f.inMyList=1 deltaImportQuery=SELECT f.id,f.title FROM Book f f.id='${dataimporter.delta.job_jobs_id}' deltaQuery=SELECT id FROM `Book` WHERE fm.inMyList=1 AND lastModifiedDate '${dataimporter.last_index_time}' field column=id name=id / field column=title name=title / entity name=category transformer=script:SplitAndPrettyCategory query=select fc.bookId, group_concat(cr.name) as categoryName, from BookCat fc where fc.bookId = '${item.id}' AND group by fc.bookId field column=categoryType name=categoryType / /entity /entity /document
getting total index size last update date/time from query
Hi, Looking for total number of documents in my index and the last updated date/time of the index. Is there a way to get this through the standard query q=? if not, what is the best way to get this info from solr. thanks Joel
Re: deployment questions
Anyone? I have done more reading and testing and it seems like I want to: Use SolrJ and embed solr in my webapp, but I want to disable the http access to solr, meaning force all calls through my solrj interface I am building (no admin access etc). Is there a simple way to do this? Am I better off running solr as a server on its own and using network security? thanks Joel On Nov 9, 2009, at 5:04 PM, Joel Nylund wrote: Hi, I have a java app that is deployed in jboss/tomcat container. I would like to add my solr index to it. I have read about this and it seems fairly straight forward, but im curious the best way to secure it. I require my users to login to my app to use it, so I want the search functions to behave the same way. Ideally I would like to do the solr queries from the client using ajax/json calls. So given this my thinking was I should wrapper the solr servlet and do a local proxy type interface to ensure security. Is there any easier way to do this, or an example of a good way to do this? Or does the solr servlet support a interceptor type pattern where I can have it call a piece of code before I execute the call (this application is old and not using std j2ee security so I dont think I can use that.) Another option is to do solrj on the server, and not do the client side calls, in this case I think I could lock down the solr servlet interface to only allow local calls. thanks Joel
indexing on differnt server
is it possible to index on one server and copy the files over? thanks Joel
deployment questions
Hi, I have a java app that is deployed in jboss/tomcat container. I would like to add my solr index to it. I have read about this and it seems fairly straight forward, but im curious the best way to secure it. I require my users to login to my app to use it, so I want the search functions to behave the same way. Ideally I would like to do the solr queries from the client using ajax/json calls. So given this my thinking was I should wrapper the solr servlet and do a local proxy type interface to ensure security. Is there any easier way to do this, or an example of a good way to do this? Or does the solr servlet support a interceptor type pattern where I can have it call a piece of code before I execute the call (this application is old and not using std j2ee security so I dont think I can use that.) Another option is to do solrj on the server, and not do the client side calls, in this case I think I could lock down the solr servlet interface to only allow local calls. thanks Joel
Re: solr query help alpha numeric and not
Hi yes its a string, in the case of a title, it can be anything, a letter a number, a symbol or a multibyte char etc. Any ideas if I wanted a query that was not a letter a-z or a number 0-9, given that its a string? thanks Joel On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote: Hi Joel, The ID is sent back as a string (instead of as an integer) in your example. Could this be the cause? - Jonathan On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote: Hi, I have a field called firstLetterTitle, this field has 1 char, it can be anything, I need help with a few queries on this char: 1.) I want all NON ALPHA and NON numbers, so any char that is not A- Z or 0-9 I tried: http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z But I get back numeric results: doc str name=firstLetterTitle9/str str name=id23946447/str /doc 2.) I want all only Numerics: http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209 This seems to work but just checking if its the right way. 2.) I want all only English Letters: http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z This seems to work but just checking if its the right way. thanks Joel
Re: solr query help alpha numeric and not
Avlesh, thanks those worked, for somre reason I never got your mail, found it in one of the list archives though. thanks again Joel On Nov 5, 2009, at 9:08 PM, Avlesh Singh wrote: Didn't the queries in my reply work? Cheers Avlesh On Fri, Nov 6, 2009 at 4:16 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi yes its a string, in the case of a title, it can be anything, a letter a number, a symbol or a multibyte char etc. Any ideas if I wanted a query that was not a letter a-z or a number 0-9, given that its a string? thanks Joel On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote: Hi Joel, The ID is sent back as a string (instead of as an integer) in your example. Could this be the cause? - Jonathan On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote: Hi, I have a field called firstLetterTitle, this field has 1 char, it can be anything, I need help with a few queries on this char: 1.) I want all NON ALPHA and NON numbers, so any char that is not A-Z or 0-9 I tried: http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z But I get back numeric results: doc str name=firstLetterTitle9/str str name=id23946447/str /doc 2.) I want all only Numerics: http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209 This seems to work but just checking if its the right way. 2.) I want all only English Letters: http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z This seems to work but just checking if its the right way. thanks Joel
solr query help alpha numeric and not
Hi, I have a field called firstLetterTitle, this field has 1 char, it can be anything, I need help with a few queries on this char: 1.) I want all NON ALPHA and NON numbers, so any char that is not A-Z or 0-9 I tried: http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z But I get back numeric results: doc str name=firstLetterTitle9/str str name=id23946447/str /doc 2.) I want all only Numerics: http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209 This seems to work but just checking if its the right way. 2.) I want all only English Letters: http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z This seems to work but just checking if its the right way. thanks Joel
how to use ajax-solr - example?
Hi, I looked at the documentation and I have no idea how to get started? Can someone point me to or show me an example of how to send a query to a solr server and paginate through the results using ajax- solr. I would glady write a blog tutorial on how to do this if someone can get me started. I dont know jquery but have used prototype scriptaculous. thanks Joel
exact match lookup
Hi, I have a field that I want to do exact match lookups using. (when I say exact match, im looking for equivalent to a sql query where with no like clause so where feedClass = Social News) For example the field is called feedClass and im doing: http://localhost:8983/solr/select?q=feedClass:Blog http://localhost:8983/solr/select?q=feedClass:Social%20News I tried using text and it seems to work pretty well except for classes with spaces in them. So I tried using field type string, that didnt work. Then I tried defining a new type called: fieldType name=text_nows class=solr.TextField positionIncrementGap=100 /fieldType This didnt seem to help either. When I do these queries for this field with spaces, I seem to get random results For example: response − lst name=responseHeader int name=status0/int int name=QTime5/int − lst name=params str name=qfeedClass:Social News/str /lst /lst − result name=response numFound=3451 start=0 − doc str name=feedClassBlog/str str name=firstLetterTitleN/str /doc any ideas? thanks Joel
Re: exact match lookup
thank worked for me, changed to: http://localhost:8983/solr/select?q=feedClass:%22social%20news%22 and the matches are correct, I changed the feedClass field back to type text. A followup question has to do with sorting these results. I have a field called title that I want the results sorted by. http://localhost:8983/solr/select?q=feedClass:%22social%20news%22sort:title%20asc I tried this and the results are not sorted (they seem random) any ideas? thanks Joel response − lst name=responseHeader int name=status0/int int name=QTime1/int − lst name=params str name=qfeedClass:social news/str str name=sort:title asc/ /lst /lst − result name=response numFound=186 start=0 − doc str name=feedClassSocial News/str str name=firstLetterTitleF/str str name=titleFar/str /doc doc str name=feedClassSocial News/str str name=firstLetterTitleD/str str name=titledig/str /doc doc str name=feedClassSocial News/str str name=firstLetterTitleT/str str name=titleTech/str /doc doc str name=feedClassSocial News/str str name=firstLetterTitleM/str str name=titleMix/str /doc On Nov 4, 2009, at 12:15 PM, Jérôme Etévé wrote: Hi, you need to quote your phrase when you search for 'Social News': feedClass:Social News (URI encoded of course). otherwise your request will become (I assume you're using a standard query parser) feedClass:Social defaultField:News . Well that's the idea. It should then work using the type string. Cheers! J. 2009/11/4 Joel Nylund jnyl...@yahoo.com: Hi, I have a field that I want to do exact match lookups using. (when I say exact match, im looking for equivalent to a sql query where with no like clause so where feedClass = Social News) For example the field is called feedClass and im doing: http://localhost:8983/solr/select?q=feedClass:Blog http://localhost:8983/solr/select?q=feedClass:Social%20News I tried using text and it seems to work pretty well except for classes with spaces in them. So I tried using field type string, that didnt work. Then I tried defining a new type called: fieldType name=text_nows class=solr.TextField positionIncrementGap=100 /fieldType This didnt seem to help either. When I do these queries for this field with spaces, I seem to get random results For example: response − lst name=responseHeader int name=status0/int int name=QTime5/int − lst name=params str name=qfeedClass:Social News/str /lst /lst − result name=response numFound=3451 start=0 − doc str name=feedClassBlog/str str name=firstLetterTitleN/str /doc any ideas? thanks Joel -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: exact match lookup
that worked, thanks! had to negate the score. thanks Joel On Nov 4, 2009, at 1:57 PM, Jérôme Etévé wrote: If feedClass acts as an identifier, better use string :) use sort=title asc,score desc (not sort:) J. 2009/11/4 Joel Nylund jnyl...@yahoo.com: thank worked for me, changed to: http://localhost:8983/solr/select?q=feedClass:%22social%20news%22 and the matches are correct, I changed the feedClass field back to type text. A followup question has to do with sorting these results. I have a field called title that I want the results sorted by. http://localhost:8983/solr/select?q=feedClass:%22social%20news%22sort:title%20asc I tried this and the results are not sorted (they seem random) any ideas? thanks Joel response − lst name=responseHeader int name=status0/int int name=QTime1/int − lst name=params str name=qfeedClass:social news/str str name=sort:title asc/ /lst /lst − result name=response numFound=186 start=0 − doc str name=feedClassSocial News/str str name=firstLetterTitleF/str str name=titleFar/str /doc doc str name=feedClassSocial News/str str name=firstLetterTitleD/str str name=titledig/str /doc doc str name=feedClassSocial News/str str name=firstLetterTitleT/str str name=titleTech/str /doc doc str name=feedClassSocial News/str str name=firstLetterTitleM/str str name=titleMix/str /doc On Nov 4, 2009, at 12:15 PM, Jérôme Etévé wrote: Hi, you need to quote your phrase when you search for 'Social News': feedClass:Social News (URI encoded of course). otherwise your request will become (I assume you're using a standard query parser) feedClass:Social defaultField:News . Well that's the idea. It should then work using the type string. Cheers! J. 2009/11/4 Joel Nylund jnyl...@yahoo.com: Hi, I have a field that I want to do exact match lookups using. (when I say exact match, im looking for equivalent to a sql query where with no like clause so where feedClass = Social News) For example the field is called feedClass and im doing: http://localhost:8983/solr/select?q=feedClass:Blog http://localhost:8983/solr/select?q=feedClass:Social%20News I tried using text and it seems to work pretty well except for classes with spaces in them. So I tried using field type string, that didnt work. Then I tried defining a new type called: fieldType name=text_nows class=solr.TextField positionIncrementGap=100 /fieldType This didnt seem to help either. When I do these queries for this field with spaces, I seem to get random results For example: response − lst name=responseHeader int name=status0/int int name=QTime5/int − lst name=params str name=qfeedClass:Social News/str /lst /lst − result name=response numFound=3451 start=0 − doc str name=feedClassBlog/str str name=firstLetterTitleN/str /doc any ideas? thanks Joel -- Jerome Eteve. http://www.eteve.net jer...@eteve.net -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: how to use ajax-solr - example?
Hi Israel, I agree the idea of adding a scripting language in between is good, but I want something simple I can easily test my queries with data and scroll through the results. I have been using the browser and getting xml for now, but would like to save my queries in a simple html page and format the data. I figured this is something I can throw together in a few hours, but I also figured someone would have already done the work. thanks Joel On Nov 4, 2009, at 2:02 PM, Israel Ekpo wrote: On Wed, Nov 4, 2009 at 10:48 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I looked at the documentation and I have no idea how to get started? Can someone point me to or show me an example of how to send a query to a solr server and paginate through the results using ajax-solr. I would glady write a blog tutorial on how to do this if someone can get me started. I dont know jquery but have used prototype scriptaculous. thanks Joel Joel, It will be best if you use a scripting language between Solr and JavaScript This is becasue sending data only between JavaScript and Solr will limit you to only one domain name. However, if you are using a scripting language between JavaScript and Solr you can use the scripting language to retrieve the request parameters from JavaScript and then same them to Solr with the response writer set to json. This will cause Solr to send the response in JSON format which the scripting language can pass on to JavaScript. This example here will cause Solr to return the response in JSON. http://example.com:8443/solr/select?q=searchkeywordwt=json -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: best way to model 1-N
thanks, but im confused how I can aggregate across rows, I dont know of any easy way to get my db to return one row for all the categories (given the hint from your other email), I have split the category query into a separate entity, but its returning multiple rows, how do I combine multiple rows into 1 index entity? thanks Joel On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote: In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would the query above work?. Yes you are right. A multivalued field for categories is the answer. For populating in the index - 1. If you use DIH to populate your indexes and your datasource is a database then you can use DIH's RegexTransformer on an aggregated list of categories. e.g. if your database query retruns a,b,c,d in a column called db_categories, this is how you would put it in DIH's data-config file - field column=db_categories name=categories splityBy=, /. 2. If you add documents to Solr yourself multiple values for the field can be specified as an array or list of values in the SolrInputDocument. A multivalued field provides the same faceting and searching capabilites like regular fields. There is no special syntax. Cheers Avlesh On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I have one index so far which contains feeds. I have been able to de-normalize several tables and map this data onto the feed entity. There is one tricky problem that I need help on. Feeds have 1 - many categories. So Lets say we have Category1, Category2 and Category3 Feed 1 - is in Category 1 Feed 2 is in category2 and category3 Feed 3 is in category2 Feed 4 has no category In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would the query above work?. thanks Joel
Re: best way to model 1-N
Thanks Chantal, I will keep that in mind for tuning, for sql I figured way to combine them into one row using concat, but I still seem to be having an issue splitting them: Db now returns as one column categoryType: TOPIC,LANGUAGE but my solr result, if you note the item in categoryType all seem to be within one str, I would expect it to be in multiple strings within the array, is this assumption wrong? doc − arr name=categoryType strTOPIC,LANGUAGE/str /arr str name=id40/str str name=titlefeed title/str /doc Here is my import: document name=doc entity name=item query=SELECT f.id, f.title FROM Feed f field column=id name=id / field column=title name=title / entity name=category query=select cfcr.feedId, group_concat(cfcr.categoryType) as categoryType from CFR cfcr where cfcr.feedId = '${item.id}' AND group by cfcr.feedId field column=categoryType name=categoryType splityBy=, / /entity /entity In schema: field name=categoryType type=text indexed=true stored=true required=false multiValued=true/ field name=categoryName type=text indexed=true stored=true required=false multiValued=true/ what am I missing? thanks Joel On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote: That depends a bit on your database, but it is tricky and might not be performant. If you are more of a Java developer, you might prefer retrieving mutliple rows per SOLR document from your dataSource (join on your category and main table), and aggregate them in your custom EntityProcessor. I got a far(!) better performance retrieving everything in one query and doing the aggregation in Java. But this is, of course, depending on your table structure and data. Noble Paul helped me with the custom EntityProcessor, and it turned out quite easy. Have a look at the thread with the heading from this mailing list (SOLR-USER): DataImportHandler / Import from DB : one data set comes in multiple rows Cheers, Chantal Joel Nylund schrieb: thanks, but im confused how I can aggregate across rows, I dont know of any easy way to get my db to return one row for all the categories (given the hint from your other email), I have split the category query into a separate entity, but its returning multiple rows, how do I combine multiple rows into 1 index entity? thanks Joel On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote: In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would the query above work?. Yes you are right. A multivalued field for categories is the answer. For populating in the index - 1. If you use DIH to populate your indexes and your datasource is a database then you can use DIH's RegexTransformer on an aggregated list of categories. e.g. if your database query retruns a,b,c,d in a column called db_categories, this is how you would put it in DIH's data-config file - field column=db_categories name=categories splityBy=, /. 2. If you add documents to Solr yourself multiple values for the field can be specified as an array or list of values in the SolrInputDocument. A multivalued field provides the same faceting and searching capabilites like regular fields. There is no special syntax. Cheers Avlesh On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I have one index so far which contains feeds. I have been able to de-normalize several tables and map this data onto the feed entity. There is one tricky problem that I need help on. Feeds have 1 - many categories. So Lets say we have Category1, Category2 and Category3 Feed 1 - is in Category 1 Feed 2 is in category2 and category3 Feed 3 is in category2 Feed 4 has no category In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would the query above work?. thanks Joel
Re: best way to model 1-N
Im using apache-solr-1.3.0 I got it to work using javascript function instead. thanks Joel On Oct 30, 2009, at 12:44 PM, Chantal Ackermann wrote: This looks all right to me, but I might be missing something. Which version/build of SOLR are you using? Chantal Joel Nylund schrieb: Thanks Chantal, I will keep that in mind for tuning, for sql I figured way to combine them into one row using concat, but I still seem to be having an issue splitting them: Db now returns as one column categoryType: TOPIC,LANGUAGE but my solr result, if you note the item in categoryType all seem to be within one str, I would expect it to be in multiple strings within the array, is this assumption wrong? doc − arr name=categoryType strTOPIC,LANGUAGE/str /arr str name=id40/str str name=titlefeed title/str /doc Here is my import: document name=doc entity name=item query=SELECT f.id, f.title FROM Feed f field column=id name=id / field column=title name=title / entity name=category query=select cfcr.feedId, group_concat(cfcr.categoryType) as categoryType from CFR cfcr where cfcr.feedId = '$ {item.id}' AND group by cfcr.feedId field column=categoryType name=categoryType splityBy=, / /entity /entity In schema: field name=categoryType type=text indexed=true stored=true required=false multiValued=true/ field name=categoryName type=text indexed=true stored=true required=false multiValued=true/ what am I missing? thanks Joel On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote: That depends a bit on your database, but it is tricky and might not be performant. If you are more of a Java developer, you might prefer retrieving mutliple rows per SOLR document from your dataSource (join on your category and main table), and aggregate them in your custom EntityProcessor. I got a far(!) better performance retrieving everything in one query and doing the aggregation in Java. But this is, of course, depending on your table structure and data. Noble Paul helped me with the custom EntityProcessor, and it turned out quite easy. Have a look at the thread with the heading from this mailing list (SOLR-USER): DataImportHandler / Import from DB : one data set comes in multiple rows Cheers, Chantal Joel Nylund schrieb: thanks, but im confused how I can aggregate across rows, I dont know of any easy way to get my db to return one row for all the categories (given the hint from your other email), I have split the category query into a separate entity, but its returning multiple rows, how do I combine multiple rows into 1 index entity? thanks Joel On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote: In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would the query above work?. Yes you are right. A multivalued field for categories is the answer. For populating in the index - 1. If you use DIH to populate your indexes and your datasource is a database then you can use DIH's RegexTransformer on an aggregated list of categories. e.g. if your database query retruns a,b,c,d in a column called db_categories, this is how you would put it in DIH's data-config file - field column=db_categories name=categories splityBy=, /. 2. If you add documents to Solr yourself multiple values for the field can be specified as an array or list of values in the SolrInputDocument. A multivalued field provides the same faceting and searching capabilites like regular fields. There is no special syntax. Cheers Avlesh On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I have one index so far which contains feeds. I have been able to de-normalize several tables and map this data onto the feed entity. There is one tricky problem that I need help on. Feeds have 1 - many categories. So Lets say we have Category1, Category2 and Category3 Feed 1 - is in Category 1 Feed 2 is in category2 and category3 Feed 3 is in category2 Feed 4 has no category In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would the query above work?. thanks Joel
Re: weird problem with letters S and T
Hey everyone thanks for the help, it seems to be working this am after a restart reindex (maybe I was just too sleepy last night), and using field type of text_ws. Im curios about the pro's and cons of Michel's approach below, this seems like another good way to do it, is there any difference in terms of performance and/or index size or anything else I need to worry about. My index will have about 3million records in prod, im testing with 300k (1/10 scale) now and it seems fine. thanks Joel On Oct 29, 2009, at 8:09 AM, Michel Bottan wrote: Hi Joel, If you intend querying for the TITLE which starts with specifics letters, I have another solution which seems to be easier, since you don't need a specific field for the first letter. 1. Create a new type in your schema.xml using the following analyzer fieldType name=text_sort class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=([^a-zA-Z0-9]) replacement= replace=all/ /analyzer /fieldType 2. Create a copy field from its original field name=title_sorttype=text_sort indexed=true stored=false/ copyField source=title dest=title_sort/ 3. Use Filter Quey to filter i.e. fq=title_sort:[a TO b]s=title_sort asc (títulos começando em A até N) 4. Read field value for presentation from the original field Cheers! Michel Bottan On Thu, Oct 29, 2009 at 1:23 AM, Norberto Meijome numard...@gmail.comwrote: On Wed, 28 Oct 2009 19:20:37 -0400 Joel Nylund jnyl...@yahoo.com wrote: Well I tried removing those 2 letters from stopwords, didnt seem to help, I also tried changing the field type to text_ws, didnt seem to work. Any other ideas? Hi Joel, if your stop word filter was applied on index, you will have to reindex again (at least those documents with S and T). If your stop filter was *only* on query, then it should work after you reloaded your app. b _ {Beto|Norberto|Numard} Meijome Those who do not remember the past are condemned to repeat it. George Santayana I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
data import with transformer
Hi, I have been reading the solr book and wiki, but I cant find any similar examples to what Im looking for. I have a database field called category, this field needs some text manipulation before it goes in the index here is the java code for what im trying to do: // categories look like this prefix category suffix // I want to turn them into category remove prefix and suffix and spaces before and after public static String getPrettyCategoryName(String categoryName) { String result; if (categoryName == null || categoryName.equals()) { // nothing to do; just return what was passed in. result = categoryName; } else { result = categoryName.toLowerCase(); if (result.startsWith(startString)) { result = result.substring(startString.length()); } if (result.endsWith(endString)) { result = result.substring(0, (result.length() - endString .length())); } if (result.length() 0) { result = Character.toUpperCase(result.charAt(0)) + result.substring(1); } } return result; } Can I have a transformer call a java method? It seems like I can, but how do I transform must one column. If someone can point me to a complete example that transforms a column using java or javascript im sure I can figure this out thanks Joel
multiple sql queries for one index?
Hi, Its been hurting my brain all day to try to build 1 query for my index (joins upon joins upon joins). Is there a way I can do multiple queries to populate the same index? I have one main table that I can join everything back via ID, it should be theoretically possible If this can be done, can someone point me to an example? thanks Joel
best way to model 1-N
Hi, I have one index so far which contains feeds. I have been able to de- normalize several tables and map this data onto the feed entity. There is one tricky problem that I need help on. Feeds have 1 - many categories. So Lets say we have Category1, Category2 and Category3 Feed 1 - is in Category 1 Feed 2 is in category2 and category3 Feed 3 is in category2 Feed 4 has no category In the database this is modeled a a 1-N where category table has the mapping of feed to category I need to be able to query , give me all the feeds in any given category. How can I best model this in solr? Seems like multiValued field might help, but how would I populate it, and would the query above work?. thanks Joel
weird problem with letters S and T
(I am super new to solr, sorry if this is an easy one) Hi, I want to support an A-Z type view of my data. I have a DataImportHandler that uses sql (my query is complex, but the part that matters is: SELECT f.id, f.title, LEFT(f.title,1) as firstLetterTitle FROM Foo f I can create this index with no issues. I can query the title with no problem: http://localhost:8983/solr/select?q=title:super I can query the first letters mostly with no problem: http://localhost:8983/solr/select?q=firstLetterTitle:a Returns all the foo's with the first letter a. This actually works with every letter except S and T If I query those, I get no results. The weird thing if I do the title query above with Super I get lots of results, and the xml shoes the firstLetterTitles for those to be S doc str name=firstLetterTitleS/str str name=id84861348/str str name=titleSuper Cool/str /doc − doc str name=firstLetterTitleS/str str name=id108692/str str name=titleSuper 45/str /doc − doc etc. Any ideas, are S and T special chars in query for solr? here is the response from the s query with debug = true response − lst name=responseHeader int name=status0/int int name=QTime24/int − lst name=params str name=qfirstLetterTitle:s/str str name=debugQuerytrue/str /lst /lst result name=response numFound=0 start=0/ − lst name=debug str name=rawquerystringfirstLetterTitle:s/str str name=querystringfirstLetterTitle:s/str str name=parsedquery/ str name=parsedquery_toString/ lst name=explain/ str name=QParserOldLuceneQParser/str − lst name=timing double name=time2.0/double − lst name=prepare double name=time1.0/double − lst name=org.apache.solr.handler.component.QueryComponent double name=time1.0/double /lst − lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst − lst name=process double name=time0.0/double − lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response thanks Joel
Re: weird problem with letters S and T
Thanks Bern, now that you mention it they are in there, I assume if I remove them it will work, but I probably dont want to do that right? Is there a way for this particular query to ignore stopwords thanks Joel On Oct 28, 2009, at 6:20 PM, Bernadette Houghton wrote: Hi Joel, I had a similar issue the other day; in my case the solution turned out to be that the letters were stopwords. Don't know if this is your answer, but worth checking. Bern -Original Message- From: Joel Nylund [mailto:jnyl...@yahoo.com] Sent: Thursday, 29 October 2009 9:17 AM To: solr-user@lucene.apache.org Subject: weird problem with letters S and T (I am super new to solr, sorry if this is an easy one) Hi, I want to support an A-Z type view of my data. I have a DataImportHandler that uses sql (my query is complex, but the part that matters is: SELECT f.id, f.title, LEFT(f.title,1) as firstLetterTitle FROM Foo f I can create this index with no issues. I can query the title with no problem: http://localhost:8983/solr/select?q=title:super I can query the first letters mostly with no problem: http://localhost:8983/solr/select?q=firstLetterTitle:a Returns all the foo's with the first letter a. This actually works with every letter except S and T If I query those, I get no results. The weird thing if I do the title query above with Super I get lots of results, and the xml shoes the firstLetterTitles for those to be S doc str name=firstLetterTitleS/str str name=id84861348/str str name=titleSuper Cool/str /doc − doc str name=firstLetterTitleS/str str name=id108692/str str name=titleSuper 45/str /doc − doc etc. Any ideas, are S and T special chars in query for solr? here is the response from the s query with debug = true response − lst name=responseHeader int name=status0/int int name=QTime24/int − lst name=params str name=qfirstLetterTitle:s/str str name=debugQuerytrue/str /lst /lst result name=response numFound=0 start=0/ − lst name=debug str name=rawquerystringfirstLetterTitle:s/str str name=querystringfirstLetterTitle:s/str str name=parsedquery/ str name=parsedquery_toString/ lst name=explain/ str name=QParserOldLuceneQParser/str − lst name=timing double name=time2.0/double − lst name=prepare double name=time1.0/double − lst name=org.apache.solr.handler.component.QueryComponent double name=time1.0/double /lst − lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst − lst name=process double name=time0.0/double − lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response thanks Joel
Re: weird problem with letters S and T
Well I tried removing those 2 letters from stopwords, didnt seem to help, I also tried changing the field type to text_ws, didnt seem to work. Any other ideas? thanks Joel On Oct 28, 2009, at 6:42 PM, Martijn v Groningen wrote: I think that is not a problem, because your are only storing one character per field. There are other text field types that do not have the stop word filter, so give your first letter field that field type. In this way stopword filter analyser is only disabled for searches on the first letter field. Cheers, Martijn 2009/10/28 Joel Nylund jnyl...@yahoo.com: Thanks Bern, now that you mention it they are in there, I assume if I remove them it will work, but I probably dont want to do that right? Is there a way for this particular query to ignore stopwords thanks Joel On Oct 28, 2009, at 6:20 PM, Bernadette Houghton wrote: Hi Joel, I had a similar issue the other day; in my case the solution turned out to be that the letters were stopwords. Don't know if this is your answer, but worth checking. Bern -Original Message- From: Joel Nylund [mailto:jnyl...@yahoo.com] Sent: Thursday, 29 October 2009 9:17 AM To: solr-user@lucene.apache.org Subject: weird problem with letters S and T (I am super new to solr, sorry if this is an easy one) Hi, I want to support an A-Z type view of my data. I have a DataImportHandler that uses sql (my query is complex, but the part that matters is: SELECT f.id, f.title, LEFT(f.title,1) as firstLetterTitle FROM Foo f I can create this index with no issues. I can query the title with no problem: http://localhost:8983/solr/select?q=title:super I can query the first letters mostly with no problem: http://localhost:8983/solr/select?q=firstLetterTitle:a Returns all the foo's with the first letter a. This actually works with every letter except S and T If I query those, I get no results. The weird thing if I do the title query above with Super I get lots of results, and the xml shoes the firstLetterTitles for those to be S doc str name=firstLetterTitleS/str str name=id84861348/str str name=titleSuper Cool/str /doc − doc str name=firstLetterTitleS/str str name=id108692/str str name=titleSuper 45/str /doc − doc etc. Any ideas, are S and T special chars in query for solr? here is the response from the s query with debug = true response − lst name=responseHeader int name=status0/int int name=QTime24/int − lst name=params str name=qfirstLetterTitle:s/str str name=debugQuerytrue/str /lst /lst result name=response numFound=0 start=0/ − lst name=debug str name=rawquerystringfirstLetterTitle:s/str str name=querystringfirstLetterTitle:s/str str name=parsedquery/ str name=parsedquery_toString/ lst name=explain/ str name=QParserOldLuceneQParser/str − lst name=timing double name=time2.0/double − lst name=prepare double name=time1.0/double − lst name=org.apache.solr.handler.component.QueryComponent double name=time1.0/double /lst − lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst − lst name=process double name=time0.0/double − lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst − lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response thanks Joel