Re: Facet filter: how to specify OR expression?
How about fq=docType:(pdf OR txt) - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930648.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: K-Stemmer for Solr 3.1
Am 12.05.2011 02:05, schrieb Mark: It appears that the older version of the Lucid Works KStemmer is incompatible with Solr 3.1. Has anyone been able to get this to work? If not, what are you using as an alternative? Thanks Lucid KStemmer works nice with Solr3.1 after some minor mods to KStemFilter.java and KStemFilterFactory.java. What problems do you have? Bernd
Applying SOLR-236 field collapse patch to Solr 3.1.0
I've been trying to install the field collapse patch to solr 3.1.0 using the following link :br/br/https://issues.apache.org/jira/browse/SOLR-236br/However, I'm not entirely sure which patch to download. How do I decide on that?br/Also, as I understand it, I have to cd into the apache-solr-3.1.0 directory and then execute patch -p1 /pathtopatchfile.Is that correct?
Re: Indexing Mails
what kind of emails you want to parse ? MS emails ? You could integrate apache tika but it depends on what kind of emails Tika parser would be able to parse You can define the fields that could be parsed and define that in your xml schema thanks On Tue, May 10, 2011 at 2:07 PM, Jörg Agatz joerg.ag...@googlemail.comwrote: will the E-Mail ID, and the recent E-Mail Ids, indext too? and witch fiels i have to create in schema.xml? -- Chandan Tamrakar * *
Re: Is it possible to build Solr as a maven project?
On Tue, May 10, 2011 at 3:56 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: On Tue, May 10, 2011 at 3:50 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, There are some Maven instructions here (not in Lucene/Solr 3.1 because I just wrote the file a couple of days ago): http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/dev-tools/maven/README.maven My recommendation, since the Solr 3.1 source tarball does not include dev-tools/, is to check out the 3.1-tagged sources from Subversion: svn co http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1 and then follow the instructions in the above-linked README.maven. I did that just now and it worked for me. The results are in solr/package/maven/. I did that and i think they worked for me but i didn't get nutch to work with it, so I preferred to revert to what is officially supported (not even, but...). I'll be trying and report back. Everything worked! Those the revisions used: $ svn co -r 1101526 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1 solr 1086822 $ svn co -r 1101540 http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 nutch Thank you Please write back if you run into any problems. Steve From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Tuesday, May 10, 2011 8:37 AM To: boutr...@gmail.com Cc: solr-user@lucene.apache.org; Steven A Rowe; ryan...@gmail.com Subject: Re: Is it possible to build Solr as a maven project? sorry, this was not the target I used (this one should work too, but...), Can we expand on the but...? $ wget http://apache.panu.it//lucene/solr/3.1.0/apache-solr-3.1.0-src.tgz http://apache.panu.it/lucene/solr/3.1.0/apache-solr-3.1.0-src.tgz $ tar xf apache-solr-3.1.0-src.tgz $ cd apache-solr-3.1.0 $ ant generate-maven-artifacts generate-maven-artifacts: get-maven-poms: BUILD FAILED /Users/simpatico/Downloads/apache-solr-3.1.0/build.xml:59: The following error occurred while executing this line: /Users/simpatico/Downloads/apache-solr-3.1.0/lucene/build.xml:445: The following error occurred while executing this line: /Users/simpatico/Downloads/apache-solr-3.1.0/build.xml:45: /Users/simpatico/Downloads/apache-solr-3.1.0/dev-tools/maven does not exist. Now for those that build this, it must have worked sometime. How? Or is this a bug in the release? Looking the revisions history of the build script I might be referring to LUCENE-2490https://issues.apache.org/jira/browse/LUCENE-2490 but I'm not sure I understand the solution out. I've checked out dev-tools but even with it things don't work (tried the one with 3.1.0 relesase). the one I used is get-maven-poms. That will just create pom files and copy them to their right target locations. I'm using netbeans and I'm using the plugin Automatic Projects to do everything inside the IDE. Which version of Solr are you using ? Ludovic. 2011/5/4 Gabriele Kahlout [via Lucene] ml-node+2898211-2124746009-383...@n3.nabble.commailto: ml-node%2b2898211-2124746009-383...@n3.nabble.com generate-maven-artifacts: [mkdir] Created dir: /Users/simpatico/SOLR_HOME/build/maven [mkdir] Created dir: /Users/simpatico/SOLR_HOME/dist/maven [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/src/maven [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 *BUILD FAILED* /Users/simpatico/SOLR_HOME/*build.xml:800*: The following error occurred while executing this line: /Users/simpatico/SOLR_HOME/common-build.xml:274: artifact:deploy doesn't support the uniqueVersion attribute *build.xml:800: *m2-deploy pom.xml=src/maven/solr-parent-pom.xml.template/ removed uniquVersion attirubte: generate-maven-artifacts: [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/mavenfile:///\\Users\simpatico\SOLR_HOME\dist\maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.solr:solr-parent' [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.solr:solr-parent:1.4.2-SNAPSHOT' [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/lib [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/mavenfile:///\\Users\simpatico\SOLR_HOME\dist\maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for:
Re: Facet filter: how to specify OR expression?
It works. Many thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930783.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet filter: how to specify OR expression?
I have another facet that is of type integer and it gave an exception. Is it true that the field has to be of type string or text for the OR expression to work? -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930863.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet Count Based on Dates
But Pivot Faceting is a feature of Solr 4.0 and I am using 3.1 as that is a stable built and cant use a a Nightly Build. The question was: - I have a schema which has field Polarity which is of type text and it can have three values 0,1 or -1 and CreatedAt which is of type date. *How can I get count of polarity based on dates. For example, it gives the output that on 5/1/2011 there were 10 counts of 0, 10 counts of 1 and 10 counts of -1 * If I use the facet query like this :- http://localhost:8983/solor/select/?q=*:*facet=truefacet.field=Polarity Then I get the count of the complete database lstname=Polarity intname=0531477/int intname=1530682/int /lst The query : http://localhost:8983/solr/select/?q=*:*%20AND%20CreatedAt:[2011-03-10T00:00:00Z%20TO%202011-03-18T23:59:59Z]facet=truefacet.date=CreatedAtfacet.date.start=2011-03-10T00:00:00Zfacet.date.end=2011-03-18T23:59:59Zfacet.date.gap=%2B1DAY http://ec2-50-16-100-114.compute-1.amazonaws.com:8983/solr/select/?q=TweetData:*%20AND%20CreatedAt:[2011-03-10T00:00:00Z%20TO%202011-03-18T23:59:59Z]facet=truefacet.date=CreatedAtfacet.date.start=2011-03-10T00:00:00Zfacet.date.end=2011-03-18T23:59:59Zfacet.date.gap=%2B1DAY Would give me the count of data per day, like this: lstname=CreatedAt intname=2011-03-10T00:00:00Z0/int intname=2011-03-11T00:00:00Z276262/int intname=2011-03-12T00:00:00Z183929/int intname=2011-03-13T00:00:00Z196853/int intname=2011-03-14T00:00:00Z2967/int intname=2011-03-15T00:00:00Z22762/int intname=2011-03-16T00:00:00Z11299/int intname=2011-03-17T00:00:00Z37433/int intname=2011-03-18T00:00:00Z14359/int strname=gap+1DAY/str datename=start2011-03-10T00:00:00Z/date datename=end2011-03-19T00:00:00Z/date /lst How will I be able to get the Polarity count for each date like:- 2011-03-10T00:00:00Z Polarity 0 = 100 1 = 500 -1 = 200 2011-03-11T00:00:00Z Polarity 0=100 1=500 -1=200 And so on till the date range ends. On 10-05-2011 15:51, Grijesh wrote: Have you looked at Pivot Faceting http://wiki.apache.org/solr/HierarchicalFaceting http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting-1 - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2922541.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards Jasneet Sabharwal Software Developer NextGen Invent Corporation +91-9871228582
Re: Facet filter: how to specify OR expression?
No, OR operator should work for any data type - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930915.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet Count Based on Dates
You can apply patch for Hierarchical faceting on Solr 3.1 - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2930924.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spatial search - SOLR 3.1
Hello David, It's easy to calculate it by myself but it was nice if SOLR returns distance in the response. I can sort on distance and calculate the distance with PHP to show it to the users. -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial-search-SOLR-3-1-tp2927579p2930926.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet filter: how to specify OR expression?
The exception says: java.lang.NumberFormatExcepton: for input string or The field type is: fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2931282.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet filter: how to specify OR expression?
The input parameter assigning to the field tint is type string (or). It is trying to assign tint=or which is incorrect. So the respective exception has occurred. On Thu, May 12, 2011 at 4:10 PM, cnyee yeec...@gmail.com wrote: The exception says: java.lang.NumberFormatExcepton: for input string or The field type is: fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2931282.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document match with no highlight
Hi, Ok, here it is. Please note that I had to type everything. I did double and triple check for typos. I do not use term vectors. I also left out the timing section. Thanks for all the help. P. URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr name=parsed_filter_queries/ lst name=timing ... /lst /response On Wed, May 11, 2011 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote: I can upload the search URL and part of the output but not all of it. Company trade secrets does not allow me to upload the content of the DOC_TEXT field. I can upload the debug output section and whatever else is needed but I cannot upload the actual document data. Please let me know if any of this will help without the actual data. Sure they will help. Seeing complete list of parameters. Do you store term vectors?
Spellcheck: Two dictionaries
Hello, I have 2 fields: what and where. For both of the field i want some spellchecking. I have 2 dictionary's in my config: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypews/str lst name=spellchecker str name=namewhat/str str name=fieldwhat/str str name=spellcheckIndexDirspellchecker_what/str /lst lst name=spellchecker str name=namewhere/str str name=fieldwhere/str str name=spellcheckIndexDirspellchecker_where/str /lst /searchComponent I can search on dictionary with spellcheck.dictionary=what in my url. How can i set some spellchecking for both fields?? I see that SOLR 3.1 has spellcheck.DICT_NAME.key parameter. How can i use that in my url? -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-Two-dictionaries-tp2931458p2931458.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to update database record after indexing
actually every hour some records are inserted into database, so every hour solr indexing will be called with delta import, notes: records and data are very large (in GBs) so each time to find all solr index and update database records process will be slow. is there any eventlistners or snapshooter can help me to solve this problem ? Thanks, Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p2931537.html Sent from the Solr - User mailing list archive at Nabble.com.
Coord in queryExplain
Hello, I'm wondering why the results of coord() are not displayed when debugging query results, as described in the wiki[1http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22]. I'd like to see it. Could someone point to how to make it appear with the debug fields? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Coord in queryExplain
I'm wondering why the results of coord() are not displayed when debugging query results, as described in the wiki[1http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22]. I'd like to see it. Could someone point to how to make it appear with the debug fields? coord info displayed, however it seems that it is not displayed for value of 1.0 . To see coord, issue a multi-word query, and advance to the end of the list via start param.
Re: Coord in queryExplain
You are right! On Thu, May 12, 2011 at 2:54 PM, Ahmet Arslan iori...@yahoo.com wrote: I'm wondering why the results of coord() are not displayed when debugging query results, as described in the wiki[1 http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22 ]. I'd like to see it. Could someone point to how to make it appear with the debug fields? coord info displayed, however it seems that it is not displayed for value of 1.0 . To see coord, issue a multi-word query, and advance to the end of the list via start param. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Document match with no highlight
URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr name=parsed_filter_queries/ lst name=timing ... /lst /response Nothing looks suspicious. Can you provide two things more; fieldType of DOC_TEXT and field definition of DOC_TEXT. Also do you get snippet from the same doc, when you remove quotes from your query?
Re: Facet Count Based on Dates
Is it possible to use the features of 3.1 by default for my query ? On 12-05-2011 13:38, Grijesh wrote: You can apply patch for Hierarchical faceting on Solr 3.1 - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2930924.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards Jasneet Sabharwal Software Developer NextGen Invent Corporation +91-9871228582
Re: Facet Count Based on Dates
Or is it possible to use a Group By query in Solr 3.1 like we do in SQL ? On 12-05-2011 19:37, Jasneet Sabharwal wrote: Is it possible to use the features of 3.1 by default for my query ? On 12-05-2011 13:38, Grijesh wrote: You can apply patch for Hierarchical faceting on Solr 3.1 - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2930924.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards Jasneet Sabharwal Software Developer NextGen Invent Corporation +91-9871228582
Re: K-Stemmer for Solr 3.1
java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z Would you mind explaining your modifications? Thanks On 5/11/11 11:14 PM, Bernd Fehling wrote: Am 12.05.2011 02:05, schrieb Mark: It appears that the older version of the Lucid Works KStemmer is incompatible with Solr 3.1. Has anyone been able to get this to work? If not, what are you using as an alternative? Thanks Lucid KStemmer works nice with Solr3.1 after some minor mods to KStemFilter.java and KStemFilterFactory.java. What problems do you have? Bernd
MoreLikeThis PDF search
Hi all, I've become more and more familiar with the MoreLikeThis handler over the last several months. I'm curious whether it is possible to do a MoreLikeThis search by uploading a PDF? I looked at the ExtractingRequestHandler and that looks like it that is used to process PDF files and the like but is it possible to combine the two? Just to be clear, I don't want to send a PDF and have that be a part of the index. But rather, I'd like to be able to use the PDF as a MoreLikeThis search. Thanks, Brian Lamb
RE: Document match with no highlight
Don't you need to include your unique id field in your 'fl' parameter? It will be needed anyways so you can match up the highlight fragments with the result docs once highlighting is working... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com Join the conversation - you may even get an iPad or Nook out of it! Like us on Facebook! Follow us on Twitter! -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, May 12, 2011 7:10 AM To: solr-user@lucene.apache.org Subject: Re: Document match with no highlight URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%2 23+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexpl ainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr name=parsed_filter_queries/ lst name=timing ... /lst /response Nothing looks suspicious. Can you provide two things more; fieldType of DOC_TEXT and field definition of DOC_TEXT. Also do you get snippet from the same doc, when you remove quotes from your query?
TrieIntField for short values
Hello, I'm quite a beginner in solr and have many doubts while trying to learn how everything works. I have only a slight idea on how TrieFields work. The thing is I have an integer value that will always be in the range 0-1000. A short field would be enough for this, but there is no such TrieShortField (not even a SortableShortField). So, I used a TrieIntField. My doubt is, in this case, what would be a suitable value for precisionStep. If the field had only 1000 distinct values, but they were more or less uniformly distributed in the 32-bit int range, probably a big precisionStep would be suitable. But as my values are in the range 0 to 1000, I think (without much knowledge) that a low precisionStep should be more adequate. For example, 2. Can anybody, please, help me finding a good configuration for this type? And, if possible, can anybody explain in a brief and intuitive way what are the differences and tradeoffs of choosing smaller or bigger precisionSteps? Thanks a lot, Juan
Re: Result docs missing only when shards parameter present in query?
Does this seem like it would be a configuration issue, an indexed data issue, or something else? Thanks mrw wrote: We have two Solr nodes, each with multiple shards. If we query each shard directly (no shards parameter), we get the expected results: response lst name=responseHeader int name=status 0 int name=QTime 22 result name=response numFound=100 start=0 doc doc (^^^ hand-typed pseudo XML) However, if we add the shards parameter and even supply one of the above shards, we get the same number of results, but all the doc elements under the result element are missing: response lst name=responseHeader int name=status 0 int name=QTime 33 result name=response numFound=100 start=0 (^^^ note missing doc elements) It doesn't matter which shard is specified in the shards parameter; if any or all of the shards are specified after the shards parameter, we see this behavior. When we go to http://server:8983/solr/ on either node, we see all the shards properly listed. So, the shards seem to be registered properly, and work individually, but not when the shards parameter is supplied. Any ideas? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Result-docs-missing-only-when-shards-parameter-present-in-query-tp2928889p2932248.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document match with no highlight
Hi, field name=DOC_TEXT type=text indexed=true stored=true/ The type text is the default one that came with the default solr 1.4 install w.o any modifications. If I remove the quotes I do get snipets. In fact if I did 3 1 15~1 I do get snipet also. Hope that helps. P. On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote: URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr name=parsed_filter_queries/ lst name=timing ... /lst /response Nothing looks suspicious. Can you provide two things more; fieldType of DOC_TEXT and field definition of DOC_TEXT. Also do you get snippet from the same doc, when you remove quotes from your query?
Re: Document match with no highlight
Hi, I use uniqueKeyDOC_ID/uniquekey in schema.xml I think this is the default unique id that is used for matching. Someone correct me if I am wrong. P. On Thu, May 12, 2011 at 11:01 AM, Bob Sandiford bob.sandif...@sirsidynix.com wrote: Don't you need to include your unique id field in your 'fl' parameter? It will be needed anyways so you can match up the highlight fragments with the result docs once highlighting is working... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com Join the conversation - you may even get an iPad or Nook out of it! Like us on Facebook! Follow us on Twitter! -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, May 12, 2011 7:10 AM To: solr-user@lucene.apache.org Subject: Re: Document match with no highlight URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%2 23+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexpl ainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr name=parsed_filter_queries/ lst name=timing ... /lst /response Nothing looks suspicious. Can you provide two things more; fieldType of DOC_TEXT and field definition of DOC_TEXT. Also do you get snippet from the same doc, when you remove quotes from your query?
Changing the schema
If I change the field type in my schema, do I need to rebuild the entire index? I'm at a point now where it takes over a day to do a full import due to the sheer size of my application and I would prefer not having to reindex just because I want to make a change somewhere. Thanks, Brian Lamb
RE: Document match with no highlight
In fact if I did 3 1 15~1 I do get snipet also. Strange, I had a very similar problem, but with overlapping tokens. Since you're using the standard text field, this should be you're case. Maybe you could have a look at this issue, since it sound very familiar to me : https://issues.apache.org/jira/browse/LUCENE-3087 Pierre -Message d'origine- De : Phong Dais [mailto:phong.gd...@gmail.com] Envoyé : jeudi 12 mai 2011 17:26 À : solr-user@lucene.apache.org Objet : Re: Document match with no highlight Hi, field name=DOC_TEXT type=text indexed=true stored=true/ The type text is the default one that came with the default solr 1.4 install w.o any modifications. If I remove the quotes I do get snipets. In fact if I did 3 1 15~1 I do get snipet also. Hope that helps. P. On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote: URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr name=parsed_filter_queries/ lst name=timing ... /lst /response Nothing looks suspicious. Can you provide two things more; fieldType of DOC_TEXT and field definition of DOC_TEXT. Also do you get snippet from the same doc, when you remove quotes from your query?
RE: Document match with no highlight
Since you're using the standard text field, this should NOT be you're case. Sorry, for the missing NOT in previous phrase. You should have the same issue given what you said, but still, it sound very similar. Are you sure your fieldtype text has nothing special ? a tokenizer or filter that could add some token in your indexed text but not in your query, like for example a WordDelimiter present in index and not query ? Pierre -Message d'origine- De : Pierre GOSSE [mailto:pierre.go...@arisem.com] Envoyé : jeudi 12 mai 2011 18:21 À : solr-user@lucene.apache.org Objet : RE: Document match with no highlight In fact if I did 3 1 15~1 I do get snipet also. Strange, I had a very similar problem, but with overlapping tokens. Since you're using the standard text field, this should be you're case. Maybe you could have a look at this issue, since it sound very familiar to me : https://issues.apache.org/jira/browse/LUCENE-3087 Pierre -Message d'origine- De : Phong Dais [mailto:phong.gd...@gmail.com] Envoyé : jeudi 12 mai 2011 17:26 À : solr-user@lucene.apache.org Objet : Re: Document match with no highlight Hi, field name=DOC_TEXT type=text indexed=true stored=true/ The type text is the default one that came with the default solr 1.4 install w.o any modifications. If I remove the quotes I do get snipets. In fact if I did 3 1 15~1 I do get snipet also. Hope that helps. P. On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote: URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr name=parsed_filter_queries/ lst name=timing ... /lst /response Nothing looks suspicious. Can you provide two things more; fieldType of DOC_TEXT and field definition of DOC_TEXT. Also do you get snippet from the same doc, when you remove quotes from your query?
Support for huge data set?
Hi, I have about 300 million docs (or 10TB data) which is doubling every 3 years, give or take. The data mostly consists of Oracle records, webpage files (HTML/XML, etc.) and office doc files. There are b/t two and four dozen concurrent users, typically. The indexing server has 27 GB of RAM, but it still gets extremely taxed, and this will only get worse. Would Solr be able to efficiently deal with a load of this size? I am trying to avoid the heavy cost of GSA, etc... Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Support for huge data set?
I have the same questions. But from your message, I couldn't tell. Are you using Solr now? Or some other indexing server? Darren On Thu, 2011-05-12 at 09:59 -0700, atreyu wrote: Hi, I have about 300 million docs (or 10TB data) which is doubling every 3 years, give or take. The data mostly consists of Oracle records, webpage files (HTML/XML, etc.) and office doc files. There are b/t two and four dozen concurrent users, typically. The indexing server has 27 GB of RAM, but it still gets extremely taxed, and this will only get worse. Would Solr be able to efficiently deal with a load of this size? I am trying to avoid the heavy cost of GSA, etc... Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Support for huge data set?
Oh, my fault. No, I am not using Solr yet - just evaluating it. The current implementation is a combination of Sphinx and Oracle Text, but I have not been involved with any of the integration - I'm more of an outside analyst looking in, but will probably be involved in the integration of any new methods, particularly Open Source ones. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932704.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Support for huge data set?
Ok, thanks. Yeah, I'm in the same boat and want to know what others have done with document numbers that large. I know there is SolrCloud that can federate numerous solr instances and query across them, so I suspect some solution with 100's of M's of docs would require a federation. If anyone has done this, some best practices would be great to know! On Thu, 2011-05-12 at 10:10 -0700, atreyu wrote: Oh, my fault. No, I am not using Solr yet - just evaluating it. The current implementation is a combination of Sphinx and Oracle Text, but I have not been involved with any of the integration - I'm more of an outside analyst looking in, but will probably be involved in the integration of any new methods, particularly Open Source ones. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932704.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document match with no highlight
Hi, I read the link provided and I'll need some time to digest what it is saying. Here's my text fieldtype. fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimeterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimeterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer fieldtype Also, I figured out what value in DOC_TEXT cause this issue to occur. With a DOC_TEXT of (without the quotes): 0176 R3 1.5 TO Searching for 3 1 15 returns a match with empty highlight. Searching for 3 1 15~1 returns a match with highlight. Can anyone see anything that I'm missing? Thanks, P. On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE pierre.go...@arisem.comwrote: Since you're using the standard text field, this should NOT be you're case. Sorry, for the missing NOT in previous phrase. You should have the same issue given what you said, but still, it sound very similar. Are you sure your fieldtype text has nothing special ? a tokenizer or filter that could add some token in your indexed text but not in your query, like for example a WordDelimiter present in index and not query ? Pierre -Message d'origine- De : Pierre GOSSE [mailto:pierre.go...@arisem.com] Envoyé : jeudi 12 mai 2011 18:21 À : solr-user@lucene.apache.org Objet : RE: Document match with no highlight In fact if I did 3 1 15~1 I do get snipet also. Strange, I had a very similar problem, but with overlapping tokens. Since you're using the standard text field, this should be you're case. Maybe you could have a look at this issue, since it sound very familiar to me : https://issues.apache.org/jira/browse/LUCENE-3087 Pierre -Message d'origine- De : Phong Dais [mailto:phong.gd...@gmail.com] Envoyé : jeudi 12 mai 2011 17:26 À : solr-user@lucene.apache.org Objet : Re: Document match with no highlight Hi, field name=DOC_TEXT type=text indexed=true stored=true/ The type text is the default one that came with the default solr 1.4 install w.o any modifications. If I remove the quotes I do get snipets. In fact if I did 3 1 15~1 I do get snipet also. Hope that helps. P. On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote: URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr
What is correct use of HTMLStripCharFilter in Solr 3.1
Hi, I recently upgraded from Solr 1.3 to Solr 3.1 in order to take advantage of the HTMLStripCharFilter. But it isn't working as I expected. I have a text field that may contain HTML tags. I however would like to store it in Solr without the HTML tags. And retrieve the text field for display and for highlighting without HTML tags. I added charFilter class=solr.HTMLStripCharFilterFactory/ to the top of fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true in the schema.xml file of the solr example, both in analyzer type=index and in analyzer type=query. And the text field is simply: field name=text type=text indexed=true stored=true/ Now, when I do a search. The text field still has all the HTML tags in them and the highlighting is totally screwed up with em tags around virtually every word. What am I doing wrong? Kind regards, Nick -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-correct-use-of-HTMLStripCharFilter-in-Solr-3-1-tp2933021p2933021.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Anyone familiar with Solandra or Lucendra?
I modified the subject to include Lucendra, in case anyone has heard of it by that name. -- View this message in context: http://lucene.472066.n3.nabble.com/Anyone-familiar-with-Solandra-or-Lucendra-tp2927357p2933051.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is correct use of HTMLStripCharFilter in Solr 3.1
I recently upgraded from Solr 1.3 to Solr 3.1 in order to take advantage of the HTMLStripCharFilter. But it isn't working as I expected. I have a text field that may contain HTML tags. I however would like to store it in Solr without the HTML tags. And retrieve the text field for display and for highlighting without HTML tags. I added charFilter class=solr.HTMLStripCharFilterFactory/ to the top of fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true in the schema.xml file of the solr example, both in analyzer type=index and in analyzer type=query. And the text field is simply: field name=text type=text indexed=true stored=true/ Now, when I do a search. The text field still has all the HTML tags in them and the highlighting is totally screwed up with em tags around virtually every word. What am I doing wrong? You need to strip html tag before analysis phase. If you are using DIH, you can use stripHTML=true transformer.
Re: Anyone familiar with Solandra or Lucandra?
The old name is Lucandra not Lucendra. I've changed the subject accordingly. I'm looking forward to responses from people but I'm afraid it appears it has not yet gotten much uptake yet. I think it has enormous potential once it's hardened a bit and there's more documentation. Personally, I've been looking forward to kicking the tires a bit once I get some time. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On May 12, 2011, at 2:54 PM, kenf_nc wrote: I modified the subject to include Lucendra, in case anyone has heard of it by that name. -- View this message in context: http://lucene.472066.n3.nabble.com/Anyone-familiar-with-Solandra-or-Lucendra-tp2927357p2933051.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is correct use of HTMLStripCharFilter in Solr 3.1
On 5/12/2011 2:55 PM, Ahmet Arslan wrote: I recently upgraded from Solr 1.3 to Solr 3.1 in order to take advantage of the HTMLStripCharFilter. But it isn't working as I expected. You need to strip html tag before analysis phase. If you are using DIH, you can use stripHTML=true transformer. Wait, then what's the HTMLStripCharFilter for?
Re: Support for huge data set?
If each document is VERY small, it's actually possible that one Solr server could handle it -- especially if you DON'T try to do facetting or other similar features, but stick to straight search and relevancy. There are other factors too. But # of documents is probably less important than total size of index, or number of unique terms -- of course # of documents often correlates to those too. But if each document is largeish... yeah, I suspect that'll be too much for any one Solr server. You'll have to use some kind of distribution. Out of the box, Solr has a Distributed Search function meant for this use case. http://wiki.apache.org/solr/DistributedSearch . Some Solr features don't work under a Distributed setup, but the basic ones are there. There are some other add-ons not (yet anyway) part of Solr distro that try to solve this in even more sophisticated ways too, like SolrCloud. I don't personally know of anyone indexing that many documents, although it is probably done. But I do know of the HathiTrust project (not me personally) indexing fewer documents but still adding up to terrabytes of total index (millions to tens of millions of documents, but each one is a digitized book that could be 100-400 pages), using Distributed Searching feature, succesfully, although it required some care and maintenance it wasn't just a turn it on and it works situation. http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond http://www.hathitrust.org/technical_reports/Large-Scale-Search.pdf On 5/12/2011 1:06 PM, Darren Govoni wrote: I have the same questions. But from your message, I couldn't tell. Are you using Solr now? Or some other indexing server? Darren On Thu, 2011-05-12 at 09:59 -0700, atreyu wrote: Hi, I have about 300 million docs (or 10TB data) which is doubling every 3 years, give or take. The data mostly consists of Oracle records, webpage files (HTML/XML, etc.) and office doc files. There are b/t two and four dozen concurrent users, typically. The indexing server has 27 GB of RAM, but it still gets extremely taxed, and this will only get worse. Would Solr be able to efficiently deal with a load of this size? I am trying to avoid the heavy cost of GSA, etc... Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is correct use of HTMLStripCharFilter in Solr 3.1
It preserves the location of the terms in the original HTML document so that you can highlight terms in HTML. This makes it possible (for instance) to display the entire document, with all the search terms highlighted, or (with some careful surgery) to display formatted HTML (bold, italic, etc) in your search results. -Mike On 05/12/2011 03:42 PM, Jonathan Rochkind wrote: On 5/12/2011 2:55 PM, Ahmet Arslan wrote: I recently upgraded from Solr 1.3 to Solr 3.1 in order to take advantage of the HTMLStripCharFilter. But it isn't working as I expected. You need to strip html tag before analysis phase. If you are using DIH, you can use stripHTML=true transformer. Wait, then what's the HTMLStripCharFilter for?
Re: What is correct use of HTMLStripCharFilter in Solr 3.1
Wait, then what's the HTMLStripCharFilter for? To remove html tags in the analysis phase. For instance it can be used to display original html documents with search terms highlighted.
Re: Support for huge data set?
Thanks for the detailed response, Jonathon. I will look into the links and check out SolrCloud and Distributed Search. Load-sharing b/t 2 or 3 servers should not pose a problem, so long as it is robust (or at least not slower), fault-tolerant, and reliable. -- View this message in context: http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2933367.html Sent from the Solr - User mailing list archive at Nabble.com.
field type=string vs field type=text
What is the difference between setting a fields type to string vs setting it to text. e.g. field name=PATH type=string indexed=false stored=true/ or field name=PATH type=text indexed=false stored=true/ -- View this message in context: http://lucene.472066.n3.nabble.com/field-type-string-vs-field-type-text-tp2932083p2932083.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: field type=string vs field type=text
On Thu, May 12, 2011 at 8:23 PM, chetan guptakache...@gmail.com wrote: What is the difference between setting a fields type to string vs setting it to text. e.g. field name=PATH type=string indexed=false stored=true/ or field name=PATH type=text indexed=false stored=true/ [...] Please take a closer look at the fieldType definitions towards the beginning of the default schema.xml. The text type has tokenizers, and analyzers applied to it, while the string type does no processing of the input data. Regards, Gora
A couple newbie questions
Hello! I just started using Solr. My general use case is pushing a lot of data from Hbase to solr via an M/R job using Solrj. I have lots of questions, but the ones I'd like to start with are: (1) I noticed this: http://lucene.472066.n3.nabble.com/what-happens-to-docsPending-if-stop-solr-before-commit-td2781493.html Would seem to indicate that pending documents are commited on restart. This is great! I also noticed, that while there is a lag on start up if I have documents pending - it's only a few minutes or so. But if I issue a commit for the same number of files, the server stays blocked for 20 min or so. It almost seems like it would be a faster to add all my documents and restart the server, rather than issuing a commit. Am I doing something strange? Is this a valid conclusion? (2) I'm also getting a lot of errors about invalid UTF-8: SEVERE: org.apache.solr.common.SolrException: Invalid UTF-8 character 0x at char #2380289, byte #2378666) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) It could be that the values I have in some of my document fields is indeed invalid. My question is what does this mean when I'm submitting a batch of documents (specifically I'm using Solrj's StreamingUpdateSolrServer w/ a BinaryRequestWriter) - do I: - lose the whole batch that has the bad document? - lose the document? - lose the one field? I wish it was the third, hope it's the second, and I'm afraid it's the first... Ooo.. and I guess a third question - I'm having trouble finding a document that describes the overall design/functionality of Solr, something that would help me reason about stuff like what happens to pending documents when the server restarts or does a commit in one indexing thread commit previously added documents from another indexing thread. Both of those I've answered to my satisfaction by looking over the Solr logs mailing lists, but I'm wondering if there's some documentation I missed somehow.. For example, something like this: http://hadoop.apache.org/common/docs/current/hdfs_design.html http://hbase.apache.org/book.html#architecture Thanks! Take care, -stu
Re: field type=string vs field type=text
Hi, my recommendation: To quickly understand the difference between those two different field types, index one document using string and text fields, then facet on those fields and you will see how the terms were indexed. Using one field type or the other will depend on what you want to do with that field. On Thu, May 12, 2011 at 5:18 PM, Gora Mohanty g...@mimirtech.com wrote: On Thu, May 12, 2011 at 8:23 PM, chetan guptakache...@gmail.com wrote: What is the difference between setting a fields type to string vs setting it to text. e.g. field name=PATH type=string indexed=false stored=true/ or field name=PATH type=text indexed=false stored=true/ [...] Please take a closer look at the fieldType definitions towards the beginning of the default schema.xml. The text type has tokenizers, and analyzers applied to it, while the string type does no processing of the input data. Regards, Gora
Re: Replication Clarification Please
Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved from 1.4.1 to 3.1 and have made several changes to configuration. The configuration changes have worked nicely till now and the replication is finishing within the interval and not backing up. The changes we made are as follows 1. Increased the mergeFactor from 10 to 15 2. Increased ramBufferSizeMB to 1024 3. Changed lockType to single (previously it was simple) 4. Set maxCommitsToKeep to 1 in the deletionPolicy 5. Set maxPendingDeletes to 0 6. Changed caches from LRUCache to FastLRUCache as we had hit ratios well over 75% to increase warming speed 7. Increased the poll interval to 6 minutes and re-indexed all content. Thanks, Ravi Kiran Bhaskar On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky alexan...@trulia.com wrote: Ravi, if you have what looks like a full replication each time even if the master generation is greater than slave, try to watch for the index on both master and slave the same time to see what files are getting replicated. You probably may need to adjust your merge factor, as Bill mentioned. -Alexander On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote: Hello Mr. Kanarsky, Thank you very much for the detailed explanation, probably the best explanation I found regarding replication. Just to be sure, I wanted to test solr 3.1 to see if it alleviates the problems...I dont think it helped. The master index version and generation are greater than the slave, still the slave replicates the entire index form master (see replication admin screen output below). Any idea why it would get the whole index everytime even in 3.1 or am I misinterpreting the output ? However I must admit that 3.1 finished the replication unlike 1.4.1 which would hang and be backed up for ever. Master http://masterurl:post/solr-admin/searchcore/replication Latest Index Version:null, Generation: null Replicatable Index Version:1296217097572, Generation: 12726 Poll Interval 00:03:00 Local Index Index Version: 1296217097569, Generation: 12725 Location: /data/solr/core/search-data/index Size: 944.32 MB Times Replicated Since Startup: 148 Previous Replication Done At: Tue May 10 12:32:42 EDT 2011 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011 Current Replication Status Start Time: Tue May 10 12:32:41 EDT 2011 Files Downloaded: 18 / 108 Downloaded: 317.48 KB / 436.24 MB [0.0%] Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%] Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 KB/s Thanks, Ravi Kiran Bhaskar On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky alexan...@trulia.com wrote: Ravi, as far as I remember, this is how the replication logic works (see SnapPuller class, fetchLatestIndex method): 1. Does the Slave get the whole index every time during replication or just the delta since the last replication happened ? It look at the index version AND the index generation. If both slave's version and generation are the same as on master, nothing gets replicated. if the master's generation is greater than on slave, the slave fetches the delta files only (even if the partial merge was done on the master) and put the new files from master to the same index folder on slave (either index or index.timestamp, see further explanation). However, if the master's index generation is equals or less than one on slave, the slave does the full replication by fetching all files of the master's index and place them into a separate folder on slave (index.timestamp). Then, if the fetch is successfull, the slave updates (or creates) the index.properties file and puts there the name of the current index folder. The old index.timestamp folder(s) will be kept in 1.4.x - which was treated as a bug - see SOLR-2156 (and this was fixed in 3.1). After this, the slave does commit or reload core depending whether the config files were replicated. There is another bug in 1.4.x that fails replication if the slave need to do the full replication AND the config files were changed - also fixed in 3.1 (see SOLR-1983). 2. If there are huge number of queries being done on slave will it affect the replication ? How can I improve the performance ? (see the replications details at he bottom of the page) From my experience the half of the replication time is a time when the transferred data flushes to the disk. So the IO impact is important. 3. Will the segment names be same be same on master and slave after replication ? I see that they are different. Is this correct ? If it is correct how does the slave know what to fetch the next time i.e. the delta. They should be the same. The
DIH help request: nested xml entities and xpath
Apologies in advance if this topic/question has been previously answered…I have scoured the docs, mail archives, web looking for an answer(s) with no luck. I am sure I am just being dense or missing something obvious…please point out my stupidity as my head hurts trying to get this working. Solr 3.1 Java 1.6 Eclipse/Tomcat 7/Maven 2.x Goal: to extract manufacturer names from a repeating list of keywords each denoted by a Category, one of which is Manufacturer, and load them into a MsgKeywordMF field (see xml below) I have xml files I am loading via DIH. This an abbreviated example xml data (each file has repeating Report items, each report has repeating MsgSet, Msg, MsgList, etc items). Notice the nested repeating groups, namely MsgItems, within each document (Report): Report ReportMeta ReportDate02/22/2011/ReportDate … /ReportMeta MsgSet Msg SourceDocIDhttp://someurl.com/path/to/doc/SourceDocID … DocumentTextblah blah/DocumentText MsgList MsgItem MsgTypeSomeType/MsgType CategoryLocation/Category KeywordUSA/Keyword /MsgItem MsgItem MsgTypeAnotherType/MsgType CategoryManufacturer/Category KeywordApple/Keyword /MsgItem … /MsgList /Msg /MsgSet /Report Report … /Report Report … /Report … Here is my data-config.xml: dataConfig dataSource type=FileDataSource encoding=UTF-8 / document entity name=fileload rootEntity=false processor=FileListEntityProcessor fileName=^.*\.xml$ recursive=false baseDir=/files/xml/ entity name=report rootEntity=true pk=id url=${fileload.fileAbsolutePath} processor=XPathEntityProcessor forEach=/Report/MsgSet/Msg onError=skip transformer=DateFormatTransformer,RegexTransformer field column=DocumentText xpath=/Report/MsgSet/Msg/DocumentText/ field column=id xpath=/Report/MsgSet/Msg/SourceDocID/ field column=MsgCategory xpath=/Report/MsgSet/Msg/MsgList/MsgItem/Category / field column=MsgKeyword xpath=/Report/MsgSet/Msg/MsgList/MsgItem/Keyword / field column=MsgKeywordMF xpath=/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword / … /entity /entity /document /dataConfig As seen in my config and sample data above, I am extracting the repeating Keywords into the the MsgKeyword field. Also, and the part that does NOT work, I am trying to extract into a separate field just the keywords that have a Category of Manufacturer -- field column=MsgKeywordMF xpath=/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword / I have also tried: field column=MsgKeywordMF xpath=/Report/MsgSet/Msg/MsgList/MsgItem[@Category='Manufacturer']/Keyword / …after changing the Category to an attribute of MsgItem (MsgItem Category=Location) but it too fails to match. I have tested my xpath notation against my xml data file using various xpath evaluator tools, like within Eclipse, and it matches perfectly…but I can't get it to match/work during import. As I am able to understand it, DIH does not support nested/correlated entities, at least not with XML data sources using nested entity tags. I've tried without success to nest entities but I can't correlate the nested entity with the parent. I think the way I'm trying should work, but no luck so far…. BTW, I can't easily change the xml format, although it is possible with some pain… Any ideas? TIA, -- Eric
solr velocity.log setting
hi all, I'm new to solr, and trying to install it on tomcat. however, an exception was reached when the page http://localhost/sorl/browse was visited: *FileNotFoundException: velocity.log (Permission denied) * looks like solr is trying to create a velocity.log file to tomcat root. so, how should I set the configuration file on solr to change the location that velocity.log is logging to? Thank you. Y
Re: DIH help request: nested xml entities and xpath
Hi All, I am a Java/J2ee programmer and very new to SOLR. I would like to index a table in a postgresSql database to SOLR. Then searching the records from a GUI (Jsp Page) and showing the results in tabular form. Could any one help me out with a simple sample code. Thank you. Regards, Ashique On Fri, May 13, 2011 at 4:53 AM, Weiss, Eric wei...@llnl.gov wrote: Apologies in advance if this topic/question has been previously answered…I have scoured the docs, mail archives, web looking for an answer(s) with no luck. I am sure I am just being dense or missing something obvious…please point out my stupidity as my head hurts trying to get this working. Solr 3.1 Java 1.6 Eclipse/Tomcat 7/Maven 2.x Goal: to extract manufacturer names from a repeating list of keywords each denoted by a Category, one of which is Manufacturer, and load them into a MsgKeywordMF field (see xml below) I have xml files I am loading via DIH. This an abbreviated example xml data (each file has repeating Report items, each report has repeating MsgSet, Msg, MsgList, etc items). Notice the nested repeating groups, namely MsgItems, within each document (Report): Report ReportMeta ReportDate02/22/2011/ReportDate … /ReportMeta MsgSet Msg SourceDocIDhttp://someurl.com/path/to/doc/SourceDocID … DocumentTextblah blah/DocumentText MsgList MsgItem MsgTypeSomeType/MsgType CategoryLocation/Category KeywordUSA/Keyword /MsgItem MsgItem MsgTypeAnotherType/MsgType CategoryManufacturer/Category KeywordApple/Keyword /MsgItem … /MsgList /Msg /MsgSet /Report Report … /Report Report … /Report … Here is my data-config.xml: dataConfig dataSource type=FileDataSource encoding=UTF-8 / document entity name=fileload rootEntity=false processor=FileListEntityProcessor fileName=^.*\.xml$ recursive=false baseDir=/files/xml/ entity name=report rootEntity=true pk=id url=${fileload.fileAbsolutePath} processor=XPathEntityProcessor forEach=/Report/MsgSet/Msg onError=skip transformer=DateFormatTransformer,RegexTransformer field column=DocumentText xpath=/Report/MsgSet/Msg/DocumentText/ field column=id xpath=/Report/MsgSet/Msg/SourceDocID/ field column=MsgCategory xpath=/Report/MsgSet/Msg/MsgList/MsgItem/Category / field column=MsgKeyword xpath=/Report/MsgSet/Msg/MsgList/MsgItem/Keyword / field column=MsgKeywordMF xpath=/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword / … /entity /entity /document /dataConfig As seen in my config and sample data above, I am extracting the repeating Keywords into the the MsgKeyword field. Also, and the part that does NOT work, I am trying to extract into a separate field just the keywords that have a Category of Manufacturer -- field column=MsgKeywordMF xpath=/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword / I have also tried: field column=MsgKeywordMF xpath=/Report/MsgSet/Msg/MsgList/MsgItem[@Category='Manufacturer']/Keyword / …after changing the Category to an attribute of MsgItem (MsgItem Category=Location) but it too fails to match. I have tested my xpath notation against my xml data file using various xpath evaluator tools, like within Eclipse, and it matches perfectly…but I can't get it to match/work during import. As I am able to understand it, DIH does not support nested/correlated entities, at least not with XML data sources using nested entity tags. I've tried without success to nest entities but I can't correlate the nested entity with the parent. I think the way I'm trying should work, but no luck so far…. BTW, I can't easily change the xml format, although it is possible with some pain… Any ideas? TIA, -- Eric
Faceting question
Is there anyway to perform a search that searches across 2 fields yet only gives me facets accounts for documents matching 1 field? For example If I have fields A B and I perform a search across I would like to match my query across either of these two fields. I would then like facet counts for how many documents matched in field A only. Can this accomplished? If not out of the box what classes should I look into to create this myself? Thanks
Fieldcollapsing patxh not applied properly
Hi kai, as per your previous mails you have already applied the patches with solr 1.4.I followed the steps of your mail accordingly . But During step 9 i got the error # 1 out of 1 hunked failed.When I apply ony SOLR-236-1_4_1-paging-totals-working.patch it build successfully but the changes are not get reflected in solr-src . Kindly tell me where I am going wrong. Steps are: 1.Downloaded [solr] 2Downloaded [SOLR-236-1_4_1-paging-totals-working.patch] 3Changed line 2837 of that patch to `@@ -0,0 +1,511 @@` ( 4 Downloaded [SOLR-236-1_4_1-NPEfix.patch] 5 Extracted the Solr archive 6 Applied both patches: 7 `cd apache-solr-1.4.1` 8 `patch -p0 ../SOLR-236-1_4_1-paging-totals-working.patch` 9`patch -p0 ../SOLR-236-1_4_1-NPEfix.patch` 10 Build Solr 11 `ant clean` 12 `ant example` ... tells me BUILD SUCCESSFUL Thanks in advance! Isha garg
Fieldcollapsing patch not applied properly
Hi kai, As per your previous mails you have already applied the patches with solr 1.4.I followed the steps of your mail accordingly . But During step 9 i got the error # 1 out of 1 hunked failed.When I apply ony SOLR-236-1_4_1-paging-totals-working.patch it build successfully but the changes are not get reflected in solr-src . Kindly tell me where I am going wrong. Steps are: 1.Downloaded [solr] 2Downloaded [SOLR-236-1_4_1-paging-totals-working.patch] 3Changed line 2837 of that patch to `@@ -0,0 +1,511 @@` ( 4 Downloaded [SOLR-236-1_4_1-NPEfix.patch] 5 Extracted the Solr archive 6 Applied both patches: 7 `cd apache-solr-1.4.1` 8 `patch -p0 ../SOLR-236-1_4_1-paging-totals-working.patch` 9`patch -p0 ../SOLR-236-1_4_1-NPEfix.patch` 10 Build Solr 11 `ant clean` 12 `ant example` ... tells me BUILD SUCCESSFUL Thanks in advance! Isha garg