Re: How to reserve ids?
Otis, I'm following up on this as solving my problem though the stopwords mechanism would be great. *Do stopwords apply also to the url/id field?* Continuing on the msn.com example, with msn.com as a stopword msn.comwebpage may still actually be indexed if neither the title nor the body contains msn.com. Isn't it? P.S. I just click on 'reply to all' (or reply on the phone). If it bothers you I'll make the less lazy effort of selecting 'reply' [image: replyall.png] On Tue, Sep 27, 2011 at 6:40 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Gabriele, Using msn.com as a stopword would simply mean that msn.com would not be indexed and therefore a search for msn.com would not yield results. You could still search for hotmail and it may match documents that have msn.com token stored in them, even though msn.com is a stopword. Otis P.S. No need to CC me, I'm on the list. Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Gabriele Kahlout gabri...@mysimpatico.com To: solr-user@lucene.apache.org; Otis Gospodnetic otis_gospodne...@yahoo.com Sent: Tuesday, September 27, 2011 1:58 AM Subject: Re: How to reserve ids? I'm interested in the stopwords solution as it sounds like less work but i'm not sure i understand how it works. By having msn.com as a stopword it doesnt mean i wont get msn.com as a result for say 'hotmail'. My understanding is that msn.com will never make it to the similarity function and thus affect the score calculation. But seldom does the url anyway (in my searches on content)! -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How to reserve ids?
Hello, While indexing there are certain urls/ids I'd never want to appear in the search results (so be indexed). Is there already a 'supported by design' mechanism to do that to point me too, or should I just create this blacklist as an processor in the update chain? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to reserve ids?
I'm interested in the stopwords solution as it sounds like less work but i'm not sure i understand how it works. By having msn.com as a stopword it doesnt mean i wont get msn.com as a result for say 'hotmail'. My understanding is that msn.com will never make it to the similarity function and thus affect the score calculation. But seldom does the url anyway (in my searches on content)!
Re: How to make the url id case insensitive?
On Mon, Sep 5, 2011 at 1:22 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, URI paths are case-sensitive. If you really want to treat all URL's as case- insensitive i would suggest to modifiy the basic URL normalizer to lowercase all URL's so that it also ends up lowercased in the CrawlDB. What is your problem? I would strongly suggest another solution if you're doing wide web crawls. I don't want duplicate results where the only real difference is the case of some letters in the URL. What other solution? Cheers, Hi, I've just noticed that two search results of indexed data have the same url: http://www.atory.com/dupe_checker_pro/ http://www.atory.com/dupe_checker_PRO/ I thought the url/id was case-insentively unique. Is there how I can set it up to be so? For Solr it makes sense not to make it the default for disparate uses, but for nutch not. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How to make the url id case insensitive?
Hi, I've just noticed that two search results of indexed data have the same url: http://www.atory.com/dupe_checker_pro/ http://www.atory.com/dupe_checker_PRO/ I thought the url/id was case-insentively unique. Is there how I can set it up to be so? For Solr it makes sense not to make it the default for disparate uses, but for nutch not. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to get all the terms in a document as Luke does?
The Term Vector Component (TVC) is a SearchComponent designed to return information about documents that is stored when setting the termVector attribute on a field: Will I have to re-index after adding that to the schema? On Tue, Aug 30, 2011 at 11:06 PM, Jayendra Patil jayendra.patil@gmail.com wrote: you might want to check - http://wiki.apache.org/solr/TermVectorComponent Should provide you with the term vectors with a lot of additional info. Regards, Jayendra On Tue, Aug 30, 2011 at 3:34 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, This time I'm trying to duplicate Luke's functionality of knowing which terms occur in a search result/document (w/o parsing it again). Any Solrj API to do that? P.S. I've also posted the question on SOhttp://stackoverflow.com/q/7219111/300248 . On Wed, Jul 6, 2011 at 11:09 AM, Gabriele Kahlout gabri...@mysimpatico.comwrote: From you patch I see TermFreqVector which provides the information I want. I also found FieldInvertState.getLength() which seems to be exactly what I want. I'm after the word count (sum of tf for every term in the doc). I'm just not sure whether FieldInvertState.getLength() returns just the number of terms (not multiplied by the frequency of each term - word count) or not though. It seems as if it returns word count, but I've not tested it sufficienctly. On Wed, Jul 6, 2011 at 1:39 AM, Trey Grainger the.apache.t...@gmail.comwrote: Gabriele, I created a patch that does this about a year ago. See https://issues.apache.org/jira/browse/SOLR-1837. It was written for Solr 1.4 and is based upon the Document Reconstructor in Luke. The patch adds a link to the main solr admin page to a docinspector page which will reconstruct the document given a uniqueid (required). Keep in mind that you're only looking at what's in the index for non-stored fields, not the original text. If you have any issues using this on the most recent release, let me know and I'd be happy to create a new patch for solr 3.3. One of these days I'll remove the JSP dependency and this may eventually making it into trunk. Thanks, -Trey Grainger Search Technology Development Team Lead, Careerbuilder.com Site Architect, Celiaccess.com On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Hello, With an inverted index the term is the key, and the documents are the values. Is it still however possible that given a document id I get the terms indexed for that document? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y
How to get all the terms in a document as Luke does?
Hello, This time I'm trying to duplicate Luke's functionality of knowing which terms occur in a search result/document (w/o parsing it again). Any Solrj API to do that? P.S. I've also posted the question on SOhttp://stackoverflow.com/q/7219111/300248 . On Wed, Jul 6, 2011 at 11:09 AM, Gabriele Kahlout gabri...@mysimpatico.comwrote: From you patch I see TermFreqVector which provides the information I want. I also found FieldInvertState.getLength() which seems to be exactly what I want. I'm after the word count (sum of tf for every term in the doc). I'm just not sure whether FieldInvertState.getLength() returns just the number of terms (not multiplied by the frequency of each term - word count) or not though. It seems as if it returns word count, but I've not tested it sufficienctly. On Wed, Jul 6, 2011 at 1:39 AM, Trey Grainger the.apache.t...@gmail.comwrote: Gabriele, I created a patch that does this about a year ago. See https://issues.apache.org/jira/browse/SOLR-1837. It was written for Solr 1.4 and is based upon the Document Reconstructor in Luke. The patch adds a link to the main solr admin page to a docinspector page which will reconstruct the document given a uniqueid (required). Keep in mind that you're only looking at what's in the index for non-stored fields, not the original text. If you have any issues using this on the most recent release, let me know and I'd be happy to create a new patch for solr 3.3. One of these days I'll remove the JSP dependency and this may eventually making it into trunk. Thanks, -Trey Grainger Search Technology Development Team Lead, Careerbuilder.com Site Architect, Celiaccess.com On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Hello, With an inverted index the term is the key, and the documents are the values. Is it still however possible that given a document id I get the terms indexed for that document? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Why are not query keywords treated as a set?
Part of the query is 'injected' by my application while unaware of the user query. Would I know that 'paste past' end up together as query 'past past' I would not inject anything as it distorts the score calculation. I could inject after it, but it is not easy. So, trying to solve it right into the RequestHandler I've difficulties with queries that contain phrases () or the 'must be present' + operator. For example I'd not want to touch a user query: +zusammen essen +alein essen where 'essen' is the duplicate term. My 'good enough solution' is thus to not remove the duplicate in clauses prefixed by + or . C := set of clauses in which duplicated term t occurs. for each clause c in C: do if(!c.toString().startsWith() !c.toString().startsWith(+) |C| 1){ C.remove(c); } end What do you think? Better solutions or algorithms to make sure the same term occurs only once in a query, or at least it's weighted once only in the score calculation? On Mon, Jun 20, 2011 at 11:15 AM, Markus Jelsma markus.jel...@openindex.iowrote: That only removed tokens on the same position, as the wiki explains. Gabrielle, why would you expect that? You input two tokens so you query for two tokens, why would it be a `set` ? this might help in your analysis chain http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDupl icatesTokenFilterFactory On 20 June 2011 04:21, Gabriele Kahlout gabri...@mysimpatico.com wrote: str name=rawquerystringpast past/str str name=querystring*past past*/str str name=parsedquery*content:past content:past*/str I was expecting the query to get parsed into content:past only and not content:past content:past. On Mon, Jun 20, 2011 at 12:12 AM, lee carroll lee.a.carr...@googlemail.comwrote: do you mean a phrase query? past past can you give some more detail? On 18 June 2011 13:02, Gabriele Kahlout gabri...@mysimpatico.com wrote: q=past past 1.0 = (MATCH) sum of: * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* 1.0 = tf(termFreq(content:past)=1) 1.0 = idf(docFreq=1, maxDocs=2) 0.5 = fieldNorm(field=content, doc=0) * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* 1.0 = tf(termFreq(content:past)=1) 1.0 = idf(docFreq=1, maxDocs=2) 0.5 = fieldNorm(field=content, doc=0) Is there how I can treat the query keywords as a set? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to add TrieIntField to a SolrInputDocument?
this works: doc.remove(wc); SolrInputField wcField = new SolrInputField(wc); wcField.setValue(150, 1.0f); doc.put(wc,wcField); On Wed, Jul 13, 2011 at 4:19 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: SolrInputDocument doc = new SolrInputDocument(); doc.setField(id, 0); doc.setField(url, getURL(0)); doc.setField(content, blah blah blah); *doc.setField(wc, 150); //wc is of solr.TrieIntField field type in schema.xml* assertU(adoc(doc)); assertU(commit()); assertNumFound(1); The above test fails until I change the following in schema.xml: - fieldType name=int class=solr.*TrieIntField* omitNorms=true/ + fieldType name=int class=solr.*IntField* omitNorms=true/ On Sun, Jul 10, 2011 at 10:36 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: This was my problem: fieldType name=int class=solr.TrieIntField omitNorms=true/ I had taken my queu from Nutch's schema: fieldType name=long class=solr.LongField omitNorms=true/ On Sat, Jul 9, 2011 at 4:55 PM, Yonik Seeley yo...@lucidimagination.comwrote: Something is wrong with your indexing. Is wc an indexed field? If not, change it so it is, then re-index your data. If so, I'd recommend starting with the example data and filter for something like popularity:[6 TO 10] to convince yourself it works, then figuring out what you did differently in your schema/data. -Yonik http://www.lucidimagination.com On Sat, Jul 9, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: http://localhost:8080/solr/select?indent=onversion=2.2q=*%3A** fq=wc%3A%5B255+TO+257%5D* start=0rows=10fl=*%2Cscoreqt=wt=xmlexplainOther=hl.fl= The toString of the request: {explainOther=fl=*,scoreindent=onstart=0q=*:*hl.fl=qt=wt=xmlfq=wc:[255+TO+257]rows=1version=2.2} Even when the FilterQuery is constructed in Java it doesn't work (i get results that ignore the filter query completely). On Sat, Jul 9, 2011 at 3:40 PM, Ahmet Arslan iori...@yahoo.com wrote: I don't get it to work! If I specify no fq I get the first result with int name=wc256/int With wc:[255 TO 257] (fq=wc%3A%5B255+TO+257%5D) nothing comes out. If you give us the Full URL you are using, it can be helpful. Correct syntax is fq=wc:[255 TO 257] You can use more that fq in a request. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Why cannot I open a read-only IndexReader from TestHarness.getIndexDir() ?
IndexReader getReader() throws CorruptIndexException, IOException { return IndexReader.open(FSDirectory.open(new File(h.getCore().getIndexDir())), true); } *org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@/private/var/folders/54/54wUdohaH8eR-mvbJL0l2k+++TI/-Tmp-/solrtest-SolrTestCaseJ4-1310631397578/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@62d337d3: files: []* at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:694) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) at org.apache.lucene.index.IndexReader.open(IndexReader.java:428) at org.apache.lucene.index.IndexReader.open(IndexReader.java:288) at com.mysimpatico.me.indexplugins.SolrTest.getReader(SolrTest.java:43) I'm calling it right after a assertU(commit()) and assertQ(req(*:*), getNumFoundXPath(1)) which asserts a document has been indexed. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Why cannot I open a read-only IndexReader from TestHarness.getIndexDir() ?
i don't know about the path, TestHarness chose it (seems like a temporary directory). Does this work for you? assertU(adoc(id, 0, url, getURL(docUID), content, blah blah blah); assertU(commit()); assertNumFound(1); //this is a helper method of mine IndexReader.open(FSDirectory.open(new File(h.getCore().getIndexDir())), true); //for me it fails here. But since the document was added I suspect this is a bug On Thu, Jul 14, 2011 at 10:48 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Jul 14, 2011 at 1:56 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: IndexReader getReader() throws CorruptIndexException, IOException { return IndexReader.open(FSDirectory.open(new File(h.getCore().getIndexDir())), true); } *org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@ /private/var/folders/54/54wUdohaH8eR-mvbJL0l2k+++TI/-Tmp-/solrtest-SolrTestCaseJ4-1310631397578/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@62d337d3: files: []* at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:694) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) at org.apache.lucene.index.IndexReader.open(IndexReader.java:428) at org.apache.lucene.index.IndexReader.open(IndexReader.java:288) at com.mysimpatico.me.indexplugins.SolrTest.getReader(SolrTest.java:43) I'm calling it right after a assertU(commit()) and assertQ(req(*:*), getNumFoundXPath(1)) which asserts a document has been indexed. I'm not sure but the error indicates that the index does not exist. Perhaps the path is wrong? -- Regards, Shalin Shekhar Mangar. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I still search documents once updated?
It indeed is not stored, but this is still unexpected behavior. It's a stored and indexed field, why has the index data been lost? On Wed, Jul 13, 2011 at 12:44 AM, Erick Erickson erickerick...@gmail.comwrote: Unless you stored your content field, the value you put in there won't be fetched from the index. Verify that the doc you retrieve from the index has values for content, I bet it doesn't Best Erick On Tue, Jul 12, 2011 at 9:38 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdateLoseTermsSimplified() throws Exception { *IndexWriter writer = indexDoc();* assertEquals(1, writer.numDocs()); IndexSearcher searcher = getSearcher(writer); final TermQuery termQuery = new TermQuery(new Term(content, essen)); TopDocs docs = searcher.search(termQuery, 1); assertEquals(1, docs.totalHits); Document doc = searcher.doc(0); *writer.updateDocument(new Term(id,doc.get(id)),doc);* searcher = getSearcher(writer); *docs = searcher.search(termQuery, 1);* *assertEquals(1, docs.totalHits);*//docs.totalHits == 0 ! } testUpdateLosesTerms(com.mysimpatico.me.indexplugins.WcTest) Time elapsed: 0.346 sec FAILURE! java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.junit.Assert.assertEquals(Assert.java:454) at com.mysimpatico.me.indexplugins.WcTest.testUpdateLosesTerms(WcTest.java:271) I have not changed anything (as you can see) during the update. I just retrieve a document and the update it. But then the termQuery that worked before doesn't work anymore (while the id field wasn't changed). Is this to be expected when content field is not stored? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I still search documents once updated?
On Wed, Jul 13, 2011 at 1:57 PM, Erick Erickson erickerick...@gmail.comwrote: Wait, you directly contradicted yourself G You say it's not stored, then you say it's stored and indexed, which is it? ja, i meant indexed and not stored. When you fetch a document, only stored fields are returned and the returned data is the verbatim copy of the original data. No attempt is made to return un-stored fields. This has been the behavior allways. If you attempted to returned indexed but not stored data, you'd get stemmed versions, stop words would be removed, synonyms would be in place etc. Not to mention it would be very slow. this is what i was expecting. Otherwise updating a field of a document that has an unstored but indexed field is impossible (without losing the unstored but indexed field. I call this updating a field of a document AND deleting/updating all its unstored but indexed fields). If the field is stored, then there's another problem, you might want to dump the document after reading it from the IR. Best Erick On Wed, Jul 13, 2011 at 2:25 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: It indeed is not stored, but this is still unexpected behavior. It's a stored and indexed field, why has the index data been lost? On Wed, Jul 13, 2011 at 12:44 AM, Erick Erickson erickerick...@gmail.comwrote: Unless you stored your content field, the value you put in there won't be fetched from the index. Verify that the doc you retrieve from the index has values for content, I bet it doesn't Best Erick On Tue, Jul 12, 2011 at 9:38 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdateLoseTermsSimplified() throws Exception { *IndexWriter writer = indexDoc();* assertEquals(1, writer.numDocs()); IndexSearcher searcher = getSearcher(writer); final TermQuery termQuery = new TermQuery(new Term(content, essen)); TopDocs docs = searcher.search(termQuery, 1); assertEquals(1, docs.totalHits); Document doc = searcher.doc(0); *writer.updateDocument(new Term(id,doc.get(id)),doc);* searcher = getSearcher(writer); *docs = searcher.search(termQuery, 1);* *assertEquals(1, docs.totalHits);*//docs.totalHits == 0 ! } testUpdateLosesTerms(com.mysimpatico.me.indexplugins.WcTest) Time elapsed: 0.346 sec FAILURE! java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.junit.Assert.assertEquals(Assert.java:454) at com.mysimpatico.me.indexplugins.WcTest.testUpdateLosesTerms(WcTest.java:271) I have not changed anything (as you can see) during the update. I just retrieve a document and the update it. But then the termQuery that worked before doesn't work anymore (while the id field wasn't changed). Is this to be expected when content field is not stored? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I still search documents once updated?
Well, I'm !sure how usual this scenario would be: 1. In general those using solr with nutch don't store the content field to avoid storing the whole web/intranet in their index, twice (1 in the form of stored data, and one in the form of indexed data). Now everytime they need to update a field unrelated to content (number of inbound links for an example) they would have to re-crawl the page again. This is at least !intuitive. On Wed, Jul 13, 2011 at 2:40 PM, Michael Kuhlmann s...@kuli.org wrote: Am 13.07.2011 14:05, schrieb Gabriele Kahlout: this is what i was expecting. Otherwise updating a field of a document that has an unstored but indexed field is impossible (without losing the unstored but indexed field. I call this updating a field of a document AND deleting/updating all its unstored but indexed fields). Not necessarily. The usual use case is that you have some kind of existing data source from where you fill your Solr index. When you want to update field of a document, then you simply re-index from that source. There's no need to fetch data from Solr before. Otherwise, if you really don't have such an existing data source because a horde of typewriting monkeys filled your Solr index, then you should better declare all your fields as stored. Otherwise you'll never have a chance to get that data back. Greeting, Kuli -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I still search documents once updated?
On Wed, Jul 13, 2011 at 3:54 PM, Michael Kuhlmann s...@kuli.org wrote: Am 13.07.2011 15:37, schrieb Gabriele Kahlout: Well, I'm !sure how usual this scenario would be: 1. In general those using solr with nutch don't store the content field to avoid storing the whole web/intranet in their index, twice (1 in the form of stored data, and one in the form of indexed data). Not exactly. The indexed form is quite different from the stored form; only the tokens are stored, each token only once, and some additional data like the document count and, maybe, shingle information etc.. Hence, indexed data usually needs much less space on disk than the original data. I realized that. Maybe I should have said 1.X (1 in the form of stored data and 0.X in the form of indexed data). There's no practical alternative to storing the content in a stored field. What would you otherwise display as a search result? The following web pages have your search term somewhere in their contents, don't know where, take a look on your own? Display the title, and url (and implicitly say The following web pages have your search term somewhere in their contents, don't REMEMBER where, take a look on your own?). Solr is already configured by default not to store more than a maxFieldLength anyway. Usually one stores content only to display snippets. Greetings, Kuli -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How to add TrieIntField to a SolrInputDocument?
SolrInputDocument doc = new SolrInputDocument(); doc.setField(id, 0); doc.setField(url, getURL(0)); doc.setField(content, blah blah blah); *doc.setField(wc, 150); //wc is of solr.TrieIntField field type in schema.xml* assertU(adoc(doc)); assertU(commit()); assertNumFound(1); The above test fails until I change the following in schema.xml: - fieldType name=int class=solr.*TrieIntField* omitNorms=true/ + fieldType name=int class=solr.*IntField* omitNorms=true/ On Sun, Jul 10, 2011 at 10:36 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: This was my problem: fieldType name=int class=solr.TrieIntField omitNorms=true/ I had taken my queu from Nutch's schema: fieldType name=long class=solr.LongField omitNorms=true/ On Sat, Jul 9, 2011 at 4:55 PM, Yonik Seeley yo...@lucidimagination.comwrote: Something is wrong with your indexing. Is wc an indexed field? If not, change it so it is, then re-index your data. If so, I'd recommend starting with the example data and filter for something like popularity:[6 TO 10] to convince yourself it works, then figuring out what you did differently in your schema/data. -Yonik http://www.lucidimagination.com On Sat, Jul 9, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: http://localhost:8080/solr/select?indent=onversion=2.2q=*%3A** fq=wc%3A%5B255+TO+257%5D* start=0rows=10fl=*%2Cscoreqt=wt=xmlexplainOther=hl.fl= The toString of the request: {explainOther=fl=*,scoreindent=onstart=0q=*:*hl.fl=qt=wt=xmlfq=wc:[255+TO+257]rows=1version=2.2} Even when the FilterQuery is constructed in Java it doesn't work (i get results that ignore the filter query completely). On Sat, Jul 9, 2011 at 3:40 PM, Ahmet Arslan iori...@yahoo.com wrote: I don't get it to work! If I specify no fq I get the first result with int name=wc256/int With wc:[255 TO 257] (fq=wc%3A%5B255+TO+257%5D) nothing comes out. If you give us the Full URL you are using, it can be helpful. Correct syntax is fq=wc:[255 TO 257] You can use more that fq in a request. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to create a solr core if no solr cores were created before?
if you need the core just for testing then use Solr test framework as in the link. On Tue, Jul 12, 2011 at 10:29 AM, Mark Schoy hei...@gmx.de wrote: Thanks for your answer, but your answer is a little bit useless for me. Could you please add more information in addition to this link? Do I have to create a root core to create other cores? How can I create a root core? Manually adding in the solr.xml config? Should all be answered here See http://wiki.apache.org/solr/SolrTomcat for multiple cores use solr.xml: ?xml version=1.0 encoding=UTF-8? solr persistent=true sharedLib=lib cores adminPath=/admin/cores defaultCoreName=live shareSchema=true core name=live instanceDir=. dataDir=live / core name=test instanceDir=. dataDir=test / /cores /solr 2011/7/11 Gabriele Kahlout gabri...@mysimpatico.com: have a look here [1]. [1] https://issues.apache.org/jira/browse/SOLR-2645?focusedCommentId=13062748page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13062748 On Mon, Jul 11, 2011 at 4:46 PM, Mark Schoy hei...@gmx.de wrote: Hi, I tried to create a solr core but I always get No such solr core:-Exception. - File home = new File( pathToSolrHome ); File f = new File( home, solr.xml ); CoreContainer coreContainer = new CoreContainer(); coreContainer.load( pathToSolrHome, f ); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); CoreAdminRequest.createCore(coreName, coreDir, server); - I think the problem is the in new EmbeddedSolrServer(coreContainer, ); Thanks. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How to get doc # to use in reader.norms(content)[doc]?
Hello, I'm trying to get the norm of an indexed document for a given field but beside reader.norms(fieldName) I'm not finding any API to retrieve it. Now reader.norms(..) returns an array with the norms for that field of all indexed documents. How do I know the index of my document in there? TermQuery.explain(){ ... byte[] fieldNorms = reader.norms(field); float fieldNorm = fieldNorms!=null ? similarity.decodeNormValue(fieldNorms[doc]) : 1.0f; fieldNormExpl.setValue(fieldNorm); ... In here doc is DocSlice docs = (DocSlice) values.get(response); for (DocIterator it = docs.iterator(); it.hasNext();) { final int docId = it.nextDoc(); but what about when I don't have a SolrQueryResponse ? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How do I specify a different analyzer at search-time?
With a lucene QueryParser instance it's possible to set the analyzer in use. I suspect Solr doesn't use the same analyzer it used at indexing, defined in schema.xml but I cannot verify that without the queryparser instance. From Jan's diagram it seems this is set in the SearchHandler's init. Is it? How? On Sun, Apr 10, 2011 at 11:05 AM, Jan Høydahl jan@cominvent.com wrote: Looks really good, but two bits that i think might confuse people are the implications that a Query Parser then invokes a series of search components; and that analysis (and the pieces of an analyzer chain) are what to lookups in the underlying lucene index. the first might just be the ambiguity of Query .. using the term request parser might make more sense, in comparison to the update parsing from the other side of hte diagram. Thanks for commenting. Yea, the purpose is more to show a conceptual rather than actual relation between the different components, focusing on the flow. A 100% technical correct diagram would be too complex for beginners to comprehend, although it could certainly be useful for developers. I've removed the arrow between QueryParser and search components to clarify. The boxes first and foremost show that query parsing and response writers are within the realm of search request handler. the analysis piece is a little harder to fix cleanly. you really want the end of the analysis chain to feed back up to the searh components, and then show it (most of hte search components really) talking to the Lucene index. Yea, I know. Showing how Faceting communicate with the main index and spellchecker with its spellchecker index could also be useful, but I think that would be for another more detailed diagram. I felt it was more important for beginners to realize visually that analysis happens both at index and search time, and that the analyzers align 1:1. At this stage in the digram I often explain the importance of matching up the analysis on both sides to get a match in the index. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to create a solr core if no solr cores were created before?
have a look here [1]. [1] https://issues.apache.org/jira/browse/SOLR-2645?focusedCommentId=13062748page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13062748 On Mon, Jul 11, 2011 at 4:46 PM, Mark Schoy hei...@gmx.de wrote: Hi, I tried to create a solr core but I always get No such solr core:-Exception. - File home = new File( pathToSolrHome ); File f = new File( home, solr.xml ); CoreContainer coreContainer = new CoreContainer(); coreContainer.load( pathToSolrHome, f ); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ); CoreAdminRequest.createCore(coreName, coreDir, server); - I think the problem is the in new EmbeddedSolrServer(coreContainer, ); Thanks. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Can I write to the index from within RequestHandler.handleRequestBody(..)?
Hello, IndexWriter writer = new IndexWriter(FSDirectory.open(new File(req.getCore().getDataDir(), index)), req.getSchema().getAnalyzer(), IndexWriter.MaxFieldLength.LIMITED); updateSolrIndex(writer); But this is what I get (I know that RequestHandler are not intended to write updates). HTTP Status 500 - null java.nio.channels.OverlappingFileLockException at sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1166) at sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1068) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:868) at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:216) at org.apache.lucene.store.Lock.obtain(Lock.java:72) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:955) at com.mysimpatico.me.indexplugins.MRequestHandler.handleRequestBody(MRequestHandler.java:97) at -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I write to the index from within RequestHandler.handleRequestBody(..)?
On Sun, Jul 10, 2011 at 6:21 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: There are such RequestHandlers. Look at CSVRequestHandler, for example. IndexWriter writer = new IndexWriter(FSDirectory.open(**new File(req.getCore().getDataDir(**), index)), req.getSchema().getAnalyzer(), IndexWriter.MaxFieldLength.**LIMITED); updateSolrIndex(writer); Don't use your own writer for same index. Use UpdateRequestProcessor.**processAdd() instead. What you seem to be suggesting is: UpdateRequestProcessorChain processorChain = req.getCore().getUpdateProcessingChain(WcUpdate); UpdateRequestProcessor processor = processorChain.createProcessor(req, rsp); try{ RequestHandlerUtils.handleCommit(processor, params, false); RequestHandlerUtils.handleRollback(processor, params, false); }finally{ processor.finish(); } But this is not what I want. I want an IndexWriter instance from which I can get a reader, the analyzer in use, and the similarity class. If I shall crease a new IndexWriter for the same index, can I re-use the current one directly, without the UpdateRequestProcessor interface? koji -- http://www.rondhuit.com/en/ -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: What's the fq= syntax for NumericRangeFilter?
This was my problem: fieldType name=int class=solr.TrieIntField omitNorms=true/ I had taken my queu from Nutch's schema: fieldType name=long class=solr.LongField omitNorms=true/ On Sat, Jul 9, 2011 at 4:55 PM, Yonik Seeley yo...@lucidimagination.comwrote: Something is wrong with your indexing. Is wc an indexed field? If not, change it so it is, then re-index your data. If so, I'd recommend starting with the example data and filter for something like popularity:[6 TO 10] to convince yourself it works, then figuring out what you did differently in your schema/data. -Yonik http://www.lucidimagination.com On Sat, Jul 9, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: http://localhost:8080/solr/select?indent=onversion=2.2q=*%3A** fq=wc%3A%5B255+TO+257%5D* start=0rows=10fl=*%2Cscoreqt=wt=xmlexplainOther=hl.fl= The toString of the request: {explainOther=fl=*,scoreindent=onstart=0q=*:*hl.fl=qt=wt=xmlfq=wc:[255+TO+257]rows=1version=2.2} Even when the FilterQuery is constructed in Java it doesn't work (i get results that ignore the filter query completely). On Sat, Jul 9, 2011 at 3:40 PM, Ahmet Arslan iori...@yahoo.com wrote: I don't get it to work! If I specify no fq I get the first result with int name=wc256/int With wc:[255 TO 257] (fq=wc%3A%5B255+TO+257%5D) nothing comes out. If you give us the Full URL you are using, it can be helpful. Correct syntax is fq=wc:[255 TO 257] You can use more that fq in a request. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Can I delete the stored value?
I've stored the contents of some pages I no longer need. How can I now delete the stored content without re-crawling the pages (i.e. using updateDocument ). I cannot just remove the field, since I still want the field to be indexed, I just don't want to store something with it. My understanding is that field.setValue() won't do since that should affect the indexed value as well. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
What's the fq= syntax for NumericRangeFilter?
I'm trying to filter a query by the value of a numeric field. I can do it in Java as follows, but I don't know how to do it with the query syntax, and I found no documentation of it. @Test public void testFqWc() throws Exception { IndexSearcher searcher = wc(); *Filter wc3 = NumericRangeFilter.newIntRange(wc, 3, 3, true, true);* final MatchAllDocsQuery allQ = new MatchAllDocsQuery(); TopDocs allDocs = searcher.search(allQ, 10); assertEquals(1, allDocs.totalHits); int wc = Integer.parseInt(searcher.doc(allDocs.scoreDocs[0].doc).get(this.wc)); assertEquals(3,wc); TopDocs docs = searcher.search(allQ, wc3, 10); assertEquals(allDocs.totalHits, docs.totalHits); } On Sun, Jun 19, 2011 at 12:43 PM, Ahmet Arslan iori...@yahoo.com wrote: Beside creating an index with just the site in question, is it possible like with Google to search for results only in a given domain? If you have an appropriate field that is indexed, yes. fq=site:foo.com http://wiki.apache.org/solr/CommonQueryParameters#fq -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: What's the fq= syntax for NumericRangeFilter?
I don't get it to work! If I specify no fq I get the first result with int name=wc256/int With wc:[255 TO 257] (fq=wc%3A%5B255+TO+257%5D) nothing comes out. On Sat, Jul 9, 2011 at 12:29 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hu? It's describe in the link Ahmet's given you. I'm trying to filter a query by the value of a numeric field. I can do it in Java as follows, but I don't know how to do it with the query syntax, and I found no documentation of it. @Test public void testFqWc() throws Exception { IndexSearcher searcher = wc(); *Filter wc3 = NumericRangeFilter.newIntRange(wc, 3, 3, true, true);* final MatchAllDocsQuery allQ = new MatchAllDocsQuery(); TopDocs allDocs = searcher.search(allQ, 10); assertEquals(1, allDocs.totalHits); int wc = Integer.parseInt(searcher.doc(allDocs.scoreDocs[0].doc).get(this.wc)); assertEquals(3,wc); TopDocs docs = searcher.search(allQ, wc3, 10); assertEquals(allDocs.totalHits, docs.totalHits); } On Sun, Jun 19, 2011 at 12:43 PM, Ahmet Arslan iori...@yahoo.com wrote: Beside creating an index with just the site in question, is it possible like with Google to search for results only in a given domain? If you have an appropriate field that is indexed, yes. fq=site:foo.com http://wiki.apache.org/solr/CommonQueryParameters#fq -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: What's the fq= syntax for NumericRangeFilter?
http://localhost:8080/solr/select?indent=onversion=2.2q=*%3A** fq=wc%3A%5B255+TO+257%5D* start=0rows=10fl=*%2Cscoreqt=wt=xmlexplainOther=hl.fl= The toString of the request: {explainOther=fl=*,scoreindent=onstart=0q=*:*hl.fl=qt=wt=xmlfq=wc:[255+TO+257]rows=1version=2.2} Even when the FilterQuery is constructed in Java it doesn't work (i get results that ignore the filter query completely). On Sat, Jul 9, 2011 at 3:40 PM, Ahmet Arslan iori...@yahoo.com wrote: I don't get it to work! If I specify no fq I get the first result with int name=wc256/int With wc:[255 TO 257] (fq=wc%3A%5B255+TO+257%5D) nothing comes out. If you give us the Full URL you are using, it can be helpful. Correct syntax is fq=wc:[255 TO 257] You can use more that fq in a request. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How do I add a custom field?
so, how about this: Document doc = searcher.doc(i); // i get the doc doc.removeField(wc); // remove the field in case there's addWc(doc, docLength); //add the new field writer.updateDocument(new Term(id, Integer.toString(i++)), doc); //update the doc For some reason it doesn't get added to the index. Should it? On 7/3/11, Michael Sokolov soko...@ifactory.com wrote: You'll need to index the field. I would think you would want to index/store the field along with the associated document, in which case you'll have to reindex the documents as well - there's no single-field update capability in Lucene (yet?). -Mike On 7/3/2011 1:09 PM, Gabriele Kahlout wrote: Is there how I can compute and add the field to all indexed documents without re-indexing? MyField counts the number of terms per document (unique word count). On Sun, Jul 3, 2011 at 12:24 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Gabriele, Did you index any docs with your new field ? The results will just bring back docs and what fields they have. They won't bring back null fields just because they are in your schema. Lucene is schema-less. Solr adds the schema to make it nice to administer and very powerful to use. On 3 July 2011 11:01, Gabriele Kahloutgabri...@mysimpatico.com wrote: Hello, I want to have an additional field that appears for every document in search results. I understand that I should do this by adding the field to the schema.xml, so I add: field name=myField default=0 type=integer stored=true indexed=false/ Then I restart Solr (so that I loads the new schema.xml) and make a query specifying that it should return myField too, but it doesn't. Will it do only for newly indexed documents? Am I missing something? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I invert the inverted index?
From you patch I see TermFreqVector which provides the information I want. I also found FieldInvertState.getLength() which seems to be exactly what I want. I'm after the word count (sum of tf for every term in the doc). I'm just not sure whether FieldInvertState.getLength() returns just the number of terms (not multiplied by the frequency of each term - word count) or not though. It seems as if it returns word count, but I've not tested it sufficienctly. On Wed, Jul 6, 2011 at 1:39 AM, Trey Grainger the.apache.t...@gmail.comwrote: Gabriele, I created a patch that does this about a year ago. See https://issues.apache.org/jira/browse/SOLR-1837. It was written for Solr 1.4 and is based upon the Document Reconstructor in Luke. The patch adds a link to the main solr admin page to a docinspector page which will reconstruct the document given a uniqueid (required). Keep in mind that you're only looking at what's in the index for non-stored fields, not the original text. If you have any issues using this on the most recent release, let me know and I'd be happy to create a new patch for solr 3.3. One of these days I'll remove the JSP dependency and this may eventually making it into trunk. Thanks, -Trey Grainger Search Technology Development Team Lead, Careerbuilder.com Site Architect, Celiaccess.com On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Hello, With an inverted index the term is the key, and the documents are the values. Is it still however possible that given a document id I get the terms indexed for that document? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?
nice...where? I'm trying to figure out 2 things: 1) How to create an analyzer that corresponds to the one in the schema.xml. analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer 2) I'd like to see the code that creates it reading it from schema.xml . On Tue, Jul 5, 2011 at 12:33 PM, Markus Jelsma markus.jel...@openindex.iowrote: No. SolrJ only builds input docs from NutchDocument objects. Solr will do analysis. The integration is analogous to XML post of Solr documents. On Tuesday 05 July 2011 12:28:21 Gabriele Kahlout wrote: Hello, I'm trying to understand better Nutch and Solr integration. My understanding is that Documents are added to Solr index from SolrWriter's write(NutchDocument doc) method. But does it make any use of the WhitespaceTokenizerFactory? -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?
I suspect the following should do (1). I'm just not sure about file references as in stopInit.put(words, stopwords.txt) . (2) should clarify. 1) class SchemaAnalyzer extends Analyzer{ @Override public TokenStream tokenStream(String fieldName, Reader reader) { HashMapString, String stopInit = new HashMapString,String(); stopInit.put(words, stopwords.txt); stopInit.put(ignoreCase, Boolean.TRUE.toString()); StopFilterFactory stopFilterFactory = new StopFilterFactory(); stopFilterFactory.init(stopInit); final HashMapString, String wordDelimInit = new HashMapString, String(); wordDelimInit.put(generateWordParts, 1); wordDelimInit.put(generateNumberParts, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateNumbers, 1); wordDelimInit.put(catenateAll, 0); wordDelimInit.put(splitOnCaseChange, 1); WordDelimiterFilterFactory wordDelimiterFilterFactory = new WordDelimiterFilterFactory(); wordDelimiterFilterFactory.init(wordDelimInit); HashMapString, String porterInit = new HashMapString, String(); porterInit.put(protected, protwords.txt); EnglishPorterFilterFactory englishPorterFilterFactory = new EnglishPorterFilterFactory(); englishPorterFilterFactory.init(porterInit); return new RemoveDuplicatesTokenFilter(englishPorterFilterFactory.create(new LowerCaseFilter(wordDelimiterFilterFactory.create(stopFilterFactory.create(new WhitespaceTokenizer(reader)); } } On Tue, Jul 5, 2011 at 1:00 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: nice...where? I'm trying to figure out 2 things: 1) How to create an analyzer that corresponds to the one in the schema.xml. analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ /analyzer 2) I'd like to see the code that creates it reading it from schema.xml . On Tue, Jul 5, 2011 at 12:33 PM, Markus Jelsma markus.jel...@openindex.io wrote: No. SolrJ only builds input docs from NutchDocument objects. Solr will do analysis. The integration is analogous to XML post of Solr documents. On Tuesday 05 July 2011 12:28:21 Gabriele Kahlout wrote: Hello, I'm trying to understand better Nutch and Solr integration. My understanding is that Documents are added to Solr index from SolrWriter's write(NutchDocument doc) method. But does it make any use of the WhitespaceTokenizerFactory? -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?
Not yet an answer to 2) but this is where and how Solr initializes the Analyzer defined in the schema.xml into : //org.apache.solr.schema.IndexSchema // Load the Tokenizer // Although an analyzer only allows a single Tokenizer, we load a list to make sure // the configuration is ok // final ArrayListTokenizerFactory tokenizers = new ArrayListTokenizerFactory(1); AbstractPluginLoaderTokenizerFactory tokenizerLoader = new AbstractPluginLoaderTokenizerFactory( [schema.xml] analyzer/tokenizer, false, false ) { @Override protected void init(TokenizerFactory plugin, Node node) throws Exception { if( !tokenizers.isEmpty() ) { throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, The schema defines multiple tokenizers for: +node ); } final MapString,String params = DOMUtil.toMapExcept(node.getAttributes(),class); // copy the luceneMatchVersion from config, if not set if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM)) params.put(LUCENE_MATCH_VERSION_PARAM, solrConfig.luceneMatchVersion.toString()); plugin.init( params ); tokenizers.add( plugin ); } @Override protected TokenizerFactory register(String name, TokenizerFactory plugin) throws Exception { return null; // used for map registration } }; tokenizerLoader.load( loader, (NodeList)xpath.evaluate(./tokenizer, node, XPathConstants.NODESET) ); // Make sure something was loaded if( tokenizers.isEmpty() ) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,analyzer without class or tokenizer filter list); } // Load the Filters // final ArrayListTokenFilterFactory filters = new ArrayListTokenFilterFactory(); AbstractPluginLoaderTokenFilterFactory filterLoader = new AbstractPluginLoaderTokenFilterFactory( [schema.xml] analyzer/filter, false, false ) { @Override protected void init(TokenFilterFactory plugin, Node node) throws Exception { if( plugin != null ) { final MapString,String params = DOMUtil.toMapExcept(node.getAttributes(),class); // copy the luceneMatchVersion from config, if not set if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM)) params.put(LUCENE_MATCH_VERSION_PARAM, solrConfig.luceneMatchVersion.toString()); plugin.init( params ); filters.add( plugin ); } } @Override protected TokenFilterFactory register(String name, TokenFilterFactory plugin) throws Exception { return null; // used for map registration } }; filterLoader.load( loader, (NodeList)xpath.evaluate(./filter, node, XPathConstants.NODESET) ); return new TokenizerChain(charFilters.toArray(new CharFilterFactory[charFilters.size()]), tokenizers.get(0), filters.toArray(new TokenFilterFactory[filters.size()])); }; On Tue, Jul 5, 2011 at 2:26 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: I suspect the following should do (1). I'm just not sure about file references as in stopInit.put(words, stopwords.txt) . (2) should clarify. 1) class SchemaAnalyzer extends Analyzer{ @Override public TokenStream tokenStream(String fieldName, Reader reader) { HashMapString, String stopInit = new HashMapString,String(); stopInit.put(words, stopwords.txt); stopInit.put(ignoreCase, Boolean.TRUE.toString()); StopFilterFactory stopFilterFactory = new StopFilterFactory(); stopFilterFactory.init(stopInit); final HashMapString, String wordDelimInit = new HashMapString, String(); wordDelimInit.put(generateWordParts, 1); wordDelimInit.put(generateNumberParts, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateNumbers, 1); wordDelimInit.put(catenateAll, 0); wordDelimInit.put(splitOnCaseChange, 1); WordDelimiterFilterFactory wordDelimiterFilterFactory = new WordDelimiterFilterFactory(); wordDelimiterFilterFactory.init(wordDelimInit); HashMapString, String porterInit = new HashMapString, String(); porterInit.put(protected, protwords.txt); EnglishPorterFilterFactory englishPorterFilterFactory = new EnglishPorterFilterFactory(); englishPorterFilterFactory.init(porterInit); return new RemoveDuplicatesTokenFilter(englishPorterFilterFactory.create(new LowerCaseFilter(wordDelimiterFilterFactory.create(stopFilterFactory.create(new WhitespaceTokenizer(reader)); } } On Tue, Jul 5, 2011 at 1:00 PM, Gabriele Kahlout gabri
Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?
the answer to 2) is new IndexSchema(solrConf, schema).getAnalyzer(); On Tue, Jul 5, 2011 at 2:48 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Not yet an answer to 2) but this is where and how Solr initializes the Analyzer defined in the schema.xml into : //org.apache.solr.schema.IndexSchema // Load the Tokenizer // Although an analyzer only allows a single Tokenizer, we load a list to make sure // the configuration is ok // final ArrayListTokenizerFactory tokenizers = new ArrayListTokenizerFactory(1); AbstractPluginLoaderTokenizerFactory tokenizerLoader = new AbstractPluginLoaderTokenizerFactory( [schema.xml] analyzer/tokenizer, false, false ) { @Override protected void init(TokenizerFactory plugin, Node node) throws Exception { if( !tokenizers.isEmpty() ) { throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, The schema defines multiple tokenizers for: +node ); } final MapString,String params = DOMUtil.toMapExcept(node.getAttributes(),class); // copy the luceneMatchVersion from config, if not set if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM)) params.put(LUCENE_MATCH_VERSION_PARAM, solrConfig.luceneMatchVersion.toString()); plugin.init( params ); tokenizers.add( plugin ); } @Override protected TokenizerFactory register(String name, TokenizerFactory plugin) throws Exception { return null; // used for map registration } }; tokenizerLoader.load( loader, (NodeList)xpath.evaluate(./tokenizer, node, XPathConstants.NODESET) ); // Make sure something was loaded if( tokenizers.isEmpty() ) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,analyzer without class or tokenizer filter list); } // Load the Filters // final ArrayListTokenFilterFactory filters = new ArrayListTokenFilterFactory(); AbstractPluginLoaderTokenFilterFactory filterLoader = new AbstractPluginLoaderTokenFilterFactory( [schema.xml] analyzer/filter, false, false ) { @Override protected void init(TokenFilterFactory plugin, Node node) throws Exception { if( plugin != null ) { final MapString,String params = DOMUtil.toMapExcept(node.getAttributes(),class); // copy the luceneMatchVersion from config, if not set if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM)) params.put(LUCENE_MATCH_VERSION_PARAM, solrConfig.luceneMatchVersion.toString()); plugin.init( params ); filters.add( plugin ); } } @Override protected TokenFilterFactory register(String name, TokenFilterFactory plugin) throws Exception { return null; // used for map registration } }; filterLoader.load( loader, (NodeList)xpath.evaluate(./filter, node, XPathConstants.NODESET) ); return new TokenizerChain(charFilters.toArray(new CharFilterFactory[charFilters.size()]), tokenizers.get(0), filters.toArray(new TokenFilterFactory[filters.size()])); }; On Tue, Jul 5, 2011 at 2:26 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: I suspect the following should do (1). I'm just not sure about file references as in stopInit.put(words, stopwords.txt) . (2) should clarify. 1) class SchemaAnalyzer extends Analyzer{ @Override public TokenStream tokenStream(String fieldName, Reader reader) { HashMapString, String stopInit = new HashMapString,String(); stopInit.put(words, stopwords.txt); stopInit.put(ignoreCase, Boolean.TRUE.toString()); StopFilterFactory stopFilterFactory = new StopFilterFactory(); stopFilterFactory.init(stopInit); final HashMapString, String wordDelimInit = new HashMapString, String(); wordDelimInit.put(generateWordParts, 1); wordDelimInit.put(generateNumberParts, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateNumbers, 1); wordDelimInit.put(catenateAll, 0); wordDelimInit.put(splitOnCaseChange, 1); WordDelimiterFilterFactory wordDelimiterFilterFactory = new WordDelimiterFilterFactory(); wordDelimiterFilterFactory.init(wordDelimInit); HashMapString, String porterInit = new HashMapString, String(); porterInit.put(protected, protwords.txt); EnglishPorterFilterFactory englishPorterFilterFactory = new EnglishPorterFilterFactory(); englishPorterFilterFactory.init(porterInit); return new
Cannot I search documents added by IndexWriter after commit?
@Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); *assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); *assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory=org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Cannot I search documents added by IndexWriter after commit?
and how do you do that? There is no reopen method On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless luc...@mikemccandless.com wrote: After your writer.commit you need to reopen your searcher to see the changes. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); *assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); *assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Cannot I search documents added by IndexWriter after commit?
Still won't work (same as before). @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); final IndexReader indexReader = IndexReader.open(writer, true); IndexSearcher searcher = new IndexSearcher(indexReader); TopDocs docs = searcher.search(allQ, 10); assertEquals(0, docs.totalHits); // empty/no index Document doc = getDoc(); writer.addDocument(doc); writer.commit(); *indexReader.reopen(); searcher = new IndexSearcher(indexReader); docs = searcher.search(allQ, 10);* assertEquals(1,docs.totalHits); } private Document getDoc() { Document doc = new Document(); doc.add(new Field(id, 0, Field.Store.YES, Field.Index.NOT_ANALYZED)); return doc; } private IndexWriter getWriter() throws IOException {// 2 return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2 IndexWriter.MaxFieldLength.UNLIMITED); // 2 } On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless luc...@mikemccandless.com wrote: Sorry, you must reopen the underlying IndexReader, and then make a new IndexSearcher from the reopened reader. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: and how do you do that? There is no reopen method On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless luc...@mikemccandless.com wrote: After your writer.commit you need to reopen your searcher to see the changes. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); *assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); *assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Cannot I search documents added by IndexWriter after commit?
Re-open doens't work, but open does. @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); final IndexReader indexReader = IndexReader.open(writer, true); IndexSearcher searcher = new IndexSearcher(indexReader); TopDocs docs = searcher.search(allQ, 10); assertEquals(0, docs.totalHits); // empty/no index Document doc = getDoc(); writer.addDocument(doc); writer.commit(); searcher = new IndexSearcher(IndexReader.open(writer, true));//new IndexSearcher(directory); docs = searcher.search(allQ, 10); assertEquals(1, docs.totalHits); } On Tue, Jul 5, 2011 at 8:23 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Still won't work (same as before). @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); final IndexReader indexReader = IndexReader.open(writer, true); IndexSearcher searcher = new IndexSearcher(indexReader); TopDocs docs = searcher.search(allQ, 10); assertEquals(0, docs.totalHits); // empty/no index Document doc = getDoc(); writer.addDocument(doc); writer.commit(); *indexReader.reopen(); searcher = new IndexSearcher(indexReader); docs = searcher.search(allQ, 10); * assertEquals(1,docs.totalHits); } private Document getDoc() { Document doc = new Document(); doc.add(new Field(id, 0, Field.Store.YES, Field.Index.NOT_ANALYZED)); return doc; } private IndexWriter getWriter() throws IOException {// 2 return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2 IndexWriter.MaxFieldLength.UNLIMITED); // 2 } On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless luc...@mikemccandless.com wrote: Sorry, you must reopen the underlying IndexReader, and then make a new IndexSearcher from the reopened reader. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: and how do you do that? There is no reopen method On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless luc...@mikemccandless.com wrote: After your writer.commit you need to reopen your searcher to see the changes. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); *assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); *assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x
Can I invert the inverted index?
Hello, With an inverted index the term is the key, and the documents are the values. Is it still however possible that given a document id I get the terms indexed for that document? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I invert the inverted index?
I had looked an term vectors but don't understand them to solve my problem. Consider the following index entries: t0, doc0, doc1 t1, doc0 From the 2nd entry we know that t1 is only present in doc0. Now, my problem, given doc0 how can I know which terms occur in in (t0 and t1) (without storing the content)? One way is go over all terms in the index using the term dictionary. On Tue, Jul 5, 2011 at 10:14 PM, lboutros boutr...@gmail.com wrote: Hi Gabriele, I'm not sure to understand your problem, but the TermVectorComponent may fit your needs ? http://wiki.apache.org/solr/TermVectorComponent http://wiki.apache.org/solr/TermVectorComponentExampleEnabled Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-invert-the-inverted-index-tp3142206p3142269.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How do I compute and store a field?
Gee, I was about to post. I figured my issue is that of computing the unique terms per document. One approach to compute that value is running the analyzer on the document before before calling addDocument, and count the number of tokens. Then I can invoke addDocument with the value of the field computed. The only issue is that I'm here making the assumption that if I use the same Analyzer addDocument used in addDocument then that will always equal the number of terms indexed for that document. Is that a right assumption? Any alternative where I don't need to make this assumption? On Tue, Jul 5, 2011 at 1:29 AM, Markus Jelsma markus.jel...@openindex.iowrote: You can create a custom update processor. The passed AddUpdateCommand object has an accessor to the SolrInputDocument you're about to add. In the processAdd method you can add a new field with whatever you want. The wiki has a good example: http://wiki.apache.org/solr/UpdateRequestProcessor Hello, I'm trying to add a field that counts the number of terms in a document to my schema. So far I've been computing this value at query-time. Is there how I could compute this once only and store the field? final SolrIndexSearcher searcher = request.getSearcher(); final SolrIndexReader reader = searcher.getReader(); final String content = content; final byte[] norms = reader.norms(content); final int[] docLengths; if (norms == null) { docLengths = null; } else { docLengths = new int[norms.length]; int i = 0; for (byte b : norms) { float docNorm = searcher.getSimilarity().decodeNormValue(b); int docLength = 0; if (docNorm != 0) { docLength = (int) (1 / docNorm); //reciprocal } docLengths[i++] = docLength; } ... final NumericField docLenNormField = new NumericField(TestQueryResponseWriter.DOC_LENGHT); docLenNormField.setIntValue(docLengths[id]); doc.add(docLenNormField); -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How do I add a custom field?
Hello, I want to have an additional field that appears for every document in search results. I understand that I should do this by adding the field to the schema.xml, so I add: field name=myField default=0 type=integer stored=true indexed=false/ Then I restart Solr (so that I loads the new schema.xml) and make a query specifying that it should return myField too, but it doesn't. Will it do only for newly indexed documents? Am I missing something? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How do I add a custom field?
Is there how I can compute and add the field to all indexed documents without re-indexing? MyField counts the number of terms per document (unique word count). On Sun, Jul 3, 2011 at 12:24 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Gabriele, Did you index any docs with your new field ? The results will just bring back docs and what fields they have. They won't bring back null fields just because they are in your schema. Lucene is schema-less. Solr adds the schema to make it nice to administer and very powerful to use. On 3 July 2011 11:01, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I want to have an additional field that appears for every document in search results. I understand that I should do this by adding the field to the schema.xml, so I add: field name=myField default=0 type=integer stored=true indexed=false/ Then I restart Solr (so that I loads the new schema.xml) and make a query specifying that it should return myField too, but it doesn't. Will it do only for newly indexed documents? Am I missing something? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How do I compute and store a field?
Hello, I'm trying to add a field that counts the number of terms in a document to my schema. So far I've been computing this value at query-time. Is there how I could compute this once only and store the field? final SolrIndexSearcher searcher = request.getSearcher(); final SolrIndexReader reader = searcher.getReader(); final String content = content; final byte[] norms = reader.norms(content); final int[] docLengths; if (norms == null) { docLengths = null; } else { docLengths = new int[norms.length]; int i = 0; for (byte b : norms) { float docNorm = searcher.getSimilarity().decodeNormValue(b); int docLength = 0; if (docNorm != 0) { docLength = (int) (1 / docNorm); //reciprocal } docLengths[i++] = docLength; } ... final NumericField docLenNormField = new NumericField(TestQueryResponseWriter.DOC_LENGHT); docLenNormField.setIntValue(docLengths[id]); doc.add(docLenNormField); -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
site: feature in Solr?
Hello, Beside creating an index with just the site in question, is it possible like with Google to search for results only in a given domain? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Why are not query keywords treated as a set?
str name=rawquerystringpast past/str str name=querystring*past past*/str str name=parsedquery*content:past content:past*/str I was expecting the query to get parsed into content:past only and not content:past content:past. On Mon, Jun 20, 2011 at 12:12 AM, lee carroll lee.a.carr...@googlemail.comwrote: do you mean a phrase query? past past can you give some more detail? On 18 June 2011 13:02, Gabriele Kahlout gabri...@mysimpatico.com wrote: q=past past 1.0 = (MATCH) sum of: * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* 1.0 = tf(termFreq(content:past)=1) 1.0 = idf(docFreq=1, maxDocs=2) 0.5 = fieldNorm(field=content, doc=0) * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* 1.0 = tf(termFreq(content:past)=1) 1.0 = idf(docFreq=1, maxDocs=2) 0.5 = fieldNorm(field=content, doc=0) Is there how I can treat the query keywords as a set? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Why does paste get parsed into past?
Hello, Debugging query results I find that: str name=querystringpaste/str str name=parsedquerycontent:past/str Now paste and past are two different words. Why does Solr not consider that? How do I make it? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Why are not query keywords treated as a set?
q=past past 1.0 = (MATCH) sum of: * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* 1.0 = tf(termFreq(content:past)=1) 1.0 = idf(docFreq=1, maxDocs=2) 0.5 = fieldNorm(field=content, doc=0) * 0.5 = (MATCH) fieldWeight(content:past in 0), product of:* 1.0 = tf(termFreq(content:past)=1) 1.0 = idf(docFreq=1, maxDocs=2) 0.5 = fieldNorm(field=content, doc=0) Is there how I can treat the query keywords as a set? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Why does paste get parsed into past?
I'm !sure where those are set, but on reflection I'd keep the default settings. My real issue is why are not query keywords treated as a set?http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201106.mbox/%3CBANLkTikHunhyWc2WVTofRYU4ZW=c8oe...@mail.gmail.com%3E 2011/6/18 François Schiettecatte fschietteca...@gmail.com What do you have set up for stemming? François On Jun 18, 2011, at 8:00 AM, Gabriele Kahlout wrote: Hello, Debugging query results I find that: str name=querystringpaste/str str name=parsedquerycontent:past/str Now paste and past are two different words. Why does Solr not consider that? How do I make it? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Is it true that I cannot delete stored content from the index?
Hello, I've indexing with the content field stored. Now I'd like to delete all stored content, is there how to do that without re-indexing? It seems not from lucene FAQhttp://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F : How do I update a document or a set of documents that are already indexed? There is no direct update procedure in Lucene. To update an index incrementally you must first *delete* the documents that were updated, and *then re-add*them to the index. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
It's not possible to decide at run-time which similarity class to use, right?
Hello, I'm testing out different Similarity implementations, and to do that I restart Solr each time I want to try a different similarity class I change the class attributed of the similiary element in schema.xml. Beside running multiple-cores, each with its own schema, is there a way to tell the RequestHandler which similarity class to use? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: It's not possible to decide at run-time which similarity class to use, right?
On Thu, Jun 16, 2011 at 9:14 PM, Erik Hatcher erik.hatc...@gmail.comwrote: No, there's not a way to control Similarity on a per-request basis. Some factors from Similarity are computed at index-time though. You got me on this. What factors are you trying to tweak that way and why? Maybe doing boosting using some other mechanism (boosting functions, boosting clauses) would be a better way to go? I'm trying to assess the impact of coord (search-time) on Qtime. In one implementation coord returns 1, while in another it's actually computed. Running multiple cores adds considerable complication (must specify to share data but not conf). Patching the request handler to change similarity (didn't yet look into this) will only change 'search-time' similarity. How about breaking up similarity into run-time and compile-time? So requesthandler could take a parameter to 'safely' set the run-time similarity? I think many would welcome such responsibility distinction. Erik On Jun 16, 2011, at 14:55 , Gabriele Kahlout wrote: Hello, I'm testing out different Similarity implementations, and to do that I restart Solr each time I want to try a different similarity class I change the class attributed of the similiary element in schema.xml. Beside running multiple-cores, each with its own schema, is there a way to tell the RequestHandler which similarity class to use? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How do I make sure the resulting documents contain the query terms?
Sorry being unclear and thank you for answering. Consider the following documents A(k0,k1,k2), B(k1,k2,k3), and C(k0,k2,k3), where A,B,C are document identifiers and the ks in bracket with each are the terms each contains. So Solr inverted index should be something like: k0 -- A | C k1 -- A | B k2 -- A | B | C k3 -- B | C Now let q=k1, how do I make sure C doesn't appear as a result since it doesn't contain any occurence of k1? On Tue, Jun 7, 2011 at 12:21 AM, Erick Erickson erickerick...@gmail.comwrote: I'm having a hard time understanding what you're driving at, can you provide some examples? This *looks* like filter queries, but I think you already know about those... Best Erick On Mon, Jun 6, 2011 at 4:00 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I've seen that through boosting it's possible to influence the scoring function, but what I would like is sort of a boolean property. In some way it's to search only the indexed documents by that keyword (or the intersection/union) rather than the whole set. Is this supported in any way? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How do I make sure the resulting documents contain the query terms?
On Tue, Jun 7, 2011 at 8:43 AM, pravesh suyalprav...@yahoo.com wrote: k0 -- A | C k1 -- A | B k2 -- A | B | C k3 -- B | C Now let q=k1, how do I make sure C doesn't appear as a result since it doesn't contain any occurence of k1? Do we bother to do that. Now that's what lucene does :) Lucene/Solr doesn't do that, it ranks documents based on a scoring function, and with that it lacks the possibility of specifying that a particular term must appear (the closest way I know of is boosting it). The solution would be a way to tell Solr/lucene which documents/indices to query, i.e. query only the union/intersection of the documents in which k1,...kn appear, instead of query all indexed documents and apply the ranking function (which will give weight to documents that contains k1...kn). -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-make-sure-the-resulting-documents-contain-the-query-terms-tp3031637p3033451.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How do I make sure the resulting documents contain the query terms?
You are right, Lucene will return based on my scoring function implementation (Similarity classhttp://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html ): score(q,d) = coord(q,d)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_coord · queryNorm(q)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_queryNorm · ∑ ( tf(t in d)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_tf · idf(t)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_idf 2 · t.getBoost()http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_termBoost · norm(t,d)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_norm ) It can be seen that whenever tf(t in d) =0 the whole score will be 0, so as you say C will never be returned. My issue is when the query has multiple terms (my example was too simple!), and some are 'mandatory' while others not. In that case I should make a query that uses the +%20http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#+(eg. q=+k1). I'm unsure I'll get the syntax right, but let's say k1 is mandatory and and k2 and k3 are optional, then q=k2 k3 +k1. I see that queries made through solrj are received with + in place of the (default to OR), so q=k2+k3++k1. On Tue, Jun 7, 2011 at 5:23 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Um, normally that would never happen, because, well, like you say, the inverted index doesn't have docC for term K1, because doc C didn't include term K1. If you search on q=K1, then how/why would docC ever be in your result set? Are you seeing it in your result set? The question then would be _why_, what weird thing is going on to make that happen, that's not expected. The result set _starts_ from only the documents that actually include the term. Boosting/relevancy ranking only effects what order these documents appear in, but there's no reason documentC should be in the result set at all in your case of q=k1, where docC is not indexed under k1. On 6/7/2011 2:35 AM, Gabriele Kahlout wrote: Sorry being unclear and thank you for answering. Consider the following documents A(k0,k1,k2), B(k1,k2,k3), and C(k0,k2,k3), where A,B,C are document identifiers and the ks in bracket with each are the terms each contains. So Solr inverted index should be something like: k0 -- A | C k1 -- A | B k2 -- A | B | C k3 -- B | C Now let q=k1, how do I make sure C doesn't appear as a result since it doesn't contain any occurence of k1? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How do I make sure the resulting documents contain the query terms?
Hello, I've seen that through boosting it's possible to influence the scoring function, but what I would like is sort of a boolean property. In some way it's to search only the indexed documents by that keyword (or the intersection/union) rather than the whole set. Is this supported in any way? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to know how many documents are indexed? Anything more elegant than parsing numFound?
sorry, this was my bad.. should have used and ! (append) On Fri, Jun 3, 2011 at 9:45 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: $ curl --fail http://192.168.34.51:8080/solr/admin/stats.jsp; resp.xml $ xmlstarlet sel -t -v //@numDocs resp.xml *Extra content at the end of the document* On Fri, Jun 3, 2011 at 8:56 PM, Ahmet Arslan iori...@yahoo.com wrote: : How to know how many documents are indexed? Anything more elegant than : parsing numFound? $ curl http://192.168.34.51:8080/solr/select?q=*%3A*rows=0; resp.xml $ xmlstarlet sel -t -v //@numFound resp.xml solr/admin/stats.jsp is actually an xml too and contains numDocs and maxDoc info. I think you can get numDocs with jmx too. http://wiki.apache.org/solr/SolrJmx -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How to know how many documents are indexed? Anything more elegant than parsing numFound?
$ curl http://192.168.34.51:8080/solr/select?q=*%3A*rows=0; resp.xml $ xmlstarlet sel -t -v //@numFound resp.xml -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to know how many documents are indexed? Anything more elegant than parsing numFound?
$ curl --fail http://192.168.34.51:8080/solr/admin/stats.jsp; resp.xml $ xmlstarlet sel -t -v //@numDocs resp.xml *Extra content at the end of the document* On Fri, Jun 3, 2011 at 8:56 PM, Ahmet Arslan iori...@yahoo.com wrote: : How to know how many documents are indexed? Anything more elegant than : parsing numFound? $ curl http://192.168.34.51:8080/solr/select?q=*%3A*rows=0; resp.xml $ xmlstarlet sel -t -v //@numFound resp.xml solr/admin/stats.jsp is actually an xml too and contains numDocs and maxDoc info. I think you can get numDocs with jmx too. http://wiki.apache.org/solr/SolrJmx -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
What's the need for a complicated SolrTestCaseJ4.getClassName() ?
Hello, As long as I subclass SolrTestCaseJ4 I cannot do this.getClass().getSimpleName(), I don't understand why. I wonder if the following complicated methods in SolrTestCaseJ4 have anything to do with it? protected static String getClassName() { StackTraceElement[] stack = new RuntimeException(WhoAmI).fillInStackTrace().getStackTrace(); for (int i = stack.length-1; i=0; i--) { StackTraceElement ste = stack[i]; String cname = ste.getClassName(); if (cname.indexOf(.lucene.)=0 || cname.indexOf(.solr.)=0) { return cname; } } return SolrTestCaseJ4.class.getName(); } protected static String getSimpleClassName() { String cname = getClassName(); return cname.substring(cname.lastIndexOf('.')+1); } -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
(How) can I use SolrTestCaseJ4.assertQ(..) to test an existing index?
Hello, Examining Solr Core example it seems that a new index is created in a temp dataDir deleted after each test (Good practice - agreed). But before I start debugging adoc(..) I'm wondering if I can query the same index which I see to work through Solr Web Server interface. Also for large indeces I see it faster and easier to just copy paste a test resource index and just assertQ(..) on it. Examining the logs I figure out that SolrCore.initIndex() never picks up my index. The issue is So far, it's not working me, although I specify the dataDir it always finds no document. The issue is that SolrCore.initDirectoryFactory() called from SolrCore.initIndex()is initialized to RAMDirectoryFactory which understandably returns false to getDirectoryFactory().exists(indexDir). Other than hacking to use the StandardDirectoryFactory* how I can test an existing index?* It's been multiple days that I'm trying to figure out how to test with Solr! -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: (How) can I use SolrTestCaseJ4.assertQ(..) to test an existing index?
On Sat, May 21, 2011 at 3:29 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Hello, Examining Solr Core example it seems that a new index is created in a temp dataDir deleted after each test (Good practice - agreed). errr..from a test to the other only dataDir is rm but not the in-memory index. That's blown away with the core in @AfterClass . I'm not sure what's the point then of deleting the dataDir after each test. It's at least counter-intuitive (to me). @Test public void testAddDoc() throws Exception { final String docUID = getDocUID(); assertU(*adoc*(id, docUID, url, getURL(docUID), content, blah blah blah)); assertU(commit()); assertQ(req(anythingQ), //*[@numFound='*1*']); } @Test public void testAddOtherDoc() throws Exception { final String docUID = getDocUID(); assertU(*adoc*(id, docUID, url, getURL(docUID), content, blah blah blah)); assertU(commit()); assertQ(req(anythingQ), //*[@numFound='*2*']); } But before I start debugging adoc(..) I'm wondering if I can query the same index which I see to work through Solr Web Server interface. Also for large indeces I see it faster and easier to just copy paste a test resource index and just assertQ(..) on it. Examining the logs I figure out that SolrCore.initIndex() never picks up my index. The issue is So far, it's not working me, although I specify the dataDir it always finds no document. The issue is that SolrCore.initDirectoryFactory() called from SolrCore.initIndex()is initialized to RAMDirectoryFactory which understandably returns false to getDirectoryFactory().exists(indexDir). Other than hacking to use the StandardDirectoryFactory* how I can test an existing index?* It's been multiple days that I'm trying to figure out how to test with Solr! -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to test Solr Integartion - how to get EmbeddedSolrServer?
Thinking more about it, I can solve my immediate problem by just copy-pasting the classes I need into my own project packages (KISS like herehttps://github.com/Filirom1/solr-test-exemple ). I'd however suggest to refactor Solr code structure to be much more defaults-compliant making it easier for external developers to understand, and hopefully easier to maintain for committers (with fewer special-needs configurations). I've done some of those refactorings on my local copy of Solr and would be glad to contribute. For this particular problem the KISS solution would be to create yet one more module for Tests which depend on Solr Core and on the Test Framework. The org burden of that extra module, versus the ease of building configuration, I believe, outweights. On Tue, May 17, 2011 at 7:11 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: http://stackoverflow.com/questions/6034513/can-i-avoid-a-dependency-cycle-with-one-edge-being-a-test-dependency On Tue, May 17, 2011 at 6:49 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: On Tue, May 17, 2011 at 3:52 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: On Tue, May 17, 2011 at 3:44 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, On 5/17/2011 at 9:34 AM, Gabriele Kahlout wrote: Solr Core should declare a test dependency on Solr Test Framework. I agree: - Solr Core should have a test-scope dependency on Solr Test Framework. - Solr Test Framework should have a compile-scope dependency on Solr Core. But Maven views this as a circular dependency. I've seen, but adding it with scope test /scope works. The logic: the src is compiled first and then re-used (I'm assuming maven does something smart about not including the full jar). Not quite. I've tried a demo and the reactor complains. I'll try to see if maven could become 'smarter', or if the 2-build phase solution will work. The projects in the reactor contain a cyclic reference: Edge between 'Vertex{label='com.mysimpatico:TestFramework:1.0-SNAPSHOT'}' and 'Vertex{label='org.apache:DummyCore:1.0-SNAPSHOT'}' introduces to cycle in the graph org.apache:DummyCore:1.0-SNAPSHOT -- com.mysimpatico:TestFramework:1.0-SNAPSHOT -- org.apache:DummyCore:1.0-SNAPSHOT - [Help 1] The workaround: Solr Core includes the source of Solr Test Framework as part of its test source code. It's not pretty, but it works. I'd be happy to entertain other (functional) approaches. In dp4j.com pom.xml I build in 2 phases to compile with the same annotations in the project itself (but i don't think we need that here) Steve -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to list/see all the indexed terms of a particular field in a document?
ant luke? On Wed, May 18, 2011 at 11:47 AM, Gnanakumar gna...@zoniac.com wrote: Hi, I'm using Apache Solr v3.1. How do I list/get to see all the indexed terms of a particular field in a document (by passing Unique Key ID of the document)? For example, I've the following field definition in schema.xml: field name=mydocumentid type=string indexed=true stored=true required=true / field name=mytextcontent type=text indexed=true stored=true required=true / In this case, I expect/want to list/see all the indexed terms of a particular document (mydocumentid:x) for the document field mytextcontent. Regards, Gnanam -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Does every Solr request-response require a running server?
Hello, I'm wondering if Solr Test framework at the end of the day always runs an embedded/jetty server (which is the only way to interact with solr, i.e. no web server -- no solr) or in the tests they interact without one, calling directly the under line methods? The latter seems to be the case trying to understand SolrTestCaseJ4. That would be more white-box than otherwise. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Does every Solr request-response require a running server?
On Wed, May 18, 2011 at 5:09 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I'm wondering if Solr Test framework at the end of the day always runs an embedded/jetty server (which is the only way to interact with solr, i.e. no web server -- no solr) or in the tests they interact without one, calling directly the under line methods? The latter seems to be the case trying to understand SolrTestCaseJ4. That would be more white-box than otherwise. Solr does either, depending on the test. Most tests start only an embedded solr server w/ no web server, What is confusing me is the solr server. Is it SolrCore? In what aspects is it a 'server'? In my understanding it's the core of the Solr Web application which makes up the servlets interface, i.e. it's under the servlets not on top of them. but others use an embedded jetty server so one can talk HTTP to it. JettySolrRunner is used for the latter. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How to test Solr Integartion - how to get EmbeddedSolrServer?
Hello, I'm starting to write tests of my Solr integration, and have unfortunately spent a lot of time chasing updated documentation. Follows a test I found herehttp://blog.synyx.de/2011/01/integration-tests-for-your-solr-config/which uses anEmbeddedSolrServerto communicate with the server and run some queries. @Test public void testThatNoResultsAreReturned() throws SolrServerException { SolrParams params = new SolrQuery(text that is not found); assertQ(TEST_SEED, null, tests); QueryResponse response = req(params); assertEquals(0L, response.getResults().getNumFound()); } The issue is that I cannot add a dependency on Solr-3.2-SNAPSHOT since it's packaged as a war. I've tried to attach the sources and make the dependency of type classes but it still won't work. plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-war-plugin/artifactId configuration warSourceDirectoryweb/warSourceDirectory webXmlweb/WEB-INF/web.xml/webXml * attachClassestrue/attachClasses* /configuration /plugin How could you use EmbeddedSolrServer outside of Solr Webapp? I've see that org.apache.solr.client.solrj.embedded.TestSolrProperties does that in Solr Core, but not through a dependency on Solr Webapp (and I'm not figuring out where it comes from). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to test Solr Integartion - how to get EmbeddedSolrServer?
thank you. I'd like to stick to the same version (i.e. 3.2-SNAPSHOT). It seems things have changed there. To reproduce (should we file this and add my test as a test to avoid this bumping up again?) $ svn co -r 1104120 http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/ solr cd solr; ant get-maven-poms; mvn -N -Pbootstrap install; mvn -DskipTests install wget http://dp4j.sf.net/debug/embeddedServerTest.zip unzip embeddedServerTest.zip cd embeddedServerTest; mvn -X test P.S. I realize the example is not SSCCE (but close, and i laready uplaoded it). dependencies dependency groupIdjunit/groupId artifactIdjunit/artifactId version4.8.2/version scopetest/scope typejar/type /dependency dependency groupIdorg.apache.solr/groupId artifactIdsolr-core/artifactId version3.2-SNAPSHOT/version /dependency dependency groupIdorg.apache.solr/groupId artifactIdsolr-test-framework/artifactId version3.2-SNAPSHOT/version /dependency /dependencies import org.junit.Before; import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer; import org.apache.solr.util.AbstractSolrTestCase; public class SolrConfigTest extends AbstractSolrTestCase { public String getSchemaFile() { return /conf/schema.xml; } public String getSolrConfigFile() { return /conf/solrconfig.xml; } @Before @Override public void setUp() throws Exception { super.setUp(); new EmbeddedSolrServer(h.getCoreContainer(), h.getCore().getName()); } } On Tue, May 17, 2011 at 2:38 PM, Colin Vipurs colin.vip...@shazamteam.comwrote: I use the following: dependency groupIdorg.apache.solr/groupId artifactIdsolr-core/artifactId version3.1.0/version /dependency dependency groupIdorg.apache.solr/groupId artifactIdsolr-solrj/artifactId version3.1.0/version /dependency Hello, I'm starting to write tests of my Solr integration, and have unfortunately spent a lot of time chasing updated documentation. Follows a test I found herehttp://blog.synyx.de/2011/01/integration-tests-for-your-solr-config/which uses anEmbeddedSolrServerto communicate with the server and run some queries. @Test public void testThatNoResultsAreReturned() throws SolrServerException { SolrParams params = new SolrQuery(text that is not found); assertQ(TEST_SEED, null, tests); QueryResponse response = req(params); assertEquals(0L, response.getResults().getNumFound()); } The issue is that I cannot add a dependency on Solr-3.2-SNAPSHOT since it's packaged as a war. I've tried to attach the sources and make the dependency of type classes but it still won't work. plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-war-plugin/artifactId configuration warSourceDirectoryweb/warSourceDirectory webXmlweb/WEB-INF/web.xml/webXml * attachClassestrue/attachClasses* /configuration /plugin How could you use EmbeddedSolrServer outside of Solr Webapp? I've see that org.apache.solr.client.solrj.embedded.TestSolrProperties does that in Solr Core, but not through a dependency on Solr Webapp (and I'm not figuring out where it comes from). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ -- *Colin Vipurs* *Server Team Lead* *Shazam Entertainment Ltd * *26-28 Hammersmith Grove, London W6 7HA* m: +44 (0) 000 000 t: +44 (0) 20 8742 6820 w:*www.shazam.com* Please consider the environment before printing this document This e-mail and its contents are strictly private and confidential. It must not be disclosed, distributed or copied without our prior consent. If you have received this transmission in error, please notify Shazam Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it from your system. Please note that the information contained herein shall additionally constitute Confidential Information for the purposes of any NDA between the recipient/s and Shazam Entertainment. Shazam
Re: How to test Solr Integartion - how to get EmbeddedSolrServer?
On Tue, May 17, 2011 at 3:44 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, On 5/17/2011 at 9:34 AM, Gabriele Kahlout wrote: Solr Core should declare a test dependency on Solr Test Framework. I agree: - Solr Core should have a test-scope dependency on Solr Test Framework. - Solr Test Framework should have a compile-scope dependency on Solr Core. But Maven views this as a circular dependency. I've seen, but adding it with scope test /scope works. The logic: the src is compiled first and then re-used (I'm assuming maven does something smart about not including the full jar). The workaround: Solr Core includes the source of Solr Test Framework as part of its test source code. It's not pretty, but it works. I'd be happy to entertain other (functional) approaches. In dp4j.com pom.xml I build in 2 phases to compile with the same annotations in the project itself (but i don't think we need that here) Steve -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How do i I modify XMLWriter to write foobar?
Got this sorted checking out the branch revision. On Thu, May 5, 2011 at 9:44 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: I've now tried to write my own QueryResponseWriter plugin[1], as a maven project depending on Solr Core 3.1, which is the same version of Solr I've installed. It seems I'm not able to get rid of some cache. $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml queryResponseWriter name=*xml* class=org.apache.solr.request.* XMLResponseWriter*/ queryResponseWriter name=*Test* class=com.mysimpatico.me.indexplugins. *TestQueryResponseWriter* default=true/ Restarted tomcat after changing solrconfig.xml and placing indexplugins.jar in $SOLR_HOME/ At tomcat boot: INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/IndexPlugins.jar' to classloader I get legacy code of the plugin for both, and I don't understand why. At least the xml should be different. Why could this be? How to find out? http://localhost:8080/solr/select?q=apachewt=Test and http://localhost:8080/solr/select?q=apachewt=xml XML Parsing Error: syntax error Location: http://localhost:8080/solr/select?q=apachewt=xml (//Test Line Number 1, Column 1: foobarresponseHeaderstatusQTimeparamsqapachewtxmlresponse00foobar ^ It seems the new code for TestQueryResponseWriter[1] seems to never be executed since i added a severe log statement that doesn't appear in tomcat logs. Where are those caches? Thank you in advance. [1] package com.mysimpatico.me.indexplugins; import java.io.*; import java.util.logging.Level; import java.util.logging.Logger; import org.apache.solr.request.XMLResponseWriter; /** * Hello world! * */ public class TestQueryResponseWriter extends XMLResponseWriter{ @Override public void write(Writer writer, org.apache.solr.request.SolrQueryRequest request, org.apache.solr.response.SolrQueryResponse response) throws IOException { Logger.getLogger(TestQueryResponseWriter.class.getName()).log(Level.SEVERE, Hello from TestQueryResponseWriter); super.write(writer, request, response); } } On Thu, May 5, 2011 at 9:01 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml : queryResponseWriter name=xml class=org.apache.solr.request.* : XMLResponseWriter* default=true/ : : Now I comment the line in Solrconfix.xml, and there's no more writer. : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml : : I make a query, and the XMLResponseWriter is still in charge. : *$ curl -L http://localhost:8080/solr/select?q=apache* : ?xml version=1.0 encoding=UTF-8? ... Your example request is not specifying a wt param. in addition to the response writers declared in your solrconfig.xml, there are response writers that exist implicitly unless you define your own instances that override those names (xml, json, python, etc...) the real question is: what writer do you *want* to have used when no wt is specified? whatever the answer is: declare n instance of that writer with default=true in your solrconfig.xml -Hoss -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How to get the filtered terms from a Query in the ResponseWriter?
Hello, For a given q string I'm trying to extract the terms (identifiers of tokens) that the Query Parser identified at terms (and shows when explaining results). I manage to do it as follows, but *I hope there a better way (more direct) you will tell me about:* NamedList analysis = new *FieldAnalysisRequestHandler*().doAnalysis(request); //doAnalyis is protected, should extend with own dummy to get bypass, but for now just hack SimpleOrderedMap fieldsMap = (SimpleOrderedMap) analysis.get(field_names); SimpleOrderedMap contentMap = (SimpleOrderedMap) fieldsMap.get(content); final Set terms = new HashSet(); for (Object object : contentMap) { List termsList = (List) object; for (Object object1 : termsList) { SimpleOrderedMap termMap = (SimpleOrderedMap) object1; *terms.add((String) termMap.get(text)); *//actually I want the intersection of the terms returned here (i.e. those that made through all the filters, and not the union } } -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How to plugin the value of a Field? DocInverterPerField?
Hello, I'm trying to add an extra field to the schema.xml that is only stored, but with nutch not knowing about it, I don't know how to tell Solr of its value for each document. I'd like to plugin the computation, something like is done with Similarity, but I'm not sure how to do that. From SOLR-1566 https://issues.apache.org/jira/browse/SOLR-1566: Currently it is not possible for components to add fields to outgoing documents which are not in the the stored fields of the document. That's my next problem, but let's say I'm okay storing the field, how do I do that? BTW, I tried hacking the code to add the fields a response-time to the defaultFields per document, but since it always has a cached document it'll not add them (I could still force the adding, but I'm not sure what else will break as a result). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to plugin the value of a Field? DocInverterPerField?
It looks like I've to contact updateHandler class=solr.DirectUpdateHandler2 with an AddUpdateCommand . On Sat, May 14, 2011 at 12:36 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I'm trying to add an extra field to the schema.xml that is only stored, but with nutch not knowing about it, I don't know how to tell Solr of its value for each document. I'd like to plugin the computation, something like is done with Similarity, but I'm not sure how to do that. From SOLR-1566 https://issues.apache.org/jira/browse/SOLR-1566: Currently it is not possible for components to add fields to outgoing documents which are not in the the stored fields of the document. That's my next problem, but let's say I'm okay storing the field, how do I do that? BTW, I tried hacking the code to add the fields a response-time to the defaultFields per document, but since it always has a cached document it'll not add them (I could still force the adding, but I'm not sure what else will break as a result). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to plugin the value of a Field? DocInverterPerField?
I calculate it from search-time + index-time field values. For example, say I want to print the reciprocal of the content field norm (available at index-time) along every document in the results. What's the 'clean' way of doing that? On Sat, May 14, 2011 at 3:42 PM, Markus Jelsma markus.jel...@openindex.iowrote: I'm not sure what you're trying to do. Where does the field value needs to come from? Hello, I'm trying to add an extra field to the schema.xml that is only stored, but with nutch not knowing about it, I don't know how to tell Solr of its value for each document. I'd like to plugin the computation, something like is done with Similarity, but I'm not sure how to do that. From SOLR-1566 https://issues.apache.org/jira/browse/SOLR-1566: Currently it is not possible for components to add fields to outgoing documents which are not in the the stored fields of the document. That's my next problem, but let's say I'm okay storing the field, how do I do that? BTW, I tried hacking the code to add the fields a response-time to the defaultFields per document, but since it always has a cached document it'll not add them (I could still force the adding, but I'm not sure what else will break as a result). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Want to Delete Existing Index create fresh index
I guess you are having issues with the datadir. Did you set the datadir in solrconfig.xml? On Sat, May 14, 2011 at 4:10 PM, Pawan Darira pawan.dar...@gmail.comwrote: Hi I am using Solr 1.4. had changed schema already. When i created the index for first time, the directory was automatically created index made perfectly fine. Now, i want to create the index from scratch, so I deleted the whole data/index directory ran the script. Now it is only creating empty directories NO index files inside that. Thanks Pawan On Sat, May 14, 2011 at 6:54 PM, Dmitry Kan dmitry@gmail.com wrote: Hi Pawan, Which SOLR version do you have installed? It should be absolutely normal for the data/ sub directory to create when starting up SOLR. So just go ahead and post your data into SOLR, if you have changed the schema already. -- Regards, Dmitry Kan On Sat, May 14, 2011 at 4:01 PM, Pawan Darira pawan.dar...@gmail.com wrote: I did that. Index directory is created but not contents in that 2011/5/14 François Schiettecatte fschietteca...@gmail.com You can also shut down solr/lucene, do: rm -rf /YourIndexName/data/index and restart, the index directory will be automatically recreated. François On May 14, 2011, at 1:53 AM, Gabriele Kahlout wrote: curl --fail $solrIndex/update?commit=true -d 'deletequery*:*/query/delete' #empty index [1 http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script ] did u try? On Sat, May 14, 2011 at 7:26 AM, Pawan Darira pawan.dar...@gmail.com wrote: Hi I had an existing index created months back. now my database schema has changed. i wanted to delete the current data/index directory re-create the fresh index but it is saying that segments file not found just create blank data/index directory. Please help -- Thanks, Pawan Darira -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Thanks, Pawan Darira -- Thanks, Pawan Darira -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: How to plugin the value of a Field? DocInverterPerField?
Just reporting on progress: Hacking my own ResponseWriter I manage to add the field to the doc just-in-time before it's written. It's not that messy after all, and i suspect the fields could also be declared in schema.xml (if we want to be able to disable them at run-time) and only if present the value is computed and added. As acknowledged by others before me, there's room for refactoring ResponseWriters to at least make them more re-usable. Hope SOLR-1566 https://issues.apache.org/jira/browse/SOLR-1566 come up with a cleaner solution. On Sat, May 14, 2011 at 3:55 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: I calculate it from search-time + index-time field values. For example, say I want to print the reciprocal of the content field norm (available at index-time) along every document in the results. What's the 'clean' way of doing that? On Sat, May 14, 2011 at 3:42 PM, Markus Jelsma markus.jel...@openindex.io wrote: I'm not sure what you're trying to do. Where does the field value needs to come from? Hello, I'm trying to add an extra field to the schema.xml that is only stored, but with nutch not knowing about it, I don't know how to tell Solr of its value for each document. I'd like to plugin the computation, something like is done with Similarity, but I'm not sure how to do that. From SOLR-1566 https://issues.apache.org/jira/browse/SOLR-1566: Currently it is not possible for components to add fields to outgoing documents which are not in the the stored fields of the document. That's my next problem, but let's say I'm okay storing the field, how do I do that? BTW, I tried hacking the code to add the fields a response-time to the defaultFields per document, but since it always has a cached document it'll not add them (I could still force the adding, but I'm not sure what else will break as a result). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Editor loads wrong version of IndexSearcher while debugging - how to fix?
Hello, I'm debugging Solr built as a maven project in NB, and when I enter the code of a Lucene dependency, namely org.apache.lucene.search.IndexSearcher.explain(..) the call stack expects this method to be at line 599 while in the editor the class ends at 304. from solr-core's pom.xml: dependency groupId${project.groupId}/groupId artifactIdsolr-solrj/artifactId * version${project.version}/version* /dependency from solrj's pom.xml: dependency groupIdorg.apache.lucene/groupId artifactIdlucene-core/artifactId * version${project.version}/version* /dependency Looking up the actual class it's indeed 846 lines class and the editor is loading a faulty version sources.jar (download sourcecode). So the code in the sources.jar doesn't correspond to the binary code. Now the big question is,* why do I get sources different from the binary of the same version for a dependency*? How more could this be debugged? I don't know how NB downloads a dependency sources (googling it seems that each IDE has it's plugin for doing that). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Want to Delete Existing Index create fresh index
curl --fail $solrIndex/update?commit=true -d 'deletequery*:*/query/delete' #empty index [1 http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script] did u try? On Sat, May 14, 2011 at 7:26 AM, Pawan Darira pawan.dar...@gmail.comwrote: Hi I had an existing index created months back. now my database schema has changed. i wanted to delete the current data/index directory re-create the fresh index but it is saying that segments file not found just create blank data/index directory. Please help -- Thanks, Pawan Darira -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Is it possible to build Solr as a maven project?
On Tue, May 10, 2011 at 3:56 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: On Tue, May 10, 2011 at 3:50 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, There are some Maven instructions here (not in Lucene/Solr 3.1 because I just wrote the file a couple of days ago): http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/dev-tools/maven/README.maven My recommendation, since the Solr 3.1 source tarball does not include dev-tools/, is to check out the 3.1-tagged sources from Subversion: svn co http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1 and then follow the instructions in the above-linked README.maven. I did that just now and it worked for me. The results are in solr/package/maven/. I did that and i think they worked for me but i didn't get nutch to work with it, so I preferred to revert to what is officially supported (not even, but...). I'll be trying and report back. Everything worked! Those the revisions used: $ svn co -r 1101526 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1 solr 1086822 $ svn co -r 1101540 http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 nutch Thank you Please write back if you run into any problems. Steve From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Tuesday, May 10, 2011 8:37 AM To: boutr...@gmail.com Cc: solr-user@lucene.apache.org; Steven A Rowe; ryan...@gmail.com Subject: Re: Is it possible to build Solr as a maven project? sorry, this was not the target I used (this one should work too, but...), Can we expand on the but...? $ wget http://apache.panu.it//lucene/solr/3.1.0/apache-solr-3.1.0-src.tgz http://apache.panu.it/lucene/solr/3.1.0/apache-solr-3.1.0-src.tgz $ tar xf apache-solr-3.1.0-src.tgz $ cd apache-solr-3.1.0 $ ant generate-maven-artifacts generate-maven-artifacts: get-maven-poms: BUILD FAILED /Users/simpatico/Downloads/apache-solr-3.1.0/build.xml:59: The following error occurred while executing this line: /Users/simpatico/Downloads/apache-solr-3.1.0/lucene/build.xml:445: The following error occurred while executing this line: /Users/simpatico/Downloads/apache-solr-3.1.0/build.xml:45: /Users/simpatico/Downloads/apache-solr-3.1.0/dev-tools/maven does not exist. Now for those that build this, it must have worked sometime. How? Or is this a bug in the release? Looking the revisions history of the build script I might be referring to LUCENE-2490https://issues.apache.org/jira/browse/LUCENE-2490 but I'm not sure I understand the solution out. I've checked out dev-tools but even with it things don't work (tried the one with 3.1.0 relesase). the one I used is get-maven-poms. That will just create pom files and copy them to their right target locations. I'm using netbeans and I'm using the plugin Automatic Projects to do everything inside the IDE. Which version of Solr are you using ? Ludovic. 2011/5/4 Gabriele Kahlout [via Lucene] ml-node+2898211-2124746009-383...@n3.nabble.commailto: ml-node%2b2898211-2124746009-383...@n3.nabble.com generate-maven-artifacts: [mkdir] Created dir: /Users/simpatico/SOLR_HOME/build/maven [mkdir] Created dir: /Users/simpatico/SOLR_HOME/dist/maven [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/src/maven [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 *BUILD FAILED* /Users/simpatico/SOLR_HOME/*build.xml:800*: The following error occurred while executing this line: /Users/simpatico/SOLR_HOME/common-build.xml:274: artifact:deploy doesn't support the uniqueVersion attribute *build.xml:800: *m2-deploy pom.xml=src/maven/solr-parent-pom.xml.template/ removed uniquVersion attirubte: generate-maven-artifacts: [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/mavenfile:///\\Users\simpatico\SOLR_HOME\dist\maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.solr:solr-parent' [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.solr:solr-parent:1.4.2-SNAPSHOT' [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/lib [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/mavenfile:///\\Users\simpatico\SOLR_HOME\dist\maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata
Coord in queryExplain
Hello, I'm wondering why the results of coord() are not displayed when debugging query results, as described in the wiki[1http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22]. I'd like to see it. Could someone point to how to make it appear with the debug fields? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Coord in queryExplain
You are right! On Thu, May 12, 2011 at 2:54 PM, Ahmet Arslan iori...@yahoo.com wrote: I'm wondering why the results of coord() are not displayed when debugging query results, as described in the wiki[1 http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22 ]. I'd like to see it. Could someone point to how to make it appear with the debug fields? coord info displayed, however it seems that it is not displayed for value of 1.0 . To see coord, issue a multi-word query, and advance to the end of the list via start param. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
No more standard query type?
Is the tagged release of solr 3.1 different from the one distributed in the downloads page? It looks like a reproducible bug. svn co -r 1101526 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1 solr This is the default query I get from http://localhost:8080/solr/admin/form.jsp: http://localhost:8080/solr/select?indent=onversion=2.2q=*%3A*fq=start=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= HTTP Status 400 - unknown handler: standard -- *type* Status report *message* *unknown handler: standard* *description* *The request sent by the client was syntactically incorrect (unknown handler: standard).* -- Apache Tomcat/6.0.29I get the same with http://localhost:8080/solr/select?q=*%3A*wt=standardqt=standard, but not with: http://localhost:8080/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on (from http://localhost:8080/solr/admin) (good) response − lst name=responseHeader int name=status0/int int name=QTime5/int − lst name=params str name=indenton/str str name=start0/str str name=q*:*/str str name=rows10/str str name=version2.2/str /lst /lst result name=response numFound=0 start=0/ /response The On Thu, May 5, 2011 at 9:01 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml : queryResponseWriter name=xml class=org.apache.solr.request.* : XMLResponseWriter* default=true/ : : Now I comment the line in Solrconfix.xml, and there's no more writer. : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml : : I make a query, and the XMLResponseWriter is still in charge. : *$ curl -L http://localhost:8080/solr/select?q=apache* : ?xml version=1.0 encoding=UTF-8? ... Your example request is not specifying a wt param. in addition to the response writers declared in your solrconfig.xml, there are response writers that exist implicitly unless you define your own instances that override those names (xml, json, python, etc...) the real question is: what writer do you *want* to have used when no wt is specified? whatever the answer is: declare n instance of that writer with default=true in your solrconfig.xml -Hoss -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Is it possible to build Solr as a maven project?
sorry, this was not the target I used (this one should work too, but...), Can we expand on the but...? $ wget http://apache.panu.it//lucene/solr/3.1.0/apache-solr-3.1.0-src.tgz $ tar xf apache-solr-3.1.0-src.tgz $ cd apache-solr-3.1.0 $ ant generate-maven-artifacts *generate-maven-artifacts: get-maven-poms: BUILD FAILED /Users/simpatico/Downloads/apache-solr-3.1.0/build.xml:59: The following error occurred while executing this line: /Users/simpatico/Downloads/apache-solr-3.1.0/lucene/build.xml:445: The following error occurred while executing this line: /Users/simpatico/Downloads/apache-solr-3.1.0/build.xml:45: /Users/simpatico/Downloads/apache-solr-3.1.0/dev-tools/maven does not exist. * Now for those that build this, it must have worked sometime. How? Or is this a bug in the release? Looking the revisions history of the build script I might be referring to LUCENE-2490 https://issues.apache.org/jira/browse/LUCENE-2490 but I'm not sure I understand the solution out. I've checked out dev-tools but even with it things don't work (tried the one with 3.1.0 relesase). the one I used is get-maven-poms. That will just create pom files and copy them to their right target locations. I'm using netbeans and I'm using the plugin Automatic Projects to do everything inside the IDE. Which version of Solr are you using ? Ludovic. 2011/5/4 Gabriele Kahlout [via Lucene] ml-node+2898211-2124746009-383...@n3.nabble.com generate-maven-artifacts: [mkdir] Created dir: /Users/simpatico/SOLR_HOME/build/maven [mkdir] Created dir: /Users/simpatico/SOLR_HOME/dist/maven [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/src/maven [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 *BUILD FAILED* /Users/simpatico/SOLR_HOME/*build.xml:800*: The following error occurred while executing this line: /Users/simpatico/SOLR_HOME/common-build.xml:274: artifact:deploy doesn't support the uniqueVersion attribute *build.xml:800: *m2-deploy pom.xml=src/maven/solr-parent-pom.xml.template/ removed uniquVersion attirubte: generate-maven-artifacts: [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.solr:solr-parent' [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.solr:solr-parent:1.4.2-SNAPSHOT' [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/lib [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.solr:solr-commons-csv' [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading project information for solr-commons-csv 1.4.2-SNAPSHOT [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.solr:solr-commons-csv:1.4.2-SNAPSHOT' [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/contrib/dataimporthandler [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 BUILD FAILED /Users/simpatico/SOLR_HOME/build.xml:809: The following error occurred while executing this line: */Users/simpatico/SOLR_HOME/common-build.xml:274: artifact:deploy doesn't support the nested attach element* - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2898315.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Is it possible to build Solr as a maven project?
On Tue, May 10, 2011 at 3:50 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, There are some Maven instructions here (not in Lucene/Solr 3.1 because I just wrote the file a couple of days ago): http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/dev-tools/maven/README.maven My recommendation, since the Solr 3.1 source tarball does not include dev-tools/, is to check out the 3.1-tagged sources from Subversion: svn co http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1 and then follow the instructions in the above-linked README.maven. I did that just now and it worked for me. The results are in solr/package/maven/. I did that and i think they worked for me but i didn't get nutch to work with it, so I preferred to revert to what is officially supported (not even, but...). I'll be trying and report back. Thank you in advance. Please write back if you run into any problems. Steve From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Tuesday, May 10, 2011 8:37 AM To: boutr...@gmail.com Cc: solr-user@lucene.apache.org; Steven A Rowe; ryan...@gmail.com Subject: Re: Is it possible to build Solr as a maven project? sorry, this was not the target I used (this one should work too, but...), Can we expand on the but...? $ wget http://apache.panu.it//lucene/solr/3.1.0/apache-solr-3.1.0-src.tgz http://apache.panu.it/lucene/solr/3.1.0/apache-solr-3.1.0-src.tgz $ tar xf apache-solr-3.1.0-src.tgz $ cd apache-solr-3.1.0 $ ant generate-maven-artifacts generate-maven-artifacts: get-maven-poms: BUILD FAILED /Users/simpatico/Downloads/apache-solr-3.1.0/build.xml:59: The following error occurred while executing this line: /Users/simpatico/Downloads/apache-solr-3.1.0/lucene/build.xml:445: The following error occurred while executing this line: /Users/simpatico/Downloads/apache-solr-3.1.0/build.xml:45: /Users/simpatico/Downloads/apache-solr-3.1.0/dev-tools/maven does not exist. Now for those that build this, it must have worked sometime. How? Or is this a bug in the release? Looking the revisions history of the build script I might be referring to LUCENE-2490https://issues.apache.org/jira/browse/LUCENE-2490 but I'm not sure I understand the solution out. I've checked out dev-tools but even with it things don't work (tried the one with 3.1.0 relesase). the one I used is get-maven-poms. That will just create pom files and copy them to their right target locations. I'm using netbeans and I'm using the plugin Automatic Projects to do everything inside the IDE. Which version of Solr are you using ? Ludovic. 2011/5/4 Gabriele Kahlout [via Lucene] ml-node+2898211-2124746009-383...@n3.nabble.commailto: ml-node%2b2898211-2124746009-383...@n3.nabble.com generate-maven-artifacts: [mkdir] Created dir: /Users/simpatico/SOLR_HOME/build/maven [mkdir] Created dir: /Users/simpatico/SOLR_HOME/dist/maven [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/src/maven [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 *BUILD FAILED* /Users/simpatico/SOLR_HOME/*build.xml:800*: The following error occurred while executing this line: /Users/simpatico/SOLR_HOME/common-build.xml:274: artifact:deploy doesn't support the uniqueVersion attribute *build.xml:800: *m2-deploy pom.xml=src/maven/solr-parent-pom.xml.template/ removed uniquVersion attirubte: generate-maven-artifacts: [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/mavenfile:///\\Users\simpatico\SOLR_HOME\dist\maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.solr:solr-parent' [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.solr:solr-parent:1.4.2-SNAPSHOT' [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/lib [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/mavenfile:///\\Users\simpatico\SOLR_HOME\dist\maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.solr:solr-commons-csv' [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading project information for solr-commons-csv 1.4.2-SNAPSHOT [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading
SolrHome ends with /./ - is this normal?
Hello, I'm having trouble getting Solr 3.1 to work with nutch-1.3. I'm not sure where the problem is, but I'm wondering why does the solrHome path end with /./. cwd=/Applications/NetBeans/apache-tomcat-7.0.6/bin SolrHome=/Users/simpatico/apache-solr-3.1.0/solr/./ In the web.xml of solr: env-entry env-entry-namesolr/home/env-entry-name env-entry-value${user.home}/apache-solr-3.1.0/solr/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: SolrHome ends with /./ - is this normal?
It apparently is normal, and my issue is indeed with nutch. I've modified post.sh from the example docs to use the solr in http://localhost:8080/apache-solr-3.1-SNAPSHOT and now finally data made it to the index. $ post.sh solr.xml monitor.xml With nutch I'm at: $ svn info Path: . URL: http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: *1101459* Node Kind: directory Schedule: normal Last Changed Author: markus Last Changed Rev: 1101280 Last Changed Date: 2011-05-10 02:46:04 +0200 (Tue, 10 May 2011) Does this work for you? All I've done is svn co nutch 1.3 and execute my script which up to now worked. On Tue, May 10, 2011 at 4:11 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Hello, I'm having trouble getting Solr 3.1 to work with nutch-1.3. I'm not sure where the problem is, but I'm wondering why does the solrHome path end with /./. cwd=/Applications/NetBeans/apache-tomcat-7.0.6/bin SolrHome=/Users/simpatico/apache-solr-3.1.0/solr/./ In the web.xml of solr: env-entry env-entry-namesolr/home/env-entry-name env-entry-value${user.home}/apache-solr-3.1.0/solr/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: SolrHome ends with /./ - is this normal?
From solr logs: May 10, 2011 4:33:20 PM org.apache.solr.common.SolrException log *SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'content' * at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:321) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.netbeans.modules.web.monitor.server.MonitorFilter.doFilter(MonitorFilter.java:393) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:550) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:380) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:288) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) in conf/schema.xml: !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ *field name=content type=text stored=false indexed=true/* in conf/solrindex-mapping.xml: fields field dest=content source=content/ In recent solr I think this has been renamed into text? Solr's conf/schema.xml: via copyField further on in this schema -- * field name=text type=text indexed=true stored=false multiValued=true/* On Tue, May 10, 2011 at 4:30 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: It apparently is normal, and my issue is indeed with nutch. I've modified post.sh from the example docs to use the solr in http://localhost:8080/apache-solr-3.1-SNAPSHOT and now finally data made it to the index. $ post.sh solr.xml monitor.xml With nutch I'm at: $ svn info Path: . URL: http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: *1101459* Node Kind: directory Schedule: normal Last Changed Author: markus Last Changed Rev: 1101280 Last Changed Date: 2011-05-10 02:46:04 +0200 (Tue, 10 May 2011) Does this work for you? All I've done is svn co nutch 1.3 and execute my script which up to now worked. On Tue, May 10, 2011 at 4:11 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I'm having trouble getting Solr 3.1 to work with nutch-1.3. I'm not sure where the problem is, but I'm wondering why does the solrHome path end with /./. cwd=/Applications/NetBeans/apache-tomcat-7.0.6/bin SolrHome=/Users/simpatico/apache-solr-3.1.0/solr/./ In the web.xml of solr: env-entry env-entry-namesolr/home/env-entry-name env-entry-value${user.home}/apache-solr-3.1.0/solr/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I
Re: SolrHome ends with /./ - is this normal?
I don't get you, are you talking about conf/schema.xml? That's what I'm referring to. Am i supposed to do something with the nutch's conf/schema.xml? On Tue, May 10, 2011 at 4:46 PM, Markus Jelsma markus.jel...@openindex.iowrote: There is a working example schema in Nutch' conf directory. On Tuesday 10 May 2011 16:40:02 Gabriele Kahlout wrote: From solr logs: May 10, 2011 4:33:20 PM org.apache.solr.common.SolrException log *SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'content' * at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:321) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdatePro cessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentS treamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase .java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java :252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicatio nFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterC hain.java:210) at org.netbeans.modules.web.monitor.server.MonitorFilter.doFilter(MonitorFilte r.java:393) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicatio nFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterC hain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.j ava:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.j ava:161) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:16 4) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:10 0) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:550) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.jav a:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:380) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Htt p11Protocol.java:188) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Htt p11Protocol.java:166) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java :288) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j ava:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 908) at java.lang.Thread.run(Thread.java:680) in conf/schema.xml: !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ *field name=content type=text stored=false indexed=true/* in conf/solrindex-mapping.xml: fields field dest=content source=content/ In recent solr I think this has been renamed into text? Solr's conf/schema.xml: via copyField further on in this schema -- * field name=text type=text indexed=true stored=false multiValued=true/* On Tue, May 10, 2011 at 4:30 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: It apparently is normal, and my issue is indeed with nutch. I've modified post.sh from the example docs to use the solr in http://localhost:8080/apache-solr-3.1-SNAPSHOT and now finally data made it to the index. $ post.sh solr.xml monitor.xml With nutch I'm at: $ svn info Path: . URL: http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: *1101459* Node Kind: directory Schedule: normal Last Changed Author: markus Last Changed Rev: 1101280 Last Changed Date: 2011-05-10 02:46:04 +0200 (Tue, 10 May 2011) Does this work for you? All I've done is svn co nutch 1.3 and execute my script which up to now worked. On Tue, May 10, 2011 at 4:11 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I'm having trouble getting Solr 3.1 to work with nutch-1.3. I'm not sure where the problem is, but I'm wondering why does the solrHome path end with /./. cwd=/Applications/NetBeans/apache-tomcat-7.0.6/bin SolrHome=/Users/simpatico/apache-solr-3.1.0
Re: SolrHome ends with /./ - is this normal?
You mean that I should copy it from nutch into solr? $ cp $NUTCH_HOME/conf/schema.xml $SOLR_HOME/conf/schema.xml After restarting tomcat, and re-executing the script nothing changed. On Tue, May 10, 2011 at 5:35 PM, Markus Jelsma markus.jel...@openindex.iowrote: You need to use the schema.xml shipped with Nutch in Solr. It provides most fields that you need. On Tuesday 10 May 2011 17:31:33 Gabriele Kahlout wrote: I don't get you, are you talking about conf/schema.xml? That's what I'm referring to. Am i supposed to do something with the nutch's conf/schema.xml? On Tue, May 10, 2011 at 4:46 PM, Markus Jelsma markus.jel...@openindex.iowrote: There is a working example schema in Nutch' conf directory. On Tuesday 10 May 2011 16:40:02 Gabriele Kahlout wrote: From solr logs: May 10, 2011 4:33:20 PM org.apache.solr.common.SolrException log *SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'content' * at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:32 1) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP ro cessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten tS treamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa se .java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja va :252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat io nFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte rC hain.java:210) at org.netbeans.modules.web.monitor.server.MonitorFilter.doFilter(MonitorFil te r.java:393) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat io nFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte rC hain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve .j ava:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve .j ava:161) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java: 16 4) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: 10 0) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55 0) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j av a:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:38 0) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243 ) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H tt p11Protocol.java:188) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H tt p11Protocol.java:166) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.ja va :288) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor .j ava:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 908) at java.lang.Thread.run(Thread.java:680) in conf/schema.xml: !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ *field name=content type=text stored=false indexed=true/* in conf/solrindex-mapping.xml: fields field dest=content source=content/ In recent solr I think this has been renamed into text? Solr's conf/schema.xml: via copyField further on in this schema -- * field name=text type=text indexed=true stored=false multiValued=true/* On Tue, May 10, 2011 at 4:30 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: It apparently is normal, and my issue is indeed with nutch. I've modified post.sh from the example docs to use the solr in http://localhost:8080/apache-solr-3.1-SNAPSHOT and now finally data made it to the index. $ post.sh solr.xml monitor.xml With nutch I'm at: $ svn
Re: SolrHome ends with /./ - is this normal?
actually something changed, I managed to crawl and index some pages (the other must have to do with regex-urls). Thank you! Was this always necessary? Any pointer discussing why it's needed? On Tue, May 10, 2011 at 5:40 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: You mean that I should copy it from nutch into solr? $ cp $NUTCH_HOME/conf/schema.xml $SOLR_HOME/conf/schema.xml After restarting tomcat, and re-executing the script nothing changed. On Tue, May 10, 2011 at 5:35 PM, Markus Jelsma markus.jel...@openindex.io wrote: You need to use the schema.xml shipped with Nutch in Solr. It provides most fields that you need. On Tuesday 10 May 2011 17:31:33 Gabriele Kahlout wrote: I don't get you, are you talking about conf/schema.xml? That's what I'm referring to. Am i supposed to do something with the nutch's conf/schema.xml? On Tue, May 10, 2011 at 4:46 PM, Markus Jelsma markus.jel...@openindex.iowrote: There is a working example schema in Nutch' conf directory. On Tuesday 10 May 2011 16:40:02 Gabriele Kahlout wrote: From solr logs: May 10, 2011 4:33:20 PM org.apache.solr.common.SolrException log *SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'content' * at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:32 1) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP ro cessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten tS treamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa se .java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja va :252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat io nFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte rC hain.java:210) at org.netbeans.modules.web.monitor.server.MonitorFilter.doFilter(MonitorFil te r.java:393) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat io nFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte rC hain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve .j ava:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve .j ava:161) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java: 16 4) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: 10 0) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55 0) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j av a:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:38 0) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243 ) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H tt p11Protocol.java:188) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H tt p11Protocol.java:166) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.ja va :288) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor .j ava:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 908) at java.lang.Thread.run(Thread.java:680) in conf/schema.xml: !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ *field name=content type=text stored=false indexed=true/* in conf/solrindex-mapping.xml: fields field dest=content source=content/ In recent solr I think this has been renamed into text? Solr's conf/schema.xml: via copyField further on in this schema -- * field name=text type=text indexed=true stored=false multiValued=true/* On Tue, May 10, 2011 at 4:30 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: It apparently is normal, and my
Re: Solr 4.0
REPOST as a more general question about ivy dependencies: http://stackoverflow.com/questions/5941789/do-ivy-dependency-revisions-have-anything-to-do-with-svns On Mon, May 9, 2011 at 11:31 AM, Gabriele Kahlout gabri...@mysimpatico.comwrote: I think you are talking about this dependency: dependency org=org.apache.solr name=solr-solrj *rev=1.4.1* conf=*-default / I've checked out solr 4 svn revision 1099940[1]. What value should I use for rev? [1] http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2905051.html On Tue, Apr 19, 2011 at 2:48 PM, Julien Nioche lists.digitalpeb...@gmail.com wrote: You need to change the version of SOLR in ivy/ivy.xml then rebuild unless you change the jars straight in to nutch-1.3/runtime/local/lib - assuming that you're running Nutch locally only On 19 April 2011 07:09, Haspadar haspa...@gmail.com wrote: Yes, it occured after removing SolrJ1.4 jar and copy 4.0 version. Before it I upgrated Nutch for Solr 3.1 the same way and all worked fine. Thanks 2011/4/19 Markus Jelsma markus.jel...@openindex.io Hi, Hello. I'm using Nutch 1.3. I decided to upgrade Solr to version 4.0 and I replaced Nutch libs (Snapshot and SolrJ) from Solr dist. After that I got the error at SolrIndexer on Reduce stage: 11/04/19 01:47:19 INFO mapred.JobClient: map 100% reduce 27% 11/04/19 01:47:21 INFO mapred.JobClient: Task Id : attempt_201104190142_0009_r_00_0, Status : FAILED org.apache.solr.common.SolrException: ERROR: [doc= http://www.site.net/ ] Error adding field 'tstamp'='2011-04-18T22:45:17.404Z' ERROR: [doc=http://www.site.net/] Error adding field 'tstamp'='2011-04-18T22:45:17.404Z' request: http://127.0.0.1:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp SolrServer.java:436) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp SolrServer.java:245) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstract UpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:50) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.ja va:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.Child.main(Child.java:170) If you are using Solr 1.4.x then you must upgrade the SolrJ jar's in Nutch. Solr 1.4.x and higher are not compatible. Just remove the 1.4.x jar's and copy over the new. I tried to remove tstamp from solrindex-mapping.xml and Solr's schema.xml. But this field is required in schema.xml and I got the error: 11/04/19 01:58:03 INFO mapred.JobClient: Task Id : attempt_201104190142_0010_r_00_0, Status : FAILED org.apache.solr.common.SolrException: ERROR: [doc= http://www.site.net/ ] unknown field 'tstamp' ERROR: [doc=http://www.site.net/] unknown field 'tstamp' Removing a mapping doesn't mean the field isn't copied over. All unmapped fields are copied as is. The example mapping seems rather useless as it copies exact field names. It's only useful if your source fields and destination fields are actually different, which is usually not the case if you dedicate a Solr core for a Nutch crawl. You must either not create the field by some plugin or add the field to your Solr index. I'm surprised this error actually showed up considering the incompatible Javabin versions. Perhaps you already upgraded the SolrJ api? How I can upgrade Solr to 4 version? Thank you. -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email
Why is org.apache.solr.response.XMLWriter final?
Hello, It's final in the trunk, and has always been since conception in 2006 at revision 372455. Why? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
How do I debug Unable to evaluate expression using this context printed at start?
I've tried to re-install solr on tomcat, and now when I launch tomcat in debug mode I see the following exception relating to solr. It's not enough to understand the problem (and fix it), but I don't know where to look for more (or what to do). Please help me. Following the tutorial and discussion here, this is my context descriptor (solr.xml): ?xml version=1.0 encoding=utf-8? Context docBase=/Users/simpatico/SOLR_HOME/dist/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/Users/simpatico/SOLR_HOME override=true/ /Context (the war exists) $ ls $SOLR_HOME/dist/solr.war /Users/simpatico/SOLR_HOME//dist/solr.war $ ls $SOLR_HOME/conf/solrconfig.xml /Users/simpatico/SOLR_HOME//conf/solrconfig.xml When Tomcat starts: INFO: Using JNDI solr.home: /Users/simpatico/SOLR_HOME May 5, 2011 2:46:50 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/Users/simpatico/SOLR_HOME/' ... INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/wstx-asl-3.2.7.jar' to classloader May 5, 2011 2:46:50 PM org.apache.solr.common.SolrException log SEVERE: *javax.xml.transform.TransformerException: Unable to evaluate expression using this context* at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.RuntimeException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335) ... 18 more - java.lang.RuntimeException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) ---
Re: Is it possible to build Solr as a maven project?
Okay, that sequence worked, but then shouldn't I be able to do $ mvn install afterwards? This is what I get: ... Compiling 478 source files to /Users/simpatico/debug/solr4/solr/build/solr - COMPILATION ERROR : - org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package com.google.common.collect does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[29,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[29,4] cannot find symbol symbol : variable ByteStreams location: class org.apache.solr.spelling.suggest.fst.InputStreamDataInput org/apache/solr/spelling/suggest/fst/FSTLookup.java:[128,57] cannot find symbol symbol : variable Lists location: class org.apache.solr.spelling.suggest.fst.FSTLookup org/apache/solr/spelling/suggest/fst/FSTLookup.java:[170,26] cannot find symbol symbol : variable Lists location: class org.apache.solr.spelling.suggest.fst.FSTLookup org/apache/solr/spelling/suggest/fst/FSTLookup.java:[203,35] cannot find symbol symbol : variable Lists location: class org.apache.solr.spelling.suggest.fst.FSTLookup org/apache/solr/spelling/suggest/fst/FSTLookup.java:[529,6] cannot find symbol symbol : variable Closeables location: class org.apache.solr.spelling.suggest.fst.FSTLookup org/apache/solr/spelling/suggest/fst/FSTLookup.java:[551,6] cannot find symbol symbol : variable Closeables location: class org.apache.solr.spelling.suggest.fst.FSTLookup 9 errors - Reactor Summary: Grandparent POM for Apache Lucene Java and Apache Solr SUCCESS [13.255s] Lucene parent POM . SUCCESS [0.199s] Lucene Core ... SUCCESS [15.528s] Lucene Test Framework . SUCCESS [4.657s] Lucene Common Analyzers ... SUCCESS [16.770s] Lucene Contrib Ant SUCCESS [1.103s] Lucene Contrib bdb SUCCESS [0.883s] Lucene Contrib bdb-je . SUCCESS [0.872s] Lucene Database aggregator POM SUCCESS [0.091s] Lucene Demo ... SUCCESS [0.842s] Lucene Memory . SUCCESS [0.726s] Lucene Queries SUCCESS [1.559s] Lucene Highlighter SUCCESS [3.007s] Lucene InstantiatedIndex .. SUCCESS [1.224s] Lucene Lucli .. SUCCESS [1.579s] Lucene Miscellaneous .. SUCCESS [1.163s] Lucene Query Parser ... SUCCESS [4.274s] Lucene Spatial SUCCESS [1.159s] Lucene Spellchecker ... SUCCESS [0.841s] Lucene Swing .. SUCCESS [1.177s] Lucene Wordnet SUCCESS [0.816s] Lucene XML Query Parser ... SUCCESS [1.197s] Lucene Contrib aggregator POM . SUCCESS [0.079s] Lucene ICU Analysis Components SUCCESS [1.494s] Lucene Phonetic Filters ... SUCCESS [0.759s] Lucene Smart Chinese Analyzer . SUCCESS [3.534s] Lucene Stempel Analyzer ... SUCCESS [1.537s] Lucene Analysis Modules aggregator POM SUCCESS [0.081s] Lucene Benchmark .. SUCCESS [3.693s] Lucene Modules aggregator POM . SUCCESS [0.147s] Apache Solr parent POM SUCCESS [0.099s] Apache Solr Solrj . SUCCESS [3.670s] Apache Solr Core .. FAILURE [7.842s] On Thu, May 5, 2011 at 3:36 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, The sequence should be 1. svn update 2. ant get-maven-poms 3. mvn -N -Pbootstrap install I think you left out #2 - there was a very recent change to the POMs that affects the noggit jar name. Steve -Original Message- From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Thursday, May 05, 2011 1:22 AM To: solr-user@lucene.apache.org Subject: Re: Is it possible to build Solr as a maven project? Thank you so much for this gem, David! I still don't manage to build though: $ svn update At revision 1099684. $ mvn clean $ mvn -N -Pbootstrap install [INFO] [INFO] BUILD FAILURE [INFO
Re: How do I debug Unable to evaluate expression using this context printed at start?
While the question remains valid, I found there reason to my problem. Backing up I had saved Tomcat's descriptor file in my $SOLR_HOME and Solr was trying to read it as described in SolrCore Wikihttp://wiki.apache.org/solr/CoreAdmin . What saved me was remembering Chris's earlier remarkhttp://markmail.org/thread/3y4zqieyjqfi5vl3. Thank you Chris! On Thu, May 5, 2011 at 2:58 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: I've tried to re-install solr on tomcat, and now when I launch tomcat in debug mode I see the following exception relating to solr. It's not enough to understand the problem (and fix it), but I don't know where to look for more (or what to do). Please help me. Following the tutorial and discussion here, this is my context descriptor (solr.xml): ?xml version=1.0 encoding=utf-8? Context docBase=/Users/simpatico/SOLR_HOME/dist/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/Users/simpatico/SOLR_HOME override=true/ /Context (the war exists) $ ls $SOLR_HOME/dist/solr.war /Users/simpatico/SOLR_HOME//dist/solr.war $ ls $SOLR_HOME/conf/solrconfig.xml /Users/simpatico/SOLR_HOME//conf/solrconfig.xml When Tomcat starts: INFO: Using JNDI solr.home: /Users/simpatico/SOLR_HOME May 5, 2011 2:46:50 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/Users/simpatico/SOLR_HOME/' ... INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/wstx-asl-3.2.7.jar' to classloader May 5, 2011 2:46:50 PM org.apache.solr.common.SolrException log SEVERE: *javax.xml.transform.TransformerException: Unable to evaluate expression using this context* at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.RuntimeException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335) ... 18 more - java.lang.RuntimeException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382
Re: Is it possible to build Solr as a maven project?
Steven, thank you! $ mvn -DskipTests=true install works! [INFO] Reactor Summary: [INFO] [INFO] Grandparent POM for Apache Lucene Java and Apache Solr SUCCESS [13.142s] [INFO] Lucene parent POM . SUCCESS [0.345s] [INFO] Lucene Core ... SUCCESS [18.448s] [INFO] Lucene Test Framework . SUCCESS [3.560s] [INFO] Lucene Common Analyzers ... SUCCESS [7.739s] [INFO] Lucene Contrib Ant SUCCESS [1.265s] [INFO] Lucene Contrib bdb SUCCESS [1.332s] [INFO] Lucene Contrib bdb-je . SUCCESS [1.321s] [INFO] Lucene Database aggregator POM SUCCESS [0.242s] [INFO] Lucene Demo ... SUCCESS [1.813s] [INFO] Lucene Memory . SUCCESS [2.412s] [INFO] Lucene Queries SUCCESS [2.275s] [INFO] Lucene Highlighter SUCCESS [2.985s] [INFO] Lucene InstantiatedIndex .. SUCCESS [2.170s] [INFO] Lucene Lucli .. SUCCESS [1.814s] [INFO] Lucene Miscellaneous .. SUCCESS [1.998s] [INFO] Lucene Query Parser ... SUCCESS [2.755s] [INFO] Lucene Spatial SUCCESS [1.314s] [INFO] Lucene Spellchecker ... SUCCESS [1.535s] [INFO] Lucene Swing .. SUCCESS [1.233s] [INFO] Lucene Wordnet SUCCESS [1.309s] [INFO] Lucene XML Query Parser ... SUCCESS [1.483s] [INFO] Lucene Contrib aggregator POM . SUCCESS [0.151s] [INFO] Lucene ICU Analysis Components SUCCESS [2.728s] [INFO] Lucene Phonetic Filters ... SUCCESS [1.765s] [INFO] Lucene Smart Chinese Analyzer . SUCCESS [3.709s] [INFO] Lucene Stempel Analyzer ... SUCCESS [4.241s] [INFO] Lucene Analysis Modules aggregator POM SUCCESS [0.213s] [INFO] Lucene Benchmark .. SUCCESS [2.926s] [INFO] Lucene Modules aggregator POM . SUCCESS [0.307s] [INFO] Apache Solr parent POM SUCCESS [0.233s] [INFO] Apache Solr Solrj . SUCCESS [3.780s] [INFO] Apache Solr Core .. SUCCESS [9.693s] [INFO] Apache Solr Search Server . SUCCESS [6.739s] [INFO] Apache Solr Test Framework SUCCESS [2.699s] [INFO] Apache Solr Analysis Extras ... SUCCESS [3.868s] [INFO] Apache Solr Clustering SUCCESS [6.736s] [INFO] Apache Solr DataImportHandler . SUCCESS [4.914s] [INFO] Apache Solr DataImportHandler Extras .. SUCCESS [2.721s] [INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS [0.253s] [INFO] Apache Solr Content Extraction Library SUCCESS [1.909s] [INFO] Apache Solr - UIMA integration SUCCESS [1.922s] [INFO] Apache Solr Contrib aggregator POM SUCCESS [0.211s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 2:18.040s [INFO] Finished at: Thu May 05 20:39:09 CEST 2011 [INFO] Final Memory: 38M/90M [INFO] On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote: Okay, that sequence worked, but then shouldn't I be able to do $ mvn install afterwards? This is what I get: ... COMPILATION ERROR : - org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package com.google.common.collect does not exist ... mvn install should work, but it doesn't - I can reproduce this error on my machine. This is a bug in the Maven build. The nightly Lucene/Solr Maven build on Jenkins should have caught this compilation failure three weeks ago, when Dawid Weiss committed his work under https://issues.apache.org/jira/browse/SOLR-2378. Unfortunately, the nightly builds were using the results of compilation under the Ant build, rather than compiling from scratch. I have committed a fix to the nightly build script so this won't happen again. The Maven build bug is that the Solr-core Google Guava dependency was scoped as test-only. Until SOLR-2378, that was true, but it is no longer. So
Re: How do i I modify XMLWriter to write foobar?
I've now tried to write my own QueryResponseWriter plugin[1], as a maven project depending on Solr Core 3.1, which is the same version of Solr I've installed. It seems I'm not able to get rid of some cache. $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml queryResponseWriter name=*xml* class=org.apache.solr.request.* XMLResponseWriter*/ queryResponseWriter name=*Test* class=com.mysimpatico.me.indexplugins.* TestQueryResponseWriter* default=true/ Restarted tomcat after changing solrconfig.xml and placing indexplugins.jar in $SOLR_HOME/ At tomcat boot: INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/IndexPlugins.jar' to classloader I get legacy code of the plugin for both, and I don't understand why. At least the xml should be different. Why could this be? How to find out? http://localhost:8080/solr/select?q=apachewt=Test and http://localhost:8080/solr/select?q=apachewt=xml XML Parsing Error: syntax error Location: http://localhost:8080/solr/select?q=apachewt=xml (//Test Line Number 1, Column 1: foobarresponseHeaderstatusQTimeparamsqapachewtxmlresponse00foobar ^ It seems the new code for TestQueryResponseWriter[1] seems to never be executed since i added a severe log statement that doesn't appear in tomcat logs. Where are those caches? Thank you in advance. [1] package com.mysimpatico.me.indexplugins; import java.io.*; import java.util.logging.Level; import java.util.logging.Logger; import org.apache.solr.request.XMLResponseWriter; /** * Hello world! * */ public class TestQueryResponseWriter extends XMLResponseWriter{ @Override public void write(Writer writer, org.apache.solr.request.SolrQueryRequest request, org.apache.solr.response.SolrQueryResponse response) throws IOException { Logger.getLogger(TestQueryResponseWriter.class.getName()).log(Level.SEVERE, Hello from TestQueryResponseWriter); super.write(writer, request, response); } } On Thu, May 5, 2011 at 9:01 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml : queryResponseWriter name=xml class=org.apache.solr.request.* : XMLResponseWriter* default=true/ : : Now I comment the line in Solrconfix.xml, and there's no more writer. : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml : : I make a query, and the XMLResponseWriter is still in charge. : *$ curl -L http://localhost:8080/solr/select?q=apache* : ?xml version=1.0 encoding=UTF-8? ... Your example request is not specifying a wt param. in addition to the response writers declared in your solrconfig.xml, there are response writers that exist implicitly unless you define your own instances that override those names (xml, json, python, etc...) the real question is: what writer do you *want* to have used when no wt is specified? whatever the answer is: declare n instance of that writer with default=true in your solrconfig.xml -Hoss -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Is it possible to build Solr as a maven project?
Just for the reference. $ svn update At revision 1099940. On Thu, May 5, 2011 at 9:14 PM, Steven A Rowe sar...@syr.edu wrote: You're welcome, I'm glad you got it to work. - Steve -Original Message- From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Thursday, May 05, 2011 2:41 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to build Solr as a maven project? Steven, thank you! $ mvn -DskipTests=true install works! [INFO] Reactor Summary: [INFO] [INFO] Grandparent POM for Apache Lucene Java and Apache Solr SUCCESS [13.142s] [INFO] Lucene parent POM . SUCCESS [0.345s] [INFO] Lucene Core ... SUCCESS [18.448s] [INFO] Lucene Test Framework . SUCCESS [3.560s] [INFO] Lucene Common Analyzers ... SUCCESS [7.739s] [INFO] Lucene Contrib Ant SUCCESS [1.265s] [INFO] Lucene Contrib bdb SUCCESS [1.332s] [INFO] Lucene Contrib bdb-je . SUCCESS [1.321s] [INFO] Lucene Database aggregator POM SUCCESS [0.242s] [INFO] Lucene Demo ... SUCCESS [1.813s] [INFO] Lucene Memory . SUCCESS [2.412s] [INFO] Lucene Queries SUCCESS [2.275s] [INFO] Lucene Highlighter SUCCESS [2.985s] [INFO] Lucene InstantiatedIndex .. SUCCESS [2.170s] [INFO] Lucene Lucli .. SUCCESS [1.814s] [INFO] Lucene Miscellaneous .. SUCCESS [1.998s] [INFO] Lucene Query Parser ... SUCCESS [2.755s] [INFO] Lucene Spatial SUCCESS [1.314s] [INFO] Lucene Spellchecker ... SUCCESS [1.535s] [INFO] Lucene Swing .. SUCCESS [1.233s] [INFO] Lucene Wordnet SUCCESS [1.309s] [INFO] Lucene XML Query Parser ... SUCCESS [1.483s] [INFO] Lucene Contrib aggregator POM . SUCCESS [0.151s] [INFO] Lucene ICU Analysis Components SUCCESS [2.728s] [INFO] Lucene Phonetic Filters ... SUCCESS [1.765s] [INFO] Lucene Smart Chinese Analyzer . SUCCESS [3.709s] [INFO] Lucene Stempel Analyzer ... SUCCESS [4.241s] [INFO] Lucene Analysis Modules aggregator POM SUCCESS [0.213s] [INFO] Lucene Benchmark .. SUCCESS [2.926s] [INFO] Lucene Modules aggregator POM . SUCCESS [0.307s] [INFO] Apache Solr parent POM SUCCESS [0.233s] [INFO] Apache Solr Solrj . SUCCESS [3.780s] [INFO] Apache Solr Core .. SUCCESS [9.693s] [INFO] Apache Solr Search Server . SUCCESS [6.739s] [INFO] Apache Solr Test Framework SUCCESS [2.699s] [INFO] Apache Solr Analysis Extras ... SUCCESS [3.868s] [INFO] Apache Solr Clustering SUCCESS [6.736s] [INFO] Apache Solr DataImportHandler . SUCCESS [4.914s] [INFO] Apache Solr DataImportHandler Extras .. SUCCESS [2.721s] [INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS [0.253s] [INFO] Apache Solr Content Extraction Library SUCCESS [1.909s] [INFO] Apache Solr - UIMA integration SUCCESS [1.922s] [INFO] Apache Solr Contrib aggregator POM SUCCESS [0.211s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 2:18.040s [INFO] Finished at: Thu May 05 20:39:09 CEST 2011 [INFO] Final Memory: 38M/90M [INFO] On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote: Okay, that sequence worked, but then shouldn't I be able to do $ mvn install afterwards? This is what I get: ... COMPILATION ERROR : - org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package com.google.common.collect does not exist ... mvn install should work, but it doesn't
Is it possible to build Solr as a maven project?
Hello, I'm trying to modify Solr and I think debugging will be very useful to understand what's going on. Hence I'd like to use an IDE (NetBeans) which automatically supports Maven projects. I see under src/maven that there are templates but I'm not sure how to use them to mavenize the build/project. Nothing on the Wiki. I've seen issue solr-19 and some messages on older msgs on the mailing list too. Any instructions? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Is it possible to build Solr as a maven project?
generate-maven-artifacts: [mkdir] Created dir: /Users/simpatico/SOLR_HOME/build/maven [mkdir] Created dir: /Users/simpatico/SOLR_HOME/dist/maven [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/src/maven [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 *BUILD FAILED* /Users/simpatico/SOLR_HOME/*build.xml:800*: The following error occurred while executing this line: /Users/simpatico/SOLR_HOME/common-build.xml:274: artifact:deploy doesn't support the uniqueVersion attribute *build.xml:800: *m2-deploy pom.xml=src/maven/solr-parent-pom.xml.template/ removed uniquVersion attirubte: generate-maven-artifacts: [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.solr:solr-parent' [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.solr:solr-parent:1.4.2-SNAPSHOT' [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/lib [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.solr:solr-commons-csv' [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading project information for solr-commons-csv 1.4.2-SNAPSHOT [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.solr:solr-commons-csv:1.4.2-SNAPSHOT' [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/contrib/dataimporthandler [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 BUILD FAILED /Users/simpatico/SOLR_HOME/build.xml:809: The following error occurred while executing this line: */Users/simpatico/SOLR_HOME/common-build.xml:274: artifact:deploy doesn't support the nested attach element* On Wed, May 4, 2011 at 11:50 AM, lboutros boutr...@gmail.com wrote: In the ant script there is a target to generate maven's artifacts. After that, you will be able to open the project as a standard maven project. Ludovic. 2011/5/4 Gabriele Kahlout [via Lucene] ml-node+2898068-621882422-383...@n3.nabble.com Hello, I'm trying to modify Solr and I think debugging will be very useful to understand what's going on. Hence I'd like to use an IDE (NetBeans) which automatically supports Maven projects. I see under src/maven that there are templates but I'm not sure how to use them to mavenize the build/project. Nothing on the Wiki. I've seen issue solr-19 and some messages on older msgs on the mailing list too. Any instructions? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2898068.html To start a new topic under Solr - User, email ml-node+472068-1765922688-383...@n3.nabble.com To unsubscribe from Solr - User, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE= . - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2898084.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid
Re: Is it possible to build Solr as a maven project?
On Wed, May 4, 2011 at 1:11 PM, lboutros boutr...@gmail.com wrote: oups, sorry, this was not the target I used (this one should work too, but...), the one I used is get-maven-poms. That will just create pom files and copy them to their right target locations. I don't have get-maven-poms target in my script. I'm using netbeans and I'm using the plugin Automatic Projects to do everything inside the IDE. Which version of Solr are you using ? the official latest: 3.1 Maybe I can copy-paste from the build script you are using? Ludovic. 2011/5/4 Gabriele Kahlout [via Lucene] ml-node+2898211-2124746009-383...@n3.nabble.com generate-maven-artifacts: [mkdir] Created dir: /Users/simpatico/SOLR_HOME/build/maven [mkdir] Created dir: /Users/simpatico/SOLR_HOME/dist/maven [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/src/maven [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 *BUILD FAILED* /Users/simpatico/SOLR_HOME/*build.xml:800*: The following error occurred while executing this line: /Users/simpatico/SOLR_HOME/common-build.xml:274: artifact:deploy doesn't support the uniqueVersion attribute *build.xml:800: *m2-deploy pom.xml=src/maven/solr-parent-pom.xml.template/ removed uniquVersion attirubte: generate-maven-artifacts: [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.solr:solr-parent' [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.solr:solr-parent:1.4.2-SNAPSHOT' [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/lib [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 [artifact:deploy] Deploying to file:///Users/simpatico/SOLR_HOME/dist/maven [artifact:deploy] [INFO] Retrieving previous build number from remote [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.solr:solr-commons-csv' [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading project information for solr-commons-csv 1.4.2-SNAPSHOT [artifact:deploy] [INFO] Retrieving previous metadata from remote [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.solr:solr-commons-csv:1.4.2-SNAPSHOT' [copy] Copying 1 file to /Users/simpatico/SOLR_HOME/build/maven/contrib/dataimporthandler [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 BUILD FAILED /Users/simpatico/SOLR_HOME/build.xml:809: The following error occurred while executing this line: */Users/simpatico/SOLR_HOME/common-build.xml:274: artifact:deploy doesn't support the nested attach element* - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2898315.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).