Re: Problems with sandbox - can't find org.apache.lucene.store.IndexInput
I haven't checked the specifics, but many of the contrib (the "sandbox" is the old name for it) projects have upgraded their latest code to be against the trunk of Lucene, which is destined to be Lucene 1.9. You'll need to either grab a previous JAR built before the codebase changed, or upgrade yourself to the trunk of Lucene's subversion repository all the way around. Erik On Dec 31, 2005, at 10:21 AM, Colin Young wrote: I'm attempting to compile Lucene with some sandbox code -- specifically the Berkely DB index storage -- and I'm running into and issue where the code is attempting to import IndexInput (apparently located in org.apache.lucene.store.IndexInput) but I can't find it in the source anywhere. I'm not sure if the sandbox code is maybe using a more recent version of the Lucene code, or if I'm missing something obvious. My personaly guess is that it's the latter. I'm using Lucene 1.4.3 source and the db directory from the source repository at the apache site. Thanks for any tips. Colin Notice: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
My first question in 2006 :D
Hello everybody and happy new year! My first question about lucene in 2006 is the next: What I have to do with the message "No tvx file". Every night I have to do a complete indexation proces of a forum in phpBB. For example in an indexation of 93 documents (posts in Forum phpBB) i see 4 messages of No tvx file in my logs. The called that produce the message is this(contents is an string): Field CONTENTS = Field.UnStored("CONTENTS",contents,true); CONTENTS.setBoost((float) 0.2); lucene_doc.add(CONTENTS); The problem is when I'm working with an active forum that I can obtain near of 200 message of "No tvx file" My index is setCompoundFile(true); What I do wrong? Or what can I do to not obtain this messages? Thks for any reply - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: My first question in 2006 :D
Daniel, you can simply ignore this message. It only says that you have term vectors enabled and add one ore more "empty" documents without a body. If you don't need term vectors for any special operations on index terms, switch this feature off. Bernhard Daniel Cortes wrote: Hello everybody and happy new year! My first question about lucene in 2006 is the next: What I have to do with the message "No tvx file". Every night I have to do a complete indexation proces of a forum in phpBB. For example in an indexation of 93 documents (posts in Forum phpBB) i see 4 messages of No tvx file in my logs. The called that produce the message is this(contents is an string): Field CONTENTS = Field.UnStored("CONTENTS",contents,true); CONTENTS.setBoost((float) 0.2); lucene_doc.add(CONTENTS); The problem is when I'm working with an active forum that I can obtain near of 200 message of "No tvx file" My index is setCompoundFile(true); What I do wrong? Or what can I do to not obtain this messages? Thks for any reply - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Problems with sandbox - can't find org.apache.lucene.store.IndexInput
That would probably explain things. Is 1.9 close, or are we still talking months aways? Unfortunately, what I'm trying to do is use the code for Berkeley DB Java Edition which, best as I can tell was only ported against the 1.9 code, so it looks like my choices are to do the port myself, or check out 1.9 to see what the current issues and and see how stable it is for my purposes. Thanks Colin Young -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 2 January, 2006 05:12 To: java-user@lucene.apache.org Subject: Re: Problems with sandbox - can't find org.apache.lucene.store.IndexInput I haven't checked the specifics, but many of the contrib (the "sandbox" is the old name for it) projects have upgraded their latest code to be against the trunk of Lucene, which is destined to be Lucene 1.9. You'll need to either grab a previous JAR built before the codebase changed, or upgrade yourself to the trunk of Lucene's subversion repository all the way around. Erik On Dec 31, 2005, at 10:21 AM, Colin Young wrote: > I'm attempting to compile Lucene with some sandbox code -- > specifically the Berkely DB index storage -- and I'm running into and > issue where the code is attempting to import IndexInput (apparently > located in > org.apache.lucene.store.IndexInput) but I can't find it in the source > anywhere. I'm not sure if the sandbox code is maybe using a more > recent version of the Lucene code, or if I'm missing something > obvious. My personaly guess is that it's the latter. > > I'm using Lucene 1.4.3 source and the db directory from the source > repository at the apache site. > > Thanks for any tips. > > Colin > > > Notice: This email message is for the sole use of the intended > recipient(s) and may contain confidential and privileged information. > Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Notice: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problems with sandbox - can't find org.apache.lucene.store.IndexInput
Trunk of Lucene is very stable, more so than 1.4.3 I've heard. Is 1.9 release close? Hard to even say. It could be. No substantial changes to the trunk before 1.9 is officially released are planned that I know of. Erik On Jan 2, 2006, at 3:51 PM, Colin Young wrote: That would probably explain things. Is 1.9 close, or are we still talking months aways? Unfortunately, what I'm trying to do is use the code for Berkeley DB Java Edition which, best as I can tell was only ported against the 1.9 code, so it looks like my choices are to do the port myself, or check out 1.9 to see what the current issues and and see how stable it is for my purposes. Thanks Colin Young -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 2 January, 2006 05:12 To: java-user@lucene.apache.org Subject: Re: Problems with sandbox - can't find org.apache.lucene.store.IndexInput I haven't checked the specifics, but many of the contrib (the "sandbox" is the old name for it) projects have upgraded their latest code to be against the trunk of Lucene, which is destined to be Lucene 1.9. You'll need to either grab a previous JAR built before the codebase changed, or upgrade yourself to the trunk of Lucene's subversion repository all the way around. Erik On Dec 31, 2005, at 10:21 AM, Colin Young wrote: I'm attempting to compile Lucene with some sandbox code -- specifically the Berkely DB index storage -- and I'm running into and issue where the code is attempting to import IndexInput (apparently located in org.apache.lucene.store.IndexInput) but I can't find it in the source anywhere. I'm not sure if the sandbox code is maybe using a more recent version of the Lucene code, or if I'm missing something obvious. My personaly guess is that it's the latter. I'm using Lucene 1.4.3 source and the db directory from the source repository at the apache site. Thanks for any tips. Colin Notice: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Notice: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Problems with sandbox - can't find org.apache.lucene.store.IndexInput
That's good enough for me. At this point, going with a reasonably stable branch rather than using my code appears to be the more conservative option considering our release timeframe (which allows for extensive testing). Thanks for the help (and the excellent book). Colin -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 2 January, 2006 21:03 To: java-user@lucene.apache.org Subject: Re: Problems with sandbox - can't find org.apache.lucene.store.IndexInput Trunk of Lucene is very stable, more so than 1.4.3 I've heard. Is 1.9 release close? Hard to even say. It could be. No substantial changes to the trunk before 1.9 is officially released are planned that I know of. Erik On Jan 2, 2006, at 3:51 PM, Colin Young wrote: > That would probably explain things. Is 1.9 close, or are we still > talking months aways? Unfortunately, what I'm trying to do is use the > code for Berkeley DB Java Edition which, best as I can tell was only > ported against the 1.9 code, so it looks like my choices are to do the > port myself, or check out 1.9 to see what the current issues and and > see how stable it is for my purposes. > > Thanks > > Colin Young > > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: 2 January, 2006 05:12 > To: java-user@lucene.apache.org > Subject: Re: Problems with sandbox - can't find > org.apache.lucene.store.IndexInput > > I haven't checked the specifics, but many of the contrib (the > "sandbox" > is the old name for it) projects have upgraded their latest code to be > against the trunk of Lucene, which is destined to be Lucene 1.9. > You'll > need to either grab a previous JAR built before the codebase changed, > or upgrade yourself to the trunk of Lucene's subversion repository all > the way around. > > Erik > > > On Dec 31, 2005, at 10:21 AM, Colin Young wrote: > >> I'm attempting to compile Lucene with some sandbox code -- >> specifically the Berkely DB index storage -- and I'm running into and >> issue where the code is attempting to import IndexInput (apparently >> located in >> org.apache.lucene.store.IndexInput) but I can't find it in the source >> anywhere. I'm not sure if the sandbox code is maybe using a more >> recent version of the Lucene code, or if I'm missing something >> obvious. My personaly guess is that it's the latter. >> >> I'm using Lucene 1.4.3 source and the db directory from the source >> repository at the apache site. >> >> Thanks for any tips. >> >> Colin >> >> >> Notice: This email message is for the sole use of the intended >> recipient(s) and may contain confidential and privileged information. >> Any unauthorized review, use, disclosure or distribution is >> prohibited. If you are not the intended recipient, please contact the >> sender by reply email and destroy all copies of the original message. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > Notice: This email message is for the sole use of the intended > recipient(s) and may contain confidential and privileged information. > Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Notice: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Scoring
Thank you Chris. That seems like a good suggestion. I will try to pass a different Query object to the Highlighter api that the one used for searching. I plan to break down the HTML document and store the title/sub title/content in different fields of the index. So if I create a new query comparing company name and keywords against title and content fields, then I am assuming that highlighter api will give a higher ranking to the fragment where both terms of the query match against those fragments where just one term(either title or content) matches. I am assuming that even if I do not increase the boost factor of any of the terms, the api will take care of this ranking. This is my understanding of the scoring/ranking algorithm. Any comments anyone? Thanks, Harini Chris Hostetter wrote: : My requirement is to show the relevant fragments of the news article for : each company along with the search results. But the highlighter api : sometimes picks up the fragments which are not so relevant to the news : article/company. I would like to know if there is anyway that I can : modify the scoring/ranking of these fragments in such a way that the : news items in which a company name & keywords in the headline gets : assigned a very strong relevancy ranking, closely followed by a company : name mention in the first paragraph and a multiple-mention within the : entire story. Something like headline = 5 points, first paragraph = : four, etc. Well, the sample query you mentioned isn't checking any company names, or doing anything with a "keywords" field. I'm not to familiar with the way the highlighter package works, but i imagine that with the types of queries you said you are using, if you are highlighting the "Content" field, the CompanyId and the FilingDate clauses of your query will be fairly irelevent (becuase they are numbers, not because they are different field names) An idea i've suggested before (but i don't remember if anyone ever said wether it is a viable use of the Highlighter or not) is to give the highlighter a completely different Query object then the one you used to get your search results. ie, if you search query (what you want used to compute score) is... +(CompanyId:10 CompanyId:20) Content:"cost saving" Content:outsource ...but once you've gotten those results, what you really care about is highlighting the name of the company, and you think the best fragments when those company names appear near the other words, then give the highlighter a query that looks like... "companyname10 cost savings"~20 "companyname20 outsource"~20 ...etc : >>> Here is the search query(BooleanQuery) I am passing to the : >>> IndexSearcher : >>> and QueryScorer: : >>> +DocumentType:news : >>> +(CompanyId:10 CompanyId:20 CompanyId:30 CompanyId:40) : >>> +FilingDate:[20041201 TO 20051201] : >>> +(Content:"cost saving" Content:"cost savings" Content:outsource : >>> Content:outsources Content:downsize Content:downsizes : >>> Content:restructuring Content:restructure) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]