Ranking Terms
Hi everybody, I need to found some documentation about the algorithms that lucene use internally in the indexation and how it works with weights and frequencies of the terms.This information will be used to know tastes of my users and to relate users with the same interest and restlessness.:D I read something about .frq files but I don't have any frq life in my index. Thks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: PHP-Lucene Integration
Hi, I have a problem about PHP and Lucen. too. I have PhpBB (a forum) and a JAVA portal, I need to index post on Lucene Index, phpBB use a DB of mysql. I have 2 options, first index the database, a thing that I don't do never, and I think that is complex because I supose I have to decide how often to re-index the database. The second option and the option that I think is the best it's to do that every add or modify button in the phpBB calls a JAVA thread that recive parameters how text of topic, autor and other things, this things will be indexed but not stored and the only thing to store will be url of topic. I hope this will be good for someone. PD: I don't have idea how to do the second option until yet :D.Because I have to modify all the buttons and I don't have to call a JAVA thread since PHP, I hope that I haven't to install JAVA bridge for this because, I don't have comunication PHP -JAVA only thing that I need is call a JAVA thread. Perhaps my ideas are erroneous, please tell me. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
MySql and Lucene
I what to know your opinion about this: I've a new portal, and Lucene is the serach engine. This portal is an integration of a lot of opensource software. phpBB(MySql) is our election for the forum, and I have to do that searches with the search engine include search in the forum. I think that I have 2 options: -Every new post in the forum, it was been indexed in the Mysql and Lucene Index ( storing fields that I want to show in the results for exemaple author, title date,...) It means that I've almost a total copy of the MySQL in my Lucene Index. - Or Do the search with lucene and after do a SQL query in the servlett, but how I show the results.I can't show first the Lucene's results and after the phorum's results. Any Idea? thks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search results
I've return with my questions :) I''ve indexed a lot of documents (.txt, .doc, .pdf , .java , .html , .htm ) and I use with modifications of examples at Lucene in action :D. The schema of documents is: path/name_document Field.Keyword (not toke,stored, indexed) title Field.Text(token, stored, indexed) authorField.Keyword (not token, stored, indexed) summaryField.Text (token, stored, indexed) keys Field.Unstored (token, not stored,indexed) dateField.UnIndexed (not token,stored, not indexed) body Field.Text (token, not stored, indexed) I want to show the results like this : title or name fileAuthor Sumary pathdate what do you think about this? My question are what I do when I show results if documents haven't a summary? I show the first lines of documents? Perhaps it is a silly question but until now I haven't a solution. PD: About the past thread of Lucene In action book I've bought Lucene In Action by Amazon and total prize for me was 26.37$. PD2: When my book arrive I let you rest of my questions. Thanks for all - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
time of indexer
Hi to everybody, and merry christmas for all(and specially people who that me today are working instead of stay with the family). I don't understand because my search in the index give this bad results: I index 112 php files how a txt. with this machine Pentium 4 2,4GHz 512 RAM running during the index Windows XP and Eclipse Tiempo de búsqueda total: 80882 ms the fields that I use are doc.add(Field.Keyword(filename, file.getCanonicalPath())); doc.add(Field.UnStored(body, bodyText)); doc.add(Field.Text(titulo, title)); What I'm doing bad? thks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
index question
I want to know In the case that you use Lucene for index files how a general searcher, what fields (or keys) do you use to index. For example, in my case are html,pdf,doc,ppt and txt and I'm thinked to use Field Autor, Field title, field url, field content, field modification date. Something more? some recommendation? thks and Merry Xmas for all. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: index question
thks nader I need a general search of documents, it's for this that I ask yours recomendations, because fields are only for info in the search. Tipically search on Google for example search:casa La casa roja ..haba una vez una casa roja que tenia htttp:\\go.to\casaModification date:25-12-04 for do this what fields and options (keybord,text,unindex,unstored) do you should use? thks Nader Henein wrote: It comes down to your searching needs, do you need to have your documents searcheable by these fields or do you need a general search of the whole document, your decisions will impact the size of the index and the speed of indexing and searching so give it due thought, start from your GUI requirement and design the index that responds to your user needs best. Nader Daniel Cortes wrote: I want to know In the case that you use Lucene for index files how a general searcher, what fields (or keys) do you use to index. For example, in my case are html,pdf,doc,ppt and txt and I'm thinked to use Field Autor, Field title, field url, field content, field modification date. Something more? some recommendation? thks and Merry Xmas for all. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: index question
A lot of thks Nader, I try now, and I tell you the results. thks Nader Henein wrote: ok, so you can index the whole document in one shot, but you should store certain fields like what you display in the search results in the index to avoid a round trip to the DB. so for example you would store title synopsis link doc_id date and then just index what you want to be searchable, the reason why you would have title stored in one field and indexed again in another so if you stem that field it will become useless for display purposes. So the logical representation of your index would look something like this: document id stored/ indexed title stored/ un-indexed synopsis stored/ un-indexed date stored / indexed full document stemmed indexed / un stored /document Enjoy Nader Henein Daniel Cortes wrote: thks nader I need a general search of documents, it's for this that I ask yours recomendations, because fields are only for info in the search. Tipically search on Google for example search:casa La casa roja ..haba una vez una casa roja que tenia htttp:\\go.to\casaModification date:25-12-04 for do this what fields and options (keybord,text,unindex,unstored) do you should use? thks Nader Henein wrote: It comes down to your searching needs, do you need to have your documents searcheable by these fields or do you need a general search of the whole document, your decisions will impact the size of the index and the speed of indexing and searching so give it due thought, start from your GUI requirement and design the index that responds to your user needs best. Nader Daniel Cortes wrote: I want to know In the case that you use Lucene for index files how a general searcher, what fields (or keys) do you use to index. For example, in my case are html,pdf,doc,ppt and txt and I'm thinked to use Field Autor, Field title, field url, field content, field modification date. Something more? some recommendation? thks and Merry Xmas for all. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
HTMLParser vs NekoHTML(indexig HTML files)
What do you prefer?and more important, why? Someone tell me that Neko is more powerfull because something relationated about XML, but I didn't understand. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
how fields do you use in your indexs
I'm doing a searcher of differnt format's files for a web. I want to know in your cases what field and atributes do you use for this search (tokenized,stored, etc..) I'm thinking to create the field title filename contents date_of_modification (I'm indexing the body of html's files how contents). Do you put something more? thks to all - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
lucene-db
I've found some websites that use lucene-db, and I never saw this .jar. Someone can tall me to found information about this. This API can probided me some elements to index a MySQL DB of a forum or wiki? thks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene working with a DB
I read a lot of messages that Lucene can index a DB because it use that INPUTSTREAM type I don't understand how to do this. For example if I've a forum with Mysql and a lot of files on my web, for every search I've to select the index that I want use in my search, true? But I don't know how to do that Lucene writes an index about the information of the DB of forum (for example MySQL) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Number of documents
I've to show to my boss if Lucene is the best option for create a search engine of a new portal. I want to now how many documents do you have in your index? And how many bigger is your DB? the types of formats who has to support the portal are html jsp txt doc pdf ppt another question that I have is: I'm playing with the files of the book Lucene in Action and I try to use the example of handling types.The folder data contains 5 files, and created index contain five documents what the only one that contains any word in the index is the .html file Everybody have the same result? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: LIMO problems
Hi, I want to know what library do you use for search in PPT files? POI support this? thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search HTML Files
I've been trying the demo apps of Lucene for searching in html files, I want to know what problems or options are not implemented in this web aplication. thks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
types of formats who support Lucene?
Hi I''m newer in this mail list and what you can see my English is very terrible. I 'm having a study to select the best technology for a motor serching of an application web with a ratio of 1000 users/day. I read a little bit of Lucene what I don't know what file types support the search. If you can reply my or say me a page that tells this I regret you. Thanks of a novatillo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]