Ranking Terms

2005-02-25 Thread Daniel Cortes
Hi everybody,
I need to found some documentation about the algorithms that lucene use 
internally in the indexation and how it works with weights and 
frequencies of the terms.This information will be used to know tastes of 
my users and to relate users with the same interest and restlessness.:D
I read something about .frq files but I don't have any frq life in my index.
Thks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: PHP-Lucene Integration

2005-02-09 Thread Daniel Cortes
Hi, I have a problem about PHP and Lucen. too.
I have PhpBB (a forum) and a JAVA portal, I need to index post on Lucene 
Index, phpBB use a DB of mysql.
I have 2 options, first index the database, a thing that I don't do 
never, and I think that is complex because I supose I have to decide how 
often to re-index the database.
The second option and the option that I think is the best it's to do 
that every add or modify button in the phpBB calls a JAVA thread 
that recive parameters how text of topic, autor and other things, this 
things will be indexed but not stored and the only thing to store will 
be url of topic.
I hope this will be good for someone.

PD: I don't have idea how to do the second option until yet :D.Because I 
have to modify all the buttons and I don't have to call a JAVA thread 
since PHP, I hope that I haven't to install JAVA bridge for this 
because, I don't have comunication PHP -JAVA only thing that I need is 
call a JAVA thread.
Perhaps my ideas are erroneous, please tell me.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


MySql and Lucene

2005-01-13 Thread Daniel Cortes
I what to know your opinion about this:
I've a new portal, and Lucene is the serach engine. This portal is an 
integration of a lot of opensource software.
phpBB(MySql) is our election for the forum, and I have to do that 
searches with the search engine include search in the forum.
I think that I have 2 options:
-Every new post in the forum, it was been  indexed in the Mysql and 
Lucene Index ( storing fields that I want to show in the results for 
exemaple author, title date,...)
It means that I've almost a total copy of the MySQL in my Lucene Index.
- Or  Do the search with lucene and after do a SQL query in the 
servlett, but how I show the results.I can't show first the Lucene's 
results and after the phorum's results.
Any Idea?
thks


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Search results

2005-01-10 Thread Daniel Cortes
I've return with my questions :)
I''ve indexed a lot of documents (.txt, .doc, .pdf , .java , .html , 
.htm ) and I use with modifications of examples at Lucene in action :D.
The schema of documents is:

path/name_document Field.Keyword (not toke,stored,
indexed)
title Field.Text(token, 
 stored, indexed)
authorField.Keyword (not token, 
stored, indexed)
summaryField.Text (token,  
stored,  indexed)
keys   Field.Unstored (token,  not 
stored,indexed)
dateField.UnIndexed  (not 
token,stored,  not indexed)
body  Field.Text (token,  
not stored,   indexed)

I want to show the results like this :
title or name fileAuthor
Sumary
pathdate
what do you think about this?
My question are what I do when I show results if documents haven't a 
summary? I show the first lines of documents? Perhaps it is a silly 
question but until now I haven't a solution.

PD: About the past thread of Lucene In action book I've bought Lucene 
In Action by Amazon and total prize for me was 26.37$.
PD2: When my book arrive I let you rest of  my questions.
Thanks for all

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


time of indexer

2004-12-28 Thread Daniel Cortes
Hi to everybody, and merry christmas for all(and specially people who 
that me today are working  instead of stay with the family).

I don't understand because my search in the index give this bad results:
I index 112 php files how a txt.
with this machine
Pentium 4 2,4GHz 512 RAM running during the index Windows XP and Eclipse
Tiempo de búsqueda total: 80882 ms
the fields that I use are
doc.add(Field.Keyword(filename, file.getCanonicalPath()));
doc.add(Field.UnStored(body, bodyText));
doc.add(Field.Text(titulo, title));
What I'm doing bad?
thks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


index question

2004-12-27 Thread Daniel Cortes
I want to know In the case that you use Lucene for index files how a 
general searcher, what fields (or keys) do you use to index.
For example, in my case are html,pdf,doc,ppt and txt and I'm thinked to 
use Field Autor, Field title, field url, field content, field 
modification date.
Something more? some recommendation?
thks
and Merry Xmas for all.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: index question

2004-12-27 Thread Daniel Cortes
thks nader
I need a general search of documents, it's for this that I ask yours 
recomendations, because fields are only for info in the search. 
Tipically search on Google for example

search:casa
La casa roja
..haba una vez una casa roja que tenia 
htttp:\\go.to\casaModification date:25-12-04
for do this  what fields and options (keybord,text,unindex,unstored) do 
you should use?

thks
Nader Henein wrote:
It comes down to your searching needs, do you need to have your 
documents searcheable by these fields or do you need a general search 
of the whole document, your decisions will impact the size of the 
index and the speed of indexing and searching so give it due thought, 
start from your GUI requirement and design the index that responds to 
your user needs best.

Nader
Daniel Cortes wrote:
I want to know In the case that you use Lucene for index files how a 
general searcher, what fields (or keys) do you use to index.
For example, in my case are html,pdf,doc,ppt and txt and I'm thinked 
to use Field Autor, Field title, field url, field content, field 
modification date.
Something more? some recommendation?
thks
and Merry Xmas for all.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: index question

2004-12-27 Thread Daniel Cortes
A lot of thks Nader, I try now, and I tell you the results.
thks
Nader Henein wrote:
ok, so you can index the whole document in one shot, but you should 
store certain fields like what you display in the search results in 
the index to avoid a round trip to the DB.

so for example you would store title synopsis link doc_id 
date and then just index what you want to be searchable, the reason 
why you would have title stored in one field and indexed again in 
another so if you stem that field it will become useless for display 
purposes.  So the logical representation of your index would look 
something like this:

document
   id stored/ indexed
   title stored/ un-indexed
   synopsis stored/ un-indexed
   date stored / indexed
   full document stemmed  indexed / un stored
/document
Enjoy
Nader Henein
Daniel Cortes wrote:
thks nader
I need a general search of documents, it's for this that I ask yours 
recomendations, because fields are only for info in the search. 
Tipically search on Google for example

search:casa
La casa roja
..haba una vez una casa roja que tenia 
htttp:\\go.to\casaModification date:25-12-04
for do this  what fields and options (keybord,text,unindex,unstored) 
do you should use?

thks
Nader Henein wrote:
It comes down to your searching needs, do you need to have your 
documents searcheable by these fields or do you need a general 
search of the whole document, your decisions will impact the size of 
the index and the speed of indexing and searching so give it due 
thought, start from your GUI requirement and design the index that 
responds to your user needs best.

Nader
Daniel Cortes wrote:
I want to know In the case that you use Lucene for index files how 
a general searcher, what fields (or keys) do you use to index.
For example, in my case are html,pdf,doc,ppt and txt and I'm 
thinked to use Field Autor, Field title, field url, field content, 
field modification date.
Something more? some recommendation?
thks
and Merry Xmas for all.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


HTMLParser vs NekoHTML(indexig HTML files)

2004-12-27 Thread Daniel Cortes
What do you prefer?and more important, why?
Someone tell me that Neko is more powerfull because something 
relationated  about XML, but I didn't understand.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


how fields do you use in your indexs

2004-12-23 Thread Daniel Cortes
I'm doing a searcher of differnt format's files for a web. I want to 
know in your cases what field and atributes do you use for this search 
(tokenized,stored, etc..)
I'm thinking to create the field title filename contents 
date_of_modification (I'm indexing the body of html's files how contents).
Do you put something more?
thks to all

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


lucene-db

2004-12-22 Thread Daniel Cortes
I've found some websites that use lucene-db, and I never saw this .jar.
Someone can tall me to found information about this.
This API can probided me some elements to index a MySQL DB of a forum or 
wiki?
thks

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Lucene working with a DB

2004-12-21 Thread Daniel Cortes
I read a lot of messages that Lucene can index a DB because it use that 
INPUTSTREAM type
I don't understand how to do this. For example if I've a forum with 
Mysql  and a lot of files on my web, for every search I've to select the 
index that I want use in my search, true? But I don't know how to do 
that Lucene writes an index about the information of the DB of forum 
(for example  MySQL)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Number of documents

2004-12-20 Thread Daniel Cortes
I've to show to my boss if Lucene is the best option for create a search 
engine of a new portal.
I want to now how many documents do you have in your index?
And how many bigger is your DB?
the types of formats who has to support the portal are html jsp txt doc 
pdf ppt

another question that I have is:
I'm playing with the files of the book Lucene in Action and I try to use 
the example of handling types.The folder data contains 5 files, and 
created index contain five
documents what the only one that contains any word in the index is the 
.html file
Everybody have the same result?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: LIMO problems

2004-12-13 Thread Daniel Cortes
Hi, I want to know what library do you use for search in PPT files?
POI support this?
thanks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Search HTML Files

2004-12-13 Thread Daniel Cortes
I've been trying the demo apps of Lucene for searching in html files, I 
want to know what problems or options are not implemented in this web 
aplication.
thks

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


types of formats who support Lucene?

2004-12-02 Thread Daniel Cortes
Hi I''m newer in this mail list and what you can see my English is very 
terrible.
I 'm having a study to select the best technology  for a motor serching 
of an application web with a ratio of 1000 users/day.
I  read a little bit of Lucene what I don't know what file types support 
the search.
If you can reply my or say me a page that tells this I regret you.
Thanks of a novatillo

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]