Re: Newbie questions

2005-02-14 Thread Erik Hatcher
On Feb 14, 2005, at 2:40 PM, Paul Jans wrote:
Hi again,
So is SqlDirectory recommended for use in a cluster to
workaround the accessibility problem, or are people
using NFS or a standalone server instead?
Neither.  As far as I know, Berkeley DB is the only viable DB 
implementation currently.

NFS has notoriously had issues with Lucene and file locking.  Search 
the archives for more details on this.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Newbie questions

2005-02-14 Thread Paul Jans
Hi again,

So is SqlDirectory recommended for use in a cluster to
workaround the accessibility problem, or are people
using NFS or a standalone server instead?

Thanks in advance,
PJ

--- Paul Jans <[EMAIL PROTECTED]> wrote:

> I've already ordered Lucene in Action :)
> 
> > There is a LuceneRAR project that is still in its
> > infancy here: 
> > https://lucenerar.dev.java.net/
> 
> I will keep an eye on that for sure.
> 
> > You can also store a Lucene index in Berkeley DB
> > (look at the 
> > /contrib/db area of the source code repository)
> 
> We're already using Oracle, so would it be possible
> to
> store the index there, thus giving each cluster node
> easy access to it. I read about SqlDirectory in the
> archives but it looks like it didn't make it to the
> API and I don't see it on the contrib page.
> 
> I'm more concerned about making the index accessible
> rather than transactional consistency, so NFS may be
> another option like you mention. I'm curious to hear
> about other systems which are clustered and how
> others
> are doing this; lessons learnt and best practices
> etc.
> 
> Thanks again for the help. Lucene looks like a first
> class tool.
> 
> PJ
> 
> --- Erik Hatcher <[EMAIL PROTECTED]> wrote:
> 
> > 
> > On Feb 10, 2005, at 5:00 PM, Paul Jans wrote:
> > > A couple of newbie questions. I've searched the
> > > archives and read the Javadoc but I'm still
> having
> > > trouble figuring these out.
> > 
> > Don't forget to get your copy of "Lucene in
> Action"
> > too :)
> > 
> > > 1. What's the best way to index and handle
> queries
> > > like the following:
> > >
> > > Find me all users with (a CS degree and a GPA >
> > 3.0)
> > > or (a Math degree and a GPA > 3.5).
> > 
> > Some suggestions:  index degree as a Keyword
> field. 
> > Pad GPA, so that 
> > all of them are the form #.# (or #.## maybe). 
> > Numerics need to be 
> > lexicographically ordered, and thus padded.
> > 
> > With the right analyzer (see the AnalysisParalysis
> > page on the wiki) 
> > you could use this type of query with
> QueryParser:'
> > 
> > degree:cs AND gpa:[3.0 TO 9.9]
> > 
> > > 2. What are the best practices for using Lucene
> in
> > a
> > > clustered J2EE environment? A standalone
> > index/search
> > > server or storing the index in the database or
> > > something else ?
> > 
> > There is a LuceneRAR project that is still in its
> > infancy here: 
> > https://lucenerar.dev.java.net/
> > 
> > You can also store a Lucene index in Berkeley DB
> > (look at the 
> > /contrib/db area of the source code repository)
> > 
> > However, most projects do fine with "cruder"
> > techniques such as sharing 
> > the Lucene index on a common drive and ensuring
> that
> > locking is 
> > configured to use the common drive also.
> > 
> > Erik
> > 
> > 
> >
>
-
> > To unsubscribe, e-mail:
> > [EMAIL PROTECTED]
> > For additional commands, e-mail:
> > [EMAIL PROTECTED]
> > 
> > 
> 
> 
> 
>   
> __ 
> Do you Yahoo!? 
> Yahoo! Mail - Helps protect you from nasty viruses. 
> http://promotions.yahoo.com/new_mail
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 




__ 
Do you Yahoo!? 
The all-new My Yahoo! - What will yours do?
http://my.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Newbie questions

2005-02-11 Thread Paul Jans
I've already ordered Lucene in Action :)

> There is a LuceneRAR project that is still in its
> infancy here: 
> https://lucenerar.dev.java.net/

I will keep an eye on that for sure.

> You can also store a Lucene index in Berkeley DB
> (look at the 
> /contrib/db area of the source code repository)

We're already using Oracle, so would it be possible to
store the index there, thus giving each cluster node
easy access to it. I read about SqlDirectory in the
archives but it looks like it didn't make it to the
API and I don't see it on the contrib page.

I'm more concerned about making the index accessible
rather than transactional consistency, so NFS may be
another option like you mention. I'm curious to hear
about other systems which are clustered and how others
are doing this; lessons learnt and best practices etc.

Thanks again for the help. Lucene looks like a first
class tool.

PJ

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> 
> On Feb 10, 2005, at 5:00 PM, Paul Jans wrote:
> > A couple of newbie questions. I've searched the
> > archives and read the Javadoc but I'm still having
> > trouble figuring these out.
> 
> Don't forget to get your copy of "Lucene in Action"
> too :)
> 
> > 1. What's the best way to index and handle queries
> > like the following:
> >
> > Find me all users with (a CS degree and a GPA >
> 3.0)
> > or (a Math degree and a GPA > 3.5).
> 
> Some suggestions:  index degree as a Keyword field. 
> Pad GPA, so that 
> all of them are the form #.# (or #.## maybe). 
> Numerics need to be 
> lexicographically ordered, and thus padded.
> 
> With the right analyzer (see the AnalysisParalysis
> page on the wiki) 
> you could use this type of query with QueryParser:'
> 
>   degree:cs AND gpa:[3.0 TO 9.9]
> 
> > 2. What are the best practices for using Lucene in
> a
> > clustered J2EE environment? A standalone
> index/search
> > server or storing the index in the database or
> > something else ?
> 
> There is a LuceneRAR project that is still in its
> infancy here: 
> https://lucenerar.dev.java.net/
> 
> You can also store a Lucene index in Berkeley DB
> (look at the 
> /contrib/db area of the source code repository)
> 
> However, most projects do fine with "cruder"
> techniques such as sharing 
> the Lucene index on a common drive and ensuring that
> locking is 
> configured to use the common drive also.
> 
>   Erik
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 




__ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Newbie questions

2005-02-11 Thread Erik Hatcher
On Feb 11, 2005, at 1:36 PM, Erik Hatcher wrote:
Find me all users with (a CS degree and a GPA > 3.0)
or (a Math degree and a GPA > 3.5).
Some suggestions:  index degree as a Keyword field.  Pad GPA, so that 
all of them are the form #.# (or #.## maybe).  Numerics need to be 
lexicographically ordered, and thus padded.

With the right analyzer (see the AnalysisParalysis page on the wiki) 
you could use this type of query with QueryParser:'

	degree:cs AND gpa:[3.0 TO 9.9]
oops, to be completely technically correct, use curly brackets to get > 
rather than >=

degree:cs AND gpa:{3.0 TO 9.9}
(I'll assume GPA's only go to 4.0 or 5.0 :)
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Newbie questions

2005-02-11 Thread Erik Hatcher
On Feb 10, 2005, at 5:00 PM, Paul Jans wrote:
A couple of newbie questions. I've searched the
archives and read the Javadoc but I'm still having
trouble figuring these out.
Don't forget to get your copy of "Lucene in Action" too :)
1. What's the best way to index and handle queries
like the following:
Find me all users with (a CS degree and a GPA > 3.0)
or (a Math degree and a GPA > 3.5).
Some suggestions:  index degree as a Keyword field.  Pad GPA, so that 
all of them are the form #.# (or #.## maybe).  Numerics need to be 
lexicographically ordered, and thus padded.

With the right analyzer (see the AnalysisParalysis page on the wiki) 
you could use this type of query with QueryParser:'

degree:cs AND gpa:[3.0 TO 9.9]
2. What are the best practices for using Lucene in a
clustered J2EE environment? A standalone index/search
server or storing the index in the database or
something else ?
There is a LuceneRAR project that is still in its infancy here: 
https://lucenerar.dev.java.net/

You can also store a Lucene index in Berkeley DB (look at the 
/contrib/db area of the source code repository)

However, most projects do fine with "cruder" techniques such as sharing 
the Lucene index on a common drive and ensuring that locking is 
configured to use the common drive also.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Newbie questions

2005-02-10 Thread Paul Jans
Hi,

A couple of newbie questions. I've searched the
archives and read the Javadoc but I'm still having
trouble figuring these out. 

1. What's the best way to index and handle queries
like the following: 

Find me all users with (a CS degree and a GPA > 3.0)
or (a Math degree and a GPA > 3.5).

2. What are the best practices for using Lucene in a
clustered J2EE environment? A standalone index/search
server or storing the index in the database or
something else ?

Thank you in advance,
PJ




__ 
Do you Yahoo!? 
All your favorites on one personal page – Try My Yahoo!
http://my.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Newbie Questions: Site Scoping, Page Type Filtering/Sorting, Localization, Clustering

2004-05-31 Thread Erik Hatcher
On May 30, 2004, at 10:34 PM, Sasha Haghani wrote:
I am a newbie to Lucene and I'm considering using it in an upcoming 
project.
I've read through the documentation but I still have a number of 
questions:
I'll do my best with some pointers below...
1. SEGMENTING AN INDEX & QUERIES BY SITE SCOPE
In my use case, I have a number of logical websites backed by the same
underlying content store.  A Document may be ultimately end up 
"belonging"
to one or more logical sites, but at a distinct URL for each.  The
simplistic solution is to maintain indices for each logical site, but 
this
will result in some unwanted duplication and the need to update 
multiple
indices on "shared" content changes.  Other than that, can anyone 
suggest
approaches for how to segment a single index to accomodate multiple 
logical
sites and allow queries within a particlar site's scope?  Are fields 
the
solution?  How should the distinct per-site URLs be managed?
I don't think there is a definitive "best" way to do this.  Per-site 
indexes is one option.  Using a "site" field is another.  Queries for a 
particular site could be done either by using QueryFilter or by 
wrapping all queries in a BooleanQuery with a required TermQuery for 
the "site".

Sites could share documents by simply adding multiple 
Field.Keyword("site", site) to the documents.

2. LOCALIZED CONTENT
I understand that at its core, Lucene can support content from any 
locale
and character set supported by Java.  What is the best way of 
implementing
Lucene to handle a content base which includes numerous locales.  One 
index
per locale or should all Documents be placed in a single index and 
tagged
with a "locale" field?  Or is there another approach altogether?
Again, there isn't really a "best" way, I don't think.  How does the 
locale situation relate to the previously mentioned site separation?  A 
"locale" field is a perfectly reasonable way to go also.  I don't know 
of any other approach.

3. DOCUMENT URLS
Is the URL at which the original document can be retrieved generally 
(i.e.,
for linking search results to the original doc) stored as a non-index,
non-tokenized, stored Field in the Document?
It depends on whether you want to query for it or not.  Field.Keyword 
if you want to be able to query for it.  Field.UnIndexed if you want it 
with the attributes you specified.

4. QUERY FILTERING & SORTING BY FIELD VALUE
In my application I have a pretty typical need to distinguish between
different document types (e.g., FAQs, Articles, Reviews, etc.) in 
order to
allow the user to restrict their results to particular types of 
documents or
to sort results by type.  Are fields again the solution for this?  Can
Queries filter or sort results/hits on exact field values (i.e.,
non-tokenized field values).
Fields are generally the solution :)  What else is there?  Documents 
have Fields.  Fields are where you put metadata about documents.  A 
document type makes perfect sense to put in a field.

QueryFilter or the BooleanQuery AND trick mentioned above would allow 
you to narrow results down to a particular set of types.  Sorting works 
on exact values, yes, and you can write your own sorting implementation 
if lexicographic or numeric sorting are not sufficient which could key 
off external information if needed.  To sort on a field, it needs to be 
indexed and non-tokenized (stored is irrelevant).  There must be only a 
single term for that field in a document.  Check the Javadocs for the 
Sort class for more details on the sorting requirements.

5. DEPLOYING LUCENE IN A CLUSTERED WEB-APP ENVIRONMENT
How is Lucene to be deployed in a clustered web-app environment?  Do 
all
cluster nodes require access to a networked filesystem containing the 
index
files or is there another solution?  How is concurrency managed when 
the
index is being incrementally updated?
This is entirely up to you to manage.  I'm sure developers building 
solutions with Lucene have employed all sorts of various architectures.

Concurrency is managed via lock files that need to be shared among apps 
interacting with the index.  The short answer is only a single process 
(but multiple threads sharing an IndexWriter) can index at a time.  You 
would probably want to build some sort of queuing infrastructure and 
have a single indexer, or index into separate indexes and merge them.

Any answers and suggestions are much appreciated.  Thanks.
I hope this helps some.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Newbie Questions: Site Scoping, Page Type Filtering/Sorting, Localization, Clustering

2004-05-30 Thread Sasha Haghani
Hi there,
 
I am a newbie to Lucene and I'm considering using it in an upcoming project.
I've read through the documentation but I still have a number of questions:
 
1. SEGMENTING AN INDEX & QUERIES BY SITE SCOPE
In my use case, I have a number of logical websites backed by the same
underlying content store.  A Document may be ultimately end up "belonging"
to one or more logical sites, but at a distinct URL for each.  The
simplistic solution is to maintain indices for each logical site, but this
will result in some unwanted duplication and the need to update multiple
indices on "shared" content changes.  Other than that, can anyone suggest
approaches for how to segment a single index to accomodate multiple logical
sites and allow queries within a particlar site's scope?  Are fields the
solution?  How should the distinct per-site URLs be managed?
 
2. LOCALIZED CONTENT
I understand that at its core, Lucene can support content from any locale
and character set supported by Java.  What is the best way of implementing
Lucene to handle a content base which includes numerous locales.  One index
per locale or should all Documents be placed in a single index and tagged
with a "locale" field?  Or is there another approach altogether?
 
3. DOCUMENT URLS
Is the URL at which the original document can be retrieved generally (i.e.,
for linking search results to the original doc) stored as a non-index,
non-tokenized, stored Field in the Document?
 
4. QUERY FILTERING & SORTING BY FIELD VALUE
In my application I have a pretty typical need to distinguish between
different document types (e.g., FAQs, Articles, Reviews, etc.) in order to
allow the user to restrict their results to particular types of documents or
to sort results by type.  Are fields again the solution for this?  Can
Queries filter or sort results/hits on exact field values (i.e.,
non-tokenized field values).
 
5. DEPLOYING LUCENE IN A CLUSTERED WEB-APP ENVIRONMENT
How is Lucene to be deployed in a clustered web-app environment?  Do all
cluster nodes require access to a networked filesystem containing the index
files or is there another solution?  How is concurrency managed when the
index is being incrementally updated?
 
Any answers and suggestions are much appreciated.  Thanks.
 
--Daniel


Re: Newbie Questions

2003-08-27 Thread Erik Hatcher
On Tuesday, August 26, 2003, at 02:51  PM, Mark Woon wrote:
Ah, I've been testing out something similar to the latter.  I've been 
adding multiple values on the same key.  Won't this have the same 
effect?  I've been assuming that if I do

doc.add(Field.Keyword("content", "value1");
doc.add(Field.Keyword("content", "value2");
And did a search on the "content" field for either value, I'd get a 
hit, and it seems to work.  This way, I figure I'd be able to 
differentiate between values that I want tokenized and values that I 
don't.

Is there a difference between this and building a StringBuffer 
containing all the values and storing that as a single field-value?
There is a big difference between using Field.Text and Field.Keyword, 
yes.  It all depends on how you want things tokenized (or not).  
Field.Keyword does not tokenize (via the Analyzer), but Field.Text does.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Newbie Questions

2003-08-26 Thread Gregor Heinrich
Hi Mark.

Sorry, it's rc1 really which is out. But if you go to the cvs server, then
you'll find the rc2-dev version.

Multiple calls to Document.add with the same field results in that "their
text is treated as though appended for the purposes of search." (API doc).

Can you try out if there's a differece between the cases you mention? I don'
t know but I'd be interested as well;-).

Gregor




-Original Message-
From: Mark Woon [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 26, 2003 8:52 PM
To: Lucene Users List
Subject: Re: Newbie Questions


Gregor Heinrich wrote:

> ad 1: MultiFieldQueryParser is what you might want: you can specify the
> fields to run the query on. Alternatively, the practice of duplicating
> the
> contents of all separate fields in question into one additional merged
> field
> has been suggested, which enables you to use QueryParser itself.
>

Ah, I've been testing out something similar to the latter.  I've been
adding multiple values on the same key.  Won't this have the same
effect?  I've been assuming that if I do

doc.add(Field.Keyword("content", "value1");
doc.add(Field.Keyword("content", "value2");

And did a search on the "content" field for either value, I'd get a hit,
and it seems to work.  This way, I figure I'd be able to differentiate
between values that I want tokenized and values that I don't.

Is there a difference between this and building a StringBuffer
containing all the values and storing that as a single field-value?


> ad 2: Depending on the Analyzer you use, the query is normalised, i.e.,
> stemmed (remove suffices from words) and stopword-filtered (remove highly
> frequent words). Have a look at StandardAnalyzer.tokenStream(...) to
> see how
> the different filters work. In the analysis package the 1.3rc2 Lucene
> distribution has a Porter stemming algorithm: PorterStemmer.
>

There's an rc2 out?  Where??  I just checked the Lucene website and only
see rc1.


Thanks everyone for all the quick responses!

-Mark



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Newbie Questions

2003-08-26 Thread Mark Woon
Gregor Heinrich wrote:

ad 1: MultiFieldQueryParser is what you might want: you can specify the
fields to run the query on. Alternatively, the practice of duplicating 
the
contents of all separate fields in question into one additional merged 
field
has been suggested, which enables you to use QueryParser itself.

Ah, I've been testing out something similar to the latter.  I've been 
adding multiple values on the same key.  Won't this have the same 
effect?  I've been assuming that if I do

doc.add(Field.Keyword("content", "value1");
doc.add(Field.Keyword("content", "value2");
And did a search on the "content" field for either value, I'd get a hit, 
and it seems to work.  This way, I figure I'd be able to differentiate 
between values that I want tokenized and values that I don't.

Is there a difference between this and building a StringBuffer 
containing all the values and storing that as a single field-value?


ad 2: Depending on the Analyzer you use, the query is normalised, i.e.,
stemmed (remove suffices from words) and stopword-filtered (remove highly
frequent words). Have a look at StandardAnalyzer.tokenStream(...) to 
see how
the different filters work. In the analysis package the 1.3rc2 Lucene
distribution has a Porter stemming algorithm: PorterStemmer.

There's an rc2 out?  Where??  I just checked the Lucene website and only 
see rc1.

Thanks everyone for all the quick responses!

-Mark



Re: Newbie Questions

2003-08-26 Thread Erik Hatcher
On Tuesday, August 26, 2003, at 12:53  AM, Mark Woon wrote:
1) How can I search all fields at the same time?  The QueryParser 
seems to only search one specific field.
The common thing I've done and seen others do is glue all the fields 
together into a master searchable field named something like "contents" 
or "keywords" (be sure to put a space in between text so it can be 
tokenized properly).

2) How can I automatically default all searches into fuzzy mode?  I 
don't want my users to have to know that they must add a "~" at the 
end of all their terms.
Your description of searches for "cancer" finding "cancerous" isn't 
really what the fuzzy query is about.  What you're after, I think, is 
more the stemming algorithms used during the analysis phase.  Have a 
look at the SnowballAnalyzer in the Lucene sandbox.  There is a little 
bit about it in the article I wrote for java.net: 
http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html - it 
definitely sounds like more work in the analysis phase is what you're 
after.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Newbie Questions

2003-08-26 Thread Aviran Mordo
1. You need to use MultiFieldQueryParser
2. I think you should use PorterStemFilter instead of fuzzy query
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/Por
terStemFilter.html

-Original Message-
From: Mark Woon [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 26, 2003 12:54 AM
To: [EMAIL PROTECTED]
Subject: Newbie Questions


Hi all...

I've been playing with Lucene for a couple days now and I have a couple 
questions I'm hoping some one can help me with.  I've created a Lucene 
index with data from a database that's in several different fields, and 
I want to set up a web page where users can search the index.  Ideally, 
all searches should be as google-like as possible.  In Lucene terms, I 
guess this means the query should be fuzzy.  For example, if someone 
searches for "cancer" then I'd like to get back all resuls with any form

of the word cancer in the term ("cancerous", "breast cancer", etc.).

So far, I seem to be having two problems:

1) How can I search all fields at the same time?  The QueryParser seems 
to only search one specific field.

2) How can I automatically default all searches into fuzzy mode?  I 
don't want my users to have to know that they must add a "~" at the end 
of all their terms.

Thanks,
-Mark




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Newbie Questions

2003-08-26 Thread Gregor Heinrich
Hi Mark,

short answers to your questions:

ad 1: MultiFieldQueryParser is what you might want: you can specify the
fields to run the query on. Alternatively, the practice of duplicating the
contents of all separate fields in question into one additional merged field
has been suggested, which enables you to use QueryParser itself.

ad 2: Depending on the Analyzer you use, the query is normalised, i.e.,
stemmed (remove suffices from words) and stopword-filtered (remove highly
frequent words). Have a look at StandardAnalyzer.tokenStream(...) to see how
the different filters work. In the analysis package the 1.3rc2 Lucene
distribution has a Porter stemming algorithm: PorterStemmer.

Have fun,

Gregor

-Original Message-
From: Mark Woon [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 26, 2003 6:54 AM
To: [EMAIL PROTECTED]
Subject: Newbie Questions


Hi all...

I've been playing with Lucene for a couple days now and I have a couple
questions I'm hoping some one can help me with.  I've created a Lucene
index with data from a database that's in several different fields, and
I want to set up a web page where users can search the index.  Ideally,
all searches should be as google-like as possible.  In Lucene terms, I
guess this means the query should be fuzzy.  For example, if someone
searches for "cancer" then I'd like to get back all resuls with any form
of the word cancer in the term ("cancerous", "breast cancer", etc.).

So far, I seem to be having two problems:

1) How can I search all fields at the same time?  The QueryParser seems
to only search one specific field.

2) How can I automatically default all searches into fuzzy mode?  I
don't want my users to have to know that they must add a "~" at the end
of all their terms.

Thanks,
-Mark




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Newbie Questions

2003-08-26 Thread Mark Woon
Hi all...

I've been playing with Lucene for a couple days now and I have a couple 
questions I'm hoping some one can help me with.  I've created a Lucene 
index with data from a database that's in several different fields, and 
I want to set up a web page where users can search the index.  Ideally, 
all searches should be as google-like as possible.  In Lucene terms, I 
guess this means the query should be fuzzy.  For example, if someone 
searches for "cancer" then I'd like to get back all resuls with any form 
of the word cancer in the term ("cancerous", "breast cancer", etc.).

So far, I seem to be having two problems:

1) How can I search all fields at the same time?  The QueryParser seems 
to only search one specific field.

2) How can I automatically default all searches into fuzzy mode?  I 
don't want my users to have to know that they must add a "~" at the end 
of all their terms.

Thanks,
-Mark


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Newbie Questions

2002-04-08 Thread Armbrust, Daniel C.

The way that we have done this (and this isn't necessarily the best way, it
was just the solution we came up with) is that we store all dates and
numbers as strings, but formatted in such a way that when they are
alphabetized, they will be in the right order.

The Lucene Date Filtering mechanism was useless to us, because it doesn't
allow dates before 1970.  

We stored all of our dates as strings in a format of year month day, this
way they sorted in the proper order.
Then you can write your own datefilter, which is basically a cut and paste
from lucenes date filter.

We also had an age field, and to make it sort properly, we had to prefix all
of the ages, like 

003
050
101

This way they sort properly, and you can write an age filter (again a cut
and paste from date filter) that will let you search for ages > 50.

Oh, and to apply more than one filter at a time (the way we did it) you will
need the Chainable Filter class, which I think is now on the contributions
page, but was also in the mailing archives in the last 2 weeks.

Dan


-Original Message-
From: Chris Withers [mailto:[EMAIL PROTECTED]]
Sent: Sunday, April 07, 2002 4:55 AM
To: [EMAIL PROTECTED]
Subject: Newbie Questions


Hi there,

I'm new to Lucene and have what will hopefully be a couple of simple
questions.

1. Can I index numbers with Lucene? If so, ints or floats or ?

2. Can I index dates with Lucene?

In either case, is there any way I can sort the results returned by a search
on
these fields?
Also, can I search for only documents which have been indexed with a range
in
one of these fields?

For example: I only want documents where the 'cost' field is between 1000
and
2000 and where the date of manufacture was prior to 13th June 1978.

cheers,

Chris

--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Newbie Questions

2002-04-07 Thread Chris Withers

Hi there,

I'm new to Lucene and have what will hopefully be a couple of simple questions.

1. Can I index numbers with Lucene? If so, ints or floats or ?

2. Can I index dates with Lucene?

In either case, is there any way I can sort the results returned by a search on
these fields?
Also, can I search for only documents which have been indexed with a range in
one of these fields?

For example: I only want documents where the 'cost' field is between 1000 and
2000 and where the date of manufacture was prior to 13th June 1978.

cheers,

Chris

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




newbie questions

2001-10-23 Thread David Bonilla
Title: En blanco



I´m trying to implement Lucene in my application but I´m really a 
newbie.
 
1) If I want to create a Index in the directory e:\Lucene, 
must I just do writer = new IndexWriter("E:/Lucene", null, true); 
?
 
2) How exactly can I create a Index in a database ? Can 
anybody send a sample ?
 
3) Talking about the boolean third parameter in 
IndexWriter if I write writer = new 
IndexWriter("E:/Lucene", null, false); and the index dont 
exist.. is the Index created anyway ?
(I must use it to control if the index is already writed or not)
 
Thanks a lot !!! 
__David Bonilla FuertesTHE BIT BANG 
NETWORKhttp://www.bit-bang.comProfesor Waksman, 
8, 6º B28036 MadridSPAINTel.: (+34) 914 577 747Móvil: 656 62 83 
92Fax: (+34) 914 586 
176__