Re: Atomicity in Lucene operations

2004-10-19 Thread Nader Henein
As soon as I've cleaned up the code, I'll publish it, it needs a little 
more documentation as well.

Nader
Roy Shan wrote:
Maybe you can contribute it to sandbox?
On Mon, 18 Oct 2004 08:31:30 -0700 (PDT), Yonik Seeley
[EMAIL PROTECTED] wrote:
 

Hi Nader,
I would greatly appreciate it if you could CC me on
the docs or the code.
Thanks!
Yonik
--- Nader Henein [EMAIL PROTECTED] wrote:
   

It's pretty integrated into our system at this
point, I'm working on
Packaging it and cleaning up my documentation and
then I'll make it
available, I can give you the documents and if you
still want the code
I'll slap together a ruff copy for you and ship it
across.
Nader Henein
Roy Shan wrote:
 

Hello, Nader:
I am very interested in how you implement the
   

atomicity. Could you
 

send me a copy of your code?
Thanks in advance.
Roy
   

   
__
Do you Yahoo!?
Yahoo! Mail - Helps protect you from nasty viruses.
http://promotions.yahoo.com/new_mail


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   


 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Range Query

2004-10-19 Thread Karthik N S

Hi

Guys

Apologies.



I  have  a Field Type  Text  'ItemPrice' ,  Using it to Store   Price
Factor in numeric  such as  10, 25.25 , 50.00

If I am suppose to Find the Range factor  between 2   prices

ex -
 Contents:shoes +ItemPrice:[10.00 TO 50.60]


I get results  other  then the Range that has been  executed   [This may be
due to query parsing the Ascii values instead of  numeric values ]

Am  I am missing something in the Querry syntax  or Is this the wrong way to
construct the Query.

Please Somebody Advise me ASAP.  :(

Thx in advance




  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Range Query

2004-10-19 Thread Chuck Williams
Range queries use a lexicographic (dictionary) order.  So, assuming all
your values are positive, you need to ensure that the integer part of
each number has a fixed number of digits (pad with leading 0's).  The
fractional part should be fine, although 1.0 will follow 1.  If you have
negative numbers you need to pad an extra 0 on the left of the
positives, start the negatives with -, and invert the magnitude of the
negatives (so they go in the other order).

Your actual example below should work as is, except that 10 will not be
in the range since 10.00 is strictly after 10.  However, this won't work
without the padding assuming you have any prices with at an integer part
of other than exactly two digits (e.g., 10 is before 6, but after 06).

Chuck

   -Original Message-
   From: Karthik N S [mailto:[EMAIL PROTECTED]
   Sent: Tuesday, October 19, 2004 12:05 AM
   To: LUCENE
   Subject: Range Query
   
   
   Hi
   
   Guys
   
   Apologies.
   
   
   
   I  have  a Field Type  Text  'ItemPrice' ,  Using it to Store  
Price
   Factor in numeric  such as  10, 25.25 , 50.00
   
   If I am suppose to Find the Range factor  between 2   prices
   
   ex -
Contents:shoes +ItemPrice:[10.00 TO 50.60]
   
   
   I get results  other  then the Range that has been  executed   [This
may
   be
   due to query parsing the Ascii values instead of  numeric values ]
   
   Am  I am missing something in the Querry syntax  or Is this the
wrong
   way to
   construct the Query.
   
   Please Somebody Advise me ASAP.  :(
   
   Thx in advance
   
   
   
   
 WITH WARM REGARDS
 HAVE A NICE DAY
 [ N.S.KARTHIK]
   
   
   
   
  
-
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search Help in word doc

2004-10-19 Thread Natarajan.T
Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 



RE: QueryParsing

2004-10-19 Thread Rupinder Singh Mazara
hi erik and everyone else

 ok i will buy the book ;)
but this still does not solve the problem of
 why String x = \jakarta apache\~100; is being transalted as a
PhraseQuery
  FULL_TEXT:jakarta apache~100

 is the correct query beining formed ?  or is there something wrong with the
 Proximity Search topic in the URL
http://jakarta.apache.org/lucene/docs/queryparsersyntax.html


 Regards

 Rupinder

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: 18 October 2004 21:05
To: Lucene Users List
Subject: Re: QueryParsing


QueryParser does not (currently) support SpanQuery's.  PhraseQuery is
what you'll always get with double-quoted strings.  However, you can
customize the behavior and get a SpanQuery instead by subclassing and
overriding getPhraseQuery.  In fact, this is an example I wrote for
Lucene in Action.

   Erik


On Oct 18, 2004, at 2:39 PM, Rupinder Singh Mazara wrote:

 hi all

  i have a question regarding the QueryParser and Proximity Searches
  I executed the following piece of code

 String x = \jakarta apache\~100;
 QueryParser parser = new QueryParser(FULL_TEXT,new
 StandardAnalyzer() );
 parser.setOperator( QueryParser.DEFAULT_OPERATOR_AND );
 Query query = parser.parse(x);
 System.out.println(query.getClass()+ - +query.toString());

 IndexReader indexReader = IndexReader.open(  new File(luceneroot)
 );

 query =  query.rewrite(indexReader);
 System.out.println(query.getClass()+ - +query.toString());

 in both System.out.println I get the following result
 class org.apache.lucene.search.PhraseQuery - FULL_TEXT:jakarta
 apache~100

 is this correct, I was expecting to see a SpanQuery being formed  at
 the
 second println statement


 I have take this from the example in
 http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

 If I remove the quotes I see a QueryParsing error which tell me that
 the
 Similarity should be between 0.0 and 1.0
 which is as expected

 please let me know if I missed something


 Regards


 Rupinder


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Help in word doc

2004-10-19 Thread Cocula Remi

Seen that.
I use the Character.isISOControl() function to identify and remove these characters.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 10:37
À : [EMAIL PROTECTED]
Objet : Search Help in word doc


Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Help in word doc

2004-10-19 Thread Natarajan.T
Hi Remi,

Thanks for your response...
Pls send me the jar name with sample code.

Thanks,
Natarajan.



-Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:26 PM
To: Lucene Users List
Subject: RE: Search Help in word doc


Seen that.
I use the Character.isISOControl() function to identify and remove these
characters.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 10:37
À : [EMAIL PROTECTED]
Objet : Search Help in word doc


Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: QueryParsing

2004-10-19 Thread Morus Walter
Rupinder Singh Mazara writes:
 hi erik and everyone else
 
  ok i will buy the book ;)
 but this still does not solve the problem of
  why String x = \jakarta apache\~100; is being transalted as a
 PhraseQuery
   FULL_TEXT:jakarta apache~100
 
  is the correct query beining formed ?  or is there something wrong with the
  Proximity Search topic in the URL
 http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
 
A proximity search is done by a PhraseQuery with a slop.
The slop makes the PhraseQuery to perform a proximity search (so you can
argue that the name is problematic).
That's what query parser creates.

SpanQueries where introduced later. Maybe you can get the effect of a
proximity search by SpanQueries also, but that's not handled by the query
parser.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Help in word doc

2004-10-19 Thread Cocula Remi
This sample code changes undesired characters into underscores.


Document doc = 

char[] cs = doc.get(content).toCharArray();
StringBuffer sb = new StringBuffer();
for (int j=0;j Array.getLength(cs);j++)
{
if (!Character.isISOControl(cs[j]))
{
sb.append(cs[j]);
}
else
{
sb.append( _ );
}
}

System.out.println(sb.toString());

-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 11:06
À : 'Lucene Users List'
Objet : RE: Search Help in word doc


Hi Remi,

Thanks for your response...
Pls send me the jar name with sample code.

Thanks,
Natarajan.



-Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:26 PM
To: Lucene Users List
Subject: RE: Search Help in word doc


Seen that.
I use the Character.isISOControl() function to identify and remove these
characters.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 10:37
À : [EMAIL PROTECTED]
Objet : Search Help in word doc


Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: QueryParsing

2004-10-19 Thread Rupinder Singh Mazara
thank you Morus

this makes things very clear to me 

Regards

Rupinder


-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED]
Sent: 19 October 2004 10:05
To: Lucene Users List
Subject: RE: QueryParsing


Rupinder Singh Mazara writes:
 hi erik and everyone else
 
  ok i will buy the book ;)
 but this still does not solve the problem of
  why String x = \jakarta apache\~100; is being transalted as a
 PhraseQuery
   FULL_TEXT:jakarta apache~100
 
  is the correct query beining formed ?  or is there something 
wrong with the
  Proximity Search topic in the URL
 http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
 
A proximity search is done by a PhraseQuery with a slop.
The slop makes the PhraseQuery to perform a proximity search (so you can
argue that the name is problematic).
That's what query parser creates.

SpanQueries where introduced later. Maybe you can get the effect of a
proximity search by SpanQueries also, but that's not handled by the query
parser.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Help in word doc

2004-10-19 Thread Natarajan.T
Ok Thanks Remi

-Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:37 PM
To: Lucene Users List
Subject: RE: Search Help in word doc

This sample code changes undesired characters into underscores.


Document doc = 

char[] cs = doc.get(content).toCharArray();
StringBuffer sb = new StringBuffer();
for (int j=0;j Array.getLength(cs);j++)
{
if (!Character.isISOControl(cs[j]))
{
sb.append(cs[j]);
}
else
{
sb.append( _ );
}
}

System.out.println(sb.toString());

-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 11:06
À : 'Lucene Users List'
Objet : RE: Search Help in word doc


Hi Remi,

Thanks for your response...
Pls send me the jar name with sample code.

Thanks,
Natarajan.



-Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:26 PM
To: Lucene Users List
Subject: RE: Search Help in word doc


Seen that.
I use the Character.isISOControl() function to identify and remove these
characters.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 10:37
À : [EMAIL PROTECTED]
Objet : Search Help in word doc


Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Help in word doc

2004-10-19 Thread Natarajan.T
Are you doing this functionality under indexing part or search part

-Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:37 PM
To: Lucene Users List
Subject: RE: Search Help in word doc

This sample code changes undesired characters into underscores.


Document doc = 

char[] cs = doc.get(content).toCharArray();
StringBuffer sb = new StringBuffer();
for (int j=0;j Array.getLength(cs);j++)
{
if (!Character.isISOControl(cs[j]))
{
sb.append(cs[j]);
}
else
{
sb.append( _ );
}
}

System.out.println(sb.toString());

-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 11:06
À : 'Lucene Users List'
Objet : RE: Search Help in word doc


Hi Remi,

Thanks for your response...
Pls send me the jar name with sample code.

Thanks,
Natarajan.



-Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:26 PM
To: Lucene Users List
Subject: RE: Search Help in word doc


Seen that.
I use the Character.isISOControl() function to identify and remove these
characters.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 10:37
À : [EMAIL PROTECTED]
Objet : Search Help in word doc


Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Help in word doc

2004-10-19 Thread Natarajan.T
Ok, Thanks a lot...

-Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 3:14 PM
To: Lucene Users List
Subject: RE: Search Help in word doc

In my case, search.
But probably that the best is to do it at indexing time.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 11:41
À : 'Lucene Users List'
Objet : RE: Search Help in word doc


Are you doing this functionality under indexing part or search part

-Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:37 PM
To: Lucene Users List
Subject: RE: Search Help in word doc

This sample code changes undesired characters into underscores.


Document doc = 

char[] cs = doc.get(content).toCharArray();
StringBuffer sb = new StringBuffer();
for (int j=0;j Array.getLength(cs);j++)
{
if (!Character.isISOControl(cs[j]))
{
sb.append(cs[j]);
}
else
{
sb.append( _ );
}
}

System.out.println(sb.toString());

-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 11:06
À : 'Lucene Users List'
Objet : RE: Search Help in word doc


Hi Remi,

Thanks for your response...
Pls send me the jar name with sample code.

Thanks,
Natarajan.



-Original Message-
From: Cocula Remi [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 2:26 PM
To: Lucene Users List
Subject: RE: Search Help in word doc


Seen that.
I use the Character.isISOControl() function to identify and remove these
characters.


-Message d'origine-
De : Natarajan.T [mailto:[EMAIL PROTECTED]
Envoyé : mardi 19 octobre 2004 10:37
À : [EMAIL PROTECTED]
Objet : Search Help in word doc


Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Arabic analyzer

2004-10-19 Thread Pierrick Brihaye
Hi,
Scott Smith a écrit :
Is anyone aware of an open source (non-GPL; i.e.., free for commercial
use) Arabic analyzer for Lucene?
Unfortunately (for you), my Arabic Analyzer for Java 
(http://savannah.nongnu.org/projects/aramorph) is GPL-ed.

 Does Arabic really require a stemmer
as well (some of the reading I've seen on the web would suggest that a
stemmer is almost a necessity with Arabic to get anything useful where
it is not with other languages).
IMHO, stemming *is* a necessity in arabic since this language involves 
prefixing, suffixing and infixing as well as written a few yet very 
frequent word agregations.

Good luck,
--
Pierrick Brihaye
mailto:[EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


using optimize and addDocument concurrently.

2004-10-19 Thread Stephen Halsey
Hi,

My basic question is whether it is possible to continue to add documents to an index 
in one Thread while running a long running optimization of the index (approx 30 mins) 
in another thread.  I'm using Lucene version 1.4.2.  The concurrency matrix at 
http://www.jguru.com/faq/view.jsp?EID=913302 shows that if you use the same 
IndexWriter object you can do concurrent writes and optimization.  When I try it in my 
program the addDocuments wait until the optimization has finished, so in this respect 
it is Thread safe, but the operations cannot be performed at the same time.  Our 
problem is that the index needs to be continually kept up to date with new news 
articles, but also needs to be regularly optimized to keep it fast.  If I cannot 
update and optimize one index at the same time the best way I can see of doing this is 
maintaining multiple identical indexes and offlining, optimizing, letting them catch 
up-to-date and re-onlining them.  Does that sounds best to you?

Thanks a lot in advance


Steve

Null or no analyzer

2004-10-19 Thread Rupinder Singh Mazara
Hi All

  I have a question regarding selection of Analyzer's during query parsing


  i have three field in my index db_id, full_text, subject
  all three are indexed, however while indexing I specified to lucene to
index db_id and subject but not tokenize them

  I want to give a single search box in my application to enable searching
for documents
  some query can look lile  motor cross rally this will get fed to
QueryParser to do the relevent parsing

  however if the user enters  Jhon Kerry  subject:Elections 2004 I want to
make sure that No analyzer is used fro the subject field ? how can that be
done.

  this is because I expect the users to know the subject from a List of
controlled vocabularies and also I am searching for
 documents that have the exact subject I tried using the
PerFieldAnalyzerWrapper, but how do I get hold a Analyzer that
 does nothing but pass the text trough to the Searcher  ?




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



MultiSearcher object question

2004-10-19 Thread Jeff Munson
I've just indexed over 600,000 documents (index size = 12GB) and have a
simple servlet to search the index.  I am using the MultiSearcher object
(I will add more indexes in the future) in a servlet to test searching.
I have noticed that the instantiation of my MulitSearcher object is
taking about 5 seconds.  As a solution, I have created the MultiSearcher
object and stored it in the Application context so I create it once and
access it subsequent times.  

My question is, is this a recommended practice?  If I have 1000 users
concurrently searching, will this approach cause problems?  What do
others do in web applications using the MultiSearcher object? 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: using optimize and addDocument concurrently.

2004-10-19 Thread Aad Nales
Steve,

The behavior that you descibe is as expected. I have tackled a similar
problem to yours by creating a proxy object that acts as a gatekeeper to
all IndexReader, IndexSearcher and IndexWriter operations. With fully
synchronized access to all methods of the proxy you will not run into
any problems. Everytime I need to perform something with the writer, I
close the searcher etc.

As to regular optimization I tend to reindex now and again with a
completely seperate writer and replace the index by moving it to the new
location. This BTW has also become a method in my proxy object.

Hope this helps,
Cheers,
Aad




Hi,

My basic question is whether it is possible to continue to add documents
to an index in one Thread while running a long running optimization of
the index (approx 30 mins) in another thread.  I'm using Lucene version
1.4.2.  The concurrency matrix at
http://www.jguru.com/faq/view.jsp?EID=913302 shows that if you use the
same IndexWriter object you can do concurrent writes and optimization.
When I try it in my program the addDocuments wait until the optimization
has finished, so in this respect it is Thread safe, but the operations
cannot be performed at the same time.  Our problem is that the index
needs to be continually kept up to date with new news articles, but also
needs to be regularly optimized to keep it fast.  If I cannot update and
optimize one index at the same time the best way I can see of doing this
is maintaining multiple identical indexes and offlining, optimizing,
letting them catch up-to-date and re-onlining them.  Does that sounds
best to you?

Thanks a lot in advance


Steve



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Thesaurus ...

2004-10-19 Thread Patricio Galeas
Hello,
I'm a new user of Lucene, and a would like to use it to create a Thesaurus.
Do you have any idea to do this?  Thanks!

kind regards
P.Galeas





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Null or no analyzer

2004-10-19 Thread Aviran
You can use WhiteSpaceAnalyzer

Aviran
http://aviran.mordos.com

-Original Message-
From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 19, 2004 11:23 AM
To: Lucene Users List
Subject: Null or no analyzer


Hi All

  I have a question regarding selection of Analyzer's during query parsing


  i have three field in my index db_id, full_text, subject
  all three are indexed, however while indexing I specified to lucene to
index db_id and subject but not tokenize them

  I want to give a single search box in my application to enable searching
for documents
  some query can look lile  motor cross rally this will get fed to
QueryParser to do the relevent parsing

  however if the user enters  Jhon Kerry  subject:Elections 2004 I want to
make sure that No analyzer is used fro the subject field ? how can that be
done.

  this is because I expect the users to know the subject from a List of
controlled vocabularies and also I am searching for  documents that have the
exact subject I tried using the PerFieldAnalyzerWrapper, but how do I get
hold a Analyzer that  does nothing but pass the text trough to the Searcher
?




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Downloading Full Copies of Web Pages

2004-10-19 Thread Luciano Barbosa
Hi folks,
I want to download full copies of web pages and storage them locally as 
well the hyperlink structures as local directories. I tried to use 
Lucene, but I've realized that  it doesn't have a crawler.
Does anyone know a software that make this?
Thanks,

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Range Query

2004-10-19 Thread Jonathan Hager
That is exactly right.  It is searching the ASCII.  To solve it I pad
my price using a method like this:

  /**
   * Pads the Price so that all prices are the same number of characters and
   * can be compared lexigraphically.
   * @param price
   * @return
   */
  public static String formatPriceAsString(Double price) {
if (price == null) {
  return null;
}
return PRICE_FORMATTER.format(price.doubleValue());
  }

where PRICE_FORMATTER contains enough digits for your largest number.

  private static final DecimalFormat PRICE_FORMATTER = new
DecimalFormat(000.00);

When searching I also pad the query term.  I looked into hooking into
QueryParser, but since the lower/upper prices for my application are
different inputs, I choose to handle them without hooking into the
QueryParser.

Jonathan


On Tue, 19 Oct 2004 12:35:06 +0530, Karthik N S
[EMAIL PROTECTED] wrote:
 
 Hi
 
 Guys
 
 Apologies.
 
 I  have  a Field Type  Text  'ItemPrice' ,  Using it to Store   Price
 Factor in numeric  such as  10, 25.25 , 50.00
 
 If I am suppose to Find the Range factor  between 2   prices
 
 ex -
  Contents:shoes +ItemPrice:[10.00 TO 50.60]
 
 I get results  other  then the Range that has been  executed   [This may be
 due to query parsing the Ascii values instead of  numeric values ]
 
 Am  I am missing something in the Querry syntax  or Is this the wrong way to
 construct the Query.
 
 Please Somebody Advise me ASAP.  :(
 
 Thx in advance
 
   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Thesaurus ...

2004-10-19 Thread David Spencer
Erik Hatcher wrote:
Have a look at the WordNet contribution in the Lucene sandbox 
repository.  It could be leveraged for part of a solution.
It's something I contributed.
Relevant links are:
http://jakarta.apache.org/lucene/docs/lucene-sandbox/
http://www.tropo.com/techno/java/lucene/wordnet.html
Basically it uses the Lucene index as a kind of associated array to map 
words to their synonyms using the thesaurus from Wordnet, so a key like, 
say, fast will have mappings to quick and rapid. This can then be 
used for query expansion.

An example of this expansion in use is here:
http://www.hostmon.com/rfc/advanced.jsp

Erik
On Oct 19, 2004, at 12:40 PM, Patricio Galeas wrote:
Hello,
I'm a new user of Lucene, and a would like to use it to create a 
Thesaurus.
Do you have any idea to do this?  Thanks!

kind regards
P.Galeas


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]