RE: BOOLEAN EXCEPTION APPSERVER SOLUTION

2005-02-13 Thread Karthik N S

Hi Guys

Apologies

The form was Correct,

The problem of CLASS LOADING  was or may be a bug with the Version jdk1.4.1
and TOMCAT5.5.3 on Gentoo O/s.

So I switched to jdk1.4.2 and every thing seems to be in proper Order as of
now.



Thx for the advise
With regards
karthik


-Original Message-
From: Ronnie [mailto:[EMAIL PROTECTED]
Sent: Friday, February 11, 2005 4:37 PM
To: Lucene Users List
Subject: Re: BOOLEAN EXCEPTION APPSERVER


Do a search for lucene jars, something like:
# find $TOMCAT_HOME/ -name lucene*.jar

Replace $TOMCAT_HOME with the correct dir to your tomcat installation.
Also check the classpath of the user running tomcat.

/Ronnie

- Original Message -
From: Karthik N S [EMAIL PROTECTED]
To: Lucene Users List lucene-user@jakarta.apache.org
Sent: Friday, February 11, 2005 10:52 AM
Subject: RE: BOOLEAN EXCEPTION APPSERVER


 Hi

I removed the Lucene1.4.3.jar from the webapp dir and the result
 Exception raised


 Feb 11, 2005 3:48:26 PM org.apache.catalina.core.ApplicationContext log
 SEVERE: Error configuring application listener of class
 com.controlnet.servertool.WebContextReporter
 java.lang.NoClassDefFoundError: org/apache/lucene/analysis/Analyzer
 at java.lang.Class.getDeclaredConstructors0(Native Method)
 at java.lang.Class.privateGetDeclaredConstructors(Class.java:1590)
 at java.lang.Class.getConstructor0(Class.java:1762)
 at java.lang.Class.newInstance0(Class.java:276)
 at java.lang.Class.newInstance(Class.java:259)
 at

org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:
 3546)
 at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4031)
 at

org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:7
 55)
 at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:739)
 at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525)
 at

org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:886)
 at

org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:849
 )
 at
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:474)
 at
 org.apache.catalina.startup.HostConfig.start(HostConfig.java:1079)
 at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:310)
 at

org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSuppor
 t.java:119)
 at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1011)
 at
 org.apache.catalina.core.StandardHost.start(StandardHost.java:718)
 at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1003)
 at
 org.apache.catalina.core.StandardEngine.start(StandardEngine.java:437)
 at
 org.apache.catalina.core.StandardService.start(StandardService.java:450)
 at
 org.apache.catalina.core.StandardServer.start(StandardServer.java:2009)
 at org.apache.catalina.startup.Catalina.start(Catalina.java:538)

 So this mean's I have Only one Copy of Lucene in the Classpath of Tomcat5

 and The same Exceptions are also avaliable for Windows2000 / Linux gentoo
 servers.

 Please Help

 Thx in advance

 -Original Message-
 From: Miles Barr [mailto:[EMAIL PROTECTED]
 Sent: Friday, February 11, 2005 2:51 PM
 To: Lucene Users List
 Subject: Re: BOOLEAN EXCEPTION APPSERVER


 On Fri, 2005-02-11 at 12:20 +0530, Karthik N S wrote:
  I am getting this error on  ' Every FIRST SEARCH  after  Startup of
  the WEBSERVER '
 
  and I have declared the following code only once in the  method of
  execution
 
 
  %@ page import=org.apache.lucene.search.BooleanQuery%
  BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE);
 
  The Exception  is as follows
 
  Feb 11, 2005 12:16:42 PM org.apache.catalina.core.StandardWrapperValve
  invoke
  SEVERE: Servlet.service() for servlet jsp threw exception
  java.lang.LinkageError: duplicate class definition:
  org/apache/lucene/search/BooleanQuery
  at java.lang.ClassLoader.defineClass0(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:502)
  at
  java.security.SecureClassLoader.defineClass(SecureClassLoader.java:123)
  at
 

org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLo
 ader.java:1626)
  at
 

org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.jav
 a:850)
  at
 

org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav
 a:1299)
  at
 

org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav
 a:1181)
  at
  org.apache.jasper.servlet.JasperLoader.loadClass(JasperLoader.java:148)
  at
  org.apache.jasper.servlet.JasperLoader.loadClass(JasperLoader.java:69)

  o/S  = Gentoo Linux
  java = 1.4.1
  Ram = 256
  webserver Tomcat5.5.3

 It looks like the class definition is being loaded twice. But if it's
 being done by different classloaders

RE: BOOLEAN EXCEPTION APPSERVER

2005-02-11 Thread Karthik N S
Hi

   I removed the Lucene1.4.3.jar from the webapp dir and the result
Exception raised


Feb 11, 2005 3:48:26 PM org.apache.catalina.core.ApplicationContext log
SEVERE: Error configuring application listener of class
com.controlnet.servertool.WebContextReporter
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/Analyzer
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:1590)
at java.lang.Class.getConstructor0(Class.java:1762)
at java.lang.Class.newInstance0(Class.java:276)
at java.lang.Class.newInstance(Class.java:259)
at
org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:
3546)
at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4031)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:7
55)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:739)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525)
at
org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:886)
at
org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:849
)
at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:474)
at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1079)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:310)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSuppor
t.java:119)
at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1011)
at
org.apache.catalina.core.StandardHost.start(StandardHost.java:718)
at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1003)
at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:437)
at
org.apache.catalina.core.StandardService.start(StandardService.java:450)
at
org.apache.catalina.core.StandardServer.start(StandardServer.java:2009)
at org.apache.catalina.startup.Catalina.start(Catalina.java:538)

So this mean's I have Only one Copy of Lucene in the Classpath of Tomcat5

and The same Exceptions are also avaliable for Windows2000 / Linux gentoo
servers.

Please Help

Thx in advance

-Original Message-
From: Miles Barr [mailto:[EMAIL PROTECTED]
Sent: Friday, February 11, 2005 2:51 PM
To: Lucene Users List
Subject: Re: BOOLEAN EXCEPTION APPSERVER


On Fri, 2005-02-11 at 12:20 +0530, Karthik N S wrote:
 I am getting this error on  ' Every FIRST SEARCH  after  Startup of
 the WEBSERVER '

 and I have declared the following code only once in the  method of
 execution


 %@ page import=org.apache.lucene.search.BooleanQuery%
 BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE);

 The Exception  is as follows

 Feb 11, 2005 12:16:42 PM org.apache.catalina.core.StandardWrapperValve
 invoke
 SEVERE: Servlet.service() for servlet jsp threw exception
 java.lang.LinkageError: duplicate class definition:
 org/apache/lucene/search/BooleanQuery
 at java.lang.ClassLoader.defineClass0(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:502)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:123)
 at

org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLo
ader.java:1626)
 at

org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.jav
a:850)
 at

org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav
a:1299)
 at

org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav
a:1181)
 at
 org.apache.jasper.servlet.JasperLoader.loadClass(JasperLoader.java:148)
 at
 org.apache.jasper.servlet.JasperLoader.loadClass(JasperLoader.java:69)

 o/S  = Gentoo Linux
 java = 1.4.1
 Ram = 256
 webserver Tomcat5.5.3

It looks like the class definition is being loaded twice. But if it's
being done by different classloaders it should be fine. You might have
two different versions on Lucene being loaded. Tomcat uses several
classloaders depending on where it finds the JAR file:

http://jakarta.apache.org/tomcat/tomcat-5.5-doc/class-loader-howto.html

Make sure you only have one copy of the Lucene JAR visible to Tomcat.



--
Miles Barr [EMAIL PROTECTED]
Runtime Collective Ltd.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: BOOLEAN EXCEPTION APPSERVER

2005-02-11 Thread Karthik N S
Hi

Apologies.

  When I said 'defined another BooleanQuery class' I meant actually
  writing another class with the name
  org.apache.lucene.search.BooleanQuery. I'm guessing this isn't the case.

No None of my Packages either start or named with the Lucene similar names
[I use Eclipse IDE
 and it would defnetly indicate the name conflict ,if this was the case]


will come back afte switching the jdk from 1.4.1 to 1.4.2

Any more Ideas post to the Form will be of great Help

Thx in advance



-Original Message-
From: Miles Barr [mailto:[EMAIL PROTECTED]
Sent: Friday, February 11, 2005 4:03 PM
To: Lucene Users List
Subject: RE: BOOLEAN EXCEPTION APPSERVER


On Fri, 2005-02-11 at 15:50 +0530, Karthik N S wrote:
 Hi

I have One Jsp [Query.jsp] which constructs Query something like below

   +CLOTHS +(+SHOES SOCKS) +(PANTS SHIRTS) -COTTON AND
itemPrice:[0010
 TO 0020]


 That'd odd. You haven't defined another BooleanQuery class have you?

   So for the itemPrice Range I use the BooleanQuery

When I said 'defined another BooleanQuery class' I meant actually
writing another class with the name
org.apache.lucene.search.BooleanQuery. I'm guessing this isn't the case.

I'm afraid I'm out of ideas. Maybe as a last ditch attempt you could try
switching JVMs?


--
Miles Barr [EMAIL PROTECTED]
Runtime Collective Ltd.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



REPLACE USING ANALYZERS

2005-02-02 Thread Karthik N S



Hi 
Guys
Apologies.
I am would 
like to know if Any Analyzers out there which can give me the required o/p 
as shown below
1)
I/p = "+~shoes -~nike" 

O/p = 
"+shoes -nike"

2)
I/p = +(+"~shoes -~nike")
O/p = +(+"shoes -nike")

3)
I/p = +~shoes -~nike
O/p 
= +shoes -nike

[ Note:- 
I am Using the _javascript_ tool avaliable from Lucene 
ContributersSite to build Advance Search with synonym factor 
]

Thx in 
advance
WITH WARM REGARDS HAVE A NICE DAY [ 
N.S.KARTHIK] 


RE: REPLACE USING ANALYZERS

2005-02-02 Thread Karthik N S
Hi Erik

   OOps Forgot !

  What about If the I/p is


 I/p = +~shoes~  -~nike~  or  +(+~shoes~ -~nike~)   or  +~shoes~ -~nike~

  Using replaceAll would not solve the Problem ,

  since Fuzzy Searches in Query Parses would not return hits for
equivalen's.


|:(

thx in advance
Karthik




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 02, 2005 3:50 PM
To: Lucene Users List
Subject: Re: REPLACE USING ANALYZERS



On Feb 2, 2005, at 4:12 AM, Karthik N S wrote:

 Hi Guys

 Apologies.

 I am would like to know if Any Analyzers out there  which can give me
 the required o/p as shown below

Sure:

string.replaceAll(~,)

:)


 1)

 I/p   =  +~shoes  -~nike

  O/p  =  +shoes  -nike

  

 2)

 I/p    = +(+~shoes -~nike)

 O/p   = +(+shoes -nike)

  

 3)

 I/p   =  +~shoes -~nike

 O/p  =  +shoes  -nike

  

 [ Note:- I  am Using the Javascript tool avaliable from Lucene
 Contributers Site to build Advance Search with synonym factor ]

  

 Thx in advance

 image.tiff
 WITH WARM REGARDS
  HAVE A NICE DAY
 [ N.S.KARTHIK]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



LUCENE + EXCEPTION

2005-01-24 Thread Karthik N S




Hi 
Guys
Apologies..

On 
STANDALONE Usge of UPDATION/DELETION/ADDITION of Documents into 
MergerIndex, the Code of mine
runs 
PERFECTLY with out any Problems.
But When the 
same Code is plugged into a WEBAPP on TOMCAT with a servlet Running in SINGLE 
THREAD MODE,Some times 
Frequently I 
get the Error as below
java.io.IOException: read past 
EOF at 
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:218) 
at 
org.apache.lucene.store.InputStream.readBytes(InputStream.java:61) 
at 
org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:356) 
at 
org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:323) 
at 
org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:64) 
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85) 
at 
org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) 
at 
org.apache.lucene.search.Hits.init(Hits.java:43) 
at 
org.apache.lucene.search.Searcher.search(Searcher.java:33) 
at 
org.apache.lucene.search.Searcher.search(Searcher.java:27)
Somebody 
Please tell me Why is this happening
O/s = 
Jentoo
JAVA = Jdk 
1.4.2
WEBAPP = 
TOMCAT
Lucene 
= 1.4.3
Thx in 
advance
Karthik












WITH WARM REGARDS HAVE A NICE DAY [ 
N.S.KARTHIK] 


RE: LUCENE + EXCEPTION

2005-01-24 Thread Karthik N S
Hi

  Ok Still I have the Exeption in process ,If even I try to have a Servlet
Single Instance [may be by Authentication
  processs] , but I made shure that Lucene's MergerIndexing is controlled by
single Initiation...



  But With out any Shared Resource's the Exception is popping on Frequently,



   java.io.IOException: read past EOF
at
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou
ndFileReader.java:218)
at
org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
at
org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:356)
at
org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:323)
at
org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:64)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at org.apache.lucene.search.Hits.init(Hits.java:43)
at org.apache.lucene.search.Searcher.search(Searcher.java:33)
at org.apache.lucene.search.Searcher.search(Searcher.java:27)



  Please Help me

 [ I could not find any solution on Lucene Form for the same,may be I am the
only one with the issue]

Karthik

-Original Message-
From: Chris Lamprecht [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 25, 2005 9:48 AM
To: Lucene Users List
Subject: Re: LUCENE + EXCEPTION


Hi Karthik,

If you are talking about SingleThreadModel (i.e. your servlet
implements javax.servlet.SingleThreadModel), this does not guarantee
that two different instances of your servlet won't be run at the same
time.  It only guarantees that each instance of your servlet will only
be run by one thread at a time.  See:

http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/servlet/SingleThreadMode
l.html

If you are accessing a shared resource (a lucene index), you'll have
to prevent concurrent modifications somehow other than
SingleThreadModel.

I think they've finally deprecated SingleThreadModel in the latest
(may be not even out yet) servlet spec.

-chris


 On STANDALONE Usge of   UPDATION/DELETION/ADDITION of Documents into
 MergerIndex, the  Code of mine


 runs PERFECTLY  with out any Problems.


 But When the same Code is plugged into a WEBAPP on TOMCAT with a servlet
 Running in SINGLE THREAD MODE,Some times


 Frequently I get the Error as below

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: help in indexing

2005-01-19 Thread Karthik N S
Hi

 Probably u need to use the Luke S/w to peek insid tu'r Indexer,Use it then
  come back for more help


Karthik


-Original Message-
From: chetan minajagi [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 20, 2005 12:05 PM
To: lucene-user@jakarta.apache.org
Subject: help in indexing


Hi ,

It might seem elementary to most of you.
I am trying to build a search tool for internal use using lucene.
I have used the following
 for
 .pdf   --  PDFBOx
 .html --  demo file of lucene(HTMLDocument)
 .xls   --  poi

The indexing seems to work without throwing up any errors.
But,when i try to search i end up getting with zero hits always.
I have tried to use the same string that i see (System.out.print(Document))
but in vain.
Can somebody let me know where and what could be wrong.
Regards,
Chetan


-
Do you Yahoo!?
 Yahoo! Search presents - Jib Jab's 'Second Term'


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: QUERYPARSIN BOOSTING

2005-01-12 Thread Karthik N S
Hi Guys

Apologies...

If somebody's is  been closely watching GOOGLE, It boost's WEBSITES for
payed category sites based on search words.

Can This [ boost the Full WEBSITE ] be achieved in Lucene's search  based on
searchword

If So Please Explain /examples  ???.

with regards
karthik



-Original Message-
From: Chuck Williams [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 11, 2005 2:00 PM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: RE: QUERYPARSIN  BOOSTING


Karthik,

I don't think the boost in your example does much since you are using an
AND query, i.e. all hits will have to contain both vendor:nike and
contents:shoes.  If you used an OR, then the boost would put nike
products above (non-nike) shoes, unless there was some other factor that
causes score of contents:shoes to be 10x greater than that of
vendor:nike.  It's a good idea to look at the results of explain() when
analyzing what's happening with scoring, tuning your boosts and your
Similarity.

Chuck

   -Original Message-
   From: Nader Henein [mailto:[EMAIL PROTECTED]
   Sent: Tuesday, January 11, 2005 12:21 AM
   To: Lucene Users List
   Subject: Re: QUERYPARSIN  BOOSTING
  
From the text on the Lucene Jakarta Site :
   http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
  
  
   Lucene provides the relevance level of matching documents based on
the
   terms found. To boost a term use the caret, ^, symbol with a boost
   factor (a number) at the end of the term you are searching. The
higher
   the boost factor, the more relevant the term will be.
  
   Boosting allows you to control the relevance of a document by
   boosting its term. For example, if you are searching for
  
  
  
  
   jakarta apache
  
  
  
  
   and you want the term jakarta to be more relevant boost it
using
   the ^ symbol along with the boost factor next to the term. You
would
   type:
  
  
  
  
   jakarta^4 apache
  
  
  
  
   This will make documents with the term jakarta appear more
relevant.
   You can also boost Phrase Terms as in the example:
  
  
  
  
   jakarta apache^4 jakarta lucene
  
  
  
  
   By default, the boost factor is 1. Although the boost factor
must be
   positive, it can be less than 1 (e.g. 0.2)
  
  
   Regards.
  
   Nader Henein
  
  
   Karthik N S wrote:
  
   Hi Guys
   
   
   
   Apologies...
   
   This Question may be asked million times on this form ,need some
   clarifications.
   
   1) FieldType =  keyword  name =  vendor
   
   2)FieldType =  text  name = contents
   
   Question:
   
   1) How to Construct a Query which would allow hits  avaliable for
the
   VENDOR
   to  appear  first ?.
   
   2) If boosting is to be applied How TO   ?.
   
   3) Is the Query Constructed Below correct?.
   
   +Contents:shoes +((vendor:nike)^10)
   
   
   
   Please Advise.
   Thx in advance.
   
   
   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]
   
   
   
  
-
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail:
[EMAIL PROTECTED]
   
   
   
   
   
   
  
  
-
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



SYNONYM + GOOGLE

2005-01-10 Thread Karthik N S


Hi Guys

Apologies

Does Lucene have a  Synonym  Functonality as Google.

If u search Google  using  '~shoes',  It returns  hits  based on the
Synonym's

[ I know there is a Synonym Wordnet  based Lucene Package in the sandbox

http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordN
et/   ]

Can this be achieved in Lucene ,If so How ???



Thx in Advance
Karthik






















WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: SYNONYM + GOOGLE

2005-01-10 Thread Karthik N S
Hi Erik

Apologies...

I may be a little offline from this form,but I may help u for the next
version of Luncene In Action.


 I Was working on Java WordNet Library , On fiddling with the API's, found
something Interesting ,

 the code attached to this  get's more Synonyms then the Wordnet's Indexed
format avaliable from the LuceneinAction Zip File



1) It needs Wordnet2.0's Dictonery  Installed

2) jwnl.jar from SourceForge

[
http://sourceforge.net/project/showfiles.php?group_id=33824package_id=33975
release_id=196864 ]


After sucess compilation

Type for watch

ORIGINAL  : watch OR analog_watch OR digital_watch OR hunter OR
hunting_watch OR pendulum_watch OR
pocket_watch OR stem-winder OR wristwatch OR wrist_watch

FORMATTED : watch OR analog watch OR digital watch OR hunter OR
hunting watch OR pendulum watch OR pocket watch


Check this Out,may be u will come up with Briliant Idea's



with regards
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, January 10, 2005 5:19 PM
To: Lucene Users List
Subject: Re: SYNONYM + GOOGLE



On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
 If u search Google  using  '~shoes',  It returns  hits  based on the
 Synonym's

 [ I know there is a Synonym Wordnet  based Lucene Package in the
 sandbox

 http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
 contributions/WordN
 et/   ]

 Can this be achieved in Lucene ,If so How ???

Yes, it can be achieved.  Not quite synonyms, but various forms of the
same word can be found in this example, like this search for similar
(see the highlighted variations):

http://www.lucenebook.com/search?query=similar

This is accomplished using the Snowball stemmer filter found in the
sandbox.   For synonyms, you have lots of options.  In Lucene in Action
I demonstrate custom analyzers that inject synonyms using the WordNet
database (from the sandbox).  From the source code distribution of LIA:

% ant SynonymAnalyzerViewer
Buildfile: build.xml

SynonymAnalyzerViewer:
  [echo]
  [echo]   Using a custom SynonymAnalyzer, two fixed strings are
  [echo]   analyzed with the results displayed.  Synonyms, from
the
  [echo]   WordNet database, are injected into the same positions
  [echo]   as the original words.
  [echo]
  [echo]   See the Analysis chapter for more on synonym
injection and
  [echo]   position increments.  The Tools and extensions
chapter covers
  [echo]   the WordNet feature found in the Lucene sandbox.
  [echo]
 [input] Press return to continue...

  [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...

  [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
[quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
[agile]
  [java] 2: [brown] [brownness] [brownish]
  [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
  [java] 4: [jumps]
  [java] 5: [over] [o] [across]
  [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
  [java] 7: [dogs]

...

The phrase analyzed was The quick brown fox jumps over the lazy dogs.
  Why no synonyms for jumps and dogs?  WordNet has synonyms for
jump and dog, but not the plural forms.  Stemming would be a
necessary step in achieving full synonym look-up, though this would
need to be done carefully as the stem of a word is not necessarily a
real word itself - so you'd probably want to stem the synonym database
also to ensure accurate lookup.

Also notice the semantically incorrect synonyms that appear for the
animal fox (confuse, for example).  Be careful!  :)

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Please help - installation problem

2005-01-10 Thread Karthik N S
Hi

   I think u need to add One more piece of code at the end of

   Path setting

   JAVA _HOME = /home/JDK..

   and finally

   export TOMCAT_HOME CLASSPATH ANT_HOME PATH

   Once u have done this Type echo $CLASSPATH to check if the jar files are
avaliable for compilation / Interpretation


 an I also know if tomcat version 3.2.4 is sufficient for Lucene to run?

  Any Relavent Tomcat that works with Jdk1.4.2 is suffucient for Lucene to
Execute...




with regards
Karthik



-Original Message-
From: jac jac [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 11, 2005 8:21 AM
To: lucene-user@jakarta.apache.org
Subject: Please help - installation problem


Hi all,

This is jac here, and I am currently in urgent need to install Lucene on a
unix machine. However, I am not sure where to set the paths coz I am
unfamiliar with Unix  am a newbie to Java as well.

I have installed Lucene on the Windows version and it works but I cant
understand why unix can't..

The following paths are what I have entered:
Can someone please check for me?

-  PATH=.:$PATH:$ANT_HOME/bin
- TOMCAT_HOME=/home/jac/jakarta-tomcat-3.2.4
- ANT_HOME=/home/jac/apache-ant-1.6.2
- CLASSPATH=$TOMCAT_HOME/webapps/luc/WEB-INF/lib/lucene-1.4.3.jar:
$TOMCAT_HOME/webapps/luc/WEB-INF/lib/lucene-demos-1.4.3

Can I also know if tomcat version 3.2.4 is sufficient for Lucene to run?

Thanks in advance!

Regards,
jac


 Yahoo! Mobile
- Download the latest ringtones, games, and more!


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



QUERYPARSIN BOOSTING

2005-01-10 Thread Karthik N S


Hi Guys



Apologies...

This Question may be asked million times on this form ,need some
clarifications.

1) FieldType =  keyword  name =  vendor

2)FieldType =  text  name = contents

Question:

1) How to Construct a Query which would allow hits  avaliable for the VENDOR
to  appear  first ?.

2) If boosting is to be applied How TO   ?.

3) Is the Query Constructed Below correct?.

+Contents:shoes +((vendor:nike)^10)



Please Advise.
Thx in advance.


WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



INDEXREADER + MAXDOC

2005-01-04 Thread Karthik N S


Hi

Guys

Apologies...



On using the integer number of  Indexreader.maxDoc() API , 

Is it possible to get the VALUES from the varoius  fieldtypes.

ex:-   'docs.get(contents)  at  IndexReader.maxdoc()'



If so How...??




WITH WARM REGARDS 
HAVE A NICE DAY 
[ N.S.KARTHIK] 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: INDEXREADER + MAXDOC

2005-01-04 Thread Karthik N S
Hi Erik

Apologies...

  I would like to EXTRACT the DATA from the various fields of the Last
Document [as u said ]

  Ex: at IndexReader.maxDoc = 100

  doc.get(Content) == ISBN100
  doc.get(name)== LUCENE IN ACTION
  doc.get(author)  == Erik Hatcher
  .

This is my Requirement.

Please
With regards
Karthik


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 04, 2005 5:10 PM
To: Lucene Users List
Subject: Re: INDEXREADER + MAXDOC



On Jan 4, 2005, at 5:19 AM, Karthik N S wrote:
 On using the integer number of  Indexreader.maxDoc() API ,

 Is it possible to get the VALUES from the varoius  fieldtypes.

 ex:-   'docs.get(contents)  at  IndexReader.maxdoc()'



 If so How...??

Just to be sure I understand... you want the last document in the
index?  IndexReader.document(n) will give you this.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: New Highlighter features + api

2005-01-02 Thread Karthik N S
Hi

  Mark,

Apologies..



Can u Please tell the form where to find the 
JavaDoc API for the Highlighter package u have created.


Thx in Advance
[ WISH U  AND THE FORM 'A HAPPY NEW YEAR']
KARTHIK




-Original Message-
From: markharw00d [mailto:[EMAIL PROTECTED]
Sent: Monday, January 03, 2005 4:14 AM
To: Lucene Users List
Subject: New Highlighter features


The Highlighter package in CVS has been updated with the following new 
features:

* GradientFormatter is a new formatter that can be used to change the 
colour intensity of matching terms, based on their score. I have found 
this to be a useful way of visualizing the basis of query matches, 
especially when  the query was derived automatically eg in a 
MoreLikeThis style of query.

* The QueryScorer class has a new constructor that takes an IndexReader 
which is used to provide term scores based on scarcity (idf score).  
Using this with the new GradientFormatter ensures the most important 
terms are highlighted most strongly.

* New class TokenSources offers methods to produce a TokenStream from 
indexes using the new TermVector features (saving the cost of 
reanalyzing to produce TokenStreams for the highlighter).

TokenStreams. Formatters and Scorers are all pluggable elements of the 
main Highlighter class so these are just new extensions around the 
existing core functionality. See the Javadocs for further details on how 
to use these components..

Cheers
Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



SNOWBALL STEMMER + BOOSTING

2004-12-22 Thread Karthik N S


Hi Guys

Apologies..

Using Analysis Paralysis on SnowBall Stemmer [ using StandardAnalyzer.
ENGLISH_STOP_WORDS
and StopAnalyzer.ENGLISH_STOP_WORDS ] from

http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html?page=last#thre
ad

for the word   'jakarta^4 apache'

both the cases return me something like this
=
org.apache.lucene.analysis.snowball.SnowballAnalyzer:

[JAKARTHA] [4] [APACHE]

=


I wonder what happened to the BOOSTING SYMBOL '^' and if the same word
is used on QueryParser.parse(), What would be the Hit's returned???

Thx in advance

WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



MergerIndex + Searchables

2004-12-21 Thread Karthik N S
Hi Guys

Apologies...


I have several MERGERINDEXES [  MGR1,MGR2,MGR3].

for searching across these MERGERINDEXES I use the following Code
IndexSearcher[] indexToSearch = new IndexSearcher[CNTINDXDBOOK];

for(int all=0;allCNTINDXDBOOK;all++){
indexToSearch[all] = new IndexSearcher(INDEXEDBOOKS[all]);
 System.out.println(all +  ADDED TO SEARCHABLES  + INDEXEDBOOKS[all]);
}

MultiSearcher searcher = new MultiSearcher(indexToSearch);


Question :

When on Search Process , How to Display that this relevan  Document Id
Originated from Which MRG???

[ Some thing like this : -  Search word  'ISBN12345' is avalible from
MRGx ]



  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: NUMERIC RANGE BOOLEAN

2004-12-16 Thread Karthik N S
Hi Erik


Apologies.


Yes As I told u in the X-mail

We have to get the All the Hits int the Range ,

   So  0.99 cents IS  ALWAYS be 0.99 cents  on which we do the price
Comaprison from consumer point of view .


I hope  I have answered u'r Question


With regards
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 16, 2004 5:24 PM
To: Lucene Users List
Subject: Re: NUMERIC RANGE BOOLEAN


On Dec 16, 2004, at 5:03 AM, Morus Walter wrote:
 Erik Hatcher writes:

 TooManyClauses exception occurs when a query such as a RangeQuery
 expands to more than 1024 terms.  I don't see how this could be the
 case in the query you provided - are you certain that is the query
 that
 generated the error?

 Why not: the terms might be 0003 0003.1 0003.11 ...

 So the question is, how do his terms look like...

Ah, good point! So, Karthik - what are are the values of those terms?

Pragmatically, do you really need to do a range involving the cents of
a price?

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Indexing with Lucene 1.4.3

2004-12-16 Thread Karthik N S

Hi there

Apologies.



   If u are using the IndexHTML from the demo.jar package which is
abvaliable from Lucene1.4.3.zip

 Then u bettter look at the File Extensions of u'r file's,they may be
filtered out of the indexing process

 due to this code present in IndexHTML.java
 
  } else if (file.getPath().endsWith(.html) || // index .html files
  file.getPath().endsWith(.htm) || // index .htm files
  file.getPath().endsWith(.txt)) { // index .txt files
 


It the Extensions u have is within the 'endsWith' options then u have
sucessfully indexed the 6000 Documents of u's

Try to use the Luke Monitering S/f avaliable from the Jakartha Lucene Web
site and check for the same

[Hint Try to use the SearchFiles.class from the Lucene1.4.3.zip to search
onthe documents u have indexed sucessfuly]


with regards
Karthik






-Original Message-
From: Hetan Shah [mailto:[EMAIL PROTECTED]
Sent: Friday, December 17, 2004 12:30 AM
To: Lucene Users List
Subject: Indexing with Lucene 1.4.3


Hello,

I have been trying to index around 6000 documents using IndexHTML from
1.4.3 and at the end of indexing in my index directory I only have 3 files.
segments
deletable and
_5en.cfs

Can someone tell me what is going on and where are the actual index
files? How can I resolve this issue?
Thanks.
-H


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



NUMERIC RANGE BOOLEAN

2004-12-16 Thread Karthik N S
Hi Guys

Apologies.




Can some body Please Tell me Why is this Happening and any work around for
the same .???


Constructed String : +bags +itemPrice:[0003 TO 0020]

Query String: +contents:bags +itemPrice:[0003 TO 0020]

org.apache.lucene.search.BooleanQuery$TooManyClauses
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:79)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:71)
at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:99)
at
org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:243)
at
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:166)







  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: NUMERIC RANGE BOOLEAN

2004-12-16 Thread Karthik N S
Hi Erik

Yes this is Happening in our case

and we are using Lucene1.4.3 with same Sys config from my Ex mail


With regards
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 16, 2004 3:17 PM
To: Lucene Users List
Subject: Re: NUMERIC RANGE BOOLEAN


On Dec 16, 2004, at 4:07 AM, Karthik N S wrote:
 Can some body Please Tell me Why is this Happening and any work around 
 for
 the same .???


 Constructed String : +bags +itemPrice:[0003 TO 0020]

 Query String: +contents:bags +itemPrice:[0003 TO 0020]

 org.apache.lucene.search.BooleanQuery$TooManyClauses

TooManyClauses exception occurs when a query such as a RangeQuery 
expands to more than 1024 terms.  I don't see how this could be the 
case in the query you provided - are you certain that is the query that 
generated the error?

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


LUCENE1.4.1 - LUCENE1.4.2 - LUCENE1.4.3 Exception

2004-12-14 Thread Karthik N S
Hi Guys


Some body tell me what this Exception am Getting Pleae

Sys Specifications

O/s Linux Gentoo
Appserver Apache Tomcat/4.1.24
Jdk build 1.4.2_03-b02
Lucene 1.4.1 ,2, 3

Note: - This Exception is displayed on Every 2nd Query after Tomcat is
started


java.io.IOException: Stale NFS file handle
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(RandomAccessFile.java:307)
at
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:420)
at
org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
at
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou
ndFileReader.java:220)
at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at
org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java:142)
at
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:115)
at
org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:143)
at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:137)
at
org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:253)
at
org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:69)
at org.apache.lucene.search.Similarity.idf(Similarity.java:255)
at
org.apache.lucene.search.TermQuery$TermWeight.sumOfSquaredWeights(TermQuery.
java:47)
at org.apache.lucene.search.Query.weight(Query.java:86)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
at
org.apache.lucene.search.MultiSearcherThread.run(ParallelMultiSearcher.java:
251)





  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: HITCOLLECTOR+SCORE+DELIMMA

2004-12-13 Thread Karthik N S
Hi

Vikas Gupta


Since Erik Replied to me on my last mail, A FILTER cand be built for the
same can be to
fetch scrores between  0.2f to 1.0f.

Can u please spare me some code for the same.

[ Sorry for the Spell mistake, My Mail IDE does not have one ]
With regards
Karthik




-Original Message-
From: Vikas Gupta [mailto:[EMAIL PROTECTED]
Sent: Monday, December 13, 2004 3:17 PM
To: Lucene Users List
Subject: RE: HITCOLLECTOR+SCORE+DELIMA


 On Dec 10, 2004, at 7:39 AM, Karthik N S wrote:
  I am still in delima on How to use the HitCollector for returning
  Hits hits
  between scores  0.2f to 1.0f ,
 
  There is not a simple example for the same, yet lot's of talk on usage
  for
  the same on the form.

1) I am not 100% sure about this but it might work.

Add the code starting with  in IndexSearcher.java::search()

 // inherit javadoc
  public TopDocs search(Query query, Filter filter, final int nDocs)
   throws IOException {
Scorer scorer = query.weight(this).scorer(reader);
if (scorer == null)
  return new TopDocs(0, new ScoreDoc[0]);

final BitSet bits = filter != null ? filter.bits(reader) : null;
final HitQueue hq = new HitQueue(nDocs);
final int[] totalHits = new int[1];
scorer.score(new HitCollector() {
public final void collect(int doc, float score) {
  if (score  0.0f  // ignore zeroed buckets
  score 0.2f  score1.0f)
  (bits==null || bits.get(doc))) {// skip docs not in bits
totalHits[0]++;
hq.insert(new ScoreDoc(doc, score));
  }
}
  });



2) Filter examples are in Lucene in Action book, Chapter 5. I wrote an
example as well:



String query = odyssey;

BooleanQuery bq = new BooleanQuery();
bq.add(new TermQuery(new Term(content, query)), true, false);

BooleanQuery bqf = new BooleanQuery();
bqf.add(new TermQuery(new Term(H2, query)), true, false);

Filter f = new QueryFilter(bqf);

IndexReader reader = IndexReader.open(new File(dir,
index).getCanonicalPath());
Searcher luceneSearcher = new
org.apache.lucene.search.IndexSearcher(reader);
luceneSearcher.setSimilarity(new NutchSimilarity());

//Logically the following would be executed as follows: Find all
//the docs matching bq. Select the ones which matchbqf
hits = luceneSearcher.search(bq, f);

System.out.print(query:  + query);

System.out.println(Total hits:  + hits.length());

3) delima is spelled as dilemma


-Vikas Gupta

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: HITCOLLECTOR+SCORE+DELIMMA

2004-12-13 Thread Karthik N S
Hi Erik


What exactly do u mean by this


We've emphasized numerous times that calling hits.doc(i) is a resource
hit.  Don't do it for documents you aren't going to show.  To filter by
score, use hits.score(i) first.

 I am bit Confused u mean to say Replace

   hits.doc(i)

by

  hits.score(i)



Also

 Ah, so you are accessing every document to get this field information.
 It is incorrect that you cannot filter prior to getting hits.  You have
 a couple of options in filtering by a field value - use a QueryFilter
. or simply AND a RangeQuery to the original query.


Since the portal we ar building for is a eCommerce one, We have to return
SearchWord across

  ( 7 ) x 1000 x  15000  documents , Get most of the Relevant His (Where
ever Score is between 0.5 to 1.0 )

  and then Sort the adjecent Fields 'Vendors' and 'Price' in ASC Order


 In such a case We cannot use RangeQuery without priorly knowing what
exactly the Consumer want's


 Is it not possible to have a Generalized Filter in further versions of API
, to Inject some minor factors prior to

 getting the Hits returned.


Thx in advance
Karthik



-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 14, 2004 3:44 PM
To: Lucene Users List
Subject: Re: HITCOLLECTOR+SCORE+DELIMMA



On Dec 13, 2004, at 11:16 PM, Karthik N S wrote:
  time [ A simple search of 'handbags' returned 1,60,000 hits and time
 taken
 was 440 secs ,in production Env  / May be our
  Coding is poor,But we are constantly improving the process ].

If your searches are taking 440 seconds, you have something more
fundamentally wrong.  You are either doing some large
wildcard/range/fuzzy expansions or you're accessing every document from
all your hits.  Is the searcher.search() method taking that long?  I
bet not.  Or rather is it the iteration over the Hits that is killing
the search time, which is what I suspect?

We've emphasized numerous times that calling hits.doc(i) is a resource
hit.  Don't do it for documents you aren't going to show.  To filter by
score, use hits.score(i) first.

  { O/s Linux Gentoo , RAM 1GB, Lucene1.4.1,Appserver = Tomcat5, and
 BlackDawn Java 1.4.2 with Args  -XX:+UseParallelGC for

  Garbage Collection  }

Please narrow your code down to a clean, succinct example that you can
post.  It is difficult to help you without details of your code (but
let me emphasize again - it needs to be clean and succinct so it is
quick for us to get a handle on).

  To be One step in advance ,We also have an adjecent Fields 'Vendor
 ','Price' which we have to accordingly Compare
  Best/Poor/Least results . So We have to have to limit the hits
 accordingly,since Lucene API does not provide any way to
  inject this limiting facility *prior* to getting the hits .

Ah, so you are accessing every document to get this field information.
It is incorrect that you cannot filter prior to getting hits.  You have
a couple of options in filtering by a field value - use a QueryFilter
or simply AND a RangeQuery to the original query.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: HITCOLLECTOR+SCORE+DELIMMA

2004-12-13 Thread Karthik N S
Hi Erik

Apologies...




 In this Mailed
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]
che.orgmsgNo=11254

 I have already told u that  doc.get( ); was coming in batches for a mear
hit of  '4000' , and this is happening in real

 time [ A simple search of 'handbags' returned 1,60,000 hits and time taken
was 440 secs ,in production Env  / May be our

 Coding is poor,But we are constantly improving the process ].


 { O/s Linux Gentoo , RAM 1GB, Lucene1.4.1,Appserver = Tomcat5, and
BlackDawn Java 1.4.2 with Args  -XX:+UseParallelGC for

 Garbage Collection  }


 To be One step in advance ,We also have an adjecent Fields 'Vendor
','Price' which we have to accordingly Compare

 Best/Poor/Least results . So We have to have to limit the hits
accordingly,since Lucene API does not provide any way to

 inject this limiting facility *prior* to getting the hits .


 [ Excuse me Nader Henein ,I am from a Lucene-Users Form  NOT in
Lucene-Developer's Form,

  So we expect a Least possible Help ]


With Warm Regards
Karthik




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, December 13, 2004 6:39 PM
To: Lucene Users List
Subject: Re: HITCOLLECTOR+SCORE+DELIMMA



On Dec 13, 2004, at 6:58 AM, Karthik N S wrote:
 Iterate over Hits.  returns large hit values and Iteration on Hits
 for
 scores consumes time ,

 so How Do I Limit my Search Between [ X.xf to Y.yf ] prior getting the
 Hits.

Why do you need to do this *prior* to getting Hits?

You have yet to justify what you're asking.  I almost guarantee you
that navigating Hits in the way I said will be as fast as you need it
to be.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: HITCOLLECTOR+SCORE+DELIMA

2004-12-12 Thread Karthik N S

Hi Guys

Apologies..


So u say I have to Build a Filter to Collect all the Scores between the 2
Ranges [ 0.2f to 1.0f]


so the API for the same would be

 Hits hit = search(Query query, Filter filtertoGetScore)


 But while writing the Filter  Score again depends on Hits   Score =
hits.score(x);



 How To solve this Or Am I in Wrong Process


Any Simple Src for the same will be greatly appreciated.  :)

Thx in advance



-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, December 10, 2004 6:54 PM
To: Lucene Users List
Subject: Re: HITCOLLECTOR+SCORE+DELIMA


On Dec 10, 2004, at 7:39 AM, Karthik N S wrote:
 I am still in delima on How to use the HitCollector for returning
 Hits hits
 between scores  0.2f to 1.0f ,

 There is not a simple example for the same, yet lot's of talk on usage
 for
 the same on the form.

Unfortunately there isn't a clean way to stop a HitCollector - it will
simply collect all hits.

Also, scores are _not_ normalized when passed to a HitCollector, so you
may get scores  1.0.  Hits, however, does normalize and you're
guaranteed that scores will be = 1.0.  Hits are in descending score
order, so you may just want to use Hits and filter based on the score
provided by hits.score(i).

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



HITCOLLECTOR+SCORE+DELIMA

2004-12-10 Thread Karthik N S

Hi guys

Apologies.



I am still in delima on How to use the HitCollector for returning  Hits hits
between scores  0.2f to 1.0f ,

There is not a simple example for the same, yet lot's of talk on usage for
the same on the form.

Please somebody spare a bit of code (u'r intelligence) on this form.





Thx in advance
Karthik

























  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



SEARCH +HITS+LIMIT

2004-12-08 Thread Karthik N S
Hi Guy's

Apologies...



One question for the form [ Especially Erik]


1) I have a MERGED Index with  100,000  File Indexed into it  ( Content  is
one of the Fields of Type 'Text' )

2) On search for a simple words  Camera  returns me  6000 hits.

3) Since the Search process is  via  WebApps , a simple JSP is used to
display the Content.


Question

How to Display the Contents for the Hits in  Incremental order ?

[ Each Time a re hit to the Mergerindex with Incremental X value ].
This would solve the problem of Out of Memory by prefetching all the hit in
one strait go process.

Ex:

Total hits 6000

1st page  -  hit's returned (1   to   25)
2nd page -  hit's returned (26  to  50)
.
.
.
.

N th page  hit's returned ( 5975 - 6000 )

Hint : - This is similar to a SQL query   SELECT * FROM LUCENE  LIMIT 10, 5



  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: LUCENE + 1.4.2

2004-12-06 Thread Karthik N S
Hi
Erik



Apologies...


This mean's that Issues w.r.t 1.4.2 and 1.4.1 are fixed in 1.4.3 as of
presently,

1) So u say  We can retrospectively move our under Developemental Code from
to 1.4.3 from 1.4.1 safetly ?.

2) Do we need to Reindex All Of Our Code done via 1.4.1  or continue with
replacement of 1.4.3.jar alone ?.

3) Can we also Have some Announcement on the Form from time to time  when
ever the new and final Versions are released ?.



Thx in advance.
Karthik


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, December 06, 2004 3:39 PM
To: Lucene Users List
Subject: Re: LUCENE + 1.4.2



On Dec 6, 2004, at 1:22 AM, Karthik N S wrote:
 I am not able  to find the  FINAL Lucene 1.4.2  SRC any where on
 http://jakarta.apache.org/lucene/docs/index.html

 Please can some Body Reply the Form with the URL.

Actually Lucene 1.4.3 is now available and I recommend you use it
instead, through the official Jakarta binary downloads.  We're still
tidying up some loose ends on it before officially announcing it, but
you can find it here:

http://www.apache.org/dist/jakarta/lucene/binaries/

We did not release 1.4.1 or 1.4.2 properly, so those binaries are not
available there currently.  Whether we retroactively put them there is
still undecided.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



LUCENE + 1.4.2

2004-12-05 Thread Karthik N S
Hi Guys.



Apologies...



I am not able  to find the  FINAL Lucene 1.4.2  SRC any where on
http://jakarta.apache.org/lucene/docs/index.html

Please can some Body Reply the Form with the URL.




  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: UNIQUE FIELD NAMES + SEARCH

2004-12-02 Thread Karthik N S

Hi Erik


Apologies...


 Thx That src worked perfectly.

 Wow that really overcame a huge boulder for me.

 .. :|

with regards
Karthik 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 02, 2004 4:23 PM
To: Lucene Users List
Subject: Re: UNIQUE FIELD NAMES + SEARCH


On Dec 2, 2004, at 2:13 AM, Karthik N S wrote:
 I  My Index, I have a Filed Type KeyWord  ' FILE_NAME ' , It Captures 
 UNIQUE
 FOLDER NAME'S  [ Starts with  B1,B2,B3. ]  During Indexing Process.

 Please Can SomeBody Tell me How to Display  ALL the FOLDER NAMES  from 
 the
 Field  'FILE_NAME'  With out any Search Word

I guess if you keep asking, someone will eventually answer :)

Here's an example I use to get all categories from the Lucene index 
that drives my blog at http://www.blogscene.org/erik

 Set categories = new TreeSet();

 IndexReader reader = IndexReader.open(indexDir);
 try {
 TermEnum terms = reader.terms(new Term(category, ));
 while (category.equals(terms.term().field())) {
 categories.add(terms.term().text());

 if (!terms.next()) {
 break;
 }
 }
 } finally {
 reader.close();
 }

IndexReader is enumerating all the terms in the category field 
(you'll use your filename field name instead).

 [ Can I use  'B* '  for Search Exclusively on the Field Type ]

Sure, but that would require that you walk every document returned from 
the search and pull its filename field.  This would be vastly slower 
than the above code that goes directly to the terms.

My apologies for not replying sooner on this.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: GETVALUES +SEARCH

2004-12-01 Thread Karthik N S
Hi
  Erik

Apologies..


  We create a ArrayList Object and Load all the Hit Values into them and
return
  the same for Display purpose on a Servlet. On the servlet we track the
server side created ArrayList
  for Required number of dispalys.

 [ At any time we have to have all the hit values loaded into the arryList
,cannot compromise for the same ]


  We Obsorved that the doc.get() was not continous for an hit of 4000 and
was coming
  in batches,


 So any new API features will definetly helps us.


With regards
Karthik


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 01, 2004 4:04 PM
To: Lucene Users List
Subject: Re: GETVALUES +SEARCH


On Dec 1, 2004, at 12:41 AM, Karthik N S wrote:
Is there any API in Lucene Which can retrieve all the searched
 Values in
 single fetch

into some sort of an 'Array'   WITHOUT using this [ below ] Looping
 process [ This would make

the Search and display more Faster ].

  for (int i = 0; i  hits.length();i++) {
 Document doc = hits.doc(i);
 String path  = doc.get(path);
 .
  }

Are you really showing *all* results at one time?  Or just the first
several?  Iterating over all hits and retrieving each Document is often
unwise and generally unnecessary if only the first 20 or so are shown
at first.

I don't know of a simpler way to get all the path values in your
example.  Perhaps a HitCollector is more to your liking?  Though it
probably would not speed anything up for you.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



UNIQUE FIELD NAMES + SEARCH

2004-12-01 Thread Karthik N S
Hi Guys'
Apologies



I  My Index, I have a Filed Type KeyWord  ' FILE_NAME ' , It Captures UNIQUE
FOLDER NAME'S  [ Starts with  B1,B2,B3. ]  During Indexing Process.

Please Can SomeBody Tell me How to Display  ALL the FOLDER NAMES  from the
Field  'FILE_NAME'  With out any Search Word


[ Can I use  'B* '  for Search Exclusively on the Field Type ]



Thx in Advance






  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



SEARCH CRITERIA

2004-11-30 Thread Karthik N S

Hi Guys

Apologies.


On yahoo and Altavista ,if searched upon a word like 'kid'  returns the
search with

similar as below.


   Also try: kid rock, kid games, star wars kid, karate kid   More...



  How to obtain the similar search criteria using Lucene.


Thx in advance


Warm regards
Karthik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




GETVALUES +SEARCH

2004-11-30 Thread Karthik N S

Hi Guys


Apologies.




On Search API the command  [ package  org.apache.lucene.document.Document ]

Will this'public final String[] getValues(String name)' return me
all the docs with out looping  thru ?

Please Explaine with example.



Thx in advance



  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: GETVALUES +SEARCH

2004-11-30 Thread Karthik N S
Hi Guys


Apologies...



   Is there any API in Lucene Which can retrieve all the searched Values in
single fetch

   into some sort of an 'Array'   WITHOUT using this [ below ] Looping
process [ This would make

   the Search and display more Faster ].

 for (int i = 0; i  hits.length();i++) {
  Document doc = hits.doc(i);
  String path  = doc.get(path);
.
 }



Thx in Advance
Karthik


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 30, 2004 8:06 PM
To: Lucene Users List
Subject: Re: GETVALUES +SEARCH



On Nov 30, 2004, at 7:10 AM, Karthik N S wrote:
 On Search API the command  [ package
 org.apache.lucene.document.Document ]

 Will this'public final String[] getValues(String name)' return
 me
 all the docs with out looping  thru ?

getValues(fieldName) returns a String[] of the values of the field.
It's similar to getValue(fieldName).  If you index a field multiple
times:

doc.add(Field.Keyword(keyword, one));
doc.add(Field.Keyword(keyword, two));

getValue(keyword) will return one, but getValues(keyword) will
return a String[] {one, two}

If you want to retrieve all documents, use IndexReader's various API
methods.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



UNIQUE FILE SEARCH

2004-11-28 Thread Karthik N S


Hi Guy's


Apologies.



I  have a Index with one of the fields   is   FieldType  'KeyWord' .

To this Field I add  UNIQUE  File Names .



On Search How can I display All the File names  with out  any SearchKeyword
?.



Thx in Advance.




  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



MERGERINDEX + SOLUTION

2004-11-23 Thread Karthik N S

Hi Guys

Apologies



I have a MERGERINDEX [ Merged 1000 subindexes] ,


The Question  is

Does Somebody have any solution  for recorrecting  the  Mergerindex [ in
case of Corruption ]

If so Please Let the Form  know about this,so developers like us would use
the same.


Thx in Advance





  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



X-Thread Message

2004-11-18 Thread Karthik N S

Hi  Guys


Apologies



Does Any body have any Improved Suggestion  for the Thread

http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]
he.orgmsgId=1992830


I am still in Delima



  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: COUNT SUBINDEX [IN MERGERINDEX]

2004-11-17 Thread Karthik N S
Hi Guys


Apologies..

I am Still Confused.. ;(


Let me make it more simple Question


   On using Search from a  Index without any SearchWord,  I would like to
count  the total number of Documents present in it.

   [ I Only have the Field Types 'Field.Keyword' which stores the Unique
filename ]

   Will IndexReader.termDocs(term) give me the Count for the same.
   If so How To use it... Please

  Thx in advance.
Karthik



-Original Message-
From: Paul Elschot [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 17, 2004 2:02 PM
To: [EMAIL PROTECTED]
Subject: Re: COUNT SUBINDEX [IN MERGERINDEX]


On Wednesday 17 November 2004 07:10, Karthik N S wrote:
 Hi guy's


 Apologies.


   So  A Mergeed Index is again a Single [ addition of subIndexes... ),

  If that case , If One of the Field Types is of  type   'Field.Keyword'
 whic is Unique across the subIndexes [Before Merging].

  and If I want to Count this Unique Field in a MergerIndex  [After i'ts
been
 Merged ] How do I do this Please.

IndexReader.numDocs() will give the number of docs in an index.

Lucene has no direct support for unique fields. After merging, if the
same unique field value occurs in both source indexes, the merged
index will contain two documents with that value.
In case one wants to merge into unique field values, the non unique
values in one of the source indexes need to be deleted before merging.

See IndexReader.termDocs(term) on how to get the document numbers
for (unique) terms via a TermDocs, and IndexReader.delete(docNum)
for deleting docs.

Regards,
Paul.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: COUNT SUBINDEX [IN MERGERINDEX]

2004-11-16 Thread Karthik N S
Hi guy's


Apologies.


  So  A Mergeed Index is again a Single [ addition of subIndexes... ),

 If that case , If One of the Field Types is of  type   'Field.Keyword'
whic is Unique across the subIndexes [Before Merging].

 and If I want to Count this Unique Field in a MergerIndex  [After i'ts been
Merged ] How do I do this Please.

  Ex
  SubIndex1 = filename1,filenam2,filenam3

  SubIndex2 = filename4,filenam5,filenam6

 MergerIndex1 = filename1,filenam2,filenam3, filename4,filenam5,filenam6

[From MergerIndex] Count = 6 nos


Something like the above



Thx in Advance




-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 17, 2004 10:30 AM
To: Lucene Users List
Subject: Re: COUNT SUBINDEX [IN MERGERINDEX]


Once the index is merged there is only 1 index - there are no
subindices.

Otis

--- Karthik N S [EMAIL PROTECTED] wrote:



 Hi Guys,


 Apologies .



 Can Some body Tell me which API to use to Count the  number of
 SubIndexe's
 in a MERGED Index.



 Thx in Advance





   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Karthik N S
Hi Guy's

Apologies .


  I am NOT Using sorting code

  hits = multiSearcher.search(query, new Sort(new SortField(filename,
SortField.STRING)));

 but using multiSearcher.search(query)

 in Core Files setup and still getting the Error.



 More Advises Required..


Karthik



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 12:46 PM
To: Lucene Users List
Subject: Re: Lucene1.4.1 + OutOf Memory


There is a memory leak in the sorting code of Lucene 1.4.1.
1.4.2 has the fix!

--- Karthik N S [EMAIL PROTECTED] wrote:


 Hi
 Guys

 Apologies..



 History

 Ist type :  4  subindexes   +  MultiSearcher  + Search on
 Content Field
 Only  for 2000 hits


=
 Exception  [ Too many Files Open ]





 IInd type :  40 Mergerd Indexes [1000 subindexes each]   +
 MultiSearcher
 /ParallelSearcher +  Search on Content Field Only for 2
 hits


=
 Exception  [ OutOf Memeory  ]



 System Config  [same for both type]

 Amd Processor [High End Single]
 RAM  1GB
 O/s Linux  ( jantoo type )
 Appserver Tomcat 5.05
 Jdk [ IBM  Blackdown-1.4.1-01  ( == Jdk1.4.1) ]

 Index contains 15 Fields
 Search
 Done only on 1 field
 Retrieve 11 corrosponding fields
 3 Fields  are for debug details


 Switched from Ist type to IInd Type

 Can some body suggest me Why is this Happening

 Thx in advance




   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]





-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Karthik N S
Hi Guy's


Apologies.


  That's Why  Somebody on the form asked me to Switch to


 : 40 Mergerd Indexes [1000 subindexes each]   +  MultiSearcher /
ParallelSearcher +  Search on Content Field Only for 2

  the problem of to many Files open was solved since now there were only 40
MergerIndexes - [1 MergerIndex has 1000 sub indexes]

  instead of  4 subindexes.

 Now I am gettinf Out of Memory Exception.


  Any Idea On how to Solve this problem.



Thx in Advance






-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 2:16 PM
To: Lucene Users List
Subject: RE: Lucene1.4.1 + OutOf Memory



Exception too many files open means:
- searcher object is nor closed after query execution
- too little file handlers

Regards
J.



  Karthik N S
  [EMAIL PROTECTED]To:   Lucene Users List
[EMAIL PROTECTED],
  et.co.in
[EMAIL PROTECTED]
   cc:   (bcc: Iouli
Golovatyi/X/GP/Novartis)
  10.11.2004 09:41 Subject:  RE: Lucene1.4.1 +
OutOf Memory
  Please respond to
  Lucene UsersCategory:
|-|
  List| ( ) Action
needed   |
   | ( )
Decision needed |
   | ( ) General
Information |

|-|






Hi Guy's

Apologies .


  I am NOT Using sorting code

  hits = multiSearcher.search(query, new Sort(new SortField(filename,
SortField.STRING)));

 but using multiSearcher.search(query)

 in Core Files setup and still getting the Error.



 More Advises Required..


Karthik



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 12:46 PM
To: Lucene Users List
Subject: Re: Lucene1.4.1 + OutOf Memory


There is a memory leak in the sorting code of Lucene 1.4.1.
1.4.2 has the fix!

--- Karthik N S [EMAIL PROTECTED] wrote:


 Hi
 Guys

 Apologies..



 History

 Ist type :  4  subindexes   +  MultiSearcher  + Search on
 Content Field
 Only  for 2000 hits


=
 Exception  [ Too many Files Open ]





 IInd type :  40 Mergerd Indexes [1000 subindexes each]   +
 MultiSearcher
 /ParallelSearcher +  Search on Content Field Only for 2
 hits


=
 Exception  [ OutOf Memeory  ]



 System Config  [same for both type]

 Amd Processor [High End Single]
 RAM  1GB
 O/s Linux  ( jantoo type )
 Appserver Tomcat 5.05
 Jdk [ IBM  Blackdown-1.4.1-01  ( == Jdk1.4.1) ]

 Index contains 15 Fields
 Search
 Done only on 1 field
 Retrieve 11 corrosponding fields
 3 Fields  are for debug details


 Switched from Ist type to IInd Type

 Can some body suggest me Why is this Happening

 Thx in advance




   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]





-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Karthik N S
Hi Guy's


Apologies..


 Yes  Erik

  The Day I switched from Lucene1.3.1 to Lucene1.4.1  We  are using  the
CompoundFile format to


writer.setUseCompoundFile(true);


Some More Advises Please.


Thx in advance

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 3:05 PM
To: Lucene Users List
Subject: Re: Lucene1.4.1 + OutOf Memory


On Nov 10, 2004, at 1:55 AM, Karthik N S wrote:

 Hi
 Guys

 Apologies..

No need to apologize for asking questions.

 History

 Ist type :  4  subindexes   +  MultiSearcher  + Search on Content
 Field

You've got 40,000 indexes aggregated under a MultiSearcher and you're
wondering why you're running out of memory?!  :O

 Exception  [ Too many Files Open ]

Are you using the compound file format?

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Karthik N S
 files


 not being deleted after 1.4.1. Not sure if that could cause the
 problems
 you're experiencing.

 Regards
 Daniel






 Well, it seems not to be files, it looks more like those
 SegmentTermEnum
 objects accumulating in memory.
 #I've seen some discussion on these objects in the
 developer-newsgroup
 that had taken place some time ago.
 I am afraid this is some kind of runaway caching I have to deal with.
 Maybe not  correctly addressed in this newsgroup, after all...

 Anyway: any idea if there is an API command to re-init caches?

 Thanks,

 Daniel



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]







 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]







 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]









-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: 10 November 2004 09:35
To: Lucene Users List
Subject: Re: Lucene1.4.1 + OutOf Memory


On Nov 10, 2004, at 1:55 AM, Karthik N S wrote:

 Hi
 Guys

 Apologies..

No need to apologize for asking questions.

 History

 Ist type :  4  subindexes   +  MultiSearcher  + Search on Content
 Field

You've got 40,000 indexes aggregated under a MultiSearcher and you're
wondering why you're running out of memory?!  :O

 Exception  [ Too many Files Open ]

Are you using the compound file format?

   Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



LUCENE + DATA RETRIVAL

2004-11-09 Thread Karthik N S

Hi guys,


Apologies...


Has any one on the form attempted to retrieved data and Indexed
Macromedia FLASH based Files
If there is some example please distrubute ,it may be usefull for
developer's.


Thx in advance






  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene1.4.1 + OutOf Memory

2004-11-09 Thread Karthik N S

Hi
Guys

Apologies..



History

Ist type :  4  subindexes   +  MultiSearcher  + Search on Content Field
Only  for 2000 hits

  =
Exception  [ Too many Files Open ]





IInd type :  40 Mergerd Indexes [1000 subindexes each]   +  MultiSearcher
/ParallelSearcher +  Search on Content Field Only for 2 hits

  =
Exception  [ OutOf Memeory  ]



System Config  [same for both type]

Amd Processor [High End Single]
RAM  1GB
O/s Linux  ( jantoo type )
Appserver Tomcat 5.05
Jdk [ IBM  Blackdown-1.4.1-01  ( == Jdk1.4.1) ]

Index contains 15 Fields
Search
Done only on 1 field
Retrieve 11 corrosponding fields
3 Fields  are for debug details


Switched from Ist type to IInd Type

Can some body suggest me Why is this Happening

Thx in advance




  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



UPDATION+MERGERINDEX

2004-11-07 Thread Karthik N S

Hi Guys


Apologies.


a) 

1) SEARCH FOR SUBINDEX IN A  OPTIMISED MERGED INDEX
2) DELETE THE FOUND SUBINDEX FROM THE OPTIMISED MERGERINDEX
3) OPTIMISE THE MERGERINDEX
4) ADD A NEW VERSION OF THE SUBINDEX TO THE MERGER INDEX
5) OPTIMISE THE MERGERINDEX



b)

1) SEARCH FOR SUBINDEX IN A  OPTIMISED MERGED INDEX
2) DELETE THE FOUND SUBINDEX FROM THE OPTIMISED MERGERINDEX
3) ADD A NEW VERSION OF THE SUBINDEX TO THE MERGER INDEX
4) OPTIMISE THE MERGERINDEX


 a  OR  b  WHICH IS BETTER CHOICE 



THX IN ADVANCE


  WITH WARM REGARDS 
  HAVE A NICE DAY 
  [ N.S.KARTHIK] 
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



INDEXREADER + DELETE + LUCENE1.4.1

2004-11-04 Thread Karthik N S



Hi Guy's

Apologies



There seems to be a bug unresolved [ Or may I be may be doing something
wrong ] in IndexReader.delete(int docNum)

Here is the Code

indexSearcher = null;
indexDirectory = null;
indexReader = null;
indexDirectory
=FSDirectory.getDirectory(/root/MERGEDINDEX/MERGER_1,false);
indexReader = IndexReader.open(indexDirectory);

IndexReader.unlock(indexDirectory);
indexSearcher = new IndexSearcher(indexReader);
query = new TermQuery(new Term(fieldName, FiledValue));
hits = indexSearcher.search(query);


if ( hits.length()  0 ) {

for(int k=0;k=hits.length();k++) {
PRINTDBG_.append(QUERY :  + query.toString() + \n +
FIELD NAME :  + fieldName + \n +
FIELD VALUE:  + FiledValue + \n +
TOTAL HITS :  + hits.length() + \n +
DELETING :  + k);

indexReader.delete(k);

}
}

indexReader.close();
indexSearcher.close();
indexDirectory.close();

System.out.printl( Debugger :  +PRINTDBG_);
indexReader = null;
indexSearcher = null;
indexDirectory = null;

//optimization
indexDirectory = FSDirectory.getDirectory(pathMergeIndex,false);
IndexWriter writer = new IndexWriter(indexDirectory, analyzer, false);
writer.mergeFactor = mergeFactorVal_;
writer.maxMergeDocs = maxMergeDocsVal_;
writer.optimize();
writer.close();

indexDirectory = null;
writer = null;

In spite of Using a new IndexReader for every Deletion of documents and
Optimization's
The 'indexReader.delete(k)' does not seems to work

Configuration History

a) 1 MergerIndex = 1000 subIndexes [ fieldName = KeyWord Field Type]

b) O/s Windows

c) Amd Processor

e) Lucene 1.4.1

f) Jdk 1.4.2

Please Some body Suggest me For Alternates 



  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



FW: Searchable Solutions Please

2004-11-02 Thread Karthik N S
Hi

Guy's

Apologies.


I am little Confused with the Search Factor.


If the Search Word  'kid'  is suppose to return me   kid ,  kid's ,
kidoos, children

   1) Do I need to use Combination of more then one Analysers ??? , If
so How.
   2) Any Alternate modification to be done for the simple Searcher
methods. ??




Thx in advance.



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 28, 2004 8:55 PM
To: [EMAIL PROTECTED]
Subject: RE: Searchable Solutions Please


A quick pointer..

What you want to look at is using a stemming implementation.  Look, for
example, at the FAQ and docs related to the PorterStemFilter and writing
A customer analyzer
(http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.index
ingtoc=faq#q17).

There is a lot of information regarding this but you'll need the same
analyzer for index and query and this would be more or less English only.

-George

 -Original Message-
 From: Karthik N S [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 28, 2004 1:47 AM
 To: LUCENE
 Subject: Searchable Solutions Please


 Hi Guys


 Aplologies


 On a Using the  Lucene Search , If returned hits for the following is to
be
 aquired

 Search Word =' kids watches '
 Hits on docs  returned should have =kid's , kid watch , junior watches


 Solution's Please


 Thx in advance






   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



LUCENE INDEX STATISTICS

2004-10-28 Thread Karthik N S


Hi Guys

Apologies.


Can some body provide approximate Statics about the following factor for
Developement  and Deployment of Lucene   [ it may be usefull for Pro's
Developers ]

a)  Creation Indexing

1) X  [ Say 100 Million ] of  number of documents  Y  [ Kilobytes ]
with  Z no of Fields
 Hardware requirement [ RAM / Os / Processor / HardDisk Space  /
Other  Specific Details  ]
 Software [ Jdk Version / Lucene Version / Appserver Version ]


 2) X [Say 100 Million]  number  to create  Merged Indexes
  Hardware requirement [ RAM / Os / Processor / HardDisk Space  /
Other  Specific Details  ]
  Software [ Jdk Version / Lucene Version / Appserver Version ]


b)Searching  on Indexes   [ 2  number of Persons Searching  per  Sec  ]

1) X  [ Say 100 Million ] of  number of documents  Y  [ Kilobytes ]
with  Z no of Fields
Hardware requirement [ RAM / Os / Processor / HardDisk Space  /
Other  Specific Details  ]
Software [ Jdk Version / Lucene Version / Appserver Version ]


 2)X [Say 100 Million]  number of Merged Indexes
 Hardware requirement [ RAM / Os / Processor / HardDisk Space  /
Other  Specific Details  ]
 Software [ Jdk Version / Lucene Version / Appserver Version ]



Thx in Advance
Karthik


  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Range Query

2004-10-21 Thread Karthik N S
Hi Guys

Apologies

Please Correct me If I am wrong,

with refrenc to
http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=7103

I will have to Re - Index all my 1 Million subindexes with the 'Price
FieldType'  padded of to  standard no of '0' s.

So can use the code modified   while Searching to find the range of Query...


[   Is there any other way to  handle this Only during SearchProcesss... ]


 Please some more Advise:(


Thx in advance.





-Original Message-
From: Chuck Williams [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 20, 2004 8:06 PM
To: Lucene Users List
Subject: RE: Range Query


Karthik,

It is all spelled out in a Lucene HowTo here:
http://wiki.apache.org/jakarta-lucene/SearchNumericalFields

Have fun with it,

Chuck

   -Original Message-
   From: Karthik N S [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, October 20, 2004 12:15 AM
   To: Lucene Users List; Jonathan Hager
   Subject: RE: Range Query
  
   Hi
  
  Jonathan
  
  
 When searching I also pad the query term ???
  
  When Exactly are u handling this  [ using During Indexing Process
   Also or
   while  Search on Process Only  ]
  
  Can u be Please  be specific.
  
  [  if time permits and possible please can u send me the sample
Code
   for
   the same ]
  
  . :)
  
  
Thx in advance
  
  
   -Original Message-
   From: Jonathan Hager [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, October 20, 2004 3:31 AM
   To: Lucene Users List
   Subject: Re: Range Query
  
  
   That is exactly right.  It is searching the ASCII.  To solve it I
pad
   my price using a method like this:
  
 /**
  * Pads the Price so that all prices are the same number of
characters
   and
  * can be compared lexigraphically.
  * @param price
  * @return
  */
 public static String formatPriceAsString(Double price) {
   if (price == null) {
 return null;
   }
   return PRICE_FORMATTER.format(price.doubleValue());
 }
  
   where PRICE_FORMATTER contains enough digits for your largest
number.
  
 private static final DecimalFormat PRICE_FORMATTER = new
   DecimalFormat(000.00);
  
   When searching I also pad the query term.  I looked into hooking
into
   QueryParser, but since the lower/upper prices for my application are
   different inputs, I choose to handle them without hooking into the
   QueryParser.
  
   Jonathan
  
  
   On Tue, 19 Oct 2004 12:35:06 +0530, Karthik N S
   [EMAIL PROTECTED] wrote:
   
Hi
   
Guys
   
Apologies.
   
I  have  a Field Type  Text  'ItemPrice' ,  Using it to Store  
   Price
Factor in numeric  such as  10, 25.25 , 50.00
   
If I am suppose to Find the Range factor  between 2   prices
   
ex -
 Contents:shoes +ItemPrice:[10.00 TO 50.60]
   
I get results  other  then the Range that has been  executed
[This
   may
   be
due to query parsing the Ascii values instead of  numeric values ]
   
Am  I am missing something in the Querry syntax  or Is this the
wrong
   way
   to
construct the Query.
   
Please Somebody Advise me ASAP.  :(
   
Thx in advance
   
  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]
   
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
   
   
  
  
-
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  
  
-
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Analysis Re visited

2004-10-21 Thread Karthik N S

Hi

Guys

Apologies...

  Can some body tell me ,What I have been doing wrong on the Lucene
basics.  :  (  [6 months 15 days]

 Using Lucene 1.4.1 O/s Win/Linux Ram 1GB


 I used a modified version of StandardAnalyzer.java  [ called it
GrammerAnalyzer.java ] and added  Symbols  '$,@,#,'

 to the same,  Also when added this Analyzer to AnalysisDemo.java avaliable
from web site


http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html?page=last#thre
ad

1) On search of '$100.50' the AnalysisDemo returned for the analyzer  used
as  '[100.00]'


2)So I Used the same Analyzer for Indexing Purpose / Searching Purpose.


3)On Hacking the Luke's src [added the same GrammerAnalyzer] file
  avaliable from  http://www.getopt.org/luke/

 When I looked at the File containing for the same values , I was surpriced
to find '$100.50' instead of  100.50


Please Somebody Advise me..

Thx in advance


  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Range Query

2004-10-20 Thread Karthik N S
Hi

   Jonathan


  When searching I also pad the query term ???

   When Exactly are u handling this  [ using During Indexing Process Also or
while  Search on Process Only  ]

   Can u be Please  be specific.

   [  if time permits and possible please can u send me the sample Code for
the same ]

   . :)


 Thx in advance


-Original Message-
From: Jonathan Hager [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 20, 2004 3:31 AM
To: Lucene Users List
Subject: Re: Range Query


That is exactly right.  It is searching the ASCII.  To solve it I pad
my price using a method like this:

  /**
   * Pads the Price so that all prices are the same number of characters and
   * can be compared lexigraphically.
   * @param price
   * @return
   */
  public static String formatPriceAsString(Double price) {
if (price == null) {
  return null;
}
return PRICE_FORMATTER.format(price.doubleValue());
  }

where PRICE_FORMATTER contains enough digits for your largest number.

  private static final DecimalFormat PRICE_FORMATTER = new
DecimalFormat(000.00);

When searching I also pad the query term.  I looked into hooking into
QueryParser, but since the lower/upper prices for my application are
different inputs, I choose to handle them without hooking into the
QueryParser.

Jonathan


On Tue, 19 Oct 2004 12:35:06 +0530, Karthik N S
[EMAIL PROTECTED] wrote:

 Hi

 Guys

 Apologies.

 I  have  a Field Type  Text  'ItemPrice' ,  Using it to Store   Price
 Factor in numeric  such as  10, 25.25 , 50.00

 If I am suppose to Find the Range factor  between 2   prices

 ex -
  Contents:shoes +ItemPrice:[10.00 TO 50.60]

 I get results  other  then the Range that has been  executed   [This may
be
 due to query parsing the Ascii values instead of  numeric values ]

 Am  I am missing something in the Querry syntax  or Is this the wrong way
to
 construct the Query.

 Please Somebody Advise me ASAP.  :(

 Thx in advance

   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Downloading Full Copies of Web Pages

2004-10-20 Thread Karthik N S
Hi


Try

nutch   [ http://www.nutch.org/docs/en/about.html ]  underneath it uses
Lucene  :)





-Original Message-
From: Luciano Barbosa [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 20, 2004 3:06 AM
To: [EMAIL PROTECTED]
Subject: Downloading Full Copies of Web Pages


Hi folks,
I want to download full copies of web pages and storage them locally as
well the hyperlink structures as local directories. I tried to use
Lucene, but I've realized that  it doesn't have a crawler.
Does anyone know a software that make this?
Thanks,

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



TestRangeQuery.java

2004-10-20 Thread Karthik N S

Hi

Does anybody have Trouble in Compiling   TestRangeQuery.java   in Eclipse
3.0 IDE,

[
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/src/test/org/apache/lucene/
search ]

Seem's there is an Error


doc.add(new Field(id, id + docCount, Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.add(new Field(content, content, Field.Store.NO,
Field.Index.TOKENIZED));



Compiler Error is with Lucene1.4.1, Win O/s
Field.Store.yes is not Found





Thx in Advance


  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Range Query

2004-10-19 Thread Karthik N S

Hi

Guys

Apologies.



I  have  a Field Type  Text  'ItemPrice' ,  Using it to Store   Price
Factor in numeric  such as  10, 25.25 , 50.00

If I am suppose to Find the Range factor  between 2   prices

ex -
 Contents:shoes +ItemPrice:[10.00 TO 50.60]


I get results  other  then the Range that has been  executed   [This may be
due to query parsing the Ascii values instead of  numeric values ]

Am  I am missing something in the Querry syntax  or Is this the wrong way to
construct the Query.

Please Somebody Advise me ASAP.  :(

Thx in advance




  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Multi + Parallel

2004-10-14 Thread Karthik N S
Hi

Apologies..


  Can somebody provide me Approximate answers   [ Which is Better choice ]
 
  A search of  10,000 subindexes using  multisearcher

  or 

 a search on  One Single Merged Index [ merged 10,000 Sub indexes ]


a) SubIndexes  10,000 (   future)

b) Field to be searche upon   = 4

c)Field type present in Indexed format = 15 

d)  RAM = 1GB

 e) O/s Linux [ Clustered Enviournament] 

 f) Processor make AMD [Probably High End]

 g) WebServer Tomcat 5.0.x  




  1)Which would be Faster ???;   

  2)If not What is may be the Probable Solution.


Karthik




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 13, 2004 3:53 PM
To: Lucene Users List
Subject: Re: Multi + Parallel 


On Oct 13, 2004, at 3:14 AM, Karthik N S wrote:
 I was Curious to Know the Difference between ParallelMultiSearcher  and
 MultiSearcher ,

 1) Is the working internal functionality of these  are  same or 
 different .

They are different internally.  Externally they should return identical 
results and not appear different at all.

Internally, ParallelMultiSearcher searches each index in a separate 
thread (searches wait until all threads finish before returning).   In 
MultiSearcher, each index is searched serially.

You will not likely see a benefit to using ParallelMultiSearcher unless 
your environment is specialized to accommodate multi-threading 
(multiple CPU's, indexes on separate drives that can operate 
independently, etc).

 2) In terms of time domain do these differ when searching same no of  
 fields
 / words .

 3)What are the features used on each of  API.

There is no external difference to using either implementation.  
Benchmark searches using both and see what is best, but generally 
MultiSeacher will be better in most environments as it avoids the 
overhead of starting up and managing multiple threads.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Multi + Parallel

2004-10-13 Thread Karthik N S


Hi
 Guys

Apologies..


I was Curious to Know the Difference between ParallelMultiSearcher  and
MultiSearcher ,

1) Is the working internal functionality of these  are  same or different .

2) In terms of time domain do these differ when searching same no of  fields
/ words .

3)What are the features used on each of  API.


Thx in advance


  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Too many Open Files + lucene 1.4.1 + Linux O/s

2004-10-13 Thread Karthik N S
Hi


Apologies for  the Long wait..


   My Linux system on ulimit -a  respresent


core file size   (blocks, -c) 0
data seg size  (kbytes, -d) unlimited
file size(blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files(-n) 1024
pipe size  (512 bytes, -p) 8
stack size   (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes(-u) 1983
virtual memory (kbytes, -v) unlimited


The Problem of  Too many Open Files  happens on every 2nd Search  being
done

I think as u say  open files(-n) 1024   should be
increased...


More Advises  is Accepted  greatefully

Thx in advance





-Original Message-
From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]
Sent: Sunday, October 03, 2004 5:08 AM
To: Lucene Users List
Subject: Re: Too many Open Files + lucene 1.4.1 + Linux O/s


Karthik N S wrote:

Hi Luceners,


Apologies.


Other day was Trying to Search using the Luceneweb  version
with Lucene1-4-1.zip  and   O/s = Linux, J2SDK version 1.4.2_03-b02

With Roughly around  500 Documents (715116 kb )  Indexed  using
Lucene1.4-final.jar and  writer.setUseCompoundFile(true);


Here are a couple of possibilities:
- the setUseCompoundFile(true) will only apply to indexes created (or
optimized) after the option is set.
  All pre-existing indexes will still be in the multi-file format.
- number of documents does not directly impact the number of files
needed by Lucene. If the index is
  really in a compound file format (see above), and is optimized, you
will need a fixed number of file handles.
  Even if the index is in a multi-file format, the number of files
needed depends on the number of indexed *fields* in the index (not
documents).
- do you get the error on the first and every search or only once in a
while? Perhaps where there are lots of
  concurrent users? Perhaps after you've done X searchers?
- check your OS-level setting for the number of open files. This is
shell/system-dependent somewhat, but
   ulimit -a should get you started. The number of open files should
be large enough to allow for all files
   and sockets that your application needs to open. In a typical
server-side Java app setting this value should
   be around 8000. Defaults are much smaller, so unless you have changed
this, this may be the answer.
- look into lsof utility. It can display all file handles in use by a
given process. This is a good tool to
  troubleshoot too many open files issues.

Good luck.
Dmitry.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



IndexHTML parser + Constructer

2004-10-01 Thread Karthik N S


Hi


Apologies .

Can Somebody Please tell me or  how to include  a constructer  within
'org.apache.lucene.demo.html.HtmlParser.java' ,
So that using the Constructer read the String argument,Strips the HTML
Tags and returns the String with out Tags.
Currently 'org.apache.lucene.demo.html.HtmlParser.java' method accepts
fullpath of the file and then reads
the Content to Strip Tags..




Thx in Advance
Karthik


-Original Message-
From: Daniel Naber [mailto:[EMAIL PROTECTED]
Sent: Saturday, September 25, 2004 12:47 AM
To: Lucene Users List
Subject: Re: demo IndexHTML parser breaks unicode?


On Friday 24 September 2004 19:58, Fred Toth wrote:

 I've got unicode in my source HTML. In particular, within meta tags,
 and it's getting broken by the indexer. Note that I'm not trying to
 query on any of this, just store and retrieve document titles with
 unicode characters.

Please try again with the code from CVS, Christoph Goller committed a fix
for this problem (at least I think it was this problem) 1-3 weeks ago.

Regards
 Daniel

--
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Too many Open Files + lucene 1.4.1 + Linux O/s

2004-09-29 Thread Karthik N S


Hi Luceners,


Apologies.


Other day was Trying to Search using the Luceneweb  version
with Lucene1-4-1.zip  and   O/s = Linux, J2SDK version 1.4.2_03-b02

With Roughly around  500 Documents (715116 kb )  Indexed  using
Lucene1.4-final.jar and  writer.setUseCompoundFile(true);

My Intension was to Search across all the 500Documents using
MultiFieldQueryParser


I have replaced the 'QueryParser.parse(srchkey,fildtpe[i], analyzer) '  with

   'MultiFieldQueryParser.parse(SEARCHKEYS,fildtpe[],analyzer)'

and

hits =  searcher.search(query) with  hits = multiSearcher.search(query, new
Sort(new SortField(filename, SortField.STRING)));

I am getting the TOO many Open Files Exception ,

Can some body Help me With the Solution,

 [I have also inserted the REFRENCE JSP file ]

java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:828)
at org.apache.lucene.store.FSDirectory$1.obtain(FSDirectory.java:307)
at org.apache.lucene.store.Lock.obtain(Lock.java:53)
at org.apache.lucene.store.Lock$With.run(Lock.java:108)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:38)
at org.apache.jsp.results_jsp._jspService(results_jsp.java:130)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:2
10)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:295)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:241)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:247)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:193)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:256)
at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:191)
at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
at
org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2415)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180
)
at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
at
org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.
java:171)
at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172
)
at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:641)
at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:174)
at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok
eNext(StandardPipeline.java:643)
at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480)
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)
at org.apache.coyote.tomcat4.CoyoteAdapter.service(CoyoteAdapter.java:223)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:594)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConne
ction(Http11Protocol.java:392)
at
org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:565)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.jav
a:619)
at java.lang.Thread.run(Thread.java:534)





















  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

MultiSearcher + Sort

2004-09-23 Thread Karthik N S


Guys


Apologies


Am I doing Wrong or is ther a bug with Lucene on Linux  O/s  When using  '
MultiSearcher with Sort '

Please Somebody Reply me ASAP

Tested both Lucene-1.4-final.jar,Lucene-1.4.1.jar

hits = multiSearcher.search(query,sortField);


Exception raised  on Linux O/s Only  [ On Windows it Works Perfectly ]


Query String  : (contents:gifts contents:articles) (path:gifts
path:articles) (modified:gifts modified:articles) (filename:gifts
filename:articles) (bookid:gifts bookid:articles) (creation:gifts
creation:articles) (chapNme:gifts chapNme:articles) (itmName:gifts
itmName:articles) (urltext:gifts urltext:articles) (itemCode:gifts
itemCode:articles) (itemPrice:gifts itemPrice:articles) (pageid:gifts
pageid:articles)

--- EXCEPTION START-
The Exception Raised file = SearchCreateArrayDataFiles.createArray1
Centralized Boolean Factor =false
  SYSTEM IS STOPPING COMPILATION
-- EXCEPTION END-

---

java.lang.RuntimeException: no terms in field bookid - cannot determine sort
type
at
org.apache.lucene.search.FieldCacheImpl.getAuto(FieldCacheImpl.java:319)
at
org.apache.lucene.search.FieldSortedHitQueue.comparatorAuto(FieldSortedHitQu
eue.java:326)
at
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSorted
HitQueue.java:167)
at
org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java
:58)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:118)
at
org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:141)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at org.apache.lucene.search.Hits.init(Hits.java:51)
at org.apache.lucene.search.Searcher.search(Searcher.java:41)

---
-


/*at
com.controlnet.indexing.search.SearchCreateArrayDataFiles.createArray1(Searc
hCreateArrayDataFiles.java:263)
  *at
com.controlnet.indexing.search.SearchCreateArrayDataFiles.main(SearchCreateA
rrayDataFiles.java:308)
  */




  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



displaying 'pages' of search results...

2004-09-22 Thread Karthik N S
Hi

Can u share the  searcher.search(query, hitCollector);  [light weight paging
api ]

Code on the form ,may be somebody like me need's it.


    ; )

Karthik

-Original Message-
From: Praveen Peddi [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 22, 2004 1:24 AM
To: Lucene Users List
Subject: Re: displaying 'pages' of search results...


The way we do it is: Get all the document ids, cache them and then get the
first 50, second 50 documents etc. We wrote a light weight paging api on top
of lucene. We call searcher.search(query, hitCollector); Our
HitCollectorImpl implements collect method and just collects the document id
only.

Praveen


- Original Message -
From: Chris Fraschetti [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, September 21, 2004 3:33 PM
Subject: displaying 'pages' of search results...


I was wondering was the best way was to go about returning say
 1,000,000 results, divided up into say 50 element sections and then
 accessing them via the first 50, second 50, etc etc.

 Is there a way to keep the query around so that lucene doesn't need to
 search again, or would the search be cached and no delay arise?

 Just looking for some ideas and possibly some implementational issues...



 --
 ___
 Chris Fraschetti
 e [EMAIL PROTECTED]

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: ANT +BUILD + LUCENE

2004-09-14 Thread Karthik N S
Hi

  Erik


   1) Using Ant and Build.xml I want to run the
org.apache.lucene.demo.IndexFiles to create an Indexfolder

   2) Problem is The same Build.xml is to be used Across the O/s for
creating Index

   3) The path of Lucene1-4-final.jar  are in respective directories for the
O/s...

[ Note :- The Path of Lucene_home,I/P and O/p directories are also
O/s Specific should be in the Build.xml  and
should be trigged somthing   by this type


 condition property=isWindows
  os family=windows /
  /condition

   or

condition property=isUnix
  os family=unix /
/condition


I hope u get the situation. :{


With regards
Karthik



-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 14, 2004 7:37 PM
To: Lucene Users List
Subject: Re: ANT +BUILD + LUCENE


I'm not following what you want very clearly, but there is an index
task in Lucene's Sandbox.

Please post what you are trying, and I'd be happy to help once I see
the details.

Erik

On Sep 12, 2004, at 4:44 PM, Karthik N S wrote:

 Hi

 Guys


 Apologies..


 The Task for me is to build the Index folder using Lucene   a simple
 Build.xml  for ANT

 The Problem .. Same 'Build .xml'  should be used for differnet
 O/s...
 [ Win / Linux ]

 The glitch is  respective jar files such as Lucene-1.4 .jar  other jar
 files are not in same dir for the O/s.
 Also the  I/p , O/p Indexer path for source/target may also vary.


 Please Somebody Help me. :(



 with regards
 Karthik




   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



ANT +BUILD + LUCENE

2004-09-13 Thread Karthik N S
Hi

Guys


Apologies..


The Task for me is to build the Index folder using Lucene   a simple
Build.xml  for ANT

The Problem .. Same 'Build .xml'  should be used for differnet O/s...
[ Win / Linux ]

The glitch is  respective jar files such as Lucene-1.4 .jar  other jar
files are not in same dir for the O/s.
Also the  I/p , O/p Indexer path for source/target may also vary.


Please Somebody Help me. :(



with regards
Karthik




  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene Minor Version ????

2004-08-31 Thread Karthik N S


Hi Guys


Apologies...




Just  was Curious to know

If  Lucene-1.4.1-final.jar  a minor version change  of  Lucene1-4-final.jar
or   

;{

Thx in Advance








  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Time to index documents

2004-08-25 Thread Karthik N S
Hi Hetan


   Th's the  major Problem of non Standatrdized Tags for HTML Document's
  u are Indexing ,resulting in lag time taken for Indexing process


   If u can Tweak the HTMLParser.jj file within  lucene.zip   '/demo/html'
file
   [U have to have some Knowledge of JAVACC for this].



Karthik

-Original Message-
From: Hetan Shah [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 26, 2004 3:01 AM
To: Lucene Users List
Subject: Time to index documents


Hello all,

Is there a way to reduce the indexing time taken when the indexer is
indexing about 30,000 + files. It is roughly taking around 6-7 hours to
do this. I am using IndexHTML class to create the index out of HTML files.

Another issue that I see is every once in a while I get the following
output on the screen.

adding ../31/1104852.html
Parse Aborted: Encountered \ at line 7, column 1.
Was expecting one of:
 ArgName ...
 = ...
 TagEnd ...

Any suggestions on preventing this from happening?

Thanks in advance.
-H


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: integrationofLucene and PDF box

2004-08-24 Thread Karthik N S
Hi
   santosh
many people  has worked in this arena...
U look at the forms one by one and u may come across  some example code
to do  similarly...


 Karthik

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 24, 2004 11:40 AM
To: Lucene Users List
Subject: integrationofLucene and PDF box


any body integrated lucene with pdfbox?
can we do it by changing the code in the IndexFiles.java or
IndexHTML.java

regards
Santosh kumar


---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: pdfboxhelp

2004-08-22 Thread Karthik N S
Hi


To Begin with try to build Indexes offline  [ out of Tomcat container]
and  on completing indxexes, feed u'r search  with the real
path of the  offline indexed folder,Start the Tomcat and then use the
search on As u experiment it out u will be comfortable with
requirment of Indexing /Search..   ; [

Karthik

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: Saturday, August 21, 2004 4:55 PM
To: Lucene Users List
Subject: Re: pdfboxhelp


Yes I did the same.
I copied all the classes into classes folder but
now when I am building the index using IndexHTML the pdfs are not added to
this index, only text and htmls are added to index.
what changes should I do for IndexHTML.java to build index with pdf
- Original Message -
From: Karthik N S [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Saturday, August 21, 2004 4:54 PM
Subject: RE: pdfboxhelp


 Hi

 If u are using the jar file with Web Interface for jsp/servlet dev, Place
 the jar file in  webapps/u'rapplication/Web-inf/lib
 and also correct the Classpath for the present modification.

 2)create u'r own package and put all u'r java files  copy the java files
to
 /Web-inf/Classes/u'r package


 Then use the same..;{


 Karthik

 -Original Message-
 From: Santosh [mailto:[EMAIL PROTECTED]
 Sent: Saturday, August 21, 2004 4:31 PM
 To: Lucene Users List
 Subject: Re: pdfboxhelp


 thanks  Natarajan and karthik,

 I corrected classpath

 but where I should write your code?
 should I write your code in IndexHTML.java  which comes along with lucene
or
 some other place?
 one more thing
 I kept pdfbox jar file in the classpath is this enough or I have to build
 the pdfbox?

 thankyou
 - Original Message -
 From: Natarajan.T [EMAIL PROTECTED]
 To: 'Lucene Users List' [EMAIL PROTECTED]
 Sent: Saturday, August 21, 2004 3:20 PM
 Subject: RE: pdfboxhelp


  Hi Santhosh,
 
  Try out this below code.(pdfbox.jar file must be in your classpath)
 
  public String getContent(InputStream  reader) throws
IOException{PDFParser
 parser = null;PDDocument pdDoc = null;PDFTextStripper stripper =
null;String
 pdftext = ;try{parser = new PDFParser(reader);parser.parse();pdDoc =
 parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor =
 new
  DecryptDocument(pdDoc);decryptor.decryptDocument();}stripper = new
 PDFTextStripper();pdftext = stripper.getText(pdDoc);
 
 info = pdDoc.getDocumentInformation();}catch(Exception err)
 {System.out.println(err.getMessage());}pdDoc.close();return pdftext;}
 
  Natarajan.
 
  -Original Message-
  From: Santosh [mailto:[EMAIL PROTECTED]
  Sent: Saturday, August 21, 2004 3:14 PM
  To: Lucene Users List
  Subject: Re: pdfboxhelp
 
  Hi Don,
 
  your Idea is nice, but whenever I write the  following code in
  IndexHTML.java of lucene
 
 
  import org.pdfbox.searchengine.lucene.*;
 
  File pdfFile = new File(/path/to/the/file.pdf);
 
  // Below returns a parse PDF file in a Lucene Document object.
  Document doc = LucenePDFDocument.getDocument(pdfFile);
 
  Iam getting the following error
 
  package org.pdfbox.searchengine.lucene does not exist
 
  I have downloaded pdfbox source code and kept the jar file in the
  classpath, please help me on this- Original Message - From: Don
 Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37
 PMSubject: Re: pdfboxhelp
 
 
Here is the super simple code required.
 
import org.pdfbox.searchengine.lucene.*;
 
File pdfFile = new File(/path/to/the/file.pdf);
 
// Below returns a parse PDF file in a Lucene Document object.Document
 doc = LucenePDFDocument.getDocument(pdfFile);
 
Santosh wrote:
 
  exactly, the same is required to me- Original Message - From:
Don
 Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39
 PMSubject: Re: pdfboxhelp
 
 
What are your intensions with PDFBox?
 
You want to use it to index PDF files?
 
Santosh wrote:
 
  hi,
 
  I have downloaded pdfbox zip. but i am in ambigous state that where to
  start. how can I check with demo, I dont see any help document with this
  download, please help me.
 
 
  regards
  Santosh kumar
  SoftPro Systems
  Hyderabad
 
 
  The harder you train in peace, the lesser you bleed in war
 
  ---SOFTPRO DISCLAIMER--
 
  Information contained in this E-MAIL and any attachments are
  confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
  and 'confidential'.
 
  If you are not an intended or authorised recipient of this E-MAIL or
  have received it in error, You are notified that any use, copying or
  dissemination  of the information contained in this E-MAIL in any
  manner whatsoever is strictly prohibited. Please delete it immediately
  and notify the sender by E-MAIL.
 
  In such a case reading, reproducing, printing or further dissemination
  of this E-MAIL is strictly

RE: pdfboxhelp

2004-08-22 Thread Karthik N S
Hi Santosh

  I think u'r Pdf is using  Log4j package ,Try toe set the classpath for
log4j.jar path.

 [ Is it a just a WARNING  or an ERROR  u are getting.

  Send me in u'r Configuration management Let me help u with it ; [


Karthik

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 10:11 AM
To: Lucene Users List
Cc: Ben Litchfield
Subject: Re: pdfboxhelp


hi karthik,

I have downloaded pdfbox and kept pdfjar file in the classpath, but when I
am typing following command in the command prompt I am getting the error:

D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText
C:\test.pdf
C:\test.txt
log4j:WARN No appenders could be found for logger
(org.pdfbox.pdfparser.PDFParse
r).
log4j:WARN Please initialize the log4j system properly

why I am getting this error? plz help


- Original Message -
From: Karthik N S [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, August 23, 2004 9:21 AM
Subject: RE: pdfboxhelp


 Hi


 To Begin with try to build Indexes offline  [ out of Tomcat container]
 and  on completing indxexes, feed u'r search  with the realpath of the
offline indexed folder,Start the Tomcat and then use the
 search on As u experiment it out u will be comfortable withrequirment
of Indexing /Search..   ; [

 Karthik

 -Original Message-
 From: Santosh [mailto:[EMAIL PROTECTED]
 Sent: Saturday, August 21, 2004 4:55 PM
 To: Lucene Users List
 Subject: Re: pdfboxhelp


 Yes I did the same.
 I copied all the classes into classes folder but
 now when I am building the index using IndexHTML the pdfs are not added to
 this index, only text and htmls are added to index.
 what changes should I do for IndexHTML.java to build index with pdf
 - Original Message -
 From: Karthik N S [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Saturday, August 21, 2004 4:54 PM
 Subject: RE: pdfboxhelp


  Hi
 
  If u are using the jar file with Web Interface for jsp/servlet dev,
Place
  the jar file in  webapps/u'rapplication/Web-inf/lib
  and also correct the Classpath for the present modification.
 
  2)create u'r own package and put all u'r java files  copy the java files
 to
  /Web-inf/Classes/u'r package
 
 
  Then use the same..;{
 
 
  Karthik
 
  -Original Message-
  From: Santosh [mailto:[EMAIL PROTECTED]
  Sent: Saturday, August 21, 2004 4:31 PM
  To: Lucene Users List
  Subject: Re: pdfboxhelp
 
 
  thanks  Natarajan and karthik,
 
  I corrected classpath
 
  but where I should write your code?
  should I write your code in IndexHTML.java  which comes along with
lucene
 or
  some other place?
  one more thing
  I kept pdfbox jar file in the classpath is this enough or I have to
build
  the pdfbox?
 
  thankyou
  - Original Message -
  From: Natarajan.T [EMAIL PROTECTED]
  To: 'Lucene Users List' [EMAIL PROTECTED]
  Sent: Saturday, August 21, 2004 3:20 PM
  Subject: RE: pdfboxhelp
 
 
   Hi Santhosh,
  
   Try out this below code.(pdfbox.jar file must be in your
classpath)
  
   public String getContent(InputStream  reader) throws
 IOException{PDFParser
  parser = null;PDDocument pdDoc = null;PDFTextStripper stripper =
 null;String
  pdftext = ;try{parser = new PDFParser(reader);parser.parse();pdDoc =
  parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor
=
  new
   DecryptDocument(pdDoc);decryptor.decryptDocument();}stripper = new
  PDFTextStripper();pdftext = stripper.getText(pdDoc);
  
  info = pdDoc.getDocumentInformation();}catch(Exception err)
  {System.out.println(err.getMessage());}pdDoc.close();return pdftext;}
  
   Natarajan.
  
   -Original Message-
   From: Santosh [mailto:[EMAIL PROTECTED]
   Sent: Saturday, August 21, 2004 3:14 PM
   To: Lucene Users List
   Subject: Re: pdfboxhelp
  
   Hi Don,
  
   your Idea is nice, but whenever I write the  following code in
   IndexHTML.java of lucene
  
  
   import org.pdfbox.searchengine.lucene.*;
  
   File pdfFile = new File(/path/to/the/file.pdf);
  
   // Below returns a parse PDF file in a Lucene Document object.
   Document doc = LucenePDFDocument.getDocument(pdfFile);
  
   Iam getting the following error
  
   package org.pdfbox.searchengine.lucene does not exist
  
   I have downloaded pdfbox source code and kept the jar file in the
   classpath, please help me on this- Original Message - From:
Don
  Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37
  PMSubject: Re: pdfboxhelp
  
  
 Here is the super simple code required.
  
 import org.pdfbox.searchengine.lucene.*;
  
 File pdfFile = new File(/path/to/the/file.pdf);
  
 // Below returns a parse PDF file in a Lucene Document
object.Document
  doc = LucenePDFDocument.getDocument(pdfFile);
  
 Santosh wrote:
  
   exactly, the same is required to me- Original Message - From:
 Don
  Vaillancourt To: Lucene Users List Sent: Friday

RE: pdfboxhelp

2004-08-22 Thread Karthik N S
Hi Santosh

  Hold on I's monday and I am on running off the Schedule  with my Job...
will reply u some time in noon.


 Karthik

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: Monday, August 23, 2004 10:51 AM
To: Lucene Users List
Subject: Fw: pdfboxhelp


hi karthik,
did u find any solution? should I send the pdf to u?
- Original Message -
From: Santosh [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, August 23, 2004 10:23 AM
Subject: Re: pdfboxhelp


 hi karthik,
  I kept log4j in the classpath , I am sending classpath variable

 CLASSPATH


.;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien

t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1.

4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sd
 k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat
 4.0\common\lib\servlet.jar;C:\Program

Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1.

4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar

;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1.

4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C

:\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6.

jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6.
 6\external\log4j.jar

 please check the error



 - Original Message -
 From: Karthik N S [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Monday, August 23, 2004 10:26 AM
 Subject: RE: pdfboxhelp


  Hi Santosh
 
I think u'r Pdf is using  Log4j package ,Try toe set the classpath for
  log4j.jar path.
 
   [ Is it a just a WARNING  or an ERROR  u are getting.
 
Send me in u'r Configuration management Let me help u with it ; [
 
 
  Karthik
 
  -Original Message-
  From: Santosh [mailto:[EMAIL PROTECTED]
  Sent: Monday, August 23, 2004 10:11 AM
  To: Lucene Users List
  Cc: Ben Litchfield
  Subject: Re: pdfboxhelp
 
 
  hi karthik,
 
  I have downloaded pdfbox and kept pdfjar file in the classpath, but when
I
  am typing following command in the command prompt I am getting the
error:
 
  D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText
  C:\test.pdf
  C:\test.txt
  log4j:WARN No appenders could be found for logger
  (org.pdfbox.pdfparser.PDFParse
  r).
  log4j:WARN Please initialize the log4j system properly
 
  why I am getting this error? plz help
 
 
  - Original Message -
  From: Karthik N S [EMAIL PROTECTED]
  To: Lucene Users List [EMAIL PROTECTED]
  Sent: Monday, August 23, 2004 9:21 AM
  Subject: RE: pdfboxhelp
 
 
   Hi
  
  
   To Begin with try to build Indexes offline  [ out of Tomcat
 container]
   and  on completing indxexes, feed u'r search  with the realpath of the
  offline indexed folder,Start the Tomcat and then use the
   search on As u experiment it out u will be comfortable
 withrequirment
  of Indexing /Search..   ; [
  
   Karthik
  
   -Original Message-
   From: Santosh [mailto:[EMAIL PROTECTED]
   Sent: Saturday, August 21, 2004 4:55 PM
   To: Lucene Users List
   Subject: Re: pdfboxhelp
  
  
   Yes I did the same.
   I copied all the classes into classes folder but
   now when I am building the index using IndexHTML the pdfs are not
added
 to
   this index, only text and htmls are added to index.
   what changes should I do for IndexHTML.java to build index with pdf
   - Original Message -
   From: Karthik N S [EMAIL PROTECTED]
   To: Lucene Users List [EMAIL PROTECTED]
   Sent: Saturday, August 21, 2004 4:54 PM
   Subject: RE: pdfboxhelp
  
  
Hi
   
If u are using the jar file with Web Interface for jsp/servlet dev,
  Place
the jar file in  webapps/u'rapplication/Web-inf/lib
and also correct the Classpath for the present modification.
   
2)create u'r own package and put all u'r java files  copy the java
 files
   to
/Web-inf/Classes/u'r package
   
   
Then use the same..;{
   
   
Karthik
   
-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: Saturday, August 21, 2004 4:31 PM
To: Lucene Users List
Subject: Re: pdfboxhelp
   
   
thanks  Natarajan and karthik,
   
I corrected classpath
   
but where I should write your code?
should I write your code in IndexHTML.java  which comes along with
  lucene
   or
some other place?
one more thing
I kept pdfbox jar file in the classpath is this enough or I have to
  build
the pdfbox?
   
thankyou
- Original Message -
From: Natarajan.T [EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent: Saturday, August 21, 2004 3:20 PM
Subject: RE: pdfboxhelp
   
   
 Hi Santhosh,

 Try out this below code.(pdfbox.jar file must be in your
  classpath)

 public String getContent(InputStream

RE: pdf search

2004-08-20 Thread Karthik N S
hi

What is that u intend to Search and What is this own 'search words'

 First Explain properly  u'r requirement to the form to get intented
results.



with regards
Karthik

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: Friday, August 20, 2004 5:59 PM
To: Lucene Users List
Subject: pdf search


Hi,

I am new bee to lucene.

I have downloaded zip file. now how can i give my own list words to lucene?
In the demo i saw that lucene is automatically creating index if we run the
java program.but I want to give my own search words, how is it possible?


regards
Santosh kumar
SoftPro Systems
Hyderabad


The harder you train in peace, the lesser you bleed in war

---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Index Size

2004-08-19 Thread Karthik N S
Guys

   Are u Using the Optimizing  the index before close process.

  If not try using it...  :}



karthik




-Original Message-
From: Honey George [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 1:00 PM
To: Lucene Users List
Subject: Re: Index Size


Hi,
 Please check for hidden files in the index folder. If
you are using linx, do something like

ls -al index folder

I am also facing a similar problem where the index
size is greater than the data size. In my case there
were some hidden temproary files which the lucene
creates.
That was taking half of the total size.

My problem is that after deleting the temporary files,
the index size is same as that of the data size. That
again seems to be a problem. I am yet to find out the
reason..

Thanks,
   george


 --- Rob Jose [EMAIL PROTECTED] wrote:
 Hello
 I have indexed several thousand (52 to be exact)
 text files and I keep running out of disk space to
 store the indexes.  The size of the documents I have
 indexed is around 2.5 GB.  The size of the Lucene
 indexes is around 287 GB.  Does this seem correct?
 I am not storing the contents of the file, just
 indexing and tokenizing.  I am using Lucene 1.3
 final.  Can you guys let me know what you are
 experiencing?  I don't want to go into production
 with something that I should be configuring better.


 I am not sure if this helps, but I have a temp index
 and a real index.  I index the file into the temp
 index, and then merge the temp index into the real
 index using the addIndexes method on the
 IndexWriter.  I have also set the production writer
 setUseCompoundFile to true.  I did not set this on
 the temp index.  The last thing that I do before
 closing the production writer is to call the
 optimize method.

 I would really appreciate any ideas to get the index
 size smaller if it is at all possible.

 Thanks
 Rob





___ALL-NEW Yahoo!
Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Restoring a corrupt index

2004-08-19 Thread Karthik N S
Hi

  George

   Do u think ,the same would work for MERGED Indexes
   Please Can u suggest a solution.


  Karthik

-Original Message-
From: Honey George [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 2:08 PM
To: Lucene Users List
Subject: RE: Restoring a corrupt index


This is what I did.

There are 2 classes in the lucene source which are not
public and therefore cannot be accessed from outside
the package. The classes are
1. org.apache.lucene.index.SegmentInfos
   - collection of segments
2. org.apache.lucene.index.SegmentInfo
   -represents a sigle segment

I took these two files and moved to a separate folder.
Then created a class with the following code fragment.

public void displaySegments(String indexDir)
throws Exception
{
Directory dir =
(Directory)FSDirectory.getDirectory(indexDir, false);
SegmentInfos segments = new SegmentInfos();
segments.read(dir);

StringBuffer str = new StringBuffer();
int size = segments.size();
str.append(Index Dir =  + indexDir );
str.append(\nTotal Number of Segments  +
size);

str.append(\n--);
for(int i=0;isize;i++)
{
str.append(\n);
str.append((i+1) + . );

str.append(((SegmentInfo)segments.get(i)).name);
}

str.append(\n--);

System.out.println(str.toString());
}


public void deleteSegment(String indexDir, String
segmentName)
throws Exception
{
Directory dir =
(Directory)FSDirectory.getDirectory(indexDir, false);
SegmentInfos segments = new SegmentInfos();
segments.read(dir);

int size = segments.size();
String name = null;
boolean found = false;
for(int i=0;isize;i++)
{
name =
((SegmentInfo)segments.get(i)).name;
if (segmentName.equals(name))
{
found = true;
segments.remove(i);
System.out.println(Deleted the
segment with name  + name
+ from the segments file);
break;
}
}
if (found)
{
segments.write(dir);
}
else
{
System.out.println(Invalid segment name:
 + segmentName);
}
}

Use the displaySegments() method to display the
segments and deleteSegment to delete the corrupt
segment.

Thanks,
  George

 --- Karthik N S [EMAIL PROTECTED] wrote:
 Hi Guys


In Our Situation we would be indexing  Million 
 Millions of Information
 documents

   with  Huge Giga Bytes of Data Indexed  and
 finally would be  put into a
 MERGED INDEX, Categorized accordingly.

   There may be a possibility of Corruption,  So
 Please do post  the code
 reffrals


  Thx
 Karthik


 -Original Message-
 From: Honey George [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, August 18, 2004 5:51 PM
 To: Lucene Users List
 Subject: Re: Restoring a corrupt index


 Thanks Erik, that worked. I was able to remove the
 corrupt index and now it looks like the index is OK.
 I
 was able to view the number of documents in the
 index.
 Before that I was getting the error,
 java.io.IOException: read past EOF

 I am yet to find out how my index got corrupted.
 There
 is another thread going on about this topic,

http://www.mail-archive.com/[EMAIL PROTECTED]/msg03165.html

 If anybody is facing similar problem and is
 interested
 in the code I can post it here.

 Thanks,
   George



  --- Erik Hatcher [EMAIL PROTECTED]
 wrote:
  The details of the segments file (and all the
  others) is freely
  available here:
 
 
 

http://jakarta.apache.org/lucene/docs/fileformats.html
 
  Also, there is Java code in Lucene, of course,
 that
  manipulates the
  segments file which could be leveraged (although
  probably package
  scoped and not easily usable in a standalone
 repair
  tool).
 
  Erik
 
 
  On Aug 18, 2004, at 6:50 AM, Honey George wrote:
 
   Looks like problem is not with the hexeditor,
 even
  in
   the ultraedit(i had access to a windows box) I
 am
   seeing the same display. The problem is I am not
  able
   to identify where a record starts with just 1
  record
   in the file.
  
   Need to try some alternate approach.
  
   Thanks,
 George







___ALL-NEW
 Yahoo!
 Messenger - all new features - even more fun!
 http://uk.messenger.yahoo.com


-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]



-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]







___ALL-NEW Yahoo!
Messenger - all new features - even more fun!  http

AnalyZer HELP Please

2004-08-18 Thread Karthik N S


Hi Guys

  Finally with lot's experimentation, I came to know that

A word  such as  'new'  already present in  Analyzer, 

will  not return  any hits [ Even when enclosed with Quotes \]

such as  New Year


   That's really Intresting:(


Thx
Karthik


  


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 17, 2004 7:35 PM
To: Lucene Users List
Subject: Re: AnalyZer HELP Please


On Aug 17, 2004, at 9:47 AM, Karthik N S wrote:
 I did as Erik  replied in his mail ,
 and  searched for the complete word   \New Year\  ,
 but the QueryParser Still returns me hit for Year  Only.

 [ The Analyzer I use has 555 English Stop words  with  new present 
 in it ]

No wonder!

 That's when I checked up with Analyzer's to verify,
 If u look at the list  Analyzer's  o/p
 GrammerAnalyzer is the one that has 555 English STOPWORDS.

 Do u think this is the bug in my Code.

Whether this is a bug or not is really for your users to determine :) 
  But it is absolutely the expected behavior.  QueryParser analyzes the 
expression too.  Even if you somehow changed QueryParser, if you never 
indexed the word new then you certainly cannot expect to search on it 
and find it.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Restoring a corrupt index

2004-08-18 Thread Karthik N S
Hi Guys


   In Our Situation we would be indexing  Million  Millions of Information
documents

  with  Huge Giga Bytes of Data Indexed  and  finally would be  put into a
MERGED INDEX, Categorized accordingly.

  There may be a possibility of Corruption,  So Please do post  the code
reffrals


 Thx
Karthik


-Original Message-
From: Honey George [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 18, 2004 5:51 PM
To: Lucene Users List
Subject: Re: Restoring a corrupt index


Thanks Erik, that worked. I was able to remove the
corrupt index and now it looks like the index is OK. I
was able to view the number of documents in the index.
Before that I was getting the error,
java.io.IOException: read past EOF

I am yet to find out how my index got corrupted. There
is another thread going on about this topic,
http://www.mail-archive.com/[EMAIL PROTECTED]/msg03165.html

If anybody is facing similar problem and is interested
in the code I can post it here.

Thanks,
  George



 --- Erik Hatcher [EMAIL PROTECTED] wrote:
 The details of the segments file (and all the
 others) is freely
 available here:



http://jakarta.apache.org/lucene/docs/fileformats.html

 Also, there is Java code in Lucene, of course, that
 manipulates the
 segments file which could be leveraged (although
 probably package
 scoped and not easily usable in a standalone repair
 tool).

   Erik


 On Aug 18, 2004, at 6:50 AM, Honey George wrote:

  Looks like problem is not with the hexeditor, even
 in
  the ultraedit(i had access to a windows box) I am
  seeing the same display. The problem is I am not
 able
  to identify where a record starts with just 1
 record
  in the file.
 
  Need to try some alternate approach.
 
  Thanks,
George






___ALL-NEW Yahoo!
Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AnalyZer HELP Please

2004-08-17 Thread Karthik N S

Hey Guys.

Apologies..


Some small Help needed

When I Run the Analyzer's for the word  New Year (with Quotes) on
Lucene1-4 final.jar on win 2k O/s
Why is the SimpleAnalyzer splitting it into 2 words ??? 

or 


am i missing something in here..



Analzying  New  Year 
org.apache.lucene.analysis.WhitespaceAnalyzer:

[] [New] [+] [Year] [] 

org.apache.lucene.analysis.SimpleAnalyzer:

[new] [year] 

org.apache.lucene.analysis.StopAnalyzer:

[new] [year] 

org.apache.lucene.analysis.standard.StandardAnalyzer:

[new] [year] 

com.controlnet.indexing.analyzers.GrammerAnalyzer:

[year] 





  WITH WARM REGARDS 
  HAVE A NICE DAY 
  [ N.S.KARTHIK] 
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: AnalyZer HELP Please

2004-08-17 Thread Karthik N S
Hi

Erik

  Apologies...

  What I ment to Say was,  a word such as New Year  (Quotes means  \ )
  on  QueryParser.parse(word, contents, analyzer) should return me hits
for the full word,
  but it did not.

 So when I  did a quick run on Analyzer process and
 found that it was splitting the Word

  New Year  =  [New]  [Year]


 Am I doing some thing wrong in here




Thx in advance.
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 17, 2004 6:18 PM
To: Lucene Users List
Subject: Re: AnalyZer HELP Please


This is what analyzers do.  I don't know of any analyzer that deals
with quotes in the way you're requesting, by keeping the contents
together as a complete token.  You'll have to write your own variant
that does this.

QueryParser, however, uses quotes to denote a phrase query, and will
query for the words together.  Perhaps this is sufficient for your
needs?

Erik

On Aug 17, 2004, at 8:40 AM, Karthik N S wrote:


 Hey Guys.

 Apologies..


 Some small Help needed

 When I Run the Analyzer's for the word  New Year (with Quotes) on
 Lucene1-4 final.jar on win 2k O/s
 Why is the SimpleAnalyzer splitting it into 2 words ???

 or


 am i missing something in here..



 Analzying  New  Year 
 org.apache.lucene.analysis.WhitespaceAnalyzer:

 [] [New] [+] [Year] []

 org.apache.lucene.analysis.SimpleAnalyzer:

 [new] [year]

 org.apache.lucene.analysis.StopAnalyzer:

 [new] [year]

 org.apache.lucene.analysis.standard.StandardAnalyzer:

 [new] [year]

 com.controlnet.indexing.analyzers.GrammerAnalyzer:

 [year]





   WITH WARM REGARDS
   HAVE A NICE DAY
   [ N.S.KARTHIK]




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: AnalyZer HELP Please

2004-08-17 Thread Karthik N S
Hi
Patrick

I did as Erik  replied in his mail ,
and  searched for the complete word   \New Year\  ,
but the QueryParser Still returns me hit for Year  Only.

[ The Analyzer I use has 555 English Stop words  with  new present in it ]

That's when I checked up with Analyzer's to verify,
If u look at the list  Analyzer's  o/p
GrammerAnalyzer is the one that has 555 English STOPWORDS.

Do u think this is the bug in my Code.

Thx
Karthik



-Original Message-
From: Patrick Burleson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 17, 2004 6:55 PM
To: Lucene Users List
Subject: Re: AnalyZer HELP Please


Karthik,

What you would want to do with the split tokens ( New and Year )
is then create a PhraseQuery containing a Term object for each token.
This should do what you want. As Erik said, QueryParser would have
done this internally, only if you actually sent in the quotes...not
just New Year, but \New Year\.

Patrick

On Tue, 17 Aug 2004 18:53:01 +0530, Karthik N S
[EMAIL PROTECTED] wrote:
 Hi

 Erik

   Apologies...

   What I ment to Say was,  a word such as New Year  (Quotes means
 \ )
   on  QueryParser.parse(word, contents, analyzer) should return me hits
 for the full word,
   but it did not.

  So when I  did a quick run on Analyzer process and
  found that it was splitting the Word

   New Year  =  [New]  [Year]

  Am I doing some thing wrong in here

 Thx in advance.
 Karthik



 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, August 17, 2004 6:18 PM
 To: Lucene Users List
 Subject: Re: AnalyZer HELP Please

 This is what analyzers do.  I don't know of any analyzer that deals
 with quotes in the way you're requesting, by keeping the contents
 together as a complete token.  You'll have to write your own variant
 that does this.

 QueryParser, however, uses quotes to denote a phrase query, and will
 query for the words together.  Perhaps this is sufficient for your
 needs?

 Erik

 On Aug 17, 2004, at 8:40 AM, Karthik N S wrote:

 
  Hey Guys.
 
  Apologies..
 
 
  Some small Help needed
 
  When I Run the Analyzer's for the word  New Year (with Quotes) on
  Lucene1-4 final.jar on win 2k O/s
  Why is the SimpleAnalyzer splitting it into 2 words ???
 
  or
 
 
  am i missing something in here..
 
 
 
  Analzying  New  Year 
  org.apache.lucene.analysis.WhitespaceAnalyzer:
 
  [] [New] [+] [Year] []
 
  org.apache.lucene.analysis.SimpleAnalyzer:
 
  [new] [year]
 
  org.apache.lucene.analysis.StopAnalyzer:
 
  [new] [year]
 
  org.apache.lucene.analysis.standard.StandardAnalyzer:
 
  [new] [year]
 
  com.controlnet.indexing.analyzers.GrammerAnalyzer:
 
  [year]
 
 
 
 
 
WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: AnalyZer HELP Please

2004-08-17 Thread Karthik N S
Hi Guys

Apologies..

   Correct me If I am wrong...

   During Indexing process, if the Analyzer  has a word   'new' in the array
' STOPWORD'  this  word is  prevented from indexing or
  Stopped from indexing.

  Then  during the process of Search  would  not return me a hit on the word
New Year  ,
  since the  word 'new'  is  in Array STOPWORD ...
  [ Even if the Word is surrounded by \]



With regards
Karthik



-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 17, 2004 7:35 PM
To: Lucene Users List
Subject: Re: AnalyZer HELP Please


On Aug 17, 2004, at 9:47 AM, Karthik N S wrote:
 I did as Erik  replied in his mail ,
 and  searched for the complete word   \New Year\  ,
 but the QueryParser Still returns me hit for Year  Only.

 [ The Analyzer I use has 555 English Stop words  with  new present
 in it ]

No wonder!

 That's when I checked up with Analyzer's to verify,
 If u look at the list  Analyzer's  o/p
 GrammerAnalyzer is the one that has 555 English STOPWORDS.

 Do u think this is the bug in my Code.

Whether this is a bug or not is really for your users to determine :)
  But it is absolutely the expected behavior.  QueryParser analyzes the
expression too.  Even if you somehow changed QueryParser, if you never
indexed the word new then you certainly cannot expect to search on it
and find it.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



HitCollector

2004-08-13 Thread Karthik N S

Hello

Please somebody explain me how to use  the HitCollector on a simple
Searcher.search(query) to obtain  score range between 1.0f and 0.02456f.


Thx in advance



  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Highlighter package updated with overlapping token support

2004-07-26 Thread Karthik N S
Hi
   Mark

 Apologies


  Please   Casn u Provide the URL for the Users to Dwnload the new
version
 of Highlighter package ( jar / Zip  format) from u'r main website page.

 [ Because some of the developers may not have access to
 CVS downloading (Organization restrictions) from Lucene - sandbox ]



Thx in advance

with regards
Karthik

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 27, 2004 2:28 AM
To: [EMAIL PROTECTED]
Subject: Highlighter package updated with overlapping token support


I have updated the Highlighter code in CVS to support tokenizers that
generate overlapping tokens.

The Junit test rig has a new example test that uses a SynonymTokenizer
which generates multiple tokens
in the same position for the same input token eg (the token football is
expanded into tokens soccer,footie and football).
The Formatter interface had to be changed to take a new TokenGroup object
instead of a single token but
I doubt any code changes in clients are required because most people use the
default Formatter implementation and haven't
created their own  implementations.

Cheers
Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Large index files

2004-07-23 Thread Karthik N S
Hi

  I think  (a) would be a better choice  [I have  done it on Linux  upt to
7GB , it's pretty faster then doing the same on win2000 PF]


with regards
Karthik

-Original Message-
From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED]
Sent: Friday, July 23, 2004 5:55 PM
To: Lucene Users List
Subject: Large index files


Hi all

  I am using lucene to index a large dataset, it so happens 10% of this data
yields indexes of
  400MB, in all likelihood it is possible the index may go upto 7GB.

  My deployment will be on a linux/tomcat  system, what will be a better
solution
  a) create one large index and hope linux does not mind
  b) generate 7-10 indexes based on some criteria and glue them together
using MultiReader, in this case I may cross the MAX file handles limit of
Tomcat ?

 regards







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Extracting Lucene onto Tomcat

2004-07-21 Thread Karthik N S
hi


  Just Copy the lucene.war file into the TomCat webApps Directory, and then
start the Tomcat

 On the Browser type...   http://localhost:8080/luceneweb   will serve u the
Pages.


  But first u have to index u'r directory  for the web module to Serve u the
searchable hits ,
  I think there should be some Information in the Lucene package itself for
doing this


with regards
Karthik

-Original Message-
From: Zilverline info [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 21, 2004 7:56 PM
To: Lucene Users List
Subject: Re: Extracting Lucene onto Tomcat


Hi Ian,

Depending on what you want to do, you could also follow the installation
instructions on http://www.zilverline.org. It describes how to install
zilverline, but the same goes for the lucene war.

Hope this helps,

   Michael Franken

Ian McDonnell wrote:

Also another silly question, do i need to setup a war on the server?


--- Ian McDonnell [EMAIL PROTECTED] wrote:
Well when i extracted it, it created the org/apache/lucene directories in
the public_html directory. When i try to compile any of the source it just
throws numerous errors. I've got the classpath set to web-inf/classes.

Have i extraced it to the wrong directory?


--- Erik Hatcher [EMAIL PROTECTED] wrote:
On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote:


Is the package information and import paths ready to deploy on Tomcat
server. I tried extracting lucene on the server, but when i compile
files, it just throws numerous no class definition errors and errors
relating to the package.



Huh?  Lucene certainly deploys just fine in Tomcat web applications (in
a WAR under WEB-INF/lib).  Could you elaborate on what you mean here?

   Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



_
Sign up for FREE email from SpinnersCity Online Dance Magazine  Vortal at
http://www.spinnerscity.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



_
Sign up for FREE email from SpinnersCity Online Dance Magazine  Vortal at
http://www.spinnerscity.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Score Range....

2004-07-16 Thread Karthik N S

Hey Guys

Apologies..


I hava Silly Question.

On a avaliable Hit returns, How would one be able to get score between  an
upper  and lower limit  value

  Say' X  0.4 and  X  1.0 '


Do u think this will work


with regards
Karthik



  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search +QueryParser+Score

2004-07-15 Thread Karthik N S

  Hey Guy's

 Apologies.

 I have a Question

Is there any API avaliable in Lucene1.4 to set the Score value to 1.0f or
lesser 
   BEFORE  doing the Query Parser  for search , so that the returns Hits for
the Score settings only.



with regards
Karthik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



HOWTO USE SORT on QUERY PARSER :)

2004-07-14 Thread Karthik N S
Hey

  Guys'

Apologies...

Gee th's so simple u have explained me Thx a lot.


Please correct me If I am wrong

1)

So U  tell me that On Field type  FIELD_CONTENTS  , the relevant hits can
be sorted  wrt  Field type FIELD_DATE 

[ Where FIELD_DATE  FIELD_CONTENTS are Field Typos for Lucene]...


2)
  To Run the Junit test's Do I need to Dwnload all the Files from CVS [Will
there be a build .aml within the CVS] to run and execute  the Tests...


with regards
Karthik


-Original Message-
From: Vladimir Yuryev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 12:08 PM
To: Lucene Users List
Subject: Re: HOWTO USE SORT on QUERY PARSER :(


example:
query = QueryParser.parse(queryString, FIELD_CONTENTS, analyzer);
Sort sort =new Sort();
sort.setSort(FIELD_DATE,true);
//hits = searcher.search(query,sort);
hits = multiSearcher.search(query,sort);
...
FIELD_DATE - indexed field.

Regards,
Vladimir

On Wed, 14 Jul 2004 12:02:33 +0530
  Karthik N S [EMAIL PROTECTED] wrote:
Hey
   Guys

Apologies

   Before running the Build.xml for the  Junit Test files , Do I need
to
Download all the Files present in  Search folder
from lucene CVS TEST in order to get the O/p Results

With regards
Karthik



-Original Message-
From: Vladimir Yuryev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 11:38 AM
To: Lucene Users List
Subject: Re: HOWTO USE SORT on QUERY PARSER :(


It is config problem.
Run build.xml -- [Run ANT...]-- Run unit tests.
Vladimir.

On Wed, 14 Jul 2004 11:27:25 +0530
  Karthik N S [EMAIL PROTECTED] wrote:
Hi
Guys

Apologies

I am using Eclipse 3.0 Ide , so when I run this file within the IDE,I
am not
able to VIEW the O/p Results.
[ Till now I have no Idea about how to setup and run the Junit
tests/View
results on  the O.ps ]

Please give me some Tips on this .

With regards
Karthik

-Original Message-
From: Vladimir Yuryev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 11:12 AM
To: Lucene Users List
Subject: Re: HOWTO USE SORT on QUERY PARSER :(


Hi!

 From CVS --
jakarta-lucene/src/test/org/apache/lucene/search/TestSort.java
Run it as  UnitTest  (   :-(   --   :-))

Best regards,
Vladimir.

On Tue, 13 Jul 2004 15:31:18 +0530
  Karthik N S [EMAIL PROTECTED] wrote:
Hey

  Guys

Apologies

   Can somebody please explain to me with a simple SRC example of
 how to
use SORT on Query parser [1.4 lucene]
  [ I am confused with the code snippet on the CVS Test Case]



with regards
Karthik

-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 13, 2004 2:29 AM
To: [EMAIL PROTECTED]
Subject: Re: Could search results give an idea of which field matched


See the explain functionality in the Javadocs and previous threads.
 You can
ask Lucene to explain why it got the results it did for a give hit.

 [EMAIL PROTECTED] 07/12/04 04:52PM 
I search the index on multiple fields. Could the search results also
tell me which field matched so that the document was selected? From
what
I can tell, only the document number and a score are returned, is
there
a way to also find out what was the field(s) of the document matched
the
query?



Sildy





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search Result + Highlighter

2004-07-14 Thread Karthik N S
Hi Guys

  Some week 's back had  reported a problem regarding  Search on Indexed
file  using Highlighter

  The Highlighter used to Dipslay   [Pad] or  [0] between  words  ( The
Field type is Field.Text type, stores the HTML summary )

  [ I am using  a CustomAnalyzer which is similar to  Standard Analyzer with
555 ENGLISH_STOP_WORDS]

  If any body has sombody looked into this matter for patch , please
specfy..



with rehards
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 14, 2004 1:06 AM
To: Lucene Users List
Subject: Re: Search Result


Look at the Term Highlighter here:

http://jakarta.apache.org/lucene/docs/lucene-sandbox/


On Jul 13, 2004, at 2:32 PM, Hetan Shah wrote:

 I think I have not explained my question correctly. What is happening
 is when I show the result on a page the text below the link as shown
 below.

 Test Page for Apache Installation
 http://dev-server.sfbay:8880/docs/sample.htm
 Sample content

 Jakarta Lucene - Lucene Sandbox
 http://dev-server.sfbay:8880/docs/lucene-sandbox/index.html
 [Jakarta Lucene] About Overview Powered by Lucene Who We Are Mailing
 Lists Resources FAQ (Official) jGuru FAQ Getting Started Query Syntax
 File Formats Javadoc Contributions Articles, etc. Benchmark


 In first example the search criteria sample occurs in the beginning
 of the page and so it shows up in the text below the link. In the
 second example the keyword sample shows up somewhere later in the
 document and so it does not show up in the text below the link. What
 can I do so that in all cases the text below the link always has the
 piece of the document where the keyword is found?

 thanks in advance.

 -H

 Hetan Shah wrote:

 What I am trying to figure out is. In my search result which is
 returned by the

 Document doc = hits.doc(i);
 text to show = doc.get(summary);

 The summary field seems to contain only the first few lines of the
 document. How can I make it to contain the piece that matches the
 query string?

 Thanks.
 -H

 Hetan Shah wrote:

 David,

 Do you know, in the demo code, how do I override or change this
 value so that I get to see the appropriate chuck of document? Would
 this change make the actual result to show the relevant section of
 the document?

 Sorry to sound so ignorant, I am very new at the whole search
 technology, getting to learn a lot from a great supportive
 community.

 Thanks,
 -H
 David Spencer wrote:

 Hetan Shah wrote:

 My search results are only displaying the top portion of the
 indexed documents. It does match the query in the later part of
 the document. Where should I look to change the code in demo3 of
 default 1.3 final distribution. In general if I want to show the
 block of document that matches with the query string which classes
 should I use?




 Sounds like this:

 http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/
 IndexWriter.html#DEFAULT_MAX_FIELD_LENGTH


 Thanks guys.
 -H


 ---
 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]



 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene 1.3 final to 1.4final problem

2004-07-08 Thread Karthik N S
Hey

Dev Guys

Apologies 

I have a Quick Problem...

  The no of Hits on set of Documents  indexed using 1.3-final  is not same
on  1.4-final  version
  [ The only modification done to the src is , I have upgraded my
CustomAnalyzer  on basis of StopAnalyzer avaliable in 1.4 ]
  Does doing this effect the performance.


  Some body please explain.


with regards
Karthik




-Original Message-
From: Alex Aw Seat Kiong [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 9:50 AM
To: Lucene Users List
Subject: upgrade from Lucene 1.3 final to 1.4rc3 problem


Hi!

I'm using Lucene 1.3 final currently, all things were working fine.
But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite
the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it)
We can re-compile it successfuly. but when will try to index the document.
It give the error as below:
java.lang.NullPointerException
at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146)
at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126)
at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173)
Which wrong? Pls help.

Thanks.

Regards,
Alex





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene 1.3 final to 1.4final problem

2004-07-08 Thread Karthik N S

Hey

Dev Guys

Apologies


 Can Some body Explain me

  Why  for an I/P word TA to  the StopAnalyzer.java  returns  [ta]
instead of [ta]

  TA  == [ta]   instead of  [ta]

  $125.96  === [125.95] instead of [$125.95]

  Is it something wrong I have been missing.


 with regards
Karthik




-Original Message-
From: Karthik N S [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 08, 2004 11:59 AM
To: Lucene Users List
Subject: Lucene 1.3 final to 1.4final problem


Hey

Dev Guys

Apologies 

I have a Quick Problem...

  The no of Hits on set of Documents  indexed using 1.3-final  is not same
on  1.4-final  version
  [ The only modification done to the src is , I have upgraded my
CustomAnalyzer  on basis of StopAnalyzer avaliable in 1.4 ]
  Does doing this effect the performance.


  Some body please explain.


with regards
Karthik




-Original Message-
From: Alex Aw Seat Kiong [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 9:50 AM
To: Lucene Users List
Subject: upgrade from Lucene 1.3 final to 1.4rc3 problem


Hi!

I'm using Lucene 1.3 final currently, all things were working fine.
But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite
the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it)
We can re-compile it successfuly. but when will try to index the document.
It give the error as below:
java.lang.NullPointerException
at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146)
at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126)
at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173)
Which wrong? Pls help.

Thanks.

Regards,
Alex





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Hit Score

2004-07-07 Thread Karthik N S
Hey Ype

 Apologies .


 I would be more intrested in Boost/Weight factor in terms of  Query rather
then Fields.

 Please explain with example src.

With regards
Karthik


-Original Message-
From: Ype Kingma [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 12:08 PM
To: [EMAIL PROTECTED]
Subject: Re: Search  Hit Score


On Wednesday 07 July 2004 08:25, Ype Kingma wrote:

 For a single term query, one can iterate through
 IndexReader.termDocs(Term) and store the document numbers by
 TermDocs.docFreq().

That should be TermDocs.freq()

Oops,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Hit Score

2004-07-07 Thread Karthik N S

Hey

 Dev Guys

 Apologies



  Can some body  Explain me How to Retrieve  All hits  avaliable per indexed
document.

   To explain in Detail


   A Physical Search on Single document would list 3 places  for a certain
word occurance,

   So if  i am suppose to retrieve all the 3 Occurances from the same Field
using Lucene ...

   How to handle the query .. ... Explain with a simple SRC Example


 with regards
  Karthik





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search Hit Score

2004-07-06 Thread Karthik N S

Hi
Dev Guys

Apologies.

I have 3 Questions for u.

1)
  I have a situation in here where I am suppose to group  unique indexerd
Documents
  depending upon the number of  hit's per document.

  To Breifly Explain this

  All documet with n  hits  for a Search word would be grouped under
Catagory A

 and all document with  hits n+1  for the same Search Word should be
grouped under  Catagory B.

 Can Lucene provide some means internally to handle this situation.


2) What is this weight /Boost factor  avaliable for the hits  ,and how to
use this Effectively.


3) Is there any thing in Lucene Core which reveles the version numbering of
current used jar files

   something like on command prompt  Java -version  displaying the
version.





with regards
Karthik




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 06, 2004 4:22 PM
To: Lucene Users List
Subject: Re: Latest StopAnalyzer.java


On Jul 6, 2004, at 2:53 AM, Morus Walter wrote:
 Karthik N S writes:

 Can SomeBody Tell me Where Can I find Latest copy of
 StopAnalyzer.java
 which can be used with Lucene1_4-final,
 On Lucene-Sandbox I am not able to Find it.

 [ My Company Prohibits me from using CVS ]

 There is no lucene 1.4 final but
 org.apache.lucene.analysis.StopAnalyzer
 is part of the lucene core.

Actually Doug did create Lucene 1.4 final:

http://jakarta.apache.org/lucene/docs/index.html

I'll try to squeeze in some time today to make it more official by
ensuring the binaries are mirrored and such.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Upgrade from Lucene 1.3 final to 1.4 problem

2004-07-06 Thread Karthik N S
Hey

Apologies 

  Same with me tooo...

  The no of Hits on set of Documents  indexed using 1.3-final  is not same
on  1.4-final  version
  [ The only modification done to the src is , I have upgraded my
CustomAnalyzer  on basis of StopAnalyzer avaliable in 1.4 ]
  Does doing this effect the performance.


  Some body please explain.


with regards
Karthik

-Original Message-
From: Alex Aw Seat Kiong [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 9:50 AM
To: Lucene Users List
Subject: upgrade from Lucene 1.3 final to 1.4rc3 problem


Hi!

I'm using Lucene 1.3 final currently, all things were working fine.
But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite
the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it)
We can re-compile it successfuly. but when will try to index the document.
It give the error as below:
java.lang.NullPointerException
at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146)
at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126)
at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173)
Which wrong? Pls help.

Thanks.

Regards,
Alex





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Using Highlighter in web Demo

2004-06-29 Thread Karthik N S

Hello Developer's

   I am NOT able to get the API for the same [1.3-final or 1.4rc4 ] for
import  details.
QueryScorer scorer = new QueryScorer(query);


just was curious to Compile and execute the same...:)



Karthik







-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 29, 2004 5:52 PM
To: Lucene Users List
Subject: Re: Using Highlighter in web Demo


On Jun 28, 2004, at 5:18 PM, Hetan Shah wrote:
 Is it possible to use highlighter successfully in the demos the web
 demo to be specific. Has any one tried out there? If so can they
 explain me how to go about it any code sample is really  very
 appreciated.

Straight from Lucene in Action:

public class HighlightIt {
   private static final String text =
   Contrary to popular belief, Lorem Ipsum is +
not simply random text. It has roots in a piece of +
classical Latin literature from 45 BC, making it over +
2000 years old. Richard McClintock, a Latin professor +
at Hampden-Sydney College in Virginia, looked up one +
of the more obscure Latin words, consectetur, from +
a Lorem Ipsum passage, and going through the cites +
of the word in classical literature, discovered the +
undoubtable source. Lorem Ipsum comes from sections +
1.10.32 and 1.10.33 of \de Finibus Bonorum et +
Malorum\ (The Extremes of Good and Evil) by Cicero, +
written in 45 BC. This book is a treatise on the +
theory of ethics, very popular during the +
Renaissance. The first line of Lorem Ipsum, \Lorem +
ipsum dolor sit amet..\, comes from a line in +
section 1.10.32.;  // from http://www.lipsum.com/

   public static void main(String[] args) throws IOException {
 String filename = args[0];

 if (filename == null) {
   System.err.println(Usage: HighlightIt filename);
   System.exit(-1);
 }

//TermQuery query = new TermQuery(new Term(f, ipsum));
 PhraseQuery query = new PhraseQuery();
 query.add(new Term(f, lorem));
 query.add(new Term(f, ipsum));
 QueryScorer scorer = new QueryScorer(query);
 SimpleHTMLFormatter formatter =
 new SimpleHTMLFormatter(span class=\highlight\,
 /span);
 Highlighter highlighter = new Highlighter(formatter, scorer);
 Fragmenter fragmenter = new SimpleFragmenter(50);
 highlighter.setTextFragmenter(fragmenter);

 TokenStream tokenStream = new StandardAnalyzer()
 .tokenStream(f, new StringReader(text));

 String result =
 highlighter.getBestFragments(tokenStream, text, 5, ...);

 FileWriter writer = new FileWriter(filename);
 writer.write(html);
 writer.write(style\n +
 .highlight {\n +
  background: yellow;\n +
 }\n +
 /style);
 writer.write(body);
 writer.write(result);
 writer.write(/body/html);
 writer.close();
   }
}

I just added the PhraseQuery in there instead of the TermQuery that is
commented out.  Highlighter works well with phrases also (although
highlights each term individually, not the breadth of the phrase by
itself).  The above code runs like it says in the usage statement, give
it a filename to save an HTML file that shows the terms highlighted in
yellow.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Using Highlighter in web Demo

2004-06-29 Thread Karthik N S
Oh! 

So silly of me


Apologies Please 

Thx for the same


-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 29, 2004 7:27 PM
To: Lucene Users List
Subject: RE: Using Highlighter in web Demo


It sounds like you don't have the Highligher Jar in your CLASSPATH at
compile-time.

1. Get Highlighter from CVS
2. Build a Jar using Ant and Highligher's build.xml
3. Add the resulting Jar to CLASSPATH
4. Compile your code (the one that imports QueryScorer)

That should work.

Otis


--- Karthik N S [EMAIL PROTECTED] wrote:
 Hi
 Otis
 
 I am not able to compile the Code because the import statement for
 the below
 is not avaliable in the API
 QueryScorer scorer = new QueryScorer(query);
 
 
 
 with regards
 Karthik
 
 
 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 29, 2004 6:55 PM
 To: Lucene Users List
 Subject: Re: Using Highlighter in web Demo
 
 
 Karthik, I don't understand your question.
 Sorting was only added in 1.4* versions, if I recall correctly. 
 There
 was no sorting in 1.3.
 
 Otis
 
 
 --- Karthik N S [EMAIL PROTECTED] wrote:
 
  Hello Developer's
 
 I am NOT able to get the API for the same [1.3-final or 1.4rc4 ]
  for
  import  details.
  QueryScorer scorer = new QueryScorer(query);
 
 
  just was curious to Compile and execute the same...:)
 
 
 
  Karthik
 
 
 
 
 
 
 
  -Original Message-
  From: Erik Hatcher [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, June 29, 2004 5:52 PM
  To: Lucene Users List
  Subject: Re: Using Highlighter in web Demo
 
 
  On Jun 28, 2004, at 5:18 PM, Hetan Shah wrote:
   Is it possible to use highlighter successfully in the demos the
 web
   demo to be specific. Has any one tried out there? If so can they
   explain me how to go about it any code sample is really  very
   appreciated.
 
  Straight from Lucene in Action:
 
  public class HighlightIt {
 private static final String text =
 Contrary to popular belief, Lorem Ipsum is +
  not simply random text. It has roots in a piece of +
  classical Latin literature from 45 BC, making it over +
  2000 years old. Richard McClintock, a Latin professor +
  at Hampden-Sydney College in Virginia, looked up one +
  of the more obscure Latin words, consectetur, from +
  a Lorem Ipsum passage, and going through the cites +
  of the word in classical literature, discovered the +
  undoubtable source. Lorem Ipsum comes from sections +
  1.10.32 and 1.10.33 of \de Finibus Bonorum et +
  Malorum\ (The Extremes of Good and Evil) by Cicero, +
  written in 45 BC. This book is a treatise on the +
  theory of ethics, very popular during the +
  Renaissance. The first line of Lorem Ipsum, \Lorem +
  ipsum dolor sit amet..\, comes from a line in +
  section 1.10.32.;  // from http://www.lipsum.com/
 
 public static void main(String[] args) throws IOException {
   String filename = args[0];
 
   if (filename == null) {
 System.err.println(Usage: HighlightIt filename);
 System.exit(-1);
   }
 
  //TermQuery query = new TermQuery(new Term(f, ipsum));
   PhraseQuery query = new PhraseQuery();
   query.add(new Term(f, lorem));
   query.add(new Term(f, ipsum));
   QueryScorer scorer = new QueryScorer(query);
   SimpleHTMLFormatter formatter =
   new SimpleHTMLFormatter(span class=\highlight\,
   /span);
   Highlighter highlighter = new Highlighter(formatter, scorer);
   Fragmenter fragmenter = new SimpleFragmenter(50);
   highlighter.setTextFragmenter(fragmenter);
 
   TokenStream tokenStream = new StandardAnalyzer()
   .tokenStream(f, new StringReader(text));
 
   String result =
   highlighter.getBestFragments(tokenStream, text, 5, ...);
 
   FileWriter writer = new FileWriter(filename);
   writer.write(html);
   writer.write(style\n +
   .highlight {\n +
background: yellow;\n +
   }\n +
   /style);
   writer.write(body);
   writer.write(result);
   writer.write(/body/html);
   writer.close();
 }
  }
 
  I just added the PhraseQuery in there instead of the TermQuery that
  is
  commented out.  Highlighter works well with phrases also (although
  highlights each term individually, not the breadth of the phrase by
  itself).  The above code runs like it says in the usage statement,
  give
  it a filename to save an HTML file that shows the terms highlighted
  in
  yellow.
 
  Erik
 
 
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL

Delete Indexed from Merged Document

2004-06-23 Thread Karthik N S
Guys

   Has Somebody out there tried DELETING/UPDATION  of   INDEXED Files from a
MERGED Index Format,
  If HowTo do this Please Explain


with regards
Karthik




-Original Message-
From: Karthik N S [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 23, 2004 9:24 AM
To: Lucene Users List
Subject: RE: Delete Indexed from Merged Document


Hi

   Otis

   The  link u have specified  displays on how to update an Indexed File [
Deleting the Old  and then updating with new Ones']

  But My Question to be more Specific is : -

  When we MERGED more then 2 Indexed files  [using
writer.addIndexes(luceneDirs)] , In such  a case How to
   Delete one of the Indexed files from the MERGED Index in
order to Insert  an new updated one

  Please have some sample code snippet in this regard..


with regards
Karthik

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 22, 2004 12:52 PM
To: Lucene Users List
Subject: Re: Delete Indexed from Merged Document


Hello Karthik,

Here is the answer: http://www.jguru.com/faq/view.jsp?EID=492423

Otis

--- Karthik N S [EMAIL PROTECTED] wrote:


   Dev Guys

   Apologies Please

 How Do I DELETE  an  Indexed Document from a MERGED Index File

Can Some body Write me some Code Snippets on this... please

 With Regards
 Karthik

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Delete Indexed from Merged Document

2004-06-23 Thread Karthik N S

Hi
Mr Wolf  What is this

// remove the document from index
int docID = hits.id(0);

 and can I increment the 0 factor  in the bracket ...for deletion


Thx in advance

Karthik

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 23, 2004 5:33 PM
To: [EMAIL PROTECTED]
Subject: AW: Delete Indexed from Merged Document


Hello,
 Karthik N S [mailto:[EMAIL PROTECTED]

Has Somebody out there tried DELETING/UPDATION  of
 INDEXED Files from a
 MERGED Index Format,
   If HowTo do this Please Explain
Of course you can delete or update a document from a merged index.
It works in the same way as for all other indexes. You need an
unique key (e.g. the file name or uri), which is indexed
for searching, to find the right document, because the internal
document numbers are changed after merging indexes or deleting
documents and optimizing an index. Using this key you can search
for the document and remove it. It doesn't matter if your index
was created by merging serveral indexes or not.
Example:
/* Create index: */
Document document = new Document();
document.add(Field.Keyword(filename, file_name)); // this must be
unique for each document!
document.add(Field.Text(content, file_content));
writer.addDocument(document);
/* ... */
  writer.close();

/* Update or remove document: Use the file name to find the original
   document and remove it from index */
  FSDirectory indexDirectory = FSDirectory.getDirectory(indexPath, false);
  IndexReader indexReader = IndexReader.open(indexDirectory);
  IndexSearcher indexSearcher = new IndexSearcher(indexReader);
  // create query and search for document using its filename
  TermQuery query = new TermQuery(new Term(filename, file_name));
  Hits hits = indexSearcher.search(query);
  if ( hits.length()  0 ) {
  // remove the document from index
int docID = hits.id(0);
  indexReader.delete( docID );
  }
  // else: this is a new file or already removed, so we can simply add it.
  indexSearcher.close();
  indexReader.close();
  indexDirectory.close();
  // now open an IndexWriter for the same index and add the updated file
  // as new document
/* done */
Hope it helps. Regards,
Wolf-Dietrich Materna

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



  1   2   >