from:"Pulkit Singhal"

How to close the wrapped directory implementation

2010-09-17 Thread Pulkit Singhal

With RAMDirectory we have the option of providing another Directory
implementation such as FSDirectory that can be wrapped and loaded into
memory:

Directory directory = new RAMDirectory(FSDirectory.open(new
File(fileDirectoryName)));

But after building the index, if I close the IndexWriter then the data
is still available for searches in the directory bean but nothing ever
gets written to the disk!

Is this a bug? Is there a workaround?

Or is this by design? Is the RAMDirectory constructor only meant to
read in data from the passed in argument? Or is it supposed to keep it
around and update it when closing?

Please write back, Thanks!
- Pulkit

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Checksum and transactional safety for lucene indexes

2010-09-20 Thread Pulkit Singhal

Hello Everyone,

What happens if:
a) lucene index gets written half-way to the disk and then something goes wrong?
b) the index gets corrupted on the file system?

When we open that directory location again using FSDirectory implementations:
a) Is there any provision for the code to clean out the previous file
and start a new index file because the older one was corrupted and
didn't match the checksum?
b) Or can we check that the # of documents that can be found in the
underlying index are now ZERO because they can't be parsed properly?
How can we do this?

- Pulkit

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

How to count entries in an index file?

2010-09-24 Thread Pulkit Singhal

Hello Everyone,

I want to load the indexed data from the file system using FSDirectory.
But I also want to be sure if something was actually loaded or if a
new empty directory was created and returned to me.
How can I count the # of entries in the Directory object returned to me?

Thanks!
- Pulkit

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to count entries in an index file?

2010-09-24 Thread Pulkit Singhal

Is using IndexReader.numDocs() on the Directory instance, the only way
to count the indexed entries?

On Fri, Sep 24, 2010 at 9:40 AM, Pulkit Singhal  wrote:
> Hello Everyone,
>
> I want to load the indexed data from the file system using FSDirectory.
> But I also want to be sure if something was actually loaded or if a
> new empty directory was created and returned to me.
> How can I count the # of entries in the Directory object returned to me?
>
> Thanks!
> - Pulkit
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Checksum and transactional safety for lucene indexes

2010-09-24 Thread Pulkit Singhal

In order to determine the integrity of an index file, I found that the
easiest way was to use IndexReader.open(directory) and if there were
any problems with the data then catch the exceptions and make a new
one.

I also see that the API offers IndexReader.indexExists() ... would
that be a better alternative?
Or would I miss out on being able to check if the index is corrupt if
I limit myself to that invocaction only?

Thanks!

- Pulkit

On Tue, Sep 21, 2010 at 12:53 AM, Lance Norskog  wrote:
> If an index file is not completely written to disk, it never become
> available. Lucene has a file describing the current active index segments.
> It writes all new files to the disk, and changes the description file
> (segments.gen) only after that.
>
> If the index files are corrupted, all bets are off. Usually the data
> structures are damaged and Lucene throws CorruptIndexExceptions, NPE or
> array out-of-bounds exceptions. There is no checksumming of the index files.
>
> Lance
>
> Pulkit Singhal wrote:
>>
>> Hello Everyone,
>>
>> What happens if:
>> a) lucene index gets written half-way to the disk and then something goes
>> wrong?
>> b) the index gets corrupted on the file system?
>>
>> When we open that directory location again using FSDirectory
>> implementations:
>> a) Is there any provision for the code to clean out the previous file
>> and start a new index file because the older one was corrupted and
>> didn't match the checksum?
>> b) Or can we check that the # of documents that can be found in the
>> underlying index are now ZERO because they can't be parsed properly?
>> How can we do this?
>>
>> - Pulkit
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Use of Lucene to store data from RSS feeds

2010-10-15 Thread Pulkit Singhal

When you ask:
a) will each feed would form a Lucene document, or
b) will each database row would form a lucene document
I'm inclined to say that really depends on what type of aggregation
tool or logic you are using.

I don't know if "Tika" does it but if there is a tool out there that
can be pointed to a feed and tweaked to spit out documents with each
field having the settings that you want then you can go with that
approach. But if you are already parsing the feed and storing the raw
data into a database table then there is no reason that you can't
leverage that. From a database row perspective you have already done a
good deal of work to collect the data and breaking it down into chunks
that Lucene can happily index as separate fields in a document.

By the way I think there are tools that read from the database
directly too but I won't try to make things too complicated.

The way I see it, if you were to use the row at this moment and index
the 4 columns as fields ... plus you could set the feed body to be
ANALYZED (why don't I see the feed body in your database table?) ...
then lucene range queries on the date/time field could possibly return
some results. I am not sure how to get keyword frequencies but if the
analyzed tokens that lucene is keeping in its index sort of represent
the keywords that you are talking about then i do know that lucene
keeps some sort of inverted index per token in terms of how many
occurrences of it are there ... may be someone else on the list can
comment on how to extract that info in a query.

Sounds doable.

On Thu, Oct 14, 2010 at 10:17 AM,   wrote:
> Hello
>
> I would like to store data retrieved hourly from RSS feeds in a database or 
> in Lucene so that the text can be easily
> indexed for word frequencies.
>
> I need to get the text from the title and description elements of RSS items.
>
> Ideally, for each hourly retrieval from a given feed, I would add a row to a 
> table in a dataset made up of the
> following columns:
>
> feed_url, title_element_text, description_element_text, polling_date_time
>
> From this, I can look up any element in a feed and calculate keyword 
> frequencies based upon the length of time required.
>
> This can be done as a database table and hashmaps used to calculate word 
> frequencies. But can I do this in Lucene to
> this degree of granularity at all? If so, would each feed form a Lucene 
> document or would each 'row' from the
> database table form one?
>
> Can anyone advise?
>
> Thanks
>
> Martin O'Shea.
> --
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene index update

2010-10-27 Thread Pulkit Singhal

Look interesting, what is the merit in having a second index in order to
keep the document id the same? Perhaps I have misunderstood. Just want to
understand your motivation here.

On Wed, Oct 20, 2010 at 2:57 PM, Nilesh Vijaywargiay  wrote:

> I've written a blog regarding a work around for updating index in Lucene
> using parallel reader. It's explained with results and pictures.
>
> It would be great if you have a look at it. The link:
> http://the10minutes.blogspot.com/2010/10/lucene-index-update.html
>
> Thanks
> Nilesh
>

Re: Lucene index update

2010-10-27 Thread Pulkit Singhal

But why do you feel the need to have a parallel reader that combines result
sets across two indices based on docId?

On Thu, Oct 28, 2010 at 12:17 AM, Nilesh Vijaywargiay <
nilesh.vi...@gmail.com> wrote:

> Pulkit,
> Parallel reader takes the union of all fields for a given id. Thus if I
> want
> to add a field or modify a field of a document which has id 2 in index1, I
> need to createa a document with id 2 in index2 with the fields I want to
> add/modify. Thus parallel reader would treat them as fields of a single
> document.
> Now if I give doc.getFields() for that document then it would list fields
> from index1 and index2.
>
>  On Wed, Oct 27, 2010 at 9:04 PM, Pulkit Singhal  >wrote:
>
> > Look interesting, what is the merit in having a second index in order to
> > keep the document id the same? Perhaps I have misunderstood. Just want to
> > understand your motivation here.
> >
> > On Wed, Oct 20, 2010 at 2:57 PM, Nilesh Vijaywargiay <
> > nilesh.vi...@gmail.com
> > > wrote:
> >
> > > I've written a blog regarding a work around for updating index in
> Lucene
> > > using parallel reader. It's explained with results and pictures.
> > >
> > > It would be great if you have a look at it. The link:
> > > http://the10minutes.blogspot.com/2010/10/lucene-index-update.html
> > >
> > > <http://the10minutes.blogspot.com/2010/10/lucene-index-update.html
> > >Thanks
> > > Nilesh
> > >
> >
>

Re: Implementing indexing of Versioned Document Collections

2010-11-10 Thread Pulkit Singhal

1) You can attach byte array "Payloads" for every occurrence of a term
during indexing. It will be stored at each term position, during indexing,
and then
can be retrieved during searching. You may want to consider taking this
approach rather than writing bitvectors to a text file. If you feel that I
should have read your thesis summary more closely and don't know what the
heck I'm talking about ... then I politely yield :)

2) Can I add a field after I already started writing the document through
IndexWriter?
Yes, you can. You can add new documents that have more fields than the ones
you added in the past. You can also "update" (internally it does a delete
then add) your document to have more fields at a later point in time.
Although in your use-case I simply didn't see why you would need to go back
and add more fields.

3) How would I do this?
a) Well if you are simply adding documents, keep adding new ones with more
fields and lucene won't complain. If you use stored fields then you may
notice inconsistencies when you pull documents in your search results
because some documents will have additional stored fields while others will
not... I am not sure how you would want to handle that.
b) If you are updating, then my personal approach would be to fetch the
document via a search that is unique enough to just get you that one doc
back. In order to do this, people usually make sure to store a unique id
when indexing the document. Then using that document, create a new one with
all the values of the old one plus the new fields you want to add. Deleted
the old doc. Add the new doc.

Good luck!

- Pulkit

On Tue, Nov 9, 2010 at 5:30 PM, Alex vB  wrote:

>
> Hello everybody,
>
> I would like to implement the paper "Compact Full-Text Indexing of
> Versioned
> Document Collections" [1] from Torsten Suel for my diploma thesis in
> Lucene.
> The basic idea is to create a two-level index structure. On the first level
> a document is identified by document ID with a posting list entry if the
> term exists at least in one version. For every posting on the first level
> with term t we have a bitvector on the second one. These bitvectors contain
> as many bits as there are versions for one document, and bit i is set to 1
> if version i contains term t or otherwise it remains 0.
>
> http://lucene.472066.n3.nabble.com/file/n1872701/Unbenannt_1.jpg
>
> This little picture is just for demonstration purposes. It shows a posting
> list for the term car and is composed of 4 document IDs. If a hit is found
> in document 6 another look-up is needed on the second level to get the
> corresponding versions (version 1, 5, 7, 8, 9, 10 from 10 versions at all).
>
> At the moment I am using wikipedia (simplewiki dump) as source with a
> SAXParser and can resolve each document with all its versions from the XML
> file (Fields are Title, ID, Content(seperated for each version)). My
> problem
> is that I am unsure how to connect the second level with the first one and
> how to store it. The key points that are needed:
> - Information from posting list creation to create the bitvector (term ->
> doc -> versions)
> - Storing the bitvectors
> - Implementing search on second level
>
> For the first steps I disabled term frequencies and positions because the
> paper isn't handling them. I would be happy to get any running version at
> all. :)
> At the moment I can create bitvectors for the documents. I realized this
> with a HashMap in TermsHashPerField where I grab the
> current
> term in add() (I hope this is the correct location for retrieving the
> inverted lists terms). Anyway I can create the corret bitvectors and write
> them into a text file.
> Excerpt of bitVectors from article "April":
> april :
>
> 110110111011
> never :
>
> 0010
> ayriway :
>
> 010110111011
> inclusive :
>
> 1000
>
> Next step would be storing all bitvecors in the index. At first glance I
> like to use an extra field to store the created bitvectors permanent in the
> index. It seems to be the easiest way for a first implementation without
> accessing the low level functions of Lucene. Can I add a field after I
> already started writing the document through IndexWriter? How would I do
> this? Or are there any other suggestions for storing? Another idea is to
> expand the index format of Lucene but this seems a little bit to difficult
> for me. Maybe I could write these information into my own file. Could
> anybody po

IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal

Hello,

1) On Windows, I often shut down my application server (which has active
IndexWriters open) using the ctrl+c keys.
2) I inspect my directories on the file system I see that the write.lock
file is still there.
3) I start the app server again, and do some operations that would require
IndexWriters to write to the same directories again and it works!

I don't understand why I do not run into any exceptions?
I mean there is already a lock file present which should prevent the
IndexWriters from getting access to the directories ... no?
I should be happy but I'm not because other folks are able to get exceptions
when they bounce their servers an I'm unable to reproduce the problem and I
can't help them.

Any clues? Anyone?

Thank You,
- Pulkit

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal

I do not actually take the trouble to specify what Lock Factory to use,
hmmm.

Are you suggesting that because I'm using FSDirectory.open() in my code, I
get a locking scheme that works ... while on other machine for other folks,
they get one that runs into issues and throws
java.nio.channels.OverlappingFileLockException?

- Pulkit

On Wed, Nov 10, 2010 at 11:21 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Likely you are using NativeFSLockFactory?
>
> In which case, a leftover lock file does not mean the index is in fact
> locked, since the OS will [correctly] release the lock on process
> exit.
>
> Mike
>
> On Wed, Nov 10, 2010 at 9:38 AM, Pulkit Singhal 
> wrote:
> > Hello,
> >
> > 1) On Windows, I often shut down my application server (which has active
> > IndexWriters open) using the ctrl+c keys.
> > 2) I inspect my directories on the file system I see that the write.lock
> > file is still there.
> > 3) I start the app server again, and do some operations that would
> require
> > IndexWriters to write to the same directories again and it works!
> >
> > I don't understand why I do not run into any exceptions?
> > I mean there is already a lock file present which should prevent the
> > IndexWriters from getting access to the directories ... no?
> > I should be happy but I'm not because other folks are able to get
> exceptions
> > when they bounce their servers an I'm unable to reproduce the problem and
> I
> > can't help them.
> >
> > Any clues? Anyone?
> >
> > Thank You,
> > - Pulkit
> >
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal

Thanks Uwe, that helps explain why the lock file is still there.

The last piece of the puzzle is why someone may see exceptions such as the
following from time to time:

java.nio.channels.OverlappingFileLockException
at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1176)
at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1078)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:878)
at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
at
org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:236)
at org.apache.lucene.store.Lock.obtain(Lock.java:72)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1041)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:864)

I suppose this means that the OS itself hasn't released the lock even after
I shut down my application server and restarted it.
Am I right?

Or is there something else that can possibly be the culprit (in anyone's
experience) that I can investigate?

- Pulkit

On Wed, Nov 10, 2010 at 12:57 PM, Uwe Schindler  wrote:

> This is because Lucene uses Native Filesystem Locks. The lock file itself
> is just a placeholder which is not cleaned up on Ctrl-C. The lock is not the
> file itself, its *on* the file.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Pulkit Singhal [mailto:pulkitsing...@gmail.com]
> > Sent: Wednesday, November 10, 2010 3:38 PM
> > To: java-user@lucene.apache.org
> > Subject: IndexWriters and write locks
> >
> > Hello,
> >
> > 1) On Windows, I often shut down my application server (which has active
> > IndexWriters open) using the ctrl+c keys.
> > 2) I inspect my directories on the file system I see that the write.lock
> file is still
> > there.
> > 3) I start the app server again, and do some operations that would
> require
> > IndexWriters to write to the same directories again and it works!
> >
> > I don't understand why I do not run into any exceptions?
> > I mean there is already a lock file present which should prevent the
> > IndexWriters from getting access to the directories ... no?
> > I should be happy but I'm not because other folks are able to get
> exceptions
> > when they bounce their servers an I'm unable to reproduce the problem and
> I
> > can't help them.
> >
> > Any clues? Anyone?
> >
> > Thank You,
> > - Pulkit
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal

You know that really confuses me. I've heard that stated a few times and
every time I just felt that it couldn't possibly be right. Maybe it was
meant in some very specific manner because otherwise aren't all Windows OSs
off-limits to Lucene then?

On Wed, Nov 10, 2010 at 2:40 PM, Uwe Schindler  wrote:

> Are you using NFS as filesystem? NFS is incompatible to lucene :-)
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -----Original Message-
> > From: Pulkit Singhal [mailto:pulkitsing...@gmail.com]
> > Sent: Wednesday, November 10, 2010 7:57 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: IndexWriters and write locks
> >
> > Thanks Uwe, that helps explain why the lock file is still there.
> >
> > The last piece of the puzzle is why someone may see exceptions such as
> the
> > following from time to time:
> >
> > java.nio.channels.OverlappingFileLockException
> > at
> >
> sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java
> > :1176)
> > at
> >
> sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:107
> > 8)
> > at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:878)
> > at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
> > at
> > org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:236)
> > at org.apache.lucene.store.Lock.obtain(Lock.java:72)
> > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1041)
> > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:864)
> >
> > I suppose this means that the OS itself hasn't released the lock even
> after I shut
> > down my application server and restarted it.
> > Am I right?
> >
> > Or is there something else that can possibly be the culprit (in anyone's
> > experience) that I can investigate?
> >
> > - Pulkit
> >
> > On Wed, Nov 10, 2010 at 12:57 PM, Uwe Schindler  wrote:
> >
> > > This is because Lucene uses Native Filesystem Locks. The lock file
> > > itself is just a placeholder which is not cleaned up on Ctrl-C. The
> > > lock is not the file itself, its *on* the file.
> > >
> > > -
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: u...@thetaphi.de
> > >
> > > > -Original Message-
> > > > From: Pulkit Singhal [mailto:pulkitsing...@gmail.com]
> > > > Sent: Wednesday, November 10, 2010 3:38 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: IndexWriters and write locks
> > > >
> > > > Hello,
> > > >
> > > > 1) On Windows, I often shut down my application server (which has
> > > > active IndexWriters open) using the ctrl+c keys.
> > > > 2) I inspect my directories on the file system I see that the
> > > > write.lock
> > > file is still
> > > > there.
> > > > 3) I start the app server again, and do some operations that would
> > > require
> > > > IndexWriters to write to the same directories again and it works!
> > > >
> > > > I don't understand why I do not run into any exceptions?
> > > > I mean there is already a lock file present which should prevent the
> > > > IndexWriters from getting access to the directories ... no?
> > > > I should be happy but I'm not because other folks are able to get
> > > exceptions
> > > > when they bounce their servers an I'm unable to reproduce the
> > > > problem and
> > > I
> > > > can't help them.
> > > >
> > > > Any clues? Anyone?
> > > >
> > > > Thank You,
> > > > - Pulkit
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal

Ah exactly the kind of wake-up call that I was looking for! Thank You :)

On Wed, Nov 10, 2010 at 3:01 PM, Steven A Rowe  wrote:

> NFS[1] != NTFS[2]
>
> [1] NFS: <http://en.wikipedia.org/wiki/Network_File_System_%28protocol%29>
> [2] NTFS: <http://en.wikipedia.org/wiki/NTFS>
>
> > -Original Message-
> > From: Pulkit Singhal [mailto:pulkitsing...@gmail.com]
> > Sent: Wednesday, November 10, 2010 2:55 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: IndexWriters and write locks
> >
> > You know that really confuses me. I've heard that stated a few times and
> > every time I just felt that it couldn't possibly be right. Maybe it was
> > meant in some very specific manner because otherwise aren't all Windows
> > OSs
> > off-limits to Lucene then?
> >
> > On Wed, Nov 10, 2010 at 2:40 PM, Uwe Schindler  wrote:
> >
> > > Are you using NFS as filesystem? NFS is incompatible to lucene :-)
> > >
> > > -
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: u...@thetaphi.de
> > >
> > >
> > > > -Original Message-
> > > > From: Pulkit Singhal [mailto:pulkitsing...@gmail.com]
> > > > Sent: Wednesday, November 10, 2010 7:57 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Re: IndexWriters and write locks
> > > >
> > > > Thanks Uwe, that helps explain why the lock file is still there.
> > > >
> > > > The last piece of the puzzle is why someone may see exceptions such
> as
> > > the
> > > > following from time to time:
> > > >
> > > > java.nio.channels.OverlappingFileLockException
> > > > at
> > > >
> > >
> >
> sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.j
> > ava
> > > > :1176)
> > > > at
> > > >
> > >
> >
> sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:10
> > 7
> > > > 8)
> > > > at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:878)
> > > > at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
> > > > at
> > > >
> > org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:236)
> > > > at org.apache.lucene.store.Lock.obtain(Lock.java:72)
> > > > at
> org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1041)
> > > > at
> > org.apache.lucene.index.IndexWriter.(IndexWriter.java:864)
> > > >
> > > > I suppose this means that the OS itself hasn't released the lock even
> > > after I shut
> > > > down my application server and restarted it.
> > > > Am I right?
> > > >
> > > > Or is there something else that can possibly be the culprit (in
> > anyone's
> > > > experience) that I can investigate?
> > > >
> > > > - Pulkit
> > > >
> > > > On Wed, Nov 10, 2010 at 12:57 PM, Uwe Schindler 
> > wrote:
> > > >
> > > > > This is because Lucene uses Native Filesystem Locks. The lock file
> > > > > itself is just a placeholder which is not cleaned up on Ctrl-C. The
> > > > > lock is not the file itself, its *on* the file.
> > > > >
> > > > > -
> > > > > Uwe Schindler
> > > > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > > > http://www.thetaphi.de
> > > > > eMail: u...@thetaphi.de
> > > > >
> > > > > > -Original Message-
> > > > > > From: Pulkit Singhal [mailto:pulkitsing...@gmail.com]
> > > > > > Sent: Wednesday, November 10, 2010 3:38 PM
> > > > > > To: java-user@lucene.apache.org
> > > > > > Subject: IndexWriters and write locks
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > 1) On Windows, I often shut down my application server (which has
> > > > > > active IndexWriters open) using the ctrl+c keys.
> > > > > > 2) I inspect my directories on the file system I see that the
> > > > > > write.lock
> > > > > file is still
> > > > > > there.
> > > > > > 3) I start the app server again, and do some operations that
> would
> > > > > require
> > > > > > IndexWriters to write to the same directories again and it works!
> > > > > >
> > > > > > I don't understand why I do not run into any exceptions?
> > > > > > I mean there is already a lock file present which should prevent
> > the
> > > > > > IndexWriters from getting access to the directories ... no?
> > > > > > I should be happy but I'm not because other folks are able to get
> > > > > exceptions
> > > > > > when they bounce their servers an I'm unable to reproduce the
> > > > > > problem and
> > > > > I
> > > > > > can't help them.
> > > > > >
> > > > > > Any clues? Anyone?
> > > > > >
> > > > > > Thank You,
> > > > > > - Pulkit
> > > > >
> > > > >
> > > > >
> 
> > -
> > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > > > >
> > > > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
>

Re: Delete Document from Index. How?

2010-11-12 Thread Pulkit Singhal

Looked at 2.2 api and those methods should be there. So the
NoSuchMethodException makes no sense.
Are you absolutely sure that your integration between PHP & Java is setup
properly and you really are using 2.2?
Could there be multiple versions of lucene jars in your classpath? such that
older ones might be getting in your way and you wouldn't really be accessing
the 2.2 jars?

On Thu, Nov 11, 2010 at 1:54 AM, dian puma  wrote:

> Hi All,
>
> I'm struggling with Lucene on deleting a specific document from the index.
> I've read the book Lucene in Action to see how to do it.
> There are 2 ways to delete documents from index, using
> IndexWriter.deleteDocuments(term) OR IndexReader.deleteDocuments.
> CMIIW
>
> FYI, I use PHP/Java Bridge and Lucene 2.2,
> But my code below didn't work, generating error:
>
> Exception occured: [[o:Exception]:"java.lang.Exception: Invoke failed:
> [[o:IndexWriter]]->deleteDocuments([o:Term]). Cause:
> java.lang.NoSuchMethodException: deleteDocuments([o:Term]).
> Candidates: [] Responsible VM: 1.6.0...@http://java.sun.com/"; at: #-5
> php.java.bridge.JavaBridge.Invoke(JavaBridge.java:1045) #-4
> php.java.bridge.Request.handleRequest(Request.java:342) #-3
> php.java.bridge.Request.handleRequests(Request.java:388) #0
> Java.inc(161): java_ThrowExceptionProxyFactory->getProxy(4, true) #1
> Java.inc(314): java_Arg->getResult(true) #2 Java.inc(320):
> java_Client->getWrappedResult(true) #3 Java.inc(499):
> java_Client->getResult() #4 Java.inc(743):
> java_Client->invokeMethod(2, 'deleteDocuments', Array) #5
> Java.inc(861): java_JavaProxy->__call('deleteDocuments', Array) #6
> [internal function]: Java->__call('deleteDocuments', Array)
>
> = snippet code with IndexWriter =
> $directory = dirname(__FILE__)."/../indexLucene/";
> $path = getcwd()."/txtfiles/testing.txt";
>
> if (strlen($path) > 0) {
>   //delete
>   echo "Delete [".$path."]";
>   $analyzer_idx = new
> Java('org.apache.lucene.analysis.standard.StandardAnalyzer');
>   $writer_idx = new
> java("org.apache.lucene.index.IndexWriter",$directory, $analyzer_idx,
> false);
>   echo java_values($writer_idx->docCount());
>   $term = new Java('org.apache.lucene.index.Term','pathfile',$path);
>   $writer_idx->deleteDocuments($term);
>   $writer_idx->close();
> }//end if
>
>
> Then, I tried to use IndexReader, instead.
> But, even worst. It failed to echo the numDocs or just to close the reader.
>
> error message:
> Exception occured: [[o:Exception]:"java.lang.Exception: Invoke failed:
> [[c:IndexReader]]->numDocs. Cause: java.lang.NoSuchMethodException:
> numDocs()
>
> === snippet code with IndexReader 
> $directory = dirname(__FILE__)."/../indexLucene/";
> $path = getcwd()."/txtfiles/testing.txt";
> $reader = new Java('org.apache.lucene.index.IndexReader');
> $reader->open($directory);
> //echo java_values($reader->numDocs());
> $reader->close();
>
> Hopefully someone would help me for this.
> Thanks in advance
> --
> Dian Puma
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

KeywordAnalyzer and Boosting

2010-11-17 Thread Pulkit Singhal

Greetings!

When using KeywordAnalyzer for indexing a field which has the
Field.Index.ANALYZED option selected.

Does the use of KeywordAnalyzer automatically mean that there is no
point in trying to set the index-time boosts on that field in the
document because it will be treated as a full token but without any
NORMS?

Or does the fact that I said used ANALYZED instead of
ANALYZED_NO_NORMS make some difference and I can get the index-time
boosts information stored the way I want?\
Even though I'm using a KeywordAnalyzer ...

Thanks for reading through my confusing question :)

- Pulkit

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: KeywordAnalyzer and Boosting

2010-11-17 Thread Pulkit Singhal

Based on my experimentation and what it says in the Lucene 2nd edition book:
"Using a KeywordAnalyzer on special fields during indexing would
eliminate the use of Index.NOT_ANALYZED_NO_NORMS during indexing and
replace it with Index.ANALYZED."

I guess that there is no way to use KeywordAnalyzer during indexing
and get NORMS.

So much for being elegant, if someone has some way to make it happen,
please let me know.

Thanks.

On Wed, Nov 17, 2010 at 7:09 PM, Pulkit Singhal  wrote:
> Greetings!
>
> When using KeywordAnalyzer for indexing a field which has the
> Field.Index.ANALYZED option selected.
>
> Does the use of KeywordAnalyzer automatically mean that there is no
> point in trying to set the index-time boosts on that field in the
> document because it will be treated as a full token but without any
> NORMS?
>
> Or does the fact that I said used ANALYZED instead of
> ANALYZED_NO_NORMS make some difference and I can get the index-time
> boosts information stored the way I want?\
> Even though I'm using a KeywordAnalyzer ...
>
> Thanks for reading through my confusing question :)
>
> - Pulkit
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: KeywordAnalyzer and Boosting

2010-11-18 Thread Pulkit Singhal

Thanks Ian,

Yup that would do the trick for me, it seems.

Also I would like to say that the following also worked, I only
realized it after I went through the scores coming from my results
step by step:

KeywordAnalyzer + Index.ANALYZED (index-time norms were present)

Cheers!

On Thu, Nov 18, 2010 at 4:10 AM, Ian Lea  wrote:
> Have you tried explicitly setting norms on/off the way you want with
> Field.setOmitNorms(boolean)?
>
>
> --
> Ian.
>
> On Thu, Nov 18, 2010 at 12:54 AM, Pulkit Singhal
>  wrote:
>> Based on my experimentation and what it says in the Lucene 2nd edition book:
>> "Using a KeywordAnalyzer on special fields during indexing would
>> eliminate the use of Index.NOT_ANALYZED_NO_NORMS during indexing and
>> replace it with Index.ANALYZED."
>>
>> I guess that there is no way to use KeywordAnalyzer during indexing
>> and get NORMS.
>>
>> So much for being elegant, if someone has some way to make it happen,
>> please let me know.
>>
>> Thanks.
>>
>> On Wed, Nov 17, 2010 at 7:09 PM, Pulkit Singhal  
>> wrote:
>>> Greetings!
>>>
>>> When using KeywordAnalyzer for indexing a field which has the
>>> Field.Index.ANALYZED option selected.
>>>
>>> Does the use of KeywordAnalyzer automatically mean that there is no
>>> point in trying to set the index-time boosts on that field in the
>>> document because it will be treated as a full token but without any
>>> NORMS?
>>>
>>> Or does the fact that I said used ANALYZED instead of
>>> ANALYZED_NO_NORMS make some difference and I can get the index-time
>>> boosts information stored the way I want?\
>>> Even though I'm using a KeywordAnalyzer ...
>>>
>>> Thanks for reading through my confusing question :)
>>>
>>> - Pulkit
>>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

How to combine QueryParser and Wildcard search

2010-11-18 Thread Pulkit Singhal

Hello,

I was wondering if there is any API call in Lucene that allows
something like the following:

Step 1: Take the user input
"hello world" you are beautiful

Step 2: QueryParser does its thing
defaultField:hello world defaultField:you defaultField:are
defaultField:beautiful

Step 3: And somehow a desired transformation like the following one is
the next thing to happen:
defaultField:hello world defaultField:you* defaultField:are*
defaultField:beautiful*

How can I tell QueryParser to throw in the wildcard "*" expression
where applicable?

You can see that I'm not expecting it to use "*" with something that
will turn into a phrase query:
"hello world" becomes defaultField:hello world
Just the rest of the tokens:
defaultField:you* defaultField:are* defaultField:beautiful*

Thanks!

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: uncorrect results

2010-11-18 Thread Pulkit Singhal

Wow, you live in a really great country and attend an awesome
university where they have classes like "Text Analytics" I'm gonna
send my kid there to study :)

In all seriousness I think the problem may be with how you are
collecting your results.

I find this very amusing:
> 80. 896889 phrase occurs 0 times

How can it claim there are zero hits and still be returning you a result? Weird.

Have you tried removing all other docs and then only leaving the one
problem child in there indexing just that and seeing what comes back?

On Wed, Nov 17, 2010 at 1:19 PM, Jan  wrote:
> thats what i figured...i can't find out what i'm doing wrong though ;)
>
> so the query is "experiment" (i know not really a phrase...but the
> assignment requested precisely so). The program constructs the following
> query
>
> +(AbstractText:"experiment" ArticleTitle:"experiment")
>
> which looks good to me. the results look like this:
>
> Found 95 hits.
> 1. 19810 phrase occurs 3 times
> 2. 587340 phrase occurs once
> ...
> 80. 896889 phrase occurs 0 times
> ...
> 95. 900325 phrase occurs once
>
> so here is the document 896889
> PMID
>        896889
> ArticleTitle
>        Estrogen-induced sexual receptivity and localization of 3H-estradiol in
> brains of female mice: effects of 5 alpha-reduced androgens, progestins
> and cyproterone acetate.
> AbstractText
>        Sexual receptivity induced in ovariectomized CD-1 mice with chronic
> daily administration of estradiol benzoate (E2 B) was blocked by
> concurrent administration of the 5 alpha-reduced androgen,
> dihydrotestosterone (DHT). Receptivity was restored in these females
> with progesterone-, but not with dihydroprogesterone-priming 6 hr prior
> to testing. Delaying the DHT injections until 12 hr after the E2 B
> injections greatly reduced its inhibitory properties. Receptivity in E2
> B-primed females was also blocked by concurrent treatment with
> cyproterone acetate and 3 alpha-, but not 3 beta-adrostanediol.
> Pretreatment with DHT, or 3 alpha- or 3 beta-androstanediol failed to
> consistently affects 3H-estradiol accumulation in crude nuclear and
> supernatant fractions from brain and pituitary
>
> so apart from doing something wrong while indexing/analyzing (the text
> above is from the xml, but i double checked...it is put in teh index
> with these textfragments) or so, the token "experiment" does not even
> occur. thats what baffles me.
>
> thanks for the very quick reaction
> jan
>
> Am Mittwoch, den 17.11.2010, 12:57 -0500 schrieb Donna L Gresh:
>> As it is probably more likely that you're doing something incorrect than
>> that Lucene is reporting incorrect results :), it might help if you
>> reported the exact query that is being submitted to the IndexSearcher, and
>> then showing us the document that was incorrectly returned. My guess is
>> that either looking at the query itself will immediately reveal the
>> problem to you, or that the query in combination with the document and
>> knowledge of which analyzers you are using will reveal the problem-
>>
>> Donna
>>
>>
>> Jan  wrote on 11/17/2010 11:47:49 AM:
>>
>> > [image removed]
>> >
>> > uncorrect results
>> >
>> > Jan
>> >
>> > to:
>> >
>> > java-user
>> >
>> > 11/17/2010 11:51 AM
>> >
>> > Please respond to java-user
>> >
>> > Hi,
>> > i have an assignment in my Text Analytics class. I am supposed to create
>> > an index and search it. The corpus is a PubMed-like XML file. it is
>> > possible to query terms (programcall a few terms) and phrases
>> > (programcall "a phrase").
>> > When a phrase is queried the program should answer how often the phrase
>> > occured.
>> > The problem is, on certain queries the IndexSearcher returns some
>> > documents that do not have that particular query in its fields.
>> > I'd be delighted if someone could tell me what i am doing wrong.
>> > See the source code at my github repo
>> >
>> https://github.com/jangingnicht/TextAnalytics2/tree/master/src/textanalytics2/
>>
>> >
>> > Thanks in advance
>> > jan
>> >
>> > PS: I use Lucene 3.0.2 and the OpenJDK Runtime Environment (IcedTea6
>> > 1.8.2) on an 64 bit Linux machine.
>> > [attachment "signature.asc" deleted by Donna L Gresh/Watson/IBM]
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: uncorrect results

2010-11-18 Thread Pulkit Singhal

Briefly looked at your code and there is no way that I'm right about
this but I'll say it anyway:
Every single field you index doesn't have any NORMS so how will the
scoring happen?
It probably happens based on the matches at query time but its not
like you are specifying any boosts in you query.
Lucene has a complex scoring formula that I don't claim to fully
understand ... but what if somehow (stay with me, don't shoot the
messenger) due to the fact that you have no NORMS at all, the results
being collected somehow give a score to the document that doesn't have
a match at all and therefore present it in the results?

Just a theory (a bad one perhaps) ... but one which can be easily
blown away by using ANALYZED in your indexer and then trying again.

- Pulkit

On Thu, Nov 18, 2010 at 12:55 PM, Pulkit Singhal
 wrote:
> Wow, you live in a really great country and attend an awesome
> university where they have classes like "Text Analytics" I'm gonna
> send my kid there to study :)
>
> In all seriousness I think the problem may be with how you are
> collecting your results.
>
> I find this very amusing:
>> 80. 896889 phrase occurs 0 times
>
> How can it claim there are zero hits and still be returning you a result? 
> Weird.
>
> Have you tried removing all other docs and then only leaving the one
> problem child in there indexing just that and seeing what comes back?
>
> On Wed, Nov 17, 2010 at 1:19 PM, Jan  wrote:
>> thats what i figured...i can't find out what i'm doing wrong though ;)
>>
>> so the query is "experiment" (i know not really a phrase...but the
>> assignment requested precisely so). The program constructs the following
>> query
>>
>> +(AbstractText:"experiment" ArticleTitle:"experiment")
>>
>> which looks good to me. the results look like this:
>>
>> Found 95 hits.
>> 1. 19810 phrase occurs 3 times
>> 2. 587340 phrase occurs once
>> ...
>> 80. 896889 phrase occurs 0 times
>> ...
>> 95. 900325 phrase occurs once
>>
>> so here is the document 896889
>> PMID
>>        896889
>> ArticleTitle
>>        Estrogen-induced sexual receptivity and localization of 3H-estradiol 
>> in
>> brains of female mice: effects of 5 alpha-reduced androgens, progestins
>> and cyproterone acetate.
>> AbstractText
>>        Sexual receptivity induced in ovariectomized CD-1 mice with chronic
>> daily administration of estradiol benzoate (E2 B) was blocked by
>> concurrent administration of the 5 alpha-reduced androgen,
>> dihydrotestosterone (DHT). Receptivity was restored in these females
>> with progesterone-, but not with dihydroprogesterone-priming 6 hr prior
>> to testing. Delaying the DHT injections until 12 hr after the E2 B
>> injections greatly reduced its inhibitory properties. Receptivity in E2
>> B-primed females was also blocked by concurrent treatment with
>> cyproterone acetate and 3 alpha-, but not 3 beta-adrostanediol.
>> Pretreatment with DHT, or 3 alpha- or 3 beta-androstanediol failed to
>> consistently affects 3H-estradiol accumulation in crude nuclear and
>> supernatant fractions from brain and pituitary
>>
>> so apart from doing something wrong while indexing/analyzing (the text
>> above is from the xml, but i double checked...it is put in teh index
>> with these textfragments) or so, the token "experiment" does not even
>> occur. thats what baffles me.
>>
>> thanks for the very quick reaction
>> jan
>>
>> Am Mittwoch, den 17.11.2010, 12:57 -0500 schrieb Donna L Gresh:
>>> As it is probably more likely that you're doing something incorrect than
>>> that Lucene is reporting incorrect results :), it might help if you
>>> reported the exact query that is being submitted to the IndexSearcher, and
>>> then showing us the document that was incorrectly returned. My guess is
>>> that either looking at the query itself will immediately reveal the
>>> problem to you, or that the query in combination with the document and
>>> knowledge of which analyzers you are using will reveal the problem-
>>>
>>> Donna
>>>
>>>
>>> Jan  wrote on 11/17/2010 11:47:49 AM:
>>>
>>> > [image removed]
>>> >
>>> > uncorrect results
>>> >
>>> > Jan
>>> >
>>> > to:
>>> >
>>> > java-user
>>> >
>>> > 11/17/2010 11:51 AM
>>> >
>>> > Please respond to java-user
>>> >
>>> > Hi,
>>> > i have

Dismax in Lucene

2010-11-20 Thread Pulkit Singhal

Hello,

I heard Yonik talk about a better dismax query parser for Solr so I
was wondering if Lucene already has this functionality contributed to
its contrib modules?

- Pulkit

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Spell Checker for Non English languages

2011-01-06 Thread Pulkit Singhal

Hello,

I was wondering if anyone on this mailing list have ever compiled a
list of algorithms for various non English languages that work well
with the lucene-spellchecker contrib module?

For example, with English using an spellchecker index built using
ngrams and then searched using LevensteinDistance works well. But
would this work for Chinese, Japanese or Korean just as well?

I can't be sure since I'm not a native speaker for any of those
languages and I do not want to make any assumptions.

Therefore, I was wondering if folks on this list could point me to
some lucene wiki page or other source that talks about what works for
which languages in terms of spell checking. It doesn't have to be
strictly related to the spellchecker module, if someone has done work
separately to get better results, please let me know about that too.

Thanks!

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Where to find non-English dictionaries, thesaurus, synonyms

2011-01-06 Thread Pulkit Singhal

Hello,

What's a good source to get dictionaries (for spellcorrections) and/or
thesaurus (for synonyms) that can be used with Lucene for non-English
languages such as Fresh, Chinese, Korean etc?

For example, the wordnet contrib module is based on the data set
provided by the Princeton based wordnet system but I'm wondering where
the Lucene users go for similar reliable source for other languages?

Thanks!

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Works on Windows, crashes on Linux

2011-02-07 Thread Pulkit Singhal

Hello Folks,

I'm using Lucene 3.0, my code runs fine on Windows but when I test it
on Linux, I run into the following stack trace:

java.io.FileNotFoundException:
/opt/apache/tomcat/webapps/myapp/luceneData/backend_IP/en_US/_1.fdt
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:212)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.(SimpleFSDirectory.java:180)
at 
org.apache.lucene.store.NIOFSDirectory.createOutput(NIOFSDirectory.java:74)
at org.apache.lucene.index.FieldsWriter.(FieldsWriter.java:65)
at 
org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:66)
at 
org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:144)
at 
org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:193)
at 
org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1443)
at 
org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1462)
at 
org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1082)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:776)
at 
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:751)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1928)

Things I tried:
1) At first I thought that it had something to do with permissions but
the permissions on the directory where all my indexes are nested and
stored is 755 so I don't see how that could be an issue.
2) Then I realized that l was using Directory directory =
FSDirectory.open(dir) to create the directory that I passed to the
IndexWriter ... and the value of my dir was
...luceneData/backend_IP/en_US/ ... now the two directories
backend_IP/en_US are really placeholders ... the real names are known
to me just before I call FSDirectory.open() ... so I thought that the
problem maybe that in Windows the backend_IP/en_US directories may be
getting created auto-magically by the FSDirectory.open()
implementation ... but in Linux I need to create them manually before
calling FSDirectory.open() ... so I tried that but that didn't help
either.

Now I'm out of clues.

Anyone have any advice?

- Pulkit

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Lucene index limit

2011-03-24 Thread Pulkit Singhal

Is there some sort of default limit imposed on the Lucene indexes?
I try to index 50k or 60k documents but when I use Luke to go inside
the index and check the total # of entries indexed, it shows that
there are only 32768 entries.
It seems liek some sort of limit ... what should I look at to adjust
this behavior?

Thanks,
- Pulkit

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Update Document based on Query instead of Term

2011-04-13 Thread Pulkit Singhal

Lucene's IndexWriter allows users to update documents by Term via this
method signature:
void updateDocument(Term term, Document doc)

But what about updating them by Query? Like so:
void updateDocument(Query query, Document doc)

1) How can this be done? As far as I know there is no such method
signature right now. In my data there is no way for me to uniquely
identify a document for update without matching more that one term, so
I really need to use a Query.
2) Would it be better to change the initial underlying indexing
process such that the multiple fields (that can identify a unique
document) will be concatenated into one string and indexed? Then that
field can be used for identification when doing updates? Is this the
common practice?
3) Or, would it be better to use the Query I have to search for the
docID that the Lucene index assigns to the data inside it and then try
to update based on that? Will I get in trouble here somehow?

Any tips are appreciated.

Thanks!
- Pulkit

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

How to close the wrapped directory implementation

Checksum and transactional safety for lucene indexes

How to count entries in an index file?

Re: How to count entries in an index file?

Re: Checksum and transactional safety for lucene indexes

Re: Use of Lucene to store data from RSS feeds

Re: Lucene index update

Re: Lucene index update

Re: Implementing indexing of Versioned Document Collections

IndexWriters and write locks

Re: IndexWriters and write locks

Re: IndexWriters and write locks

Re: IndexWriters and write locks

Re: IndexWriters and write locks

Re: Delete Document from Index. How?

KeywordAnalyzer and Boosting

Re: KeywordAnalyzer and Boosting

Re: KeywordAnalyzer and Boosting

How to combine QueryParser and Wildcard search

Re: uncorrect results

Re: uncorrect results

Dismax in Lucene

Spell Checker for Non English languages

Where to find non-English dictionaries, thesaurus, synonyms

Works on Windows, crashes on Linux

Lucene index limit

Update Document based on Query instead of Term

27 matches

Site Navigation

Mail list logo

Footer information