indexing array fields

2016-09-03 Thread Cam Bazz
Hello, I need to index arrays of long, usually of long[20], 20 in length. Its been a while since I worked with lucene, last time was probably < version 3. I read https://lucene.apache.org/core/6_2_0/core/org/apache/lucene/document/Field.html There are SortedDocValuesField and SortedSetDocValuesF

do i need a key if not going to query by key or update the document

2016-09-12 Thread Cam Bazz
Hello, Do I need to add a key, if I will not be a. updating the document b. will not fetch the document by key? What could be the possible downside of not using a key that uniquely identifies the document? I am building a log processor, and all I will do is sort and iterate. Best regards, C.

Re: do i need a key if not going to query by key or update the document

2016-09-12 Thread Cam Bazz
ents are otherwise > tiny, to add one if you don't really need it. > > Mike McCandless > > http://blog.mikemccandless.com > > On Mon, Sep 12, 2016 at 5:42 AM, Cam Bazz wrote: > > Hello, > > > > Do I need to add a key, if I will not be > > > >

simple facet search

2016-09-18 Thread Cam Bazz
Hello, I have a field called timeSlot in my documents, basically representing an hour. When a query is made, I would like to make a graph of how many doc hits corresponds to each timeSlot, sort it and display a chart of it. I am simply using term queries, to query StringFields, and here is my re

FacetResult getTopChildren

2016-09-19 Thread Cam Bazz
Hello, FacetResult getTopChildren returns the top N facets, however I need to return facets where count is above a certain threshold, for example return all facets that had counts > 10. Is there a way to accomplish this? I have been looking over the API docs and could not find it. I could maybe g

Keyword analyzer will turn query to lowercase

2016-09-22 Thread Cam Bazz
Hello, I am indexing userAgent fields found in apache logs. Indexing and querying everything with KeywordAnalyzer - But I found something strange: IndexSearcher searcher = new IndexSearcher(reader); Analyzer q_analyzer = new KeywordAnalyzer(); QueryParser pars

Re: Keyword analyzer will turn query to lowercase

2016-09-22 Thread Cam Bazz
) { this.lowercaseExpandedTerms = lowercaseExpandedTerms; } Query parser lowercases the queries only if it is a wildcard, prefix , fuzzy and range query. and it can be turned off by parser.setLowerCaseExpandedTerms(false); Which solved my problem, Best regards, C. On Thu, Sep 22, 2016 at 5:01 PM, Cam Bazz

getting a random doc from index

2008-08-29 Thread Cam Bazz
hello, how could I possibly get a select a random document out of a document collection inside a lucene index? best regards, -C.B.

lucene based tagging structure

2008-09-01 Thread Cam Bazz
Hello, Recently I developed an interest in making a lucene based structure for tagging. As we all know lucene's update is not real-time and one has to delete a document prior to updating it. I have been googling for different approaches to a lucene based tagging structure, and I stumbled upon ht

string similarity measures

2008-09-04 Thread Cam Bazz
Hello, This came up before but - if we were to make a swear word filter, string edit distances are no good. for example words like `shot` is confused with `shit`. there is also problem with words like hitchcock. appearently i need something like soundex or double metaphone. the thing is - these are

Re: Realtime Search for Social Networks Collaboration

2008-09-04 Thread Cam Bazz
Hello Jason, I have been trying to do this for a long time on my own. keep up the good work. What I tried was a document cache using apache collections. and before a indexwrite/delete i would sync the cache with index. I am waiting for lucene 2.4 to proceed. (query by delete) Best. On Wed, Sep

Re: string similarity measures

2008-09-04 Thread Cam Bazz
ver shingles? Best, On Thu, Sep 4, 2008 at 4:12 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 4 sep 2008 kl. 14.38 skrev Cam Bazz: > > > Hello, >> This came up before but - if we were to make a swear word filter, string >> edit distances are no good. for exampl

Re: string similarity measures

2008-09-04 Thread Cam Bazz
at 5:02 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 4 sep 2008 kl. 15.54 skrev Cam Bazz: > > yes, I already have a system for users reporting words. they fall on an >> operator screen and if operator approves, or if 3 other people marked it >> as >> curse, t

lucene ram buffering

2008-09-04 Thread Cam Bazz
hello, I was reading the performance optimization guides then I found : writer.setRAMBufferSizeMB() combined with: writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH); this can be used to flush automatically so if the ram buffer size is over a certain limit it will flush. now the question:

ramdisks

2008-09-04 Thread Cam Bazz
hello, anyone using ramdisks for storage? there is ramsam and there is also fusion io. but they are kinda expensive. any other alternatives I wonder? Best.

Re: ramdisks

2008-09-05 Thread Cam Bazz
> On Thu, 2008-09-04 at 17:58 +0200, Cam Bazz wrote: > > anyone using ramdisks for storage? there is ramsam and there is also > fusion > > io. but they are kinda expensive. any other alternatives I wonder? > > We've done some comparisons of RAM (Lucene RAM

query to return docs that has a certain field

2008-09-11 Thread Cam Bazz
Hello, Lets say we have different document types, and one type of document only contains field A. How can I make a query so that I get all the documents that only has field A? There is a get all documents query, but that would get all the documents whether they contain field A or not. Is there

Re: patching lucene-1314

2008-09-15 Thread Cam Bazz
[EMAIL PROTECTED]> wrote: > I usually do: > cd > patch -p 0 -i > > See also the HowToContribute page on the wiki. > > > On Sep 15, 2008, at 7:38 AM, Cam Bazz wrote: > >> Hello, >> >> To patch for lucene-1314 what must I do? >> >>

instantiated index in 2.4

2008-09-15 Thread Cam Bazz
Hello, I have been looking at instantiated index in the trunk. Does this come with a searcher? Are the adds reflected directly to the index? Or is it just an experimental thing only with reader and writer? Best. - To unsubscrib

2.4 questions

2008-09-15 Thread Cam Bazz
Hello, I see that IndexWriter.flush() is depreciated in 2.4. What do we use? Also I used to make a: try { nodeWriter = new IndexWriter(nodeDir, true, analyzer, false); } catch(FileNotFoundException e) { nodeWriter = new IndexWriter(nodeDir, true, analyzer,

IndexWriter commit

2008-09-15 Thread Cam Bazz
Hello, What is the difference between flush in <2.4 and commit? Also I have been looking over docs, and they mention commit(long) but there is no commit(long) method but only commit() Best. - To unsubscribe, e-mail: [EMAIL PROT

IndexSearcher.search

2008-09-15 Thread Cam Bazz
Hello, What is the new favorable way of searching a query? I understand Hits will be depreciated. So how do we do it the new way? With hit collector? Best. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-

IndexReader.isDeleted

2008-09-15 Thread Cam Bazz
Hello, I would like to get advantage of isDeleted. If I delete a document from index, and not commit, and index searcher is not reinstantiated, how can I check if a document is marked for deletion? I tried it with both commit() and without committing, the isDeleted(mydeleteddocid) returns always f

Re: IndexWriter commit

2008-09-15 Thread Cam Bazz
g) is a > private method that should never have been in the javadocs. Thanks for > raising this! > > Mike > > Cam Bazz wrote: > >> Hello, >> >> What is the difference between flush in <2.4 and commit? >> >> Also I have been looking over docs, a

more on isDeleted

2008-09-15 Thread Cam Bazz
Hello, Here is what I am trying to do: dir = FSDirectory.getDirectory("/test"); writer = new IndexWriter(dir, analyzer, true, new IndexWriter.MaxFieldLength(2)); writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH); Document da = new Document(); da.ad

Re: 2.4 questions

2008-09-15 Thread Cam Bazz
certain criteria. Best. On Mon, Sep 15, 2008 at 10:05 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Cam Bazz wrote: > >> Hello, >> >> I see that IndexWriter.flush() is depreciated in 2.4. What do we use? > > Looks like you already found it, but the j

Re: more on isDeleted

2008-09-15 Thread Cam Bazz
. On Mon, Sep 15, 2008 at 10:20 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > You'll have to open a new IndexReader after the delete is committed. > > An IndexReader (or IndexSearcher) only searches the point-in-time snapshot > of the index as of when it was ope

Re: 2.4 questions

2008-09-15 Thread Cam Bazz
well, I did not understand here. so there is a no way of using the new constructor - and specify autoCommit = false ? Best On Mon, Sep 15, 2008 at 10:30 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Cam Bazz wrote: > >> However the documentation states that autoCom

Re: IndexWriter commit

2008-09-15 Thread Cam Bazz
t;> still in the OS's write cache when it crashed. >> >> But the guarantee only holds if the underlying storage system is "honest" >> about fsync(), ie, it truly flushes all written bytes for that file to disk >> before returning. >> >> Mike >>

Re: 2.4 questions

2008-09-15 Thread Cam Bazz
out of curiousity and somewhat unrelated to this thread. when can we expect to see 2.4? it seems much much as changed. so people would want to port their code? Best. On Mon, Sep 15, 2008 at 10:56 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Cam Bazz wrote: > >

Re: more on isDeleted

2008-09-15 Thread Cam Bazz
buffered deletes down to docID. Those deletes > that are against existing segments in the index will be flushed at that > point to those segments; the deletes that apply only to buffered docs will > be held in RAM and used by the RAMIndexSearcher that searches IndexWriter's > buff

Re: more on isDeleted

2008-09-15 Thread Cam Bazz
n Mon, Sep 15, 2008 at 11:09 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > It will return true if the provided docID was deleted, by term or query or > docID (due to exception, privately) prior to when you asked IndexWriter to > give you a "realtime" IndexReader. >

Re: instantiated index in 2.4

2008-09-15 Thread Cam Bazz
Hello Karl; This is good good good news. It works. However, I added a document like doc.add(new Field("f", "a", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); and then searched. The score is 0.3~ for the found document. should not it be 1.0? also it will find when searched for "f","b" o

warming up searchers

2008-09-15 Thread Cam Bazz
Hello, What kind of query is best to warm up a searcher? How many searches should I do? Are we supposed to search for things we know do exist, or is it better to make queries we know they dont exist? Best. -C.B. - To unsubscrib

TopDocs question

2008-09-15 Thread Cam Bazz
Hello, Could it harm if I make a searcher.search(query, Integer.MAX_VALUE) ? I just need to make a query to get the number of hits in this case, but I dont know what the max hits will be. Also When I make a topdocs.totalHits is that same as topdocs.scoreDocs.length()? Best. -C.A. ---

Re: TopDocs question

2008-09-15 Thread Cam Bazz
Yes, I looked into implementing a custom collector that would return number of hits, but - I could not. collect() can not access anything that is final, and final can not be incremented. Any ideas? Best. On Tue, Sep 16, 2008 at 6:05 AM, Daniel Noll <[EMAIL PROTECTED]> wrote: > Cam B

Re: IndexSearcher.search

2008-09-15 Thread Cam Bazz
In cases where we dont know the possible number of hits -- and wanting to test the new 2.4 way of doing things, could I use custom hitcollectors for everything? any performance penalty for this? from what I understand both TopDocCollector and TopDocs will try to allocate an array of Integer.MAX_V

Re: TopDocs question

2008-09-15 Thread Cam Bazz
Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Cam Bazz <[EMAIL PROTECTED]> >> To: java-user@lucene.apache.org >> Sent: Monday, September 15, 2008 11:25:39 PM >> Subject: Re: TopDocs

Re: Phrase Query

2008-09-15 Thread Cam Bazz
I noticed this was because I was using a KeywordAnalyzer. Is it possible to write a document with different analyzers in different fields? Best. On Tue, Sep 16, 2008 at 8:33 AM, Cam Bazz <[EMAIL PROTECTED]> wrote: > Hello, > > Lets say I have two documents, both containing field

Phrase Query

2008-09-15 Thread Cam Bazz
Hello, Lets say I have two documents, both containing field F. document 0 has the string "a b" as F document 1 has the string "b a" as F I am trying to make a phrasequery like: PhraseQuery pq = new PhraseQuery(); pq.add(new Term("F", "a")); pq.add(new Term("F", "b"));

Re: TopDocCollector & Paging

2008-09-17 Thread Cam Bazz
And how about queries that need starting position, like hits between 100 and 200? could we pass something to the collector that will count between 0 to 100 and then get the next 100 records? Best. On Wed, Sep 17, 2008 at 5:16 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > Doesn't TopDocCollecto

Re: Phrase Query

2008-09-17 Thread Cam Bazz
CTED]> wrote: > Are the terms stopwords? > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Cam Bazz <[EMAIL PROTECTED]> >> To: java-user@lucene.apache.org >> Sent: Tuesday

Re: Some SSD results to share

2008-09-17 Thread Cam Bazz
fusionio.com has the SSD killer. not that expensive neither. just twice or triple the ssd. Best. On Tue, Sep 16, 2008 at 2:16 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > Related, I've been considering filesystem based filters on SSD. That ought > to be rather fast, consume no memory and be as si

Re: IndexSearcher.search

2008-09-18 Thread Cam Bazz
one moment: the top doc collector is based on some sort of queue, I assume. What kind of queue is that? does it sort based on score, or whichever doc comes first. best. On Wed, Sep 17, 2008 at 9:43 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Well, it turns out the theoretical maximum f

triplet store

2008-09-29 Thread Cam Bazz
Has anyone tried to implement a triplet store with lucene? Best, -C.B. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: triplet store

2008-09-29 Thread Cam Bazz
for instance one described in: http://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/rusher.html On Mon, Sep 29, 2008 at 4:04 PM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > What is that? > > On Mon, Sep 29, 2008 at 8:51 AM, Cam Bazz <[EMAIL PROTECTED]> wrote

Re: Hiring etiquette

2008-10-19 Thread Cam Bazz
How can we get on to that list? Best, On Mon, Oct 20, 2008 at 1:58 AM, Hasan Diwan <[EMAIL PROTECTED]> wrote: > 2008/10/19 Mark Miller <[EMAIL PROTECTED]>: >> You might instead limit your email to those that have agreed to be contacted >> at http://wiki.apache.org/lucene-java/Support > > FWIW, th

PrefixQuery question

2008-01-08 Thread Cam Bazz
Hello, I am having a problem qith PrefixQuery: I have a field name item title which is indexed as: doc.add(new Field("item_title", item_title.trim().toLowerCase(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES)); and I am forming my query like: PrefixQuery pq = new PrefixQuery((

lucene as a graph store

2008-01-15 Thread Cam Bazz
Hello; I like to use lucene as a graph store. The graph representation is a list of edges. Consider the code below: final int commitCount = 16 * 1024; final int numObj = 1024 * 1024; Analyzer analyzer = new KeywordAnalyzer(); FSDirectory directory = FSDirectory.g

Re: lucene as a graph store

2008-01-15 Thread Cam Bazz
does bring some terms, etc. into > memory, and you may have a look at the FieldCache. > > -Grant > > On Jan 15, 2008, at 7:17 AM, Cam Bazz wrote: > > > Hello; > > > > I like to use lucene as a graph store. The graph representation is a > > list of > >

IndexWriter.DISABLE_AUTO_FLUSH

2008-01-15 Thread Cam Bazz
Hello; Has the IndexWriter.DISABLE_AUTO_FLUSH been depreceated? I am using lucene core 2.2.0 and although it is in the documentation I can not access IndexWriter.DISABLE_AUTO_FLUSH Best, C.B.

IndexWriter.optimize()

2008-01-15 Thread Cam Bazz
Hello, I have been running some experiments on lucene. To speed up index time, I have disabled autocommit, and I flush the indexwriter each 512 objects. So far I have tried with 256,512,1024,and 2048 and I have seen a really incredible speed difference indexing. However, if I the time required to

Re: lucene as a graph store

2008-01-15 Thread Cam Bazz
query. Best. On Jan 15, 2008 6:22 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Hi, > > > - Original Message ---- > From: Cam Bazz <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, January 15, 2008 8:50:07 AM > Subject: Re: luc

Re: lucene as a graph store

2008-01-16 Thread Cam Bazz
number of edges (degrees?) between any 2 > nodes? > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message > From: Cam Bazz <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, January 15, 2008 11:34:20

NumberTools

2008-01-16 Thread Cam Bazz
Hello, When storing fields to serve as id's - is it better to use NumberTools.longToString(id) or just store the id as a field? I have noticed when using NumberTools to store number as a string, this makes range queries easier, however - you end up storing a long string. Considering millions of id

IndexSearcher and RAMDirectory

2008-01-17 Thread Cam Bazz
Hello, I understand after writing some documents in an index with an indexwriter, the IndexSearcher object has to be reinstantiated for it to find newly instantiated objects. And this reinstantiation of IndexSearcher is costly from what I understand. I am working on a caching scheme that will allo

delete a document from indexwriter

2008-01-18 Thread Cam Bazz
Hello, How do I delete a specific document from an indexwriter? I understand there is deleteDocuments(term) which deletes all the documents matching the term. But what if I want to delete a document that has more then one term in specific. I can search the document with a boolean query, and then g

Re: delete a document from indexwriter

2008-01-21 Thread Cam Bazz
27;d like to > make this option available someday in IndexWriter, but doing so now > (when there is no way to get a "reliable" docID) seems too dangerous... > > Mike > > Cam Bazz wrote: > > > Hello, > > > > How do I delete a specific document from an i

Clarification about IndexWriter.deleteDocuments and flush.

2008-01-21 Thread Cam Bazz
Hello, When we delete documents from index - will it autoflush when count of deleted documents reach a certain value. I am controlling my own flush operation, and I have disabled autoflush by: writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH); But I have taken a peek at the IndexWriter

Re: delete a document from indexwriter

2008-01-21 Thread Cam Bazz
gt; You can also use Solr, which provides "delete by query". > > Mike > > Cam Bazz wrote: > > > Hello Mike; > > > > How about deleting by a compount term? > > > > for example if I have a document with two fields srcId and dstId > > and I wa

Re: Clarification about IndexWriter.deleteDocuments and flush.

2008-01-21 Thread Cam Bazz
the source to lucene made makes me think of extensions. Nice code. Best, On Jan 21, 2008 4:47 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Cam Bazz wrote: > > > Hello, > > > > When we delete documents from index - will it autoflush when count of >

Re: delete a document from indexwriter

2008-01-21 Thread Cam Bazz
using a reader, it will acquire the write.lock, > which will fail if you have another writer open on that index). > > Mike > > Cam Bazz wrote: > > > Hello Michael; > > > > how can I construct a chain where both reader and writer at the > > same state? > &g

Re: delete a document from indexwriter

2008-01-22 Thread Cam Bazz
t; > Do you have a specific use case in mind here? I think we'd like to > make this option available someday in IndexWriter, but doing so now > (when there is no way to get a "reliable" docID) seems too dangerous... > > Mike > > Cam Bazz wrote: > > > Hel

HitCollector

2008-01-22 Thread Cam Bazz
Hello, Could someone show me a concrete example of how to use HitCollector? I have documents which have a field category. When I run a query, I need to sort results by category as well as count how many hits are there for a given category. I understand: searcher.search(Query, new HitCollector()

stange exception while indexing

2008-01-24 Thread Cam Bazz
Does anyone have any idea about the error I got while indexing? Best Regards, -C.B. Exception in thread "main" java.io.IOException: background merge hit exception: _kq:C962870 _kr:C2591 into _ks [optimize] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1749) at org.apach

Re: stange exception while indexing

2008-01-24 Thread Cam Bazz
<[EMAIL PROTECTED]> wrote: > > That means that one of the merges, which run in the background by > default with 2.3, hit an unhandled exception. > > Did you see another exception logged / printed to stderr before this > one? > > Mike > > Cam Bazz wrote: > > >

TermEnum trick

2008-01-25 Thread Cam Bazz
Hello, How do we get the TermEnum trick? I could not figure it out. basically, I have a field called category, and I like to learn what different values the category field takes. (sort of like unique in sql) Best Regards, -C.B.

Re: TermEnum trick

2008-01-25 Thread Cam Bazz
5 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > Can you show us what you've tried? > > Erick > > On Jan 25, 2008 10:49 AM, Cam Bazz <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > How about getting which documents have the that term as a bitset?

Re: TermEnum trick

2008-01-25 Thread Cam Bazz
} > list.add(term.text()); > } while (theTerms.next()); > > > On Jan 25, 2008 10:24 AM, Cam Bazz <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > How do we get the TermEnum trick? I could not figure it out. basically, >

IndexSearcher and Multiple Threads

2008-01-28 Thread Cam Bazz
Hello, Is IndexSearcher ThreadSafe? I made a simple httpserver using grizzly as described in http://jlorenzen.blogspot.com/2007/06/using-grizzly-to-create-simple-http.html which submit queries to a single instance of indexsearcher and I get some errors (when I query with more then one threads) suc

hitcollector and sort

2008-01-28 Thread Cam Bazz
Hello, How can I use a hit collector and sort object in query? I looked at the API and sort is only usable with hits. Is it even possible? since hitcollector returns a bitset - how do we do the ordering? Best, -C.B.

document Id question, again

2008-01-31 Thread Cam Bazz
Hello; If no document is ever deleted nor updated from an index, will the document id change? under which circumstances will the document ids change, apart from delete? Best Regards, -C.B.

ParallelReader question

2008-02-04 Thread Cam Bazz
Hello, When using a parallel reader with two indexes lets say, when we call a document with id, is it the combined fields of a document from the two indexes that return? The documentation was not clear on that one, except the document(int n, FieldSelector fs) method. Best, -C.B.

Re: appending field to an existing index

2008-02-04 Thread Cam Bazz
Hello, I have read the parallel reader doc. It says it must have the same number of documents as the other index. When we are using a writer - searcher combination, how can we integrate this parallel reader into game. Simply, I have some documents, and I just like to mark them, in an efficient wa

DefaultIndexAccessor

2008-02-04 Thread Cam Bazz
Hello, Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this seems very interesting. I have read the discussion on the page, but I could not figure out which set of files is the latest. Is it the IndexAccessor-1.26.2008.zip file? I will read through the code, make my own tests, and s

Re: DefaultIndexAccessor

2008-02-04 Thread Cam Bazz
ent, and an app that adds docs will be a bit more responsiveeg it > wont hang as Readers are being reopened. > > I also have to bring the AccessProvider classes back. No easy way to use > your own custom Readers without it...I shouldn't have stripped it out. > > - Mark &

Re: DefaultIndexAccessor

2008-02-04 Thread Cam Bazz
t; a finally block. Batch load multiple docs, but if your just randomly > adding > a doc, get the Writer, add it, and then release the Writer in a finally > block. If you are batch loading a million docs and you want to be able > to see them > as they are added: get the writer and add

matching products with suggest feature

2008-02-13 Thread Cam Bazz
Hello; I am trying to make a product matcher based on lucene's ngram based suggest. I did some changes so that instead of giving the speller a dictionary I feed it with a List. For example lets say I have "HP NC4400 EY605EA CORE 2 DUO T5600 1.83GHz/512MB/80GB/12.1'' NOTEBOOK" and I index it with

Re: matching products with suggest feature

2008-02-13 Thread Cam Bazz
e you > add more terms than what exists, it won't find anything. > > On Feb 13, 2008 6:54 PM, Cam Bazz <[EMAIL PROTECTED]> wrote: > > > Hello; > > > > I am trying to make a product matcher based on lucene's ngram based > > suggest. > > I did s

Re: matching products with suggest feature

2008-02-14 Thread Cam Bazz
de, then for the first it > will suggest "abcde" but for the second it won't suggest it because the > ngrams produced are "abc" and "bce" .. and "bce" does not appear in > "abcde". > > Am I right? If not, can you elaborate more on t

query question

2008-02-19 Thread Cam Bazz
Hello, I have a tokenized field where I store some info. Lets say I have "abc 1234" and "abc 678" When the user searches for "abc1234" how can I find "abc 1234" ? Best. -C.B.

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz
Hello Erick, Has anyone found a way for deleting a document with a query? I understand it can be deleted via terms, but I need to delete a document with two terms, that is the only way I can identify my document is by looking at two terms not one. best. On Fri, Mar 14, 2008 at 4:58 PM, Erick Eri

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz
riter. > > Mike > > Cam Bazz wrote: > > > Hello Erick, > > > > Has anyone found a way for deleting a document with a query? I > > understand it > > can be deleted via terms, but I need to delete a document with two > > terms, > > that is the only w

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz
> files in the index to stable storage (assuming your IO system doesn't > "lie" on fsync). > > Mike > > On Mar 17, 2008, at 4:33 AM, Cam Bazz wrote: > > > Nice. Thanks. > > > > will the 2.4 have commit improvements that we previously talked about?

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz
bytes are not > actually written to stable storage. If you have such a device that > lies then Lucene 2.4 won't be able to guarantee index consistency on > crash/power outage. > > Mike > > Cam Bazz wrote: > > > Hello, > > > > What do you mean by I

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz
what you mean by "same thread". Maybe you meant "same > index"? > > Yes, if the IndexReader reopens. > > IndexWriter.commit() makes the changes visible to readers, and makes > the changes durable to os/computer crash or power outage. > > Mike > > Cam

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz
wrote: > > It's a hard drive issue. When you call fsync, the OS asks the hard > drive to sync. > > Mike > > Cam Bazz wrote: > > > Hello, > > > > I understand the issue. But I have not understood - is this > > hardware related > > issue - i.e a

Re: IndexReader deleteDocument

2008-03-17 Thread Cam Bazz
IndexReader. IndexReader > still searches only a point in time. > > Mike > > Cam Bazz wrote: > > > yes, I meant the same index. > > > > I thought with the new changes - the index reader would see the > > changes > > without re-opening. > > It wo

using hitcollector and scoring at the same time

2008-03-20 Thread Cam Bazz
Hello, I recently changed my query logic. Before, I was getting a hits object, and now I am using a bitSet with a hitcollector. The reason for using bitSet is document caching, and being able to count how many hits belong to which categories. Although my new logic works, I have noticed that now t

document scoring

2008-03-20 Thread Cam Bazz
Hello, I am querying an index by using custom boost factors for each field. Usually a query looks like: fieldA:"term1"^0.2 fieldB:"term2"^4 when I get scores from HitCollector, they are not necessarily between 0 and 1. How can I normalize these scores? Best. -C.A.

text extraction from pdf

2008-05-14 Thread Cam Bazz
Hello All, Any suggestions for extracting text from PDF? I have tried pdfbox, but it works nice, however if the pdf is structured, it wont provide good results. For example consider the pdf: P1 Lorem Ipsum Bla bla P3 Lorem2 Ipsum2 P1 bla bla P2 bla bla bla P

Re: text extraction from pdf

2008-05-15 Thread Cam Bazz
Hello Bill, Problem I am having is that some of them has multiple columns. and multiple word boxes. Does the xpdf patch extract different columns and wordboxes? Best, -C.B. On Wed, May 14, 2008 at 6:35 PM, Bill Janssen <[EMAIL PROTECTED]> wrote: > > > the unix program pdf2text can convert keep

Re: How to add PageRank score with lucene's relevant score in sorting

2008-05-29 Thread Cam Bazz
Hello, little off topic, but how did you obtain the pagerank for each page. did you calculate it, or have you obtained it with some other way while getting a specific site. Best. On Thu, May 29, 2008 at 3:28 PM, 过佳 <[EMAIL PROTECTED]> wrote: > thanks Glen , we have tried it , but the bottleneck

fieldNorm and fieldValueUniqueness

2008-06-11 Thread Cam Bazz
Hello, When you look at the fields of a document with Luke, there is a norm column. I have not been able to figure out what that is. The reason I am asking is that I am trying to build a uniqueness model. My Index is structured as follows: classID, textID, K, V classID is a given class. textID

Re: fieldNorm and fieldValueUniqueness

2008-06-11 Thread Cam Bazz
yes, figured it out. thanks. how about checking for uniqueness? Best. On Wed, Jun 11, 2008 at 5:39 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 11 jun 2008 kl. 16.04 skrev Cam Bazz: > >> >> When you look at the fields of a document with Luke, there is a norm &g

lucene wildcard query with stop character

2008-06-12 Thread Cam Bazz
Hello, Imagine I have the following documents having keys A A>B A>B>C A>B>D A>B>C>D now Imagine a query with keyword analyzer and a wildcard: A>B>* which will bring me A>B>C , A>B>D and A>B>C>D but I just want to get A>B>C and A>B>D so can I make a query like A>B>* but does not have the > cha

Re: lucene wildcard query with stop character

2008-06-12 Thread Cam Bazz
ote: > I assume you want all of your queries to function in this way? > > If so, you could just translate the * character into a ? at search time, > which should give you the functionality you are asking for. > > Unless I'm missing something. > > Matt > >

uniqueWords, and termDocs

2008-06-23 Thread Cam Bazz
Hello, I need to be able to select a random word out of all the words in my index. how can I do this tru termDocs() ? Also, I need to get a list of unique words as well. Is there a way to ask this to lucene? Best Regards, -C.B.

boolean query or

2008-07-08 Thread Cam Bazz
Hello, Is it possible to make a boolean query where a word is equal to fieldA or fieldB? in other words, I like to search a word in two fields, if word passes in fieldA or fieldB, then it is a hit. Best, -C.B.

lucene delete by query

2008-07-23 Thread Cam Bazz
hello, was not there a lucene delete by query feature coming up? I remember something like that, but I could not find an references. best regards, -c.b.

  1   2   >