Hi,
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35a
Are Wildcard, Prefix, and Fuzzy queries case sensitive?
Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries
are not passed through the Analyzer, which is the component that pe
Hi,
: I want "Überraschung" is found by
:
: Überr*
: Ueberr*
:
: So the best i can do is to do the normalisation manually(not by an
: analyzer) before the indexing/searching process?
Or use an Analyzer at index time that puts both the UTF-8 version of the
string and the Latin-1 version of the st
Hi,
Here we use lucene to index our emails, currently 500.000 Documents.
When Searching the body by a WildcardQuery the problems arises.
I did some profiling with JProfiler. I see the more BooleanClause
instances used
the more memory is required during search.
Most memory is used by instances
Hi Rob,
For indexing e-mail, I recommend that you tokenise the e-mail addresses into
fragments and query on the fragments as whole terms rather than using
wildcards.
[example]
Hm for email adresses this isnt a big problem here.
The real problem is the query on the body part of an email, wh
Hi,
I indexed emails. And now i want to restrict the search functionality for
users so they only can search for emails to/from him.
i know the email address of the user so my plan is to do it in the following
way:
The user enters some search parameters, they are combined in a query.
This is a mi
Hi,
This sounds good. As for the code injection it is up to you to sanitize
the request before it goes to lucene, probably by filling the email
field yourself and not rely on the user input for the email address
I hoped i havent to sanitize the user input cause the email address
query is ANDed
Damien McCarthy schrieb:
Hi Joe,
It would probably be cleaner to use a QueryFilter rather than doing the AND.
Take a look at
http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/search/QueryFilter
.html
ok if its not to slow i go this way.
Also I'm not sure that using the se
Hi,
Hi Joe,
It might be possible when you append the restriction before parsing the
user query with the QueryParser, but I'm not sure. I recommend first
parsing the query, and then constructing a BooleanQuery with the parsed
user query and the e-mail term both as must.
yes thats the
Hi,
I am not sure, so i need ur opinion to these 2 questions:
Is it save to search an index while its beeing optimized by another java
process?
Is it save to add documents to an index while its beeing optimized by
another java process?
hi,
1. Yes it is safe to search while optimizing and adding documents to an
index.
2. NO you can not add documents to an index while it is optimized. You
can only have one instance of IndexWriter working on an index
HTH
yes it did,thx
-
your time!
Regards,
Joe
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Thanks for your reply Koji! Your suggestion worked fine. I thought
adding a field named "contents" to a document, even though it contains
a field already named "contents" would NOT do anything. But looks like
I am wrong!
Thank you for your kind help! :)
Regards,
Joe
On Mon,
uot;),
> i.e.
>
> idtext
> 1 User not found
> 3 Address not found
> 4 Fatal error
>
>
> Regards,
> Benzion.
>
>
>
--
Joe Scanlon
jscan...@element115.net
Mobile: 603 459 3242
Office: 312 445 0018
e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Joe Scanlon
jscan...@element115.net
Mobile: 603 459 3242
Office: 312 445 0018
-
To unsubsc
Thanks for the replies. Here is why I need the subreader (or subsearcher in
earlier Lucene versions):
I have multiple collections of documents, say broken out by years (it's more
complex than this, but this illustrates the use case):
Collection1 >>> D:/some folder/2009/*.pdf
> -Original Message-
> From: Devon H. O'Dell [mailto:devon.od...@gmail.com]
> Sent: Tuesday, August 30, 2011 8:04 PM
> To: java-user@lucene.apache.org
> Subject: Re: No subsearcher in Lucene 3.3?
>
> 2011/8/30 Joe MA :
> > When searching a single collection, no proble
Hello,
I'm new to lucene and I am having some trouble figuring out the right way
to use a SearcherTaxonomyManager for NRT faceted search. Assuming I set up
the STM with a reopen thread:
// Index Writer
Directory indexDir = FSDirectory.open(new File(indexDirectoryPath));
IndexWriterCon
Hello,
I'm attempting to setup a master/slave arrangment between two servers where
the master uses a SearcherTaxonomyManger to index and search, and the slave
is read-only - using just an IndexSearcher and TaxonomyReader.
So far I am able to publish new IndexAndTaxonomyRevisions on the master and
cking the commit data / index
epoch to see if taxonomy directory had been inadvertently replaced.
Thanks again,
Joe
On Fri, Nov 1, 2013 at 12:29 PM, Shai Erera wrote:
> Opened https://issues.apache.org/jira/browse/LUCENE-5320.
>
> Shai
>
>
> On Fri, Nov 1, 2013 at 4:59 PM,
be if input == ILLEGAL_STATE_READER?
Regards,
Joe
r.setReader(Tokenizer.java:89)
at
org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:307)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:145)
at LuceneStemmer.stem(LuceneStemmer.java:28)
at LuceneStemmerTest.stem(LuceneStemmerTest.java:16)
Thanks.
Regards,
Joe
On Thu, Mar 20, 2014 at 1:40
ings. There is a second method in Analyzer that
> takes a String to analyze (instead of Reader). This one uses an optimized
> workflow internally.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetap
terminal is
there a specific way it should be run.
Thanks,
Joe Cabrera
I was able to get the demo jars built using Ant and Ivy. It might be a good
idea to include in the documentation a reference to Ant and Ivy and exactly
which targets should be used.
Cheers,
On Tue, Apr 22, 2014 at 8:10 AM, Joe Cabrera wrote:
> Hi. I am trying to run the demo as specified
Greetings,
I am trying to use Lucene to search large documents, and return the pages
where a term(s) is matched. For example, say I am indexing 500 auto
manuals, each with around 1000 pages each. So if the user searched for
"Taurus" and "flat" and "tire", a good result could be "2006 Ford Ta
esn't update its
associated StoredField value. What do I miss here?
I would highly appreciate your help!
Regards,
Joe
Hi,
Could anyone help with my issue described below? If I'm not posting on the
right mailing list please direct me to the correct one.
Many thanks,
Joe
On Mon, Jun 12, 2017 at 3:05 PM, Joe Ye wrote:
> Hi,
>
> I have a few NumericDocValuesField fields and also added separate
ociated stored field? Is there
anything similar/equivalent to useDocValuesAsStored in Lucene core? We're
trying to use docValues to avoid a full update (delete + create new)...
Yet, we still need to retrieve the updated values.
Regards,
Joe
On Mon, Jun 19, 2017 at 4:16 PM, Michael McCandless &l
to get the correct
value for the target docId (and I'm not sure why 4 values here)? What did I
miss/do wrong? Could you point me to the right direction (with examples)
please?
Many thanks,
Joe
On Tue, Jun 20, 2017 at 12:14 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
Thanks very much Mike! That's very helpful! I got
MultiDocValues.getNumericValues
to work.
A follow up question: what's the best way/how do I retrieve binaryDocValues?
Regards,
Joe
On Fri, Jun 23, 2017 at 11:00 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> Try
t to check the existence of the document before each docValue
update.
Kind regards,
Joe
sk? If so, when happens if a crash occurs
before those updates are committed?
Many thanks,
Joe
On Tue, Jul 4, 2017 at 10:53 PM, Trejkaz wrote:
> On Tue, 4 Jul 2017 at 22:39, Joe Ye wrote:
>
> > Hi,
> >
> > I'm using Lucene core 6.6.
> >
&
Greetings,
I have an index where I import documents such as powerpoint, PDF, and so forth.
One nice feature I added is that for each document, I store a thumbnail of the
first page as an encoded String (uuencode) using a stored,not-indexed field.
This thumbnail gets displayed when the user fi
Greetings,
Has anyone looked into using Redis or some other in-memory cache with Lucene?
It seems that ElasticSearch may do this. Are there advantages to doing this
versus, say, the RAMDirectory class?
Thanks in advance,
J
Greetings,
I would like to search for items based on 'calculated' terms.
Specifically, say I am using Lucene to search a collection of tasks, with
fields "start_date" and "end_date", among others.
The question to solve is:
"Find all tasks that took longer than 100 days".
So the easy answer
r.open(IndexReader.java:141)
> at org.apache.lucene.index.IndexReader.open(IndexReader.java:136)
Looks like you don't have permission to access the Lucene index file(s).
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
ex - do you get the same
error then?
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
Uttam,
To unsubscribe you need to send an email to
[EMAIL PROTECTED]
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
On 10/9/07, Barik, Uttam <[EMAIL PROTECTED]> wrote:
>
>
>
>
> Regards,
> Uttam Kumar Barik
>
> IT Services
> Fidelity Bus
Hello everybody,
I know there was written a tons of words about this issue, but I'm just not
clear enough about it.
I have these facts:
1. my query is always 1 letter and *, eg. M*
2. i always want to get max 200 results, no more!
3. i don't want to fix this issue by setting maxClauseCount
I jus
arch, Mathematical Sciences Department
> IBM T.J. Watson Research Center
> (914) 945-2472
> http://www.research.ibm.com/people/g/donnagresh
> [EMAIL PROTECTED]
>
>
> "Joe K" <[EMAIL PROTECTED]> wrote on 04/10/2008 08:53:06 AM:
>
> > Hello everybody,
> >
Watson Research Center
> (914) 945-2472
> http://www.research.ibm.com/people/g/donnagresh
> [EMAIL PROTECTED]
>
>
> "Joe K" <[EMAIL PROTECTED]> wrote on 04/10/2008 08:53:06 AM:
>
> > Hello everybody,
> > I know there was written a tons of words about
bits.set(termDocs.doc());
>}
>} else {
>break;
>}
>} while (enumerator.next());
>} finally {
> termDocs.close();
>enumerator.close();
>
ts in the index, although the content of those documents will be
substantially less. I can also do this in one index and not search
indexes ParallelReader-style. What are people's gut feelings on how
this approach will impact the indexing and search performance in terms
of both speed and memor
in the worst case (my
second example) searches are 50% slower, but in almost all other cases
they're quite a bit faster.
Hope this helps,
Joe
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
nges to the Lucene core?
Thanks,
Joe
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
posix_fadvise (fd, 0, 0, POSIX_FADV_SEQUENTIAL);
// tell the kernel you will need the whole file.
posix_fadvice (fd, 0, 0, POSIX_FADV_WILLNEED);
I don't know offhand if Java binds these APIs, though.
Joe
---
Hi,
Benson Margulies wrote:
My experience tonight is that the stock 1.9-based Luke won't open my 2.0
indices. So I fixed up a version of the source.
I've been seeing this too.
Anyone else want it?
That would be great, if you don't mind. A jar would be ni
My work is to index keywords with a document. In my case, the document is
made up with HTML tags which i don't want to index them.
For example:
Input Document:
You are welcome
Testing text
Expected Keywords:
keywords:You
keywords:are
keywords:welcome
keywords:Testing
keywords:text
Is there
y index
and set its bit.
It probably doesn't scale to millions of matches, but it scales pretty
well to tens of thousands. I'd suggest breaking down into smaller
indexes if you can, and run this process across each of them
My task is to index lots of documents with different fields. Some of the
fields are tokenized and are going to be sorted later on when a list of
result set is need to particular field. Unfortunately, Lucene complains
about sort on a tokenized field.
So is there any way to get around of it?
Thank
that field? There's no good
> *general* answer that I've been able to see.
>
> So I suspect you really want to do something that's not
> document sorting, and if you'd make a clearer statement of
> what you're trying to accomplish I'm sure you'd ge
few times on the various lists.
What constitutes warming up a searcher, simply running a dummy query?
Joe
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Greetings,
I would like to add the number of possible hits in my queries, for example,
"found 18 hits out of a possible 245,000 documents". I am assuming that
IndexReader.numDocs() is the best way to get this value.
However, I would like to use a filter as part of the query. What is the
most e
een by a reader on
the same index (flush may happen only after the add)."
Thanks for your help..
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
Hi Erick,
I'm guessing that your problem is what gets indexed. What analyzer
are you using when indexing? One that breaks words apart on, say,
periods?
I am using the StandardAnalyzer. When I do a test query using Luke, it
returns the object I'm looking for. The query I use is:
id:"com.mycomp
om
working?
Thanks...
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
On 7/3/07, Joe Attardi <[EMAIL PROTECTED]> wrote:
Hi Erick,
I'm guessing that your problem is what gets indexed. What analyzer
> are you using when indexing? One that breaks words apart on, say
Hi Chris,
That did it! Thanks for the help. I should have read the javadocs for
Field.Index more closely!
Thanks to everyone else for their input too.
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
On 7/3/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
It sounds lik
I don't want to get ahead of myself. One at
a time... :)
Appreciate any help you all might have!
--
Joe Attardi
ored/not indexed("Joe's Devices") ? Or
can I accomplish this case-insensitive "contains" search some other way -
would I have to write a custom Analyzer, or something?
Thanks in advance!
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
>
> It does sound very strange to me, to default to a WildCardQuery! Suppose I
> am looking for "bold", I am getting hits for "old".
I know - but that's what the requirements dictate. A better example might be
a MAC or IP address, where someone might be searching for a string in the
middle - like,
Following up on my recent question. It has been suggested to me that I can
run the query text through an Analyzer without using the QueryParser. For
example, if I know what field to be searched I can create a PrefixQuery or
WildcardQuery, but still want to process the search text with the same
Anal
So then would I just concatenate the tokens together to form the query text?
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
On 7/30/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> Would this work?
>
> TokenStream ts = StandardAnalyzer.tokenStream();
&
should I just index
a MAC address as UN_TOKENIZED ?
Thanks
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
On 7/30/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote:
>
>
> >
> > So then would I just concatenate the tokens together to form
> > the query te
You are probably using the StandardAnalyzer which removes stop words such as
"and".
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
On 8/1/07, masz-wow <[EMAIL PROTECTED]> wrote:
>
>
> I understand that only document that has been indexed will be able
ken per octet ("00", "17", "fd", "14",
"d3", "2a"). Many searches will be for partial IPs or MACs ("192.168",
"00:17:fd", etc).
Are either of these methods of indexing the addresses (single token vs
per-octet token)
Hi Erick,
First, consider using your own analyzer and/or breaking the IP addresses
> up by substituting ' ' for '.' upon input.
Do you mean breaking the IP up into one token for each segment, like ["192",
"168", "1", "100"] ?
> But on to your question. Please post what you mean by
> "a large n
On 8/1/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> Use a SpanNearQuery with a slop of 0 and specify true for ordering.
> What that will do is require that the segments you specify must appear
> in order with no gaps. You have to construct this yourself since there's
> no support for SpanQueri
Hello,
I've been asked to devise some way to discover and correct data in Lucene
indexes that have been "corrupted." The word "corrupt", in this case, has a
few different meanings, some of which strike me as exceedingly difficult to
grok. What concerns me are the cases where we don't know that
We're planning on using encryption at the filesystem level (whole-disk
encryption) and, to be honest, I don't have a mechanism that can produce the
changes I'm talking about. Neither does my boss, unfortunately ;) He came
along one day and asked, "how do we know when data changed on disk without
Problem.
I can add one or multiple TermQuery's to the BooleanQuery for searching and I
am getting Hits when i preform the search on various indexes. If i add a
PhraseQuery to the BooleanQuery on a search i get zero hits.
Some Background Information:
Indexing using standard anaylzer.
I
you need to specify it from the command line
ie, java org.apache.lucene.demo.IndexFile 'type in your starting directory
here'
On 3/14/06, Miki Sun <[EMAIL PROTECTED]> wrote:
>
> I think I did. I modified these code:
>
> //creat a directory to write the indices to
> static final File INDEX_DIR =
All-
We are looking for someone with search experience (we leverage Lucene)
to lead a small team of developers as described below. If you are
interested, send your resume to [EMAIL PROTECTED] Thanks.
Joe
Job Title: Technical Lead/Engineering Manager - Ariba Content
Summary:
Ariba
I'm trying to do a search on ( Java PHP C++ ) with
lucene 1.9. I am using a MultiFieldQueryParser to
parse with StandardAnalyzer. Before I parse the string
I clean up the search string and it looks like this (
Java PHP C\+\+ ). The query is only searching on "c"
and not "c++" any ideas as to what
Hi Clive,
Lucene is a general purpose search engine. If you need crawling
capabilities on top of Lucene take a look at Nutch:
http://lucene.apache.org/nutch/
On 6/29/06, Clive. <[EMAIL PROTECTED]> wrote:
Hi,
I am working on adding a search feature to a web site that uses single
database dri
Lucene uses this lock to ensure the index does not become
corrupt when IndexReaders and IndexWriters are working on the same index.
What are the conditions that cause corruption? If there is just one
writer and multiple readers, is that safe?
---
75 matches
Mail list logo