Re: high memory usage by indexreader

2013-03-22 Thread Ian Lea
I did ask if there was anything else relevant you'd forgotten to mention ...

How fast are general file operations on the NFS files?  Your times are
still extremely long and my guess is that your network/NFS setup are
to blame.

Can you run your code on the server that is exporting the index, if
only for comparison?

Your attachment didn't make it to the list.  In this context, any
sample code that is too big to cut and paste into an email message is
too big anyway.  If necessary cut it down to a trivial example.

But verify performance against a local index first.


--
Ian.


On Thu, Mar 21, 2013 at 10:37 PM, ash nix  wrote:
> Hi Ian,
>
> Thanks for your reply.
> The index is on NFS and there is no storage local/near to machine.
> Operating system is  CentOS 6.3 with linux 2.6. It has 16 Gigs of memory.
> By initializing the Indexreader, I mean opening the IndexReader.
>
> I timed my operations using System.currentTimeMillis and executed the
> process couple of times.
> To open the IndexReader it takes 1.5 minutes at minimum and 2.5 minutes at 
> max.
> To search a boolean AND query of 2-4 terms the search time on an
> average took 56 seconds.
>
> Apart from that I found major bottle neck in my process (updateScores call).
>
> Is the indexreader open time and search time looks  okay to you?
> My dataset is going to increase and there will be lot more documents
> with more fields.
> I am attaching the code which performs the search.
>
> Thanks,
> Ashwin
>
>
> On Thu, Mar 21, 2013 at 6:43 AM, Ian Lea  wrote:
>> That number of docs is far more than I've ever worked with but I'm
>> still surprised it takes 4 minutes to initialize an index reader.
>>
>> What exactly do you mean by initialization?  Show us the code that
>> takes 4 minutes.
>>
>> What version of lucene?  What OS?  What disks?
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Mar 20, 2013 at 6:21 PM, ash nix  wrote:
>>> Thanks Ian.
>>>
>>> Number of documents in index is 381,153,828.
>>> The data set size is 1.9TB.
>>> The index size of this dataset is 290G. It is single index.
>>> The following are the fields indexed for each of the document.
>>>
>>> 1. Document id : It is StoredField and is generally around 128 chars or 
>>> more.
>>> 2. Text field:  It is TextField  and not stored.
>>> 3. Title : it is a Textfield and not stored.
>>> 4. anchor : It is Textfield and not stored.
>>> 5. Timestamp : DoubleDocValue field and not stored. Actually this
>>> should be DoubleField and I need to fix it.
>>>
>>> Initialization of indexreader at the start of search takes approximately 4 
>>> min.
>>> After initialization , I am executing a series of Boolean AND queries
>>> of 2-3 terms. Each search result is dumped with some information on
>>> score and doc id in a output file.
>>>
>>> The resident size (RES) of process is 1.7 Gigs.
>>> The total virtual memory (VIRT) is 307 Gig.
>>>
>>> Do you think this is okay?
>>> Do you think I should use Solr instead of using lucene core?
>>>
>>> I have times tamps for document and so I can split into multiple
>>> indexes sorted on chronology.
>>>
>>> Thanks,
>>> Ashwin
>>>
>>> On Wed, Mar 20, 2013 at 1:43 PM, Ian Lea  wrote:
 Searching doesn't usually use that much memory, even on large indexes.

 What version of lucene are you on?  How many docs in the index?  What
 does a slow query look like (q.toString()) and what search method are
 you calling?  Anything else relevant you forgot to tell us?


 Or google "lucene sharding" if you are determined to split the index.


 --
 Ian.


 On Wed, Mar 20, 2013 at 5:12 PM, ash nix  wrote:
> Hi Everybody,
>
> I have created a single compound index which is of size 250 Gigs.
> I open a single index reader to search simple boolean queries.
> The process is consuming lot of memory search painfully slow.
>
> It seems that I will have to create multiple indexes and have multiple
> index readers.
> Can anyone suggest me good blog or documentation on creating multiple
> indexes and performing parallel search.
>
> --
> Thanks,
> A
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org

>>>
>>>
>>>
>>> --
>>> Thanks,
>>> A
>>>
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additi

question about document-frequency in score

2013-03-22 Thread Nicole Lacoste
Hi

I am trying to figure out if the document-frequency of a term used in
calculating the score.  Is it per field?  Or is independent of the field?

Thanks

Niki

-- 
* *


Multi-value fields in Lucene 4.1

2013-03-22 Thread Chris Bamford
Hi,

If I index several similar values in a multivalued field (e.g. many authors to 
one book), is there any way to know which of these matched during a query?
e.g.

  Book "The art of Stuff", with authors "Bob Thingummy" and "Belinda Bootstrap"

If we queried for +(author:Be*) and matched this document, is there a way of 
drilling down and identifying the specific sub-field that actually triggered 
the match ("Belinda Bootstrap") ?  I was wondering what the lowest granularity 
of matching actually is - document / field / sub-field ...

I am happy to index with term vectors and positions if it helps.

Thanks,

- Chris


Re: PayloadFunctions don't work the same since 4.1

2013-03-22 Thread jimtronic
Thanks for the response. I wrote some new custom payload functions to verify
that I'm getting the value correctly and I think I am, but I did unearth
this clue.

In the docs below, the score should be the sum of all the payloads for the
term "bing". It appears to be using the value for the first term/payload it
sees for every term it finds.

  {
"id":"3",
"foo_ap":["bing|7 bing|9",
  "bing|9 bing|7"],
"score":28.0},
  {
"id":"2",
"foo_ap":["bing|9",
  "bing|7"],
"score":18.0},
  {
"id":"4",
"foo_ap":["bing|9 bing|7"],
"score":18.0},
  {
"id":"1",
"foo_ap":["bing|9"],
"score":9.0}
]

Now, the question is whether this is a storage problem or a retrieval
problem...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/PayloadFunctions-don-t-work-the-same-since-4-1-tp4049947p404.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: question about document-frequency in score

2013-03-22 Thread Simon Willnauer
all statistics in lucene are per field so is document frequency

simon

On Fri, Mar 22, 2013 at 10:48 AM, Nicole Lacoste  wrote:
> Hi
>
> I am trying to figure out if the document-frequency of a term used in
> calculating the score.  Is it per field?  Or is independent of the field?
>
> Thanks
>
> Niki
>
> --
> * *

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Getting documents from suggestions

2013-03-22 Thread Bratislav Stojanovic
OK, I've played with all this solutions and basically only one gave me
satisfying results. Using build()
with TermFreqPayload argument gave me horrible performance, because it
takes more than 5 mins
to iterate through all Terms in the index and to filter them based on the
doc id. Not sure if this nested
loop can be further optimized, but my index is barely 30MB and I have
around 300K terms.

It turns out that Jack Krupansky's answer was way to go. I build
AnalyzingSuggester using
LuceneDictionary which is really fast and then filter suggestions further
by issuing a query to
the index. Here's the code in case anyone is interested :

// generate AnalyzingSuggestions
// use existing analyzer
this.as = new AnalyzingSuggester(analyzer);

as.load(new FileInputStream(new File(suggsPath)));
if (as.sizeInBytes() == 0) {
logger.info("Building analyzer suggester...");
 as.build(new LuceneDictionary(reader, "contents"));
 as.store(new FileOutputStream(new File(suggsPath)));
}



// now, in servlet, for each suggestion fire a query
List suggs = as.lookup(q, false, 10); // do not pass true as
a second param!
logger.info("Found "+suggs.size()+" suggestions");
List filtered = new ArrayList();
for (LookupResult sug : suggs) {
if (searchSugg(sug.key.toString(), uid)) {
filtered.add(sug);
}
}
logger.info("Found "+filtered.size()+" filtered suggestions");

-

public boolean searchSugg(String q, long uid) {
...
if (q == null) {
logger.warn("Query is null");
   return false;
}
if (q.isEmpty()) {
  logger.warn("Query is empty");
return false;
}
 Date start = new Date();
 String qStr = q.trim();
//Query query = parser.parse(qStr);
   BooleanQuery query = new BooleanQuery();
  query.add(new BooleanClause(new TermQuery(new Term("contents",
qStr)), BooleanClause.Occur.MUST));
  BytesRef ref = new BytesRef();
  NumericUtils.longToPrefixCoded(uid, 0, ref);
  query.add(new BooleanClause(new TermQuery(new Term("userid", ref)),
BooleanClause.Occur.MUST));
  logger.info("Searching for: " + query.toString("contents"));

  TopDocs results = searcher.search(query, 1);
  ScoreDoc[] hits = results.scoreDocs;

  int numTotalHits = results.totalHits;
  logger.info(numTotalHits + " total matching documents");
   Date end = new Date();
  long qTime = end.getTime()-start.getTime();
  logger.info("Search took "+qTime+" ms");

  return numTotalHits > 0;

...


On Sat, Mar 16, 2013 at 8:54 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Sat, Mar 16, 2013 at 7:47 AM, Bratislav Stojanovic
>  wrote:
> > Hey Mike,
> >
> > Is this what I should be looking at?
> >
> https://builds.apache.org/job/Lucene-Artifacts-trunk/javadoc/suggest/org/apache/lucene/search/suggest/analyzing/package-summary.html
> >
> > Not sure how to call build(), i.e. what to pass as a parameter...Any
> > examples?
> > Where to specify my payload (which is "id" long field from the index)?
>
> build() takes a TermFreqPayload iterator, which iterates over the
> weight/input text/payload that you provide.
>
> Have a look at AnalyzingSuggesterTest, eg testKeywordWithPayloads.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Bratislav Stojanovic, M.Sc.


Re: Multi-value fields in Lucene 4.1

2013-03-22 Thread Jack Krupansky
I don't think there is a way of identifying which of the values of a 
multivalued field matched. But... I haven't checked the code to be 
absolutely certain whether their isn't some expert way.


Also, realize that multiple values could match, such as if you queried for 
"B*".


-- Jack Krupansky

-Original Message- 
From: Chris Bamford

Sent: Friday, March 22, 2013 5:57 AM
To: java-user@lucene.apache.org
Subject: Multi-value fields in Lucene 4.1

Hi,

If I index several similar values in a multivalued field (e.g. many authors 
to one book), is there any way to know which of these matched during a 
query?

e.g.

 Book "The art of Stuff", with authors "Bob Thingummy" and "Belinda 
Bootstrap"


If we queried for +(author:Be*) and matched this document, is there a way of 
drilling down and identifying the specific sub-field that actually triggered 
the match ("Belinda Bootstrap") ?  I was wondering what the lowest 
granularity of matching actually is - document / field / sub-field ...


I am happy to index with term vectors and positions if it helps.

Thanks,

- Chris 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: PayloadFunctions don't work the same since 4.1

2013-03-22 Thread Duke DAI
Most likely, the cause is what I said. I guess when you try to convert
bytes to number you didn't use the payload.offset to locate the right start
of bytes. Before 4.1, the start of payload is the expected value. But since
4.1, you must use the offset and length to get the correct bytes you
wanted.

Best regards,
Duke
If not now, when? If not me, who?


On Fri, Mar 22, 2013 at 6:37 PM, jimtronic  wrote:

> Thanks for the response. I wrote some new custom payload functions to
> verify
> that I'm getting the value correctly and I think I am, but I did unearth
> this clue.
>
> In the docs below, the score should be the sum of all the payloads for the
> term "bing". It appears to be using the value for the first term/payload it
> sees for every term it finds.
>
>   {
> "id":"3",
> "foo_ap":["bing|7 bing|9",
>   "bing|9 bing|7"],
> "score":28.0},
>   {
> "id":"2",
> "foo_ap":["bing|9",
>   "bing|7"],
> "score":18.0},
>   {
> "id":"4",
> "foo_ap":["bing|9 bing|7"],
> "score":18.0},
>   {
> "id":"1",
> "foo_ap":["bing|9"],
> "score":9.0}
> ]
>
> Now, the question is whether this is a storage problem or a retrieval
> problem...
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/PayloadFunctions-don-t-work-the-same-since-4-1-tp4049947p404.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Lucene reliability as primary store

2013-03-22 Thread Pablo Guerrero
Hi all,

I'm evaluating using Lucene for some data that would not be stored anywhere
else, and I'm concerned about reliabilty. Having a database storing the
data in addition to Lucene would be a problem, and I want to know if Lucene
is reliable enough.

Reading this article,
http://blog.mikemccandless.com/2012/03/transactional-lucene.html I think
that all committed data would be safe (at least as safe as in, for example,
MySQL on the same machine) in the event of JVM crash or system crash. Is
that true?

As an example, if I have an index with some data already committed, A, and
the JVM crashes during a commit of data B, could the index be corrupted, or
will just ignore B? If it's corrupted, will CheckIndex be able to recover,
at least all data in A? Will it be also true in the case of a power
shutdown, where the OS buffers are lost, but there is no disk corruption?

Thank you in advance,
Pablo


Re: Multi-value fields in Lucene 4.1

2013-03-22 Thread Michael McCandless
You might be able to get close if you use PostingsHighlighter: it
tells you the offset of each matched Passage, and you can correlate
that to which field value (assuming you stored the multi-valued
fields).

You must index offsets into your postings.

But there are caveats ... if you use positional queries,
PostingsHighlighter will find highlights that didn't necessarily match
the query ... and if you use MultiTermQueries (author:Be*) you have to
pre-rewrite this otherwise PH won't highlight the terms ...

Mike McCandless

http://blog.mikemccandless.com

On Fri, Mar 22, 2013 at 5:57 AM, Chris Bamford
 wrote:
> Hi,
>
> If I index several similar values in a multivalued field (e.g. many authors 
> to one book), is there any way to know which of these matched during a query?
> e.g.
>
>   Book "The art of Stuff", with authors "Bob Thingummy" and "Belinda 
> Bootstrap"
>
> If we queried for +(author:Be*) and matched this document, is there a way of 
> drilling down and identifying the specific sub-field that actually triggered 
> the match ("Belinda Bootstrap") ?  I was wondering what the lowest 
> granularity of matching actually is - document / field / sub-field ...
>
> I am happy to index with term vectors and positions if it helps.
>
> Thanks,
>
> - Chris

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Field.Index deprecation ?

2013-03-22 Thread jeffthorne
I am new to Lucene and going through the Lucene in Action 2nd edition book. I
have a quick question on the best way to add fields to a document now that
Field.Index is deprecated.

Here is what I am doing and what most example online suggest:

doc.add(new Field("id", dbID, Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS));

What is the new recommended way to set Index properties on Fields with
Field.Index going away? Can't seem to find anything online.

Thanks for the help,
Jeff






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-Index-deprecation-tp4050068.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Accent insensitive analyzer

2013-03-22 Thread Jerome Blouin
Hello,

I'm looking for an analyzer that allows performing accent insensitive search in 
latin languages. I'm currently using the StandardAnalyzer but it doesn't 
fulfill this need. Could you please point me to the one I need to use? I've 
checked the javadoc for the various analyzer packages but can't find one. Do I 
need to implement my own analyzer?

Regards,
Jerome



Re: Field.Index deprecation ?

2013-03-22 Thread Michael McCandless
We badly need Lucene in Action 3rd edition!

The easiest approach is to use one of the new XXXField classes under
oal.document, eg StringField for your example.

If none of the existing XXXFields "fit", you can make a custom
FieldType, tweak all of its settings, and then create a Field from
that.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Mar 22, 2013 at 11:22 AM, jeffthorne  wrote:
> I am new to Lucene and going through the Lucene in Action 2nd edition book. I
> have a quick question on the best way to add fields to a document now that
> Field.Index is deprecated.
>
> Here is what I am doing and what most example online suggest:
>
> doc.add(new Field("id", dbID, Store.YES,
> Field.Index.NOT_ANALYZED_NO_NORMS));
>
> What is the new recommended way to set Index properties on Fields with
> Field.Index going away? Can't seem to find anything online.
>
> Thanks for the help,
> Jeff
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Field-Index-deprecation-tp4050068.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Accent insensitive analyzer

2013-03-22 Thread Jack Krupansky

Try the ASCII Folding FIlter:
https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html

-- Jack Krupansky

-Original Message- 
From: Jerome Blouin

Sent: Friday, March 22, 2013 12:22 PM
To: java-user@lucene.apache.org
Subject: Accent insensitive analyzer

Hello,

I'm looking for an analyzer that allows performing accent insensitive search 
in latin languages. I'm currently using the StandardAnalyzer but it doesn't 
fulfill this need. Could you please point me to the one I need to use? I've 
checked the javadoc for the various analyzer packages but can't find one. Do 
I need to implement my own analyzer?


Regards,
Jerome


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Accent insensitive analyzer

2013-03-22 Thread Jerome Blouin
I understand that I can't configure it on an analyzer so on which class can I 
apply it?

Thank,
Jerome

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Friday, March 22, 2013 12:38 PM
To: java-user@lucene.apache.org
Subject: Re: Accent insensitive analyzer

Try the ASCII Folding FIlter:
https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html

-- Jack Krupansky

-Original Message-
From: Jerome Blouin
Sent: Friday, March 22, 2013 12:22 PM
To: java-user@lucene.apache.org
Subject: Accent insensitive analyzer

Hello,

I'm looking for an analyzer that allows performing accent insensitive search in 
latin languages. I'm currently using the StandardAnalyzer but it doesn't 
fulfill this need. Could you please point me to the one I need to use? I've 
checked the javadoc for the various analyzer packages but can't find one. Do I 
need to implement my own analyzer?

Regards,
Jerome


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Accent insensitive analyzer

2013-03-22 Thread Jack Krupansky

Start with the Standard Tokenizer:
https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html

-- Jack Krupansky

-Original Message- 
From: Jerome Blouin

Sent: Friday, March 22, 2013 12:53 PM
To: java-user@lucene.apache.org
Subject: RE: Accent insensitive analyzer

I understand that I can't configure it on an analyzer so on which class can 
I apply it?


Thank,
Jerome

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Friday, March 22, 2013 12:38 PM
To: java-user@lucene.apache.org
Subject: Re: Accent insensitive analyzer

Try the ASCII Folding FIlter:
https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html

-- Jack Krupansky

-Original Message-
From: Jerome Blouin
Sent: Friday, March 22, 2013 12:22 PM
To: java-user@lucene.apache.org
Subject: Accent insensitive analyzer

Hello,

I'm looking for an analyzer that allows performing accent insensitive search 
in latin languages. I'm currently using the StandardAnalyzer but it doesn't 
fulfill this need. Could you please point me to the one I need to use? I've 
checked the javadoc for the various analyzer packages but can't find one. Do 
I need to implement my own analyzer?


Regards,
Jerome


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Segment file clean-up and codecs

2013-03-22 Thread Ravikumar Govindarajan
Most of us, writing custom codec use segment-name as a handle and push data
to a different storage

Would it be possible to get a hook in the codec APIs, when obsolete segment
files are cleaned up after merges?

Currently, this is always implemented as a hack

--
Ravi


Re: Accent insensitive analyzer

2013-03-22 Thread SUJIT PAL
Hi Jerome,

How about this one?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory

Regards,
Sujit

On Mar 22, 2013, at 9:22 AM, Jerome Blouin wrote:

> Hello,
> 
> I'm looking for an analyzer that allows performing accent insensitive search 
> in latin languages. I'm currently using the StandardAnalyzer but it doesn't 
> fulfill this need. Could you please point me to the one I need to use? I've 
> checked the javadoc for the various analyzer packages but can't find one. Do 
> I need to implement my own analyzer?
> 
> Regards,
> Jerome
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Accent insensitive analyzer

2013-03-22 Thread Jerome Blouin
Thanks. I'll check that later.

-Original Message-
From: Sujit Pal [mailto:sujitatgt...@gmail.com] On Behalf Of SUJIT PAL
Sent: Friday, March 22, 2013 2:52 PM
To: java-user@lucene.apache.org
Subject: Re: Accent insensitive analyzer

Hi Jerome,

How about this one?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory

Regards,
Sujit

On Mar 22, 2013, at 9:22 AM, Jerome Blouin wrote:

> Hello,
> 
> I'm looking for an analyzer that allows performing accent insensitive search 
> in latin languages. I'm currently using the StandardAnalyzer but it doesn't 
> fulfill this need. Could you please point me to the one I need to use? I've 
> checked the javadoc for the various analyzer packages but can't find one. Do 
> I need to implement my own analyzer?
> 
> Regards,
> Jerome
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Segment file clean-up and codecs

2013-03-22 Thread Simon Willnauer
can you send this to d...@lucene.apache.org?

simon

On Fri, Mar 22, 2013 at 7:52 PM, Ravikumar Govindarajan
 wrote:
> Most of us, writing custom codec use segment-name as a handle and push data
> to a different storage
>
> Would it be possible to get a hook in the codec APIs, when obsolete segment
> files are cleaned up after merges?
>
> Currently, this is always implemented as a hack
>
> --
> Ravi

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene reliability as primary store

2013-03-22 Thread Simon Willnauer
On Fri, Mar 22, 2013 at 2:00 PM, Pablo Guerrero  wrote:
> Hi all,
>
> I'm evaluating using Lucene for some data that would not be stored anywhere
> else, and I'm concerned about reliabilty. Having a database storing the
> data in addition to Lucene would be a problem, and I want to know if Lucene
> is reliable enough.
>
> Reading this article,
> http://blog.mikemccandless.com/2012/03/transactional-lucene.html I think
> that all committed data would be safe (at least as safe as in, for example,
> MySQL on the same machine) in the event of JVM crash or system crash. Is
> that true?

yes that is true. Yet, a commit in Lucene is still pretty expensive,
apps like ElasticSearch or Solr us a Journal / TranactionLog to
overcome this.

>
> As an example, if I have an index with some data already committed, A, and
> the JVM crashes during a commit of data B, could the index be corrupted, or
> will just ignore B? If it's corrupted, will CheckIndex be able to recover,
> at least all data in A? Will it be also true in the case of a power
> shutdown, where the OS buffers are lost, but there is no disk corruption?

unless there is a bug, the index will not be corrupted and B is
ignored / lost. CheckIndex will not be able to recover your lost docs
it will only delete broken segments if you ask it to do so. Once you
commit and lucene returned successfully you should also survice a
power outage. If you disk is broken then your index will likely be
broken too.

simon
>
> Thank you in advance,
> Pablo

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Field.Index deprecation ?

2013-03-22 Thread Simon Willnauer
On Fri, Mar 22, 2013 at 5:28 PM, Michael McCandless
 wrote:
> We badly need Lucene in Action 3rd edition!
go mike go!!!

;)
>
> The easiest approach is to use one of the new XXXField classes under
> oal.document, eg StringField for your example.
>
> If none of the existing XXXFields "fit", you can make a custom
> FieldType, tweak all of its settings, and then create a Field from
> that.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Mar 22, 2013 at 11:22 AM, jeffthorne  wrote:
>> I am new to Lucene and going through the Lucene in Action 2nd edition book. I
>> have a quick question on the best way to add fields to a document now that
>> Field.Index is deprecated.
>>
>> Here is what I am doing and what most example online suggest:
>>
>> doc.add(new Field("id", dbID, Store.YES,
>> Field.Index.NOT_ANALYZED_NO_NORMS));
>>
>> What is the new recommended way to set Index properties on Fields with
>> Field.Index going away? Can't seem to find anything online.
>>
>> Thanks for the help,
>> Jeff
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Field-Index-deprecation-tp4050068.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Field.Index deprecation ?

2013-03-22 Thread Michael McCandless
On Fri, Mar 22, 2013 at 3:14 PM, Simon Willnauer
 wrote:
> On Fri, Mar 22, 2013 at 5:28 PM, Michael McCandless
>  wrote:
>> We badly need Lucene in Action 3rd edition!
> go mike go!!!

Once is enough :)

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Field.Index deprecation ?

2013-03-22 Thread Uwe Schindler
Come on! :-)

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Friday, March 22, 2013 9:41 PM
> To: java-user@lucene.apache.org; simon.willna...@gmail.com
> Subject: Re: Field.Index deprecation ?
> 
> On Fri, Mar 22, 2013 at 3:14 PM, Simon Willnauer
>  wrote:
> > On Fri, Mar 22, 2013 at 5:28 PM, Michael McCandless
> >  wrote:
> >> We badly need Lucene in Action 3rd edition!
> > go mike go!!!
> 
> Once is enough :)
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Field.Index deprecation ?

2013-03-22 Thread Igal Sapir
+1

I own a copy of 2nd Edition and will gladly purchase 3rd Edition when it's
out.

--
typos, misspels, and other weird words brought to you courtesy of my mobile
device and its auto-(in)correct feature.
On Mar 22, 2013 3:21 PM, "Uwe Schindler"  wrote:

> Come on! :-)
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Michael McCandless [mailto:luc...@mikemccandless.com]
> > Sent: Friday, March 22, 2013 9:41 PM
> > To: java-user@lucene.apache.org; simon.willna...@gmail.com
> > Subject: Re: Field.Index deprecation ?
> >
> > On Fri, Mar 22, 2013 at 3:14 PM, Simon Willnauer
> >  wrote:
> > > On Fri, Mar 22, 2013 at 5:28 PM, Michael McCandless
> > >  wrote:
> > >> We badly need Lucene in Action 3rd edition!
> > > go mike go!!!
> >
> > Once is enough :)
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>