Re: Lucene vs Glimpse

2013-02-05 Thread Mathias Dahl
Jack, What you say sounds hopeful, but it also sounds like quite some work to define/select the correct analyzer for each type of programming language (we use SQL, PL/SQL, Java and C# mainly). Compared to what I do know which is just to throw all files at Glimpse and it makes them searchable in a

RE: Lucene vs Glimpse

2013-02-05 Thread Uwe Schindler
Glimpse seems to use something similar like StandardAnalyzer. So I would give it a try. For program code this should work quite good. To make the auto-phrases work (which might be a good idea here, too), enable this feature in the query parser (I am referring to the comment by Jack about

Info on core search classes

2013-02-05 Thread Igor Shalyminov
Hi! I wonder where one can get information about current Lucene (v 4.1) core search classes - AtomicReader, CompositeReader, ReaderContexts - and how to use them properly for building custom search algorithms. Although Lucene in Action is really good, I can't find something on these classes

How to implement Lucene

2013-02-05 Thread Álvaro Vargas Quezada
Hello, I want to implement a central index, and I heard about Lucene, so I would like to ask your help to install it and configure it. My OS is Windows 7/XP/Server 2008. If I could index just one database and make a search I would be happy. I would be grateful if you can send me any info about

Re: How to implement Lucene

2013-02-05 Thread Ian Lea
You're probably better off using Solr which is tightly linked with lucene. http://lucene.apache.org/solr/ I'm sure there are installation and getting started guides there. -- Ian. On Tue, Feb 5, 2013 at 12:58 PM, Álvaro Vargas Quezada al...@outlook.com wrote: Hello, I want to implement a

Re: How to implement Lucene

2013-02-05 Thread VIGNESH S
Hi, For Basics on Lucene How to Create Lucene Index and some basic Stuffs Look in to Lucene in Action Book. On Tue, Feb 5, 2013 at 6:28 PM, Álvaro Vargas Quezada al...@outlook.com wrote: Hello, I want to implement a central index, and I heard about Lucene, so I would like to ask your help

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

2013-02-05 Thread saisantoshi
I am looking at the versions supported by newer version of Tika (1.3) and was not sure what version(s) of the Microsoft office it supports (97/2000/2010/2013) for each of the below? http://tika.apache.org/1.3/formats.html#Microsoft_Office_document_formats Microsoft word (also does it support

Lucene vs RDBMS indexing at scale

2013-02-05 Thread Drew Kutcharian
Hey Guys, I'm trying to figure out what would be a better approach to indexing when it comes to a large number of records (say 1 billion). As far as queries: 1. Only support exact matches (a field is equal to some constant value) or range matches (a field is larger/smaller than some constant

Re: Lucene vs RDBMS indexing at scale

2013-02-05 Thread Stephen Howe
Part of the answer depends on what kind of records you have. For instance, are you dealing with a lot of numeric data? If you need all those functions and only want to support exact matches and basic boolean comparisons, then I'd go with a RDBMS instead of Lucene. You'll get better support for

Re: Lucene vs RDBMS indexing at scale

2013-02-05 Thread Drew Kutcharian
The records are mostly logging events where they will have: 1. a timestamp 2. the type of the event 3. potentially a set of key/value properties Then I would want to be able to slice and dice the records based on time (required), type and/or the key/values. In addition, I would want to have

Re: Lucene vs Glimpse

2013-02-05 Thread Mathias Dahl
Thanks for the input! Seems I should give this another chance using the hints you all sent me. I'll report back my findings here. /Mathias On Mon, Feb 4, 2013 at 7:01 PM, Mathias Dahl mathias.d...@gmail.com wrote: Hi, I have hacked together a small web front end to the Glimpse text indexing

Re: Lucene vs RDBMS indexing at scale

2013-02-05 Thread David Pilato
So you probably should ask your question to the Elasticsearch mailing list. I think that some ES users already scales to x billion docs. Even if ES is Lucene based, it adds features to scale out (sharding, routing...). HTH -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le

Re: Lucene vs Glimpse

2013-02-05 Thread Dawid Weiss
Here's another thought: if you desperately need complex searches then you could do a heuristic filtering to narrow down the search: use an analyzer that does some form of input splitting into terms (removing excess whitespace or even producing n-grams from the input), then do the same for the