Re: Simple Web Search

2008-06-17 Thread Lukas Vlcek
Hi, If your content is stored in database then you might be also interested in Compass (I have a very positive experience with this product). Hibernate search can be other interesting product for you (I don't have any experience with this product so I am not able to tell you). Lukas On Tue, Jun

Re: distributed lucene progress

2008-06-02 Thread Lukas Vlcek
FYI: The Ning's code seems to be part of Hadoop contrib package now. On Sat, May 31, 2008 at 5:35 AM, Matt Ronge <[EMAIL PROTECTED]> wrote: > > On May 21, 2008, at 3:19 PM, Otis Gospodnetic wrote: > > No, that's a separate project on SF, IIRC. >> > > I am also interested in distributed lucene. I

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread Lukas Vlcek
Does it make sense to consider using OpenOffice to convert from MS formats to PDF or HTML before indexing. Would this yield me a lower fail rate as opposed to pure POI approach? I don't care about formating now I care about content in the first place. Formating would be important only in the case t

Re: Does Lucene save an offline version of web pages?

2008-04-27 Thread Lukas Vlcek
Hi, this sounds like job for Nutch (one of Lucene family projects). On Sun, Apr 27, 2008 at 8:26 PM, Legolas wood <[EMAIL PROTECTED]> wrote: > Hi > Thank you for reading my post. > I have to design a system with the following requirements, I think > Lucene or one of the projects which are based

Re: Compass

2008-01-22 Thread Lukas Vlcek
Hi, I am using Compass with Spring and JPA. It works pretty nice. I don't store index into database, I use traditional file system based Lucene index. Updates work very well but you have to be careful about proper mapping of your objects into search engine (specially parent-child mappings). Regar

Nutch - Microsoft Search Server integration

2008-01-14 Thread Lukas Vlcek
Hi, Is it possible to integrate Nutch into MS Search Server via OpenSearch API? (MS Search Server support Open Search: http://www.microsoft.com/enterprisesearch/serverproducts/searchserver/features.aspx ) I think it should be possible to pass user query from MS server to Nutch and integrate Nutch

Re: Wikia search goes live today

2008-01-08 Thread Lukas Vlcek
I should note that this technique is probably not easily applicable to current Lucene scoring mechanism without additional development. On 1/8/08, Lukas Vlcek <[EMAIL PROTECTED]> wrote: > > After checking the Lucene API of ParallelReader it seems that the star > score could be stor

Re: Wikia search goes live today

2008-01-08 Thread Lukas Vlcek
[EMAIL PROTECTED]> wrote: > > Lukas Vlcek wrote: > > So staring will be accommodated only during indexing phase. Does it mean > it > > will be pretty static value not a dynamically changing variable... > correct? > > In other words if I add my starts to some docum

Re: Wikia search goes live today

2008-01-08 Thread Lukas Vlcek
ennis Kubes <[EMAIL PROTECTED]> wrote: > > Star ratings are being stored but not accounted for in the score as of > yet. The plan is to include them in future indexing scores. :) > > Dennis > > Mike Klaas wrote: > > On 7-Jan-08, at 11:49 PM, Lukas Vlcek wrote: > &

Re: Wikia search goes live today

2008-01-07 Thread Lukas Vlcek
antly in the future. Obviously I don't see the big picture but I think they don't have any other option then contributing back to community if they mean it seriously. On Jan 8, 2008 8:49 AM, Lukas Vlcek <[EMAIL PROTECTED]> wrote: > This would be great! > > I am partic

Re: Wikia search goes live today

2008-01-07 Thread Lukas Vlcek
f they are misusing an ASF trademark (but, IANAL, so I don't > know) since they don't state that Nutch is a trademark of the ASF. > But, that is a discussion for somewhere else... > > > On Jan 7, 2008, at 8:13 AM, Grant Ingersoll wrote: > > > > &g

Wikia search goes live today

2008-01-07 Thread Lukas Vlcek
Hi, I noticed that Wikia search goes live today (see http://www.devxnews.com/article.php/3719906). Does anybody know where I could find more technical information about their solution? Are they going to contribute their enhancements back to Lucene/Nutch/Hadoop code? My understanding is that as lon

ApacheCon 2008 Europe - Lucene stuff

2007-11-26 Thread Lukas Vlcek
Hi, Is anybody going to present anything about Lucene (and related technologies - Solr, Hadoop, ...) at ApacheCon 2008 Europe? Any training sessions, invited talks and/or specific track? The conference pages (http://www.eu.apachecon.com/) does not contain any details yet. Regards, Lukas -- http:

Re: Lucene jdbc

2007-11-26 Thread Lukas Vlcek
AFAIK no. Lucene is revelance based query engine not relation based engine like SQL database. However, if you really want to use SQL on top of Lucene index then there can be a way. You need to store index into database (see here

Re: Customized search with Lucene?

2007-10-25 Thread Lukas Vlcek
discussions in the > list about it. > > Any solution which is docid based (even your own new > query/weight/scorer) will have to deal with this issue. > > Another approach you may consider is to augment documents with > words of queries that users searched just before clicking

Re: Customized search with Lucene?

2007-10-24 Thread Lukas Vlcek
arch time, and can use > it to create a ValueSource. > > Doron > > "Lukas Vlcek" <[EMAIL PROTECTED]> wrote on 13/10/2007 08:53:47: > > > Hi, > > > > I am looking for an easy (~preferred) way of implementing > > customized search > > with Lucene.

Re: getting summary from lucene index

2007-10-15 Thread Lukas Vlcek
Hi, See highlighter package in Lucene/contrib folder. Regards, Lukas On 10/16/07, mic1099 <[EMAIL PROTECTED]> wrote: > > > I used nutch to index my aplication. I wanted to handle indexing my self > so i > used lucene api to index. > Everything went ok except of getting summary. Under the term summ

Customized search with Lucene?

2007-10-12 Thread Lukas Vlcek
Hi, I am looking for an easy (~preferred) way of implementing customized search with Lucene. What I mean by this is changing order of returned hits according to user profile. In simple words I would like to be able to tweak order of documents in Hits collection before it is presented to the client

Search in SharePoint Server 2007

2007-08-24 Thread Lukas Vlcek
Hi, Does anybody here have any experience with Search technology used in Microsoft Office SharePoint Server 2007? (More info can be found here: http://office.microsoft.com/en-us/sharepointserver/HA102261451033.aspx) I am particularly interested in some comparison to Lucene technology. Does it sup

Re: Question about highlighting returning nothing

2007-08-16 Thread Lukas Vlcek
tances as the criteria becomes > a bitset rather than a list of terms in the rewritten query. > > > Cheers > Mark > > > - Original Message > From: Lukas Vlcek <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Thursday, 16 August, 2007 4:06

Re: Question about highlighting returning nothing

2007-08-16 Thread Lukas Vlcek
empty returns from the highlighter. > > Donna L. Gresh > Services Research, Mathematical Sciences Department > IBM T.J. Watson Research Center > (914) 945-2472 > http://www.research.ibm.com/people/g/donnagresh > [EMAIL PROTECTED] > > > > > "Lukas Vlcek" <[EMAI

Re: Question about highlighting returning nothing

2007-08-15 Thread Lukas Vlcek
Donna, I have been investigation highlighters in Lucene recently a bit. The humble experience I've learned so far is that highlighting is completely different task from indexing/searching tandem. This simple fact is not obvious to a lot of people. In your particular casue it would be helpful if yo

Re: How to keep user search history and how to turn it into information?

2007-08-13 Thread Lukas Vlcek
Enis, thanks for excellent answer! Lukas On 8/13/07, Enis Soztutar <[EMAIL PROTECTED]> wrote: > > Hi, > > Lukas Vlcek wrote: > > Enis, > > > > Thanks for your time. > > I gave a quick glance at Pig and it seems good (seems it is directly > based &

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
, ... etc) in Lucene community I am still wondering if once can use user search history data for such purpose and if the answer is yes then how (practical examples are welcomed). Lukas On 8/10/07, Enis Soztutar <[EMAIL PROTECTED]> wrote: > > > > Lukas Vlcek wrote: > > Hi Enis,

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
Hi Enis, On 8/10/07, Enis Soztutar <[EMAIL PROTECTED]> wrote: > > Hi, > > Lukas Vlcek wrote: > > Hi, > > > > I would like to keep user search history data and I am looking for some > > ideas/advices/recommendations. In general I would like to talk abo

Re: Lucene in large database contexts

2007-08-10 Thread Lukas Vlcek
Also you can look at Hibernate Search <http://search.hibernate.org/>. BR Lukas On 8/10/07, Lukas Vlcek <[EMAIL PROTECTED]> wrote: > > Hi, > did you have a chance to look at > Compass<http://www.opensymphony.com/compass/>? > It can do exactly what you want.

Re: Lucene in large database contexts

2007-08-10 Thread Lukas Vlcek
Hi, did you have a chance to look at Compass? It can do exactly what you want. Lukas On 8/10/07, Antonello Provenzano <[EMAIL PROTECTED]> wrote: > > Hi There! > > I've been working for a while on the implementation of a website > oriented to contents that woul

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
iddle tier to catch all event concerning Users Search / Hostory/ > Retrieval History/Cache Management... > Thanks, > dt > www.ejinz.com > Search News > > - Original Message - > From: "Lukas Vlcek" <[EMAIL PROTECTED]> > To: > Sent: Friday, August

How to keep user search history and how to turn it into information?

2007-08-10 Thread Lukas Vlcek
Hi, I would like to keep user search history data and I am looking for some ideas/advices/recommendations. In general I would like to talk about methods of storing such data, its structure and how to turn it into valuable information. As for the structure: == For now I don't have exac

Re: Nested Fields

2007-08-09 Thread Lukas Vlcek
Hi, Have you checked Compass framework (built on top of Lucene)? This might be interesting for you: http://www.opensymphony.com/compass/versions/1.2M3/html/core-xsem.html BR Lukas On 8/10/07, Jeff French <[EMAIL PROTECTED]> wrote: > > > Spencer, it seems i

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-08-05 Thread Lukas Vlcek
hing > that is best done without the Highlighter. > > In summary , you should use Document.getFields (more efficient if you > are getting more than one field anyway) and get around the offset issues > above. > > - Mark > > Lukas Vlcek wrote: > > Mark, > > thank you f

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Lukas Vlcek
ioned is) but it will prob cause you problems down the road. I > will look into this further. > > - Mark > > Lukas Vlcek wrote: > > Hi Lucene experts, > > > > The following is a simple Lucene code which generates > > StringIndexOutOfBoundsException exceptio

Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Lukas Vlcek
Hi Lucene experts, The following is a simple Lucene code which generates StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0 official releasse. Can anyone tell me what is wrong with this code? Is this a bug or a feature of Lucene? Any comments/hits highly welcommed! In a nutshell I

Re: multi-field and wildcard query highlighter questions

2007-07-26 Thread Lukas Vlcek
Can this be the problem? My goal is the get all the ids from document which match to Query. For example if user provides query like:[111 333] then I would like to get [111 333]. I don't want to get anything like [111 222 333]. Any idea how to do that? - Mark > > Lukas Vlcek wrote: >

multi-field and wildcard query highlighter questions

2007-07-20 Thread Lukas Vlcek
Hi, I have two questions: 1) Is it possible to get some highlighted text when using wildcard query? (I am using query rewrite) I found that it works for queries like [prefix*suffix] or [prefix?suffix] but I was not able to get results for queries like [prefix*] 2) What kind of problems I should

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Lukas Vlcek
There is a handy class in contrib/misc.../ that will show you the most frequent terms in an index. Handy dandy. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Lukas Vlcek <[EMAIL

Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Lukas Vlcek
Hi, Can anybody point me to some references how to create an ideal set of stop words? I konw that this is more like a theoretical question but how do Luceners determine which words shuold be excluded when creating Analyzers for a new languages? And which technique was used for validation of stop

Re: Merging Indeces

2007-04-16 Thread Lukas Vlcek
Hi, try to look at Compass (http://www.opensymphony.com/compass/). It is built on top of Lucene but provides additional concepts (transactions is one of them). You might find this useful depending on your needs. Regards, Lukas On 4/16/07, Erick Erickson <[EMAIL PROTECTED]> wrote: See below.

Re: why Apache doesnt create a nice forum like the others???

2007-03-27 Thread Lukas Vlcek
Eric, How do you manage Reply-to: field in your gmail? I always have to change Reply-to field in Setting (which requires more then three clicks!) and since this is a manual (and tedious) process it can introduce mistakes (mis-addressed addresses). The problem is that I am signed up to more mail-l

Re: ensuring search String availability in the content returned by lucene

2007-03-12 Thread Lukas Vlcek
Hi, I am not sure if I can help you a lot but you can check how Nutch does this (although it does not do exactly what you want). See *org.apache.nutch.summary.basic.BasicSummarizer * or *org.apache.nutch.summary.lucene.LuceneSummarizer* You should also check Highliter API ( http://lucene.apac

Re: Major Bioinformatics center adopts Lucene to help biologists search "everything"

2006-12-29 Thread Lukas Vlcek
Hi, Do you think you could give me some details about your index size? I am interested in number of documents in all indices, how indices are they distributed (are they hosted on single machine or spread all over the world), did you use any other specific technology (Compass or other framework)..

Re: Indexing clarification , please advice

2006-12-14 Thread Lukas Vlcek
Hi, May be you can consider using Compass (http://www.opensymphony.com/compass/) which could help you in your situation. They claim that some actions (like updating the index very often) are treated in a very efficient way (due to caching which is not a native part of Lucene library). Regards, L

Re: Lucene on SQL 2005

2006-12-04 Thread Lukas Vlcek
Hi, You should consider using Compass . Lukas On 12/5/06, Saroj K M <[EMAIL PROTECTED]> wrote: Dear All, I am a new user to Lucene. I am having a requirement as follows. I am using SQL Server 2005 database, The Database having a Table named --- Prod

Re: lucene - general question

2006-12-04 Thread Lukas Vlcek
On 12/4/06, Eshwaramoorthy Babu <[EMAIL PROTECTED]> wrote: Hi Lukas, Thanks for your response. I was planning to search for 1st xml ID's in 2nd XML. so I thought of using lucene for search. Can you please suggest me some scripting solution. Is perl right solution? Thanks, Babu On

Re: lucene - general question

2006-12-03 Thread Lukas Vlcek
Hi Babu, Sorry but I don't see any point in using Lucene if you don't need search functionality. Also for parsing XML files I would consider using some scripting language (as opposed to pure Java based solution). The reason is that scripting languages can be more effectire when simplicity of resu

Re: Fwd: Hibernate Lucene trademark issues

2006-11-22 Thread Lukas Vlcek
tructure. > >> What am trying to do with Hibernate Search is to keep the abstraction > as > >> light as possible. For advanced Lucene query you'll have to use pure > >> Lucene APIs, which is possible / natural with Hibernate Search > >> > >> &g

Fwd: Hibernate Lucene trademark issues

2006-11-17 Thread Lukas Vlcek
D] Hi Lukas, I'd be happy to answer your question, but I don't think Lucene dev is the appropriate area for that kind of discussion. let's move this discussion here http://forum.hibernate.org/viewforum.php?f=9 (or in the Lucene User list if you want to). Emmanuel Lukas Vlcek wrote:

Re: Sorting & SQL-Database

2006-07-03 Thread Lukas Vlcek
Hi, Looking at your problem I can think of one solution for small and *midsize* result sets. (And I have to say it may be similar to what Aleksander proposes). Write workaround query in the following form: select addfield from ( select addfield, generated_counter from table where id = 2 union

Re: Kneobase: open source, enterprise search

2006-05-02 Thread Lukas Vlcek
I was quickly looking at its web page eariler this day and it looks good so far! Good news! However, I have one question: does Kneobase contain any kind of web crawler functionality (like Nutch) or do I have to feed it with all sources *manually*? How much can be gathering of web data automated?

Re: how do I connect to the SVN repository to grab the latest source?

2006-01-03 Thread Lukas Vlcek
I use the following url: http://svn.apache.org/repos/asf/lucene/java/trunk and it works well for me. Lukas On 1/4/06, gekkokid <[EMAIL PROTECTED]> wrote: > if your using windows just download subversion from subversion.tigris.org > and install it - then just enter the command found on the lucene