Re: lucene - nutch to rss?

2005-04-13 Thread Michael Wechner
. Re the XSLT one then could offer various XSLTs in order to cover the various RSS and Atom formats (and other XMLs of course). Michi -- Michael Wechner Wyona Inc. - Open Source Content Management - Apache Lenya http://www.wyona.com http://lenya.apache.org [EMAIL

Re: Lucene job

2006-03-17 Thread Michael Wechner
] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Michael Wechner Wyona - Open Source Content Management -Apache Lenya http://www.wyona.com http://lenya.apache.org [EMAIL

Re: indexing emails

2006-06-19 Thread Michael Wechner
in the index (see span queries and term positions). - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Michael Wechner Wyona - Open Source Content Management -Apache Lenya

Re: HTML text extraction

2006-06-22 Thread Michael Wechner
is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Michael Wechner Wyona - Open Source Content Management -Apache Lenya http://www.wyona.com

Re: Tomcat Simple Example

2006-08-23 Thread Michael Wechner
Mag Gam wrote: Hi All, Does anyone have a simple Tomcat search/result example? you mean like the war file of Nutch? Michi I have 4 text files, i would like to index. Thanks -- Michael Wechner Wyona - Open Source Content Management -Apache Lenya http://www.wyona.com

Re: Tomcat Simple Example

2006-08-24 Thread Michael Wechner
with the Lucene distribution... http://lucene.apache.org/java/docs/gettingstarted.html -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Michael Wechner Wyona

Re: Tomcat Simple Example

2006-08-24 Thread Michael Wechner
Erik Hatcher wrote: On Aug 24, 2006, at 3:29 AM, Michael Wechner wrote: As an alternative I would rather suggest that one generates a well- defined XML with JSP or a servlet and then applies an XSLT. If somebody is afraid of performance issues then one might want to consider generating

Re: Please Help me

2007-02-13 Thread Michael Wechner
commands, e-mail: [EMAIL PROTECTED] -- Michael Wechner Wyona - Open Source Content Management -Apache Lenya http://www.wyona.com http://lenya.apache.org [EMAIL PROTECTED][EMAIL PROTECTED] +41 44 272 91 61

Re: help!!!!

2007-03-11 Thread Michael Wechner
Michael thanks regards ashwin -- Michael Wechner Wyona - Open Source Content Management -Apache Lenya http://www.wyona.com http://lenya.apache.org [EMAIL PROTECTED][EMAIL PROTECTED] +41 44 272 91 61

Re: Adding attribute to index

2008-04-02 Thread Michael Wechner
the sender by reply e-mail and delete all copies of this message. -- Michael Wechner Wyona - Open Source Content Management - Yanel, Yulup http://www.wyona.com [EMAIL PROTECTED], [EMAIL PROTECTED] +41 44 272 91 61

Re: Appending to index

2008-04-07 Thread Michael Wechner
? IndexWriter.updateDocument(...) HTH Michael Thanks, Nitasha Walia Software Engineer, Cisco Systems -- Michael Wechner Wyona - Open Source Content Management - Yanel, Yulup http://www.wyona.com [EMAIL PROTECTED], [EMAIL PROTECTED] +41 44 272 91 61

Re: sharing SearchIndexer

2008-09-26 Thread Michael Wechner
Ian Lea schrieb: Simon There is nothing in lucene to detect that an index has changed and automagically reopen an IndexReader. You can do the notification from your indexing thread, or every nnn mins, or whatever makes sense for your application. Note that IndexReader.reopen() does nothing

performance/scalability issues re filtering of protected search results

2008-11-10 Thread Michael Wechner
Hi We have about 1 mio documents and growing within a hierarchical order (3 to 20 deep) and about 3000 people accessing these nodes, whereas some people have access to certain branches and other people to other branches and some branches are shared. The access control of these nodes is

Re: performance/scalability issues re filtering of protected search results

2008-11-10 Thread Michael Wechner
about it, then I would very much appreciate any concrete URLs/pointers. Thanks Michael Best Erick On Mon, Nov 10, 2008 at 2:52 PM, Michael Wechner [EMAIL PROTECTED]wrote: Hi We have about 1 mio documents and growing within a hierarchical order (3 to 20 deep) and about 3000 people accessing

Re: first time using lucene

2009-01-21 Thread Michael Wechner
nitin gopi schrieb: Hello , I have recently started downloaded lucene. My project is to add LSI(Latent Semantic Indexing) to the indexing method of the lucene, to improve the indexing of documents. I am totally new into this field. Please help me in this matter and guide me how to proceed in the

Re: Crawler

2009-01-30 Thread Michael Wechner
Jay Malaluan schrieb: Hi, You can check out Nutch at http://lucene.apache.org/nutch/. also see http://incubator.apache.org/projects/droids.html Cheers Michael Regards, Jay Joel Malaluan Haroldo Nascimento-2 wrote: Hi, There is any crawler that integrate with index lucene ?

Re: Lucene OpenCms search - Xpath notation?

2009-01-30 Thread Michael Wechner
Kesarkar, Dipak schrieb: Hi, I am using OpenCms 7.0.5 with Lucene search engine. I need to index XML content for which I have a following field configuration in the opencms-search.xml unfortunately I don't have any knowledge re OpenCMS, but I think you rather want to ask there (or

Re: Parsing large xml files

2009-05-22 Thread Michael Wechner
crack...@comcast.net schrieb: http://vtd-xml.sf.net - Original Message - From: Sithu D. Sudarsan sithu.sudar...@fda.hhs.gov To: java-user@lucene.apache.org Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific Subject: Parsing large xml files Hi, While trying

Re: Parsing large xml files

2009-05-22 Thread Michael Wechner
idea of breaking into smaller chunks have worked for now... Sincerely, Sithu D Sudarsan -Original Message- From: Michael Wechner [mailto:michael.wech...@wyona.com] Sent: Friday, May 22, 2009 4:48 AM To: java-user@lucene.apache.org Subject: Re: Parsing large xml files crack

Re: How to tune Analyzer for Text Extraction

2009-08-11 Thread Michael Wechner
xs2Abhishek schrieb: Hi, I am trying to make a decision on weather or not I can use Lucene for my requirements, which mainly include data tagging. I have to be able to parse or index a .txt file and then be able to extract text accordingly. For e.g if the input document has some text like:

[JOB] Java/Lucene/Nutch developer in Zurich, Switzerland

2010-01-12 Thread Michael Wechner
Dear Developers We are looking for Java/Lucene/Nutch developers with over 2-3 years of experience for a project we are currently working on. The location is Zurich, Switzerland onsite and the job is as employee or contractor. Please reply me privately with your contact details and

Re: Is it possible to update only selected fields in a document ?

2011-03-22 Thread Michael Wechner
On 3/22/11 8:40 AM, shrinath.m wrote: On Tue, Mar 22, 2011 at 12:39 PM, Anshum-2 [via Lucene] ml-node+2713899-1210341880-376...@n3.nabble.com wrote: No as of now, there's no way to do so. Thank you Anshum-2, how do you propose I do this ? I have thought of a way like this : - first get the

Re: Is it possible to update only selected fields in a document ?

2011-03-22 Thread Michael Wechner
On 3/22/11 10:09 AM, shrinath.m wrote: On Tue, Mar 22, 2011 at 1:37 PM, Michael Wechner [via Lucene] ml-node+2714008-984126374-376...@n3.nabble.com wrote: are you looking for something like http://hrycan.com/2009/11/26/updating-document-fields-in-lucene/ ? Precisely that. I am OK

Re: Lucene Simple Project

2011-06-19 Thread Michael Wechner
Am 18.06.11 19:05, schrieb Steven A Rowe: Hi Hamada, Do you know about the Lucene demo?: http://lucene.apache.org/java/3_2_0/demo.html also you might want to use http://code.google.com/p/luke/ in order to view your search index and check what fields it actually contains HTH Michael

How to use setWriteLockTimeout(long) when write.lock already exists

2011-12-13 Thread Michael Wechner
Hi According to http://www.gossamer-threads.com/lists/lucene/java-dev/37421 one cannot overwrite the default write lock timeout of 1000ms once a write.lock already exists (for example inside a multi-threaded web-application), because in order to use the method setWriteLockTimeout(long)

Re: is it possible to index wiki markup files?

2012-01-11 Thread Michael Wechner
Maybe Tika is also of help to you http://tika.apache.org/ HTH Michael Am 11.01.12 20:13, schrieb Reyna Melara: Hi, my name is Reyna Melara I'm a PhD student form Mexico, and I have a set of 11,051,447 files with txt extension but the content of each file is in fact in wiki format, I want and

Re: Storing Documents in Lucene

2013-03-29 Thread Michael Wechner
you also might like to consider Jackrabbit: http://jackrabbit.apache.org/ or Yarep: https://github.com/wyona/yarep which are both using Lucene for indexing, but the actual data storage is hidden by an abstraction layer and is configurable/customizable. HTH Michael Am 29.03.13 02:24,

Compiling and running Lucene/Solr based on github does not seem to work

2014-12-04 Thread Michael Wechner
Hi I have cloned the github version of Lucene/Solr yesterday https://github.com/apache/lucene-solr and was running ant compile ant test successfully. Also Jetty seems to startup fine, but when I access http://localhost:8983/solr/ then I receive HTTP ERROR: 503 Problem accessing

Re: Compiling and running Lucene/Solr based on github does not seem to work

2014-12-05 Thread Michael Wechner
thanks very much for your help. I will use the solr mailing list for future solr related questions. After running ant example ant run-example inside the solr folder, I was able to access http://localhost:8983/solr without a problem. I think it would make sense to change the main README and

Lucene FAQ as CSV to train DeepPavlov

2019-12-26 Thread Michael Wechner
Hi I would like to train "DeepPavlov FAQ" http://docs.deeppavlov.ai/en/master/features/skills/faq.html https://colab.research.google.com/github/deepmipt/dp_notebooks/blob/master/DP_autoFAQ.ipynb

Is deleting with IndexReader still possible?

2021-06-17 Thread Michael Wechner
Hi According to the FAQ one can delete documents using the IndexReader https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIdeletedocumentsfromtheindex? but when I look at the javadoc of Lucene version 8_8_2

Re: Is deleting with IndexReader still possible?

2021-06-17 Thread Michael Wechner
cool, thanks very much for your quick response and updating the FAQ! Am 17.06.21 um 10:28 schrieb Adrien Grand: Good catch Michael, removing from IndexReader has actually been removed a long time ago. I just edited the FAQ to correct this. On Thu, Jun 17, 2021 at 10:08 AM Michael Wechner

Re: Lucene/Solr and BERT

2021-05-19 Thread Michael Wechner
that index and search Glove vectors. My first impression was that indexing seems surprisingly slow, but it's entirely possible I'm doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: Hi I recently found the following articles re Lucene/Solr and BERT https://dmitry

Index backwards compatibility

2021-05-26 Thread Michael Wechner
Hi I am using Lucene 8.8.2 in production and I am currently doing some tests using 9.0.0-SNAPSHOT, whereas I have included lucene-backward-codecs, because in the log files it was asking me whether I have forgotten to include lucene-backward-codecs.jar         org.apache.lucene  

Re: Lucene/Solr and BERT

2021-05-26 Thread Michael Wechner
Hi Alex Thank you very much for your feedback and the various insights! Am 26.05.21 um 04:41 schrieb Alex K: Hi Michael and others, Sorry just now getting back to you. For your three original questions: - Yes, I was referring to the Lucene90Hnsw* classes. Michael S. had a thorough response.

Re: Lucene/Solr and BERT

2021-05-24 Thread Michael Wechner
ectors existing in Lucene) can support alternative KNN implementations. On Wed, May 19, 2021 at 12:22 PM Michael Wechner wrote: Hi Alex Just to make sure I understand better what the additions are about Am 21.04.21 um 17:21 schrieb Alex K: There were a couple additions recently merged into luc

Re: Lucene/Solr and BERT

2021-05-24 Thread Michael Wechner
sure the VectorFormat API (might still get renamed due to confusion with other kinds of vectors existing in Lucene) can support alternative KNN implementations. On Wed, May 19, 2021 at 12:22 PM Michael Wechner wrote: Hi Alex Just to make sure I understand better what the additions are about Am

Re: Index backwards compatibility

2021-05-27 Thread Michael Wechner
I have added a QnA https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-WhenIupradeLucene,forexamplefrom8.8.2to9.0.0,doIhavetoreindex? Hope that makes sense, otherwise let me know and I can correct/update :-) Am 26.05.21 um 23:56 schrieb Michael Wechner: using lucene

Re: Index backwards compatibility

2021-05-27 Thread Michael Wechner
that if possible you *should* update because the 8.x index may not be able to be read by the eventual 10 release. On Thu, May 27, 2021 at 7:52 AM Michael Wechner wrote: I have added a QnA https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-WhenIupradeLucene,forexamplefrom8.8.2to9.0.0

Re: Index backwards compatibility

2021-05-26 Thread Michael Wechner
: I think you need backward-codecs-9.0.0-SNAPSHOT there. It enables 9.0 to read 8.x indexes. On Wed, May 26, 2021 at 9:27 AM Michael Wechner wrote: Hi I am using Lucene 8.8.2 in production and I am currently doing some tests using 9.0.0-SNAPSHOT, whereas I have included lucene-backward-codecs

Re: Lucene/Solr and BERT

2021-05-27 Thread Michael Wechner
, and searching, performance, you should generally index as large a number of documents as possible before flushing. -Mike On Wed, May 26, 2021 at 9:43 AM Michael Wechner wrote: Hi Alex Thank you very much for your feedback and the various insights! Am 26.05.21 um 04:41 schrieb Alex K: Hi Michael

Re: Negation search help

2021-04-29 Thread Michael Wechner
Yes, it would be great if you could share code snippets. Maybe it will help others or maybe someone will have a suggestion to improve or an alternative. All the best Michael Am 29.04.21 um 14:35 schrieb amitesh116: Thank you Michael! I solved this requirement by setting the tokenStream at

Re: Negation search help

2021-04-28 Thread Michael Wechner
Hi Amitesh I don't have statistical proof , but I think it doesn't help on mailing lists with volunteeers to write "I badly need some help", because it seems to me the contrary will happen, that people will not help at all. I think there are various reasons for this behaviour, which is

Re: Negation search help

2021-04-28 Thread Michael Wechner
Hi Amitesh Thanks for the more concrete examples. Unfortunately I do not know how to solve this better with Lucene itself in a more general context, but did you ever consider using BERT in combination with Lucene/Solr https://blog.google/products/search/search-language-understanding-bert/

Re: Use Case clarification

2021-04-05 Thread Michael Wechner
Hi The following FAQ might be a bit outdated, but nevertheless you should find some answers there as well https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ For example to answer your question 4) see

Re: Use Case clarification

2021-04-05 Thread Michael Wechner
rch engine as a personal project . On Mon, 5 Apr 2021, 10:57 Michael Wechner, wrote: Hi The following FAQ might be a bit outdated, but nevertheless you should find some answers there as well https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ For example to answer your question 4) s

Lucene/Solr and BERT

2021-04-21 Thread Michael Wechner
Hi I recently found the following articles re Lucene/Solr and BERT https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28 https://medium.com/swlh/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559 and would like to ask whether there might be more recent developments

Re: Lucene/Solr and BERT

2021-04-21 Thread Michael Wechner
. There are some test suites that index and search Glove vectors. My first impression was that indexing seems surprisingly slow, but it's entirely possible I'm doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: Hi I recently found the following articles re Lucene/Solr and BERT

Re: hello~~i have a question

2021-08-02 Thread Michael Wechner
I don't know either, whereas I searched  a little and found various good explanations what segments are, e.g. https://www.alibabacloud.com/blog/analysis-of-lucene---basic-concepts_594672 but not in which order the segments are being read. I am nore sure where in the code the segments are

Re: Search while typing (incremental search)

2021-10-08 Thread Michael Wechner
Michael On Thu, Oct 7, 2021 at 2:30 AM Michael Wechner wrote: Thanks very much for your feedback! I will try it :-) As I wrote I would like to add a summary to the Lucene FAQ (https://cwiki.apache.org/confluence/display/lucene/lucenefaq) Would the following questions make sense

Search while typing (incremental search)

2021-10-06 Thread Michael Wechner
Hi I am trying to implement a search with Lucene similar to what for example various "Note Apps" (e.g. "Google Keep" or "Samsung Notes") are offering, that with every new letter typed a new search is being executed. For example when I type "tes", then all documents are being returned

Re: Search while typing (incremental search)

2021-10-07 Thread Michael Wechner
ults (perhaps because you are gonna just sort by some custom document feature instead of relevance), then you can do that if you really want. You can use the n-gram/edge-ngram/shingle filters in the analysis package for that. On Wed, Oct 6, 2021 at 5:37 PM Michael Wechner wrote: Hi I am trying t

Re: Question about using Lucene to search source code

2021-12-20 Thread Michael Wechner
Hi Yuxin Can you provide a concrete example of a query and a document/code snippet? Thanks Michael Am 20.12.21 um 03:06 schrieb Yuxin Liu: Dear development community of Lucene: Hi from student research assistant Yuxin Liu. I'm using Lucene to build an index search for source code indexes

Autosuggest/Autocomplete: What are the best practices to build Suggester?

2021-11-18 Thread Michael Wechner
Hi I recently started to use the Autosuggest/Autocomplete package as suggested by Robert https://www.mail-archive.com/java-user@lucene.apache.org/msg51403.html which works very fine, thanks again for your help :-) But it is not clear to me what are the best practices building a suggester

Re: Autosuggest/Autocomplete: What are the best practices to build Suggester?

2021-11-18 Thread Michael Wechner
("contract search","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1)); entities.add(new Item("claims management system","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1)); suggester.build(new ItemIterator(entities.iterator())); ) I

Re: Search while typing (incremental search)

2021-10-27 Thread Michael Wechner
are according to Lucene 8.10.1 suggest API If you know any simple, recent examples, please let me know Thanks Michael Am 08.10.21 um 21:40 schrieb Michael Wechner: Am 08.10.21 um 18:49 schrieb Michael Sokolov: Thank you for offering to add to the FAQ! Indeed it should mention the suggester

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner
Hi Together I would be interested to submit a proposal/presentation re Lucene's vector search,  but would like to ask first whether somebody else wants to do this as well or might be interested to do this together? Thanks Michael Am 30.03.22 um 14:16 schrieb Rich Bowen: [You are receiving

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner
Hi Together I would be interested to submit a proposal/presentation re Lucene's vector search,  but would like to ask first whether somebody else wants to do this as well or might be interested to do this together? Thanks Michael Am 30.03.22 um 14:16 schrieb Rich Bowen: [You are receiving

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-31 Thread Michael Wechner
for helping spread the word about Lucene's new vector search capabilities! On Thu, Mar 31, 2022 at 7:36 AM Michael Wechner wrote: ok :-) thanks! Anyway, if somebody would like to join re a "vector search" proposal, please let me know Michael Am 30.03.22 um 20:13 schrieb Anshum

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner
e reviewed independently and if there is another proposals that clashes, the abstract would help the program committee pick the one (or both) that's best suited for the audience. Good luck! -Anshum On Wed, Mar 30, 2022 at 5:47 AM Michael Wechner wrote: Hi Together I would be interested to submit

Re: Need help on defining custom scorer in Lucene 9

2022-04-03 Thread Michael Wechner
Hi Lokesh IIUC each document (like for example a shop description) has a longitude and a latitude associated with. The user search input are some keywords and the the user's geo location. The keywords you use to search for the documents and the users's geo location you would like to use for

Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael Wechner
Hi Together You might be interesed in this paper / article https://arxiv.org/abs/2308.14963 Thanks Michael - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-11-07 Thread Michael Wechner
vectors format like a delegator as descirbed before. The responsibility was shifted to the codec, because there may be better alternatives to HNSW that have different limits especially with regard to performance during merging and query response times, e.g. BKD trees. Uwe Am 19.10.2023 um 10:53 sch

Re: When to use StringField and when to use FacetField for categorization?

2023-10-23 Thread Michael Wechner
to do the same thing in the faceting module, and maybe our documentation could be a bit more helpful. Cheers, -Greg On Fri, Oct 20, 2023 at 2:54 PM Michael Wechner wrote: thanks very much for this additional information, Marc! Am 20.10.23 um 20:30 schrieb Marc D'Mello: Just following up on Mike's

Re: How to replace deprecated document(i)

2023-09-24 Thread Michael Wechner
) - Shubham On Mon, Sep 25, 2023 at 1:59 AM Michael Wechner wrote: Hi I recently noctived that IndexReader.document(int) is deprecated, whereas my code is currently TopDocs topDocs = searcher.search(query, k); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { Document doc

How to replace deprecated document(i)

2023-09-24 Thread Michael Wechner
Hi I recently noctived that IndexReader.document(int) is deprecated, whereas my code is currently TopDocs topDocs = searcher.search(query, k); for (ScoreDoc scoreDoc : topDocs.scoreDocs) {     Document doc = indexReader.document(scoreDoc.doc); } How do I best replace document(int)? Thanks

Re: How to replace deprecated document(i)

2023-09-25 Thread Michael Wechner
);    }    ```    Note that these StoredFields and TermVectors instances should only    be consumed in the thread where    they were acquired. For instance, it is illegal to share them across    threads. Uwe Am 25.09.2023 um 07:53 schrieb Michael Wechner: Hi Shubham Great, thank you very much

Re: How to replace deprecated document(i)

2023-09-25 Thread Michael Wechner
that gives the missing information in 9.x Javadocs, too. Uwe Am 25.09.2023 um 11:02 schrieb Michael Wechner: you mean once per search request? I mean for example GET https://localhost:8080/search?q=Lucene and the following would be executed IndexReader reader = DirectoryReader.open

Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner
Hi I recently upgraded Lucene to 9.8.0 and was running tests with OpenAI's embedding model, which has the vector dimension 1536 and received the following error Field[vector]vector's dimensions must be <= [1024]; got 1536 wheres this worked previously with the hack to override the vector

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner
s. Uwe Am 19.10.2023 um 10:53 schrieb Michael Wechner: I forgot to mention, that when using the custom FieldType and 1536 vector dimension does work with Lucene 9.7.0 Thanks Michael Am 19.10.23 um 10:39 schrieb Michael Wechner: Hi I recently upgraded Lucene to 9.8.0 and was r

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner
I forgot to mention, that when using the custom FieldType and 1536 vector dimension does work with Lucene 9.7.0 Thanks Michael Am 19.10.23 um 10:39 schrieb Michael Wechner: Hi I recently upgraded Lucene to 9.8.0 and was running tests with OpenAI's embedding model, which has the vector

When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
Hi I have found the following simple Facet Example https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java whereas for a simple categorization of documents I currently use StringField, e.g. doc1.add(new StringField("category",

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
by different points/levels of your hierarchy. Mike McCandless http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner wrote: > Hi > > I have found the following simple Facet Example > > > https://github.com/

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
less http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner wrote: Hi I have found the following simple Facet Example https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java whereas for a simple categorization of d

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
omyWriter). It used to be that the "doc values" based faceting did not support arbitrary hierarchy, but I think that was fixed at some point. Mike McCandless http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner wrote: Hi Mike Thanks for your feedback! II

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
you need to create a TaxonomyWriter). It used to be that the "doc values" based faceting did not support arbitrary hierarchy, but I think that was fixed at some point. Mike McCandless http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner < michael.wech

Re: New user questions about demo, downloads, and IRC

2022-04-25 Thread Michael Wechner
Hi Bridger Inside https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0.tgz you should find modules/lucene-core-9.1.0.jar modules/lucene-queryparser-9.1.0.jar modules/lucene-analysis-common-9.1.0.jar modules/lucene-demo-9.1.0.jar I guess the documentation is not quite right. Re your

Re: New user questions about demo, downloads, and IRC

2022-04-25 Thread Michael Wechner
PR Thanks Michael Am 25.04.22 um 23:37 schrieb Michael Wechner: Hi Bridger Inside https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0.tgz you should find modules/lucene-core-9.1.0.jar modules/lucene-queryparser-9.1.0.jar modules/lucene-analysis-common-9.1.0.jar modules/lucene-dem

Re: New user questions about demo, downloads, and IRC

2022-04-26 Thread Michael Wechner
great, thanks! Am 26.04.22 um 21:48 schrieb Michael Sokolov: thanks, I fixed the doc! On Tue, Apr 26, 2022 at 9:13 AM Bridger Dyson-Smith wrote: Hi Michael - On Mon, Apr 25, 2022 at 5:38 PM Michael Wechner wrote: Hi Bridger Inside https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-20 Thread Michael Wechner
rch(query, k); Does that make sense to you? Thanks Michael Am 11.05.22 um 07:59 schrieb Michael Wechner: Hi Julie Cool, thanks! I try to apply it and if it works could create an example to the demo package. Will keep you posted :-) Thanks Michael Am 11.05.22 um 02:13 schrieb Julie Tibshi

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-24 Thread Michael Wechner
AM Michael Wechner Hi Julie I got it running and it seems to work fine so far :-) Re an example for the demo package, I guess this would go here https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/package-summary.html and I thought of something like

Re: Auto-complete in Lucene

2022-05-25 Thread Michael Wechner
we are using  AnalyzingInfixSuggester but I would also be curious to know whether this is the best way :-) Thanks Michael Am 25.05.22 um 14:39 schrieb Anastasiya Tarasenko: Hi All, I have a question regarding auto-complete functionality in Lucene. On the StackOverflow the suggestion

Re: Multi-Value query test

2022-06-23 Thread Michael Wechner
Maybe I misunderstand the problem, but why don't you decouple showing the results from the results of the query? Am 23.06.22 um 14:03 schrieb Patrick Bernardina: How to test if a value in a multi-value field matches a specific query? Example of the problem: I've created a query to return all

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-10 Thread Michael Wechner
if you run into any questions/ issues while trying it out! Julie On Mon, May 9, 2022 at 8:08 AM Michael Wechner wrote: sorry for the URLs below. I have tested Twilio SendGrid as outgoing server and it just rewrote the URLs https://issues.apache.org/jira/browse/SOLR-15947 https://issues.apache.or

Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great I have found https://issues.apache.org/jira/browse/SOLR-15947 https://issues.apache.org/jira/browse/LUCENE-10382 and

Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great :-) I have found

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
-summary.html which I was not aware of, but disabled the tracking now and hope it will be ok now. Thanks Michael Am 09.05.22 um 15:12 schrieb Michael Wechner: Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great :-) I have found http://url7093

Re: How to filter KnnVectorQuery with multiple terms?

2022-09-01 Thread Michael Wechner
u can also pass a BooleanQuery with multiple terms or a combination of other queries, a numeric range,... or a fulltext query out of Lucene's query parsers. Uwe Am 31.08.2022 um 22:19 schrieb Michael Wechner: Hi Matt Thanks very much for your feedback! According to your links I will try Collec

How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Michael Wechner
Hi I am currently filtering a KnnVectorQuery as follows Query filter =new TermQuery(new Term(CLASSIFICATION_FIELD, classification)); query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, filter); but it is not clear to me how I can filter for multiple terms. Should I subclass MultiTermQuery

Re: How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Michael Wechner
a BooleanQuery.Builder. As noted in TermsInSetQuery ( https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java#L62) multiple terms could be represented as a boolean query with Occur.SHOULD. ~Matt On Wed, Aug 31, 2022 at 11:15 AM Michael Wechner wrote

Will ApacheCon North America 2022 sessions also be published on YouTube?

2022-10-16 Thread Michael Wechner
Hi I just noticed that the ApacheCon Asia 2022 have been published on YouTube https://apachecon.com/ https://www.youtube.com/c/TheApacheFoundation/playlists Will this also happen for ApacheCon North America 2022? Thanks Michael

Re: Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-10-01 Thread Michael Wechner
parameters? If so, there is no better way than what you are doing. Le sam. 1 oct. 2022, 12:31, Michael Wechner a écrit : Hi Adrien Thank you very much for your help! That was it :-) I completely forgot that I set this somewhere hidden inside my code. I made a note in the pom file, such that I should

Re: Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-10-01 Thread Michael Wechner
? Thanks Michael Am 01.10.22 um 08:06 schrieb Adrien Grand: I would guess that you are configuring your IndexWriterConfig with a "Lucene91Codec" instance. You need to replace it with a "Lucene94Codec" instance. Le sam. 1 oct. 2022, 06:12, Michael Wechner a écrit : Hi I hav

Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-09-30 Thread Michael Wechner
Hi I have just upgraded from 9.1.0 to 9.4.0 and compiling works fine, but when I run and re-index my data using KnnVectorField, then I receive the following exception: java.lang.UnsupportedOperationException: Old codecs may only be used for reading     at

Latency and recall re HSWN: Lucene versus Vespa

2022-10-01 Thread Michael Wechner
Hi Together I just read the following article, where the author compares Lucene and Vespa re HSWN https://bergum.medium.com/will-new-vector-databases-dislodge-traditional-search-engines-b4fdb398fb43 What is your take on "comparing Lucene and Vespa re HSWN latency and recall"? Thanks

Re: Latency and recall re HSWN: Lucene versus Vespa

2022-10-01 Thread Michael Wechner
is a compromise. We've known for centuries that "Odyous of olde been comparisonis, And of comparisonis engendyrd is haterede." On Sat, Oct 1, 2022 at 7:18 AM Michael Wechner wrote: Hi Together I just read the following article, where the author compares Lucene and Vespa re HSWN https://bergum.

Re: [ANNOUNCE] Apache Lucene 9.4.0 released

2022-09-30 Thread Michael Wechner
great, thank you very much! Just in time for ApacheCon :-) Am 01.10.22 um 00:09 schrieb Michael Sokolov: The Lucene PMC is pleased to announce the release of Apache Lucene 9.4.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a

Re: Question for SynonymQuery

2022-12-28 Thread Michael Wechner
Hi Anh The following Stackoverflow link might help https://stackoverflow.com/questions/73240494/can-someone-assist-me-with-a-multi-word-synonym-problem-in-lucene The following thread seems to confirm, that escaping the space with a backslash does not help

Re: Question for SynonymQuery

2023-01-02 Thread Michael Wechner
onymQuery; I have just used the standard QueryParser. Instead the synonym processing occurs in the indexing phase, which is not only simpler (one search pattern, one query), but also I think you would also find it gives you superior performance (because the synonym processing occurs once at indexing time

What exactly returns IndexReader.numDeletedDocs()

2022-12-07 Thread Michael Wechner
Hi I am using Lucen 9.4.2 vector search and everything seems to work fine, except that when I delete some documents from the index, then the method https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs() always returns 0, whereas I would have

  1   2   >