Parallel SQL / calcite adapter
We are currently evaluating calcite as a SQL facade for different Data Sources - JDBC - REST >SOLR - ... I didn't found a "native" calcite adapter for solr (http://calcite.apache.org/docs/adapter.html). Is it a good idea to use the parallel sql feature (over jdbc) to connect calcite (or apache drill) to solr? Any suggestions? Thanks, Kai Gülzau
StandardTokenizer vs. hyphens
Is there some StandardTokenizer Implementation which does not break words on hyphens? I think it would be more flexible to retain hyphens and use a WordDelimiterFactory to split these tokens. StandardTokenizer today: doc1: email -> email doc2: e-mail -> e|mail doc3: e mail -> e|mail query1: email -> doc1 query2: e-mail -> doc2,doc3 query2: e mail -> doc2,doc3 StandardTokenizer which keeps hyphens + WDF: doc1: email -> email doc2: e-mail -> e-mail|email|e|mail doc3: e mail -> e|mail query1: email -> doc1,doc2 query2: e-mail -> doc1,doc2,doc3 query2: e mail -> doc2,doc3 Any suggestions to configure or code the 2nd behavior? Regards, Kai Gülzau
Keyword aware Tokenizer?
Does anybody know of a tokenizer which can be configured with (multiple) regular expressions to mark some of the input text as keyword and behave like StandardTokenizer (or UAX29URLEmailTokenizer) otherwise? Input: Does my order 4711.0815!-somecode_and.other(stuff) arrive on friday? Tokens: does|my|order|4711.0815!-somecode_and.other(stuff)|arrive|on|Friday Any pointer? How to code? Regards, Kai Gülzau
RE: Term Frequencies for Query Result
> i *think* you are saying that you want the sum of term frequencies for all > terms in all matching documents -- but i'm not sure, because i don't see > how TermVectorComponent is helping you unless you are iterating over every > doc in the result set (ie: deep paging) to get the TermVectors for every > doc ... it would help if you could explain what you mean by "counting all > frequencies manually" You are good in guessing :-) Saying "counting all frequencies manually" I think of collecting term frequencies for each term while iterating over all documents. >> I am looking for a way to get the top terms for a query result. > you have to elaborate on exactly what you mean ... how are you defining > "top terms for a query result" ? Are you talking about the most common > terms in the entire result set of documents that match your query? My goal is to show the most relevant keywords for some documents of the index. So "top terms for a query result" should be "top nouns for a filtered query". While using faceting "top" means "sorted by count of docs containing the term". When I could get the sum of the term frequencies, my hope is to be able to distinguish between too common terms and more relevant terms. Something like a score for a term based on a filtered query. regards, Kai Gülzau
RE: which analyzer is used for facet.query?
OK, "problem" solved... I my tests I only reloaded the core "master" and queried the core "slave". So config changes on "slave" where not in place :-\ Sorry guys! Kai
RE: How to make this work with SOLR ( LUCENE-2899 : Add OpenNLP Analysis capabilities as a module)
> I tried patching my SOLR 4.1 source , as well as a freshly downloaded > SOLR trunk, to no avail. I guess I just need some tips on how and what > to patch. I tried to patch the base directory as well as the lucene > directory. If there's something I need to hack in the patch, do let > me know. Try to apply the patch to trunk within eclipse. There you can see each filediff and manually change it while patching. I just ignored most of the javadoc and some other (nonfunctional) diffs and was able to produce some jars which are running (for my tests) in solr 4.1. regards, Kai
copy Field / postprocess Fields after analyze / dynamic analyzer config
I there a way to postprocess a field after analyze? Saying postprocess I think of renaming, moving or appending fields. Some more information: My schema.xml contains several language suffixed fields (nouns_de, ...). Each of these is analyzed in a language dependent way: When I do a facted search I have to include every field_lang combination since I do not know the language at query time: http://localhost:8983/solr/master/select?q=*:*&rows=0&facet=true&facet.field=nouns_de&facet.field=nouns_en&facet.field=nouns_fr&facet.field=nouns_nl ... So I have to merge all terms in my own business logic :-( Any idea / pointer to rename fields after analyze? This post says it's not possible with the current API: http://lucene.472066.n3.nabble.com/copyField-after-analyzer-td3900337.html Another approach would be to allow analyzer configuration depending on another field value (language). regards, Kai Gülzau
RE: which analyzer is used for facet.query?
> So it seems that facet.query is using the analyzer of type index. > Is it a bug or is there another analyzer type for the facet query? Nobody? Should I file a bug? Kai -Original Message- From: Kai Gülzau [mailto:kguel...@novomind.com] Sent: Tuesday, February 05, 2013 2:31 PM To: solr-user@lucene.apache.org Subject: which analyzer is used for facet.query? Hi all, which analyzer is used for the facet.query? This is my schema.xml: ... When doing a faceting search like: http://localhost:8983/solr/slave/select?q=*:*&fq=type:7&rows=0&wt=json&indent=true&facet=true&facet.query=albody_de:Klaus The UIMA whitespace tokenizer logs some infos: Feb 05, 2013 2:23:06 PM WhitespaceTokenizer process Information: "Whitespace tokenizer starts processing" Feb 05, 2013 2:23:06 PM WhitespaceTokenizer process Information: "Whitespace tokenizer finished processing" So it seems that facet.query is using the analyzer of type index. Is it a bug or is there another analyzer type for the facet query? Regards, Kai Gülzau
RE: Indexing nouns only with UIMA works - performance issue?
So with https://issues.apache.org/jira/browse/LUCENE-4749 it's possible to set the ModelFile? ??? Thanks, Kai -Original Message- From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Monday, February 04, 2013 2:47 PM To: solr-user@lucene.apache.org Subject: Re: Indexing nouns only with UIMA works - performance issue? see an example at http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/contrib/uima/src/test-files/uima/uima-tokenizers-schema.xml?view=diff&r1=1442116&r2=1442117&pathrev=1442117where the 'ngramsize' parameter is set, that's defined in AggregateSentenceAE.xml descriptor and is then set with the given actual value. HTH, Tommaso
which analyzer is used for facet.query?
Hi all, which analyzer is used for the facet.query? This is my schema.xml: ... When doing a faceting search like: http://localhost:8983/solr/slave/select?q=*:*&fq=type:7&rows=0&wt=json&indent=true&facet=true&facet.query=albody_de:Klaus The UIMA whitespace tokenizer logs some infos: Feb 05, 2013 2:23:06 PM WhitespaceTokenizer process Information: "Whitespace tokenizer starts processing" Feb 05, 2013 2:23:06 PM WhitespaceTokenizer process Information: "Whitespace tokenizer finished processing" So it seems that facet.query is using the analyzer of type index. Is it a bug or is there another analyzer type for the facet query? Regards, Kai Gülzau
RE: Indexing nouns only - UIMA vs. OpenNLP
Hi Lance, > About removing non-nouns: the OpenNLP patch includes two simple > TokenFilters for manipulating terms with payloads. The > FilterPayloadFilter lets you keep or remove terms with given payloads. yes, I used this already in the schema.xml > payloadList="NN,NNS,NNP,NNPS,FM" keepPayloads="true"/> > Works fine :-) But as Robert Muir stated in LUCENE-4345 I also think using types (and storing these optionally as payloads) would be a better approach. > http://code.google.com/p/universal-pos-tags/ Thanks for the pointer, used it to improve my english (brown) whitelist for UIMA :-) Regards, Kai Gülzau
Indexing nouns only with UIMA works - performance issue?
I now use the "stupid" way to use the german corpus for UIMA: copy + paste :-) I modified the Tagger-2.3.1.jar/HmmTagger.xml to use the german corpus ... file:german/TuebaModel.dat ... and saved it as Tagger-2.3.1.jar/HmmTaggerDE.xml Next step is to replace every occurrence of "HmmTagger" in lucene-analyzers-uima-4.1.0.jar/uima/AggregateSentenceAE.xml with "HmmTaggerDE" an save it as lucene-analyzers-uima-4.1.0.jar/uima/AggregateSentenceDEAE.xml This can be used in your schema.xml: There should be a way to accomplish this via config though. Last open issue: Performance! First run via Admin GUI analyze index value "Klaus geht in das Haus und sieht eine Maus." / query: "": ~ 5 seconds Feb 01, 2013 11:01:00 AM WhitespaceTokenizer initialize Information: "Whitespace tokenizer successfully initialized" Feb 01, 2013 11:01:02 AM WhitespaceTokenizer typeSystemInit Information: "Whitespace tokenizer typesystem initialized" Feb 01, 2013 11:01:02 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer starts processing" Feb 01, 2013 11:01:02 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer finished processing" Feb 01, 2013 11:01:02 AM WhitespaceTokenizer initialize Information: "Whitespace tokenizer successfully initialized" Feb 01, 2013 11:01:03 AM WhitespaceTokenizer typeSystemInit Information: "Whitespace tokenizer typesystem initialized" Feb 01, 2013 11:01:03 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer starts processing" Feb 01, 2013 11:01:03 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer finished processing" Feb 01, 2013 11:01:03 AM WhitespaceTokenizer initialize Information: "Whitespace tokenizer successfully initialized" Feb 01, 2013 11:01:05 AM WhitespaceTokenizer typeSystemInit Information: "Whitespace tokenizer typesystem initialized" Feb 01, 2013 11:01:05 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer starts processing" Feb 01, 2013 11:01:05 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer finished processing" Second run via Admin GUI analyze "Klaus geht in das Haus und sieht eine Maus." / query: "": ~ 4 seconds Feb 01, 2013 11:07:31 AM WhitespaceTokenizer initialize Information: "Whitespace tokenizer successfully initialized" Feb 01, 2013 11:07:32 AM WhitespaceTokenizer typeSystemInit Information: "Whitespace tokenizer typesystem initialized" Feb 01, 2013 11:07:32 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer starts processing" Feb 01, 2013 11:07:32 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer finished processing" Feb 01, 2013 11:07:32 AM WhitespaceTokenizer initialize Information: "Whitespace tokenizer successfully initialized" Feb 01, 2013 11:07:33 AM WhitespaceTokenizer typeSystemInit Information: "Whitespace tokenizer typesystem initialized" Feb 01, 2013 11:07:33 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer starts processing" Feb 01, 2013 11:07:33 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer finished processing" Feb 01, 2013 11:07:33 AM WhitespaceTokenizer initialize Information: "Whitespace tokenizer successfully initialized" Feb 01, 2013 11:07:34 AM WhitespaceTokenizer typeSystemInit Information: "Whitespace tokenizer typesystem initialized" Feb 01, 2013 11:07:34 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer starts processing" Feb 01, 2013 11:07:34 AM WhitespaceTokenizer processInformation: "Whitespace tokenizer finished processing" Initialized 3 times? I think some of the components are not reused while analyzing. Is this a known issue? Regards, Kai Gülzau -Original Message- From: Kai Gülzau [mailto:kguel...@novomind.com] Sent: Thursday, January 31, 2013 6:48 PM To: solr-user@lucene.apache.org Subject: RE: Indexing nouns only - UIMA vs. OpenNLP UIMA: I just found this issue https://issues.apache.org/jira/browse/SOLR-3013 Now I am able to use this analyzer for english texts and filter (un)wanted token types :-) Open issue -> How to set the ModelFile for the Tagger to "german/TuebaModel.dat" ??? Kai Gülzau
RE: Indexing nouns only - UIMA vs. OpenNLP
UIMA: I just found this issue https://issues.apache.org/jira/browse/SOLR-3013 Now I am able to use this analyzer for english texts and filter (un)wanted token types :-) Open issue -> How to set the ModelFile for the Tagger to "german/TuebaModel.dat" ??? OpenNLP: And a modified patch for https://issues.apache.org/jira/browse/LUCENE-2899 is now working with solr 4.1. :-) Any hints on which lib is more accurate on noun tagging? Any performance or memory issues (some OOM here while testing with 1GB via Analyzer Admin GUI)? Regards, Kai Gülzau -Original Message----- From: Kai Gülzau [mailto:kguel...@novomind.com] Sent: Thursday, January 31, 2013 2:19 PM To: solr-user@lucene.apache.org Subject: Indexing nouns only - UIMA vs. OpenNLP Hi, I am stuck trying to index only the nouns of german and english texts. (very similar to http://wiki.apache.org/solr/OpenNLP#Full_Example) First try was to use UIMA with the HMMTagger: /org/apache/uima/desc/AggregateSentenceAE.xml false false albody org.apache.uima.SentenceAnnotation coveredText albody2 - But how do I set the ModelFile to use the german corpus? - What about language identification? -- How do I use the right corpus/tagger based on the language? -- Should this be done in UIMA (how?) or via solr contrib/langid field mapping? - How to remove non nouns in the annotated field? Second try is to use OpenNLP and to apply the patch https://issues.apache.org/jira/browse/LUCENE-2899 But the patch seems to be a bit out of date. Currently I try to get it to work with solr 4.1. Any pointers appreciated :-) Regards, Kai Gülzau
Indexing nouns only - UIMA vs. OpenNLP
Hi, I am stuck trying to index only the nouns of german and english texts. (very similar to http://wiki.apache.org/solr/OpenNLP#Full_Example) First try was to use UIMA with the HMMTagger: /org/apache/uima/desc/AggregateSentenceAE.xml false false albody org.apache.uima.SentenceAnnotation coveredText albody2 - But how do I set the ModelFile to use the german corpus? - What about language identification? -- How do I use the right corpus/tagger based on the language? -- Should this be done in UIMA (how?) or via solr contrib/langid field mapping? - How to remove non nouns in the annotated field? Second try is to use OpenNLP and to apply the patch https://issues.apache.org/jira/browse/LUCENE-2899 But the patch seems to be a bit out of date. Currently I try to get it to work with solr 4.1. Any pointers appreciated :-) Regards, Kai Gülzau
Term Frequencies for Query Result
Hi, I am looking for a way to get the top terms for a query result. Faceting does not work since counts are measured as documents containing a term and not as the overall count of a term in all found documents: http://localhost:8983/solr/master/select?q=type%3A7&rows=1&wt=json&indent=true&facet=true&facet.query=type%3A7&facet.field=albody&facet.method=fc "facet_counts":{ "facet_queries":{ "type:7":156}, "facet_fields":{ "albody":[ "der",73, "in",68, "betreff",63, ... Using http://wiki.apache.org/solr/TermVectorComponent an counting all frequencies manually seems to be the only solution by now: http://localhost:8983/solr/tvrh/?q=type:7&tv.fl=albody&f.albody.tv.tf=true&wt=json&indent=true "termVectors":[ "uniqueKeyFieldName","ukey", "798_7_0",[ "uniqueKey","798_7_0", "albody",[ "der",[ "tf",5], "die",[ "tf",7], ... Does anyone know a better and more efficient solution? Regards, Kai Gülzau
RE: How to update one field without losing the others?
I'm currently playing around with a branch 4x Version (https://builds.apache.org/job/Solr-4.x/5/) but I don't get field updates to work. A simple GET testrequest http://localhost:8983/solr/master/update/json?stream.body={"add":{"doc":{"ukey":"08154711","type":"1","nbody":{"set":"mycontent" results in { "ukey":"08154711", "type":"1", "nbody":"{set=mycontent}"}] } All fields are stored. ukey is the unique key :-) type is a required field. nbody is a solr.TextField. Is there any (wiki/readme) pointer how to test and use these feature correctly? What are the restrictions? Regards, Kai Gülzau -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Saturday, June 16, 2012 4:47 PM To: solr-user@lucene.apache.org Subject: Re: How to update one field without losing the others? Atomic update is a very new feature coming in 4.0 (i.e. grab a recent nightly build to try it out). It's not documented yet, but here's the JIRA issue: https://issues.apache.org/jira/browse/SOLR-139?focusedCommentId=13269007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13269007 -Yonik http://lucidimagination.com
mailto: scheme aware tokenizer
Is there any analyzer out there which handles the mailto: scheme? UAX29URLEmailTokenizer seems to split at the wrong place: mailto:t...@example.org -> mailto:test example.org As a workaround I use mailto:"; replacement="mailto: "/> Regards, Kai Gülzau novomind AG __ Bramfelder Straße 121 • 22305 Hamburg phone +49 (0)40 808071138 • fax +49 (0)40 808071-100 email kguel...@novomind.com • http://www.novomind.com Vorstand : Peter Samuelsen (Vors.) • Stefan Grieben • Thomas Köhler Aufsichtsratsvorsitzender: Werner Preuschhof Gesellschaftssitz: Hamburg • HR B93508 Amtsgericht Hamburg
RE: DIH Strange Problem
Do you use Java 6 update 29? There is a known issue with the latest mssql driver: http://blogs.msdn.com/b/jdbcteam/archive/2011/11/07/supported-java-versions-november-2011.aspx "In addition, there are known connection failure issues with Java 6 update 29, and the developer preview (non production) versions of Java 6 update 30 and Java 6 update 30 build 12. We are in contact with Java on these issues and we will update this blog once we have more information." Should work with update 28. Kai -Original Message- From: Husain, Yavar [mailto:yhus...@firstam.com] Sent: Monday, November 28, 2011 1:02 PM To: solr-user@lucene.apache.org; Shawn Heisey Subject: RE: DIH Strange Problem I figured out the solution and Microsoft and not Solr is the problem here :): I downloaded and build latest Solr (3.4) from sources and finally hit following line of code in Solr (where I put my debug statement) : if(url != null){ LOG.info("Yavar: getting handle to driver manager:"); c = DriverManager.getConnection(url, initProps); LOG.info("Yavar: got handle to driver manager:"); } The call to Driver Manager was not returning. Here was the error!! The Driver we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded another driver called jTDS jDBC driver and installed that. Problem got fixed!!! So please follow the following steps: 1. Download jTDS jDBC driver from http://jtds.sourceforge.net/ 2. Put the driver jar file into your Solr/lib directory where you had put Microsoft JDBC driver. 3. In the data-config.xml use this statement: driver="net.sourceforge.jtds.jdbc.Driver" 4. Also in data-config.xml mention url like this: "url="jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX" 5. Now run your indexing. It should solve the problem. -Original Message- From: Husain, Yavar Sent: Thursday, November 24, 2011 12:38 PM To: solr-user@lucene.apache.org; Shawn Heisey Subject: RE: DIH Strange Problem Hi Thanks for your replies. I carried out these 2 steps (it did not solve my problem): 1. I tried setting responseBuffering to adaptive. Did not work. 2. For checking Database connection I wrote a simple java program to connect to database and fetch some results with the same driver that I use for solr. It worked. So it does not seem to be a problem with the connection. Now I am stuck where Tomcat log says: "Creating a connection for entity ." and does nothing, I mean after this log we usually get the "getConnection() took x millisecond" however I dont get that ,I can just see the time moving with no records getting fetched. Original Problem listed again: I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing data. Indexing and all was working perfectly fine. However today when I started full indexing again, Solr halts/stucks at the line "Creating a connection for entity." There are no further messages after that. I can see that DIH is busy and on the DIH console I can see "A command is still running", I can also see total rows fetched = 0 and total request made to datasource = 1 and time is increasing however it is not doing anything. This is the exact configuration that worked for me. I am not really able to understand the problem here. Also in the index directory where I am storing the index there are just 3 files: 2 segment files + 1 lucene*-write.lock file. ... data-config.xml: . . Logs: INFO: Server startup in 2016 ms Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6] Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1322041133719 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity SampleText with URL: jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, November 23, 2011 7:36 PM To: solr-user@lucene.apache.org Subject: Re: DIH Strange Problem On 11/23/2011 5:21 AM, Chantal Ackermann wrote: > Hi Yavar, > > my experience with similar problems was that there was something wrong > with the database connection or the database. > > Chantal It's also possible tha
DIH -> how to collect added/error unique keys?
Hi *, I am using DataImportHandler to do imports on a INDEX_QUEUE table (UKEY | ACTION) using a custom Transformer which adds fields from various sources depending on the UKEY. Indexing works fine this way. But now I want to delete the rows from INDEX_QUEUE which were successfully updated. -> Is there a good "API way" to do this? Right now I'm using custom RequestProcessor which collects the UIDs and calls a method on a singleton with access to the DB. It works but I hate these global singletons... :-( public void processAdd(AddUpdateCommand cmd) throws IOException { SolrInputDocument doc = cmd.getSolrInputDocument(); try { super.processAdd(cmd); addOK(doc); } catch (IOException e) { addError(doc); throw e; } catch (RuntimeException e) { addError(doc); throw e; } } Any other suggestions? Regards, Kai Gülzau
RE: Jetty logging
Hi, remove slf4j-jdk14-1.6.1.jar from the war and repack it with slf4j-log4j12.jar and log4j-1.2.14.jar instead. ->http://wiki.apache.org/solr/SolrLogging Regards, Kai Gülzau -Original Message- From: darul [mailto:daru...@gmail.com] Sent: Thursday, November 03, 2011 11:26 AM To: solr-user@lucene.apache.org Subject: Jetty logging Hello everybody, I do not find a solution on how to configure jetty with sl4j and a log4j.properties file. In I have put : - log4j-1.2.14.jar - slf4j-api-1.3.1.jar in directory: - log4j.properties At the end, nothing append when running jetty. Do you have any ideas ? Thanks, Julien -- View this message in context: http://lucene.472066.n3.nabble.com/Jetty-logging-tp3476715p3476715.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: document update / nested documents / document join
I just found another feature/ticket to be able to update fields: https://issues.apache.org/jira/browse/SOLR-2753 https://issues.apache.org/jira/browse/LUCENE-1231 -> CSF Column Stride Fields This should work well with simple fields like category/date/...!? So I have 2 options: 1.) Introduce a rather complex logic on client side to form the right join query (or do join manually), which should, as you stated, work even with complex queries. 2.) Or do it straightforward, combine all docs to one and WAIT for one of the various "update field/doc" features to be realized. I think I'll give 1.) a try and wait for 2.) if I get into trouble. Regards, Kai Gülzau -Original Message- From: Thijs [mailto:vonk.th...@gmail.com] Sent: Monday, October 17, 2011 1:22 PM To: solr-user@lucene.apache.org Subject: Re: document update / nested documents / document join Hi, First. I'm not sure you know. But the join isn't like a join in a database it's more like select * from (set of documents that match query) where exists (set of documents that match join query) I have some complex (multiple join fq) in one call and that is fine, so I think this query may work also. other wise you could try something like: q=*:*&fq={!join+from=out_ticketid+to=ticketid}(category:bugfixes+OR+out_category:bugfixes)&fq={!join+from=out_ticketid+to=ticketid}(body:answer+OR+out_body:answer) My wish would also be that this where backported to 3.x. But if not we'll probably go live on 4.x Thijs On 17-10-2011 11:46, Kai Gülzau wrote: > Nobody? > > SOLR-139 seems to be the most popular issue but I don’t think this will be > resolved in near future (this year). Right? > > So I will try SOLR-2272 as a workaround, split up my documents in "static" > and " frequently updated" > and join them at query time. > > What is the exact join query to do a query like "category:bugfixes AND > body:answer" >matching "category:bugfixes" in doc1 and >matching "body:answer" in doc3 >with just returning "doc 1"?? > > I adopted the fieldnames of > doc 3: > type: out > out_ticketid: 1001 > out_body: this is my answer > out_category: other > > q={!join+from=out_ticketid+to=ticketid}(category:bugfixes+OR+out_categ > ory:bugfixes)+AND+(body:answer+OR+out_body:answer) > > > Writing this, I doubt this syntax is even possible!? > Additionally I'm not sure if trunk with SOLR-2272 is "production ready". > > The only way to do what I want in a released 3.x version is to do several > searches and joining the results manually. > e.g. > q=category:bugfixes -> doc1 -> ticketid: 1001 q=body:answers -> > doc3 -> ticket:1001 > -> result ticketid:1001 > > This I way I would lose benefits like faceted search etc. :-\ > > Any suggestions? > > > Regards, > > Kai Gülzau > > -Original Message- > From: Kai Gülzau [mailto:kguel...@novomind.com] > Sent: Thursday, October 13, 2011 4:52 PM > To: solr-user@lucene.apache.org > Subject: document update / nested documents / document join > > Hi *, > > i am a bit confused about what is the best way to achieve my requirements. > > We have a mail ticket system. A ticket is created when a mail is received by > the system: > > doc 1: > uid: 1001_in > ticketid: 1001 > type: in > body: I have a problem > category: bugfixes > date: 201110131955 > > This incoming document is static. While the ticket is in progress there is > another document representing the current/last state of the ticket. Some > fields of this document are updated frequently: > > doc 2: > uid: 1001_out > ticketid: 1001 > type: out > body: > category: bugfixes > date: 201110132015 > > a bit later (doc 2 is deleted/updated): > doc 3: > uid: 1001_out > ticketid: 1001 > type: out > body: this is my answer > category: other > date: 201110140915 > > I would like to do a boolean search spanning multiple documents like > "category:bugfixes AND body:answer". > > I think it's the same what was proposed by: > http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-sup > port-in-lucene > > So I dig into the deeps of Lucene and Solr tickets and now i am stuck > choosing the "right" way: > > https://issues.apache.org/jira/browse/LUCENE-2454 Nested Document > query support > https://issues.apache.org/jira/browse/LUCENE-3171 > BlockJoinQuery/Collector > https://issues.apache.org/jira/browse/LUCENE-1879 Parallel incremental > indexing > https://issues.apache.org/jira/browse/SOLR-139 Support > updateable/modifiable documents > https://issues.apache.org
RE: document update / nested documents / document join
Nobody? SOLR-139 seems to be the most popular issue but I don’t think this will be resolved in near future (this year). Right? So I will try SOLR-2272 as a workaround, split up my documents in "static" and " frequently updated" and join them at query time. What is the exact join query to do a query like "category:bugfixes AND body:answer" matching "category:bugfixes" in doc1 and matching "body:answer" in doc3 with just returning "doc 1"?? I adopted the fieldnames of doc 3: type: out out_ticketid: 1001 out_body: this is my answer out_category: other q={!join+from=out_ticketid+to=ticketid}(category:bugfixes+OR+out_category:bugfixes)+AND+(body:answer+OR+out_body:answer) Writing this, I doubt this syntax is even possible!? Additionally I'm not sure if trunk with SOLR-2272 is "production ready". The only way to do what I want in a released 3.x version is to do several searches and joining the results manually. e.g. q=category:bugfixes -> doc1 -> ticketid: 1001 q=body:answers -> doc3 -> ticket:1001 -> result ticketid:1001 This I way I would lose benefits like faceted search etc. :-\ Any suggestions? Regards, Kai Gülzau -Original Message- From: Kai Gülzau [mailto:kguel...@novomind.com] Sent: Thursday, October 13, 2011 4:52 PM To: solr-user@lucene.apache.org Subject: document update / nested documents / document join Hi *, i am a bit confused about what is the best way to achieve my requirements. We have a mail ticket system. A ticket is created when a mail is received by the system: doc 1: uid: 1001_in ticketid: 1001 type: in body: I have a problem category: bugfixes date: 201110131955 This incoming document is static. While the ticket is in progress there is another document representing the current/last state of the ticket. Some fields of this document are updated frequently: doc 2: uid: 1001_out ticketid: 1001 type: out body: category: bugfixes date: 201110132015 a bit later (doc 2 is deleted/updated): doc 3: uid: 1001_out ticketid: 1001 type: out body: this is my answer category: other date: 201110140915 I would like to do a boolean search spanning multiple documents like "category:bugfixes AND body:answer". I think it's the same what was proposed by: http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene So I dig into the deeps of Lucene and Solr tickets and now i am stuck choosing the "right" way: https://issues.apache.org/jira/browse/LUCENE-2454 Nested Document query support https://issues.apache.org/jira/browse/LUCENE-3171 BlockJoinQuery/Collector https://issues.apache.org/jira/browse/LUCENE-1879 Parallel incremental indexing https://issues.apache.org/jira/browse/SOLR-139 Support updateable/modifiable documents https://issues.apache.org/jira/browse/SOLR-2272 Join If it is easily possible to update one field in a document i would just merge the two logical documents into one representing the whole ticket. But i can't see this is already possible. SOLR-2272 seems to be the best solution by now but feels like workaround. " I can't update a document field so i split it up in static and dynamic content and join both at query time." SOLR-2272 is committed to trunk/solr 4. Are there any planned release dates for solr 4 or a possible backport for SOLR-2272 in 3.x? I would appreciate any suggestions. Regards, Kai Gülzau
document update / nested documents / document join
Hi *, i am a bit confused about what is the best way to achieve my requirements. We have a mail ticket system. A ticket is created when a mail is received by the system: doc 1: uid: 1001_in ticketid: 1001 type: in body: I have a problem category: bugfixes date: 201110131955 This incoming document is static. While the ticket is in progress there is another document representing the current/last state of the ticket. Some fields of this document are updated frequently: doc 2: uid: 1001_out ticketid: 1001 type: out body: category: bugfixes date: 201110132015 a bit later (doc 2 is deleted/updated): doc 3: uid: 1001_out ticketid: 1001 type: out body: this is my answer category: other date: 201110140915 I would like to do a boolean search spanning multiple documents like "category:bugfixes AND body:answer". I think it's the same what was proposed by: http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene So I dig into the deeps of Lucene and Solr tickets and now i am stuck choosing the "right" way: https://issues.apache.org/jira/browse/LUCENE-2454 Nested Document query support https://issues.apache.org/jira/browse/LUCENE-3171 BlockJoinQuery/Collector https://issues.apache.org/jira/browse/LUCENE-1879 Parallel incremental indexing https://issues.apache.org/jira/browse/SOLR-139 Support updateable/modifiable documents https://issues.apache.org/jira/browse/SOLR-2272 Join If it is easily possible to update one field in a document i would just merge the two logical documents into one representing the whole ticket. But i can't see this is already possible. SOLR-2272 seems to be the best solution by now but feels like workaround. " I can't update a document field so i split it up in static and dynamic content and join both at query time." SOLR-2272 is committed to trunk/solr 4. Are there any planned release dates for solr 4 or a possible backport for SOLR-2272 in 3.x? I would appreciate any suggestions. Regards, Kai Gülzau
RE: Multiple indexes
> > (for example if you need separate TFs for each document type). > > I wonder if in this precise case it wouldn't be pertinent to > have a single index with the various document types each > having each their own fields set. Isn't TF calculated field by field ? Oh, you are right :) So i will start testing with one "mixed type" index and perhaps use IndexReaderFactory afterwards in comparison. Thanks, Kai Gülzau
RE: Multiple indexes
Are there any plans to support a kind of federated search in a future solr version? I think there are reasons to use seperate indexes for each document type but do combined searches on these indexes (for example if you need separate TFs for each document type). I am aware of http://wiki.apache.org/solr/DistributedSearch and a workaround to do federated search with sharding http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set but this seems to be too much network- and maintenance overhead. Perhaps it is worth a try to use an IndexReaderFactory which returns a lucene MultiReader!? Is the IndexReaderFactory still Experimental? https://issues.apache.org/jira/browse/SOLR-1366 Regards, Kai Gülzau > -Original Message- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: Wednesday, June 15, 2011 8:43 PM > To: solr-user@lucene.apache.org > Subject: Re: Multiple indexes > > Next, however, I predict you're going to ask how you do a 'join' or > otherwise query accross both these cores at once though. You can't do > that in Solr. > > On 6/15/2011 1:00 PM, Frank Wesemann wrote: > > You'll configure multiple cores: > > http://wiki.apache.org/solr/CoreAdmin > >> Hi. > >> > >> How to have multiple indexes in SOLR, with different fields and > >> different types of data? > >> > >> Thank you very much! > >> Bye. > > > > >
RE: Is there anything like MultiSearcher?
Hi Roman, do you have solved your problem and how? Regards, Kai Gülzau > -Original Message- > From: Roman Chyla [mailto:roman.ch...@gmail.com] > Sent: Saturday, February 05, 2011 4:50 PM > To: solr-user@lucene.apache.org > Subject: Is there anything like MultiSearcher? > > Dear Solr experts, > > Could you recommend some strategies or perhaps tell me if I approach > my problem from a wrong side? I was hoping to use MultiSearcher to > search across multiple indexes in Solr, but there is no such a thing > and MultiSearcher was removed according to this post: > http://osdir.com/ml/solr-user.lucene.apache.org/2011-01/msg00250.html > > I though I had two use cases: > > 1. maintenance - I wanted to build two separate indexes, one for > fulltext and one for metadata (the docs have the unique ids) - > indexing them separately would make things much simpler > 2. ability to switch indexes at search time (ie. for testing purposes > - one fulltext index could be built by Solr standard mechanism, the > other by a rather different process - independent instance of lucene) > > I think the recommended approach is to use the Distributed search - I > found a nice solution here: > http://stackoverflow.com/questions/2139030/search-multiple-sol r-cores-and-return-one-result-set > - however it seems to me, that data are sent over HTTP (5M from one > core, and 5M from the other core being merged by the 3rd solr core?) > and I would like to do it only for local indexes and without the > network overhead. > > Could you please shed some light if there already exist an optimal > solution to my use cases? And if not, whether I could just try to > build a new SolrQuerySearcher that is extending lucene MultiSearcher > instead of IndexSearch - or you think there are some deeply rooted > problems there and the MultiSearch-er cannot work inside Solr? > > Thank you, > > Roman > >