Re: Solr Admin Interface, reworked - Go on? Go away?
Hey Guys, you're completly right :) Will clean up the existing Code a little bit, and create a JIRA-Ticket. On Wed, Mar 2, 2011 at 11:32 PM, Chris Hostetter hossman_luc...@fucit.org wrote: If you run into any issues where you can't replicate something in the existing JSPs (or accomplish some new desirable functionality) because the info is not available from a request handler, don't hesitate to open feature request jiras to get the functionality added (and the folks with java know how can work on patches) Thanks Hoss. There is already one idea for the FieldAnalysisRequestHandler, which came up last week while i tried to build a new Analysis-Page. Will open a JIRA-one for that too Regards Stefan
Re: Dismax, q, q.alt, and defaultSearchField?
Hi, Try q.alt={!dismax}banana -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 2. mars 2011, at 23.06, mrw wrote: We have two banks of Solr nodes with identical schemas. The data I'm searching for is in both banks. One has defaultSearchField set to field1, the other has defaultSearchField set to field2. We need to support both user queries and facet queries that have no user content. For the latter, it appears I need to use q.alt=*:*, so I am investigating also using q.alt for user content (e.g., q.alt=banana). I run the following query: q.alt=banana defType=dismax mm=1 tie=0.1 qf=field1+field2 On bank one, I get the expected results, but on bank two, I get 0 results. I noticed (via debugQuery=true), that when I use q.alt, it resolves using the defaultSearchField (e.g., field1:banana), not the value of the qf param. Therefore, I get different results. If I switched to using q for user queries and q.alt for facet queries, I would still get different results, because q would resolve against the fields in the qf param, and q.alt would resolve against the default search field. Is there a way to override this behavior in order to get consistent results? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-q-q-alt-and-defaultSearchField-tp2621061p2621061.html Sent from the Solr - User mailing list archive at Nabble.com.
Selection Between Solr and Relational Database
Dear all, I have started to learn Solr for two months. At least right now, my system runs good in a Solr cluster. I have a question when implementing one feature in my system. When retrieving documents by keyword, I believe Solr is faster than relational database. However, if doing the following operations, I guess the performance must be lower. Is it right? What I am trying to do is listed as follows. 1) All of the documents in Solr have one field which is used to differentiate them; different categories have different value in such a field, e.g., Group; the documents are classified as news, sports, entertainment and so on. 2) Retrieve all of them documents by the field, Group. 3) Besides the field of Group, another field called CreatedTime is also existed. I will filter the documents retrieved by Group according to the value of CreatedTime. The filtered documents are the final results I need. I guess the operation performance is lower than relational database, right? Could you please give me an explanation to that? Best regards, Li Bing
Re: Boost function problem with disquerymax
You are right. it was not and index field. just stored Thanx 2011/3/2 Yonik Seeley yo...@lucidimagination.com On Wed, Mar 2, 2011 at 11:34 AM, Gastone Penzo gastone.pe...@gmail.com wrote: HI, for search i use disquery max and a i want to boost a field with bf parameter like: ...bf=boost_has_img^5 the boost_has_img field of my document is 3: int name=boost_has_img3/int if i see the results in debug query mode i can see: 0.0 = (MATCH) FunctionQuery(int(boost_has_img)), product of: 0.0 = int(boost_has_img)=0 5.0 = boost 0.06543833 = queryNorm why the score is 0 if the value is 3 and the boost is 5??? Solr thinks the value of boost_has_img is 0 for that document. Is boost_has_img an indexed field? If so, verify that the value is actually 3 for that specific document. -Yonik http://lucidimagination.com -- Gastone Penzo Webster Srl www.webster.it www.libreriauniversitaria.it
perfect match in dismax search
How to obtain perfect match with dismax query?? es: i want to search hello i love you with deftype=dismax in the title field and i want to obtain results which title is exactly hello i love you with all this terms in this order. Not less words or other. how is it possilbe?? i tryed with +(hello i love you) but if i have a title which is hello i love you mum it matches and i don't want! Thanx -- Gastone Penzo Webster Srl www.webster.it www.libreriauniversitaria.it
Re: perfect match in dismax search
Use either the string fieldType or a field with very little analysis (KeywordTokenizer + LowercaseFilter). How to obtain perfect match with dismax query?? es: i want to search hello i love you with deftype=dismax in the title field and i want to obtain results which title is exactly hello i love you with all this terms in this order. Not less words or other. how is it possilbe?? i tryed with +(hello i love you) but if i have a title which is hello i love you mum it matches and i don't want! Thanx
Re: Selection Between Solr and Relational Database
Well, an RDBMS can be very fast but Solr using fq can be very fast as well. Just try fq=group:sportsfq=createdtime:you time Dear all, I have started to learn Solr for two months. At least right now, my system runs good in a Solr cluster. I have a question when implementing one feature in my system. When retrieving documents by keyword, I believe Solr is faster than relational database. However, if doing the following operations, I guess the performance must be lower. Is it right? What I am trying to do is listed as follows. 1) All of the documents in Solr have one field which is used to differentiate them; different categories have different value in such a field, e.g., Group; the documents are classified as news, sports, entertainment and so on. 2) Retrieve all of them documents by the field, Group. 3) Besides the field of Group, another field called CreatedTime is also existed. I will filter the documents retrieved by Group according to the value of CreatedTime. The filtered documents are the final results I need. I guess the operation performance is lower than relational database, right? Could you please give me an explanation to that? Best regards, Li Bing
Re: Solr TermsComponent: space in term
why was this thread left unanswered ? Is there no way to achieve what the Op had to say ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624203.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Admin Interface, reworked - Go on? Go away?
Hi, This is simply great! Bravo! This alone is worthy including, but I also (of course) have some comments/ideas: The links section on top: * Move the links on top to bottom, reserving the top for navigation. * The send email could be changed to Community forum and instead of linking to mailto:solr-user@lucene.apache.org, link to http://wiki.apache.org/solr/UsingMailingLists * Add a link to IRC chat. http://webchat.freenode.net/?channels=#solr That would surely increase the activity on the channel :) * Allow for custom links ala the admin-extra.html. Include html code from ${solr.solr.home}/admin-links.html - letting people add links to their own support etc. * Similarly for the top-section, allow including html code from ${solr.solr.home}/admin-navi.html - where you may add links to you Master Solr or whatever Suggestion for new tabs for each core: * Prototyping - pointing to the /browse Velocity GUI. Very useful!! * CoreAdmin - Buttons reload core, remove core, rename... In the System tab for each core, it would be great to show a number of key info: * # docs * Size of index (Mb) * Last add/delete timestamp * Optimized status (with a button to optimize now) * Button to reload core now (reloads config) On the Query tab for each core: * Add a button Delete docs matching this query (With a JavaScript popup box are you sure? :) * Add an input box for query type, setting the qt param * Adding a some links below the input boxes, expanding by JavaScript: - dismax params - spatial params - spellcheck params - faceting params Should there also be a tab above all cores, with host-wide stuff? * Solr version * Host name, port * Solr HOME path * Zookeeper info and link * Core Admin (create new core) Improve the admin-extra.html concept: Today, if the file admin-extra.html exists it will be included near top of current admin GUI. This can be useful, but in this new design, it perhaps makes more sense to include the admin-extra.html contents in a widget box on each core. Then each organization can customize and put links to their internal issue trackers etc.. Include a Dev/Test/Prod indication: It is common to have three different environments, one for test, one for development and one live production. It happens now and then that you do the wrong action on the wrong server :( so a visual clue as to which environemnt you're in is very useful. I propose a simple solid bar on the very top which is RED for prod, YELLOW for test and GREEN for dev. Would it be possible to read a Java system property -Dsolr.environment=dev and based on that set the color of such a top-bar? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 2. mars 2011, at 21.47, Stefan Matheis wrote: Hi List, given that fact that my java-knowledge is sort of non-existing .. my idea was to rework the Solr Admin Interface. Compared to CouchDBs Futon or the MongoDB Admin-Utils .. not that fancy, but it was an idea few weeks ago - and i would like to contrib something, a thing which has to be non-java but not useless - hopefully ;) Actually it's completly work-in-progress .. but i'm interested in what you guys think. Right direction? Completly Wrong, just drop it? http://files.mathe.is/solr-admin/01_dashboard.png http://files.mathe.is/solr-admin/02_query.png http://files.mathe.is/solr-admin/03_schema.png http://files.mathe.is/solr-admin/04_analysis.png http://files.mathe.is/solr-admin/05_plugins.png It's actually using one index.jsp to generate to basic frame, including cores and their navigation. Everything else is loaded via existing SolrAdminHandler. Any Questions, Ideas, Thoughts outta there? Please, let me know :) Regards Stefan
Re: perfect match in dismax search
Hi, I'm working on a Filter which enables boundary match using syntax title:^hello I love you$ which will make sure that the match is exact. See SOLR-1980 (no working patch yet) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 3. mars 2011, at 11.07, Markus Jelsma wrote: Use either the string fieldType or a field with very little analysis (KeywordTokenizer + LowercaseFilter). How to obtain perfect match with dismax query?? es: i want to search hello i love you with deftype=dismax in the title field and i want to obtain results which title is exactly hello i love you with all this terms in this order. Not less words or other. how is it possilbe?? i tryed with +(hello i love you) but if i have a title which is hello i love you mum it matches and i don't want! Thanx
Date range query with mixed inclusive/exclusive
Is there any chance that https://issues.apache.org/jira/browse/LUCENE-996 will be backported to the 3x branch? I see that it's fixed in trunk, but it will be a while until it's in a release. How do people generally search for documents from lets say year 2009? I thought it would be convenient to do something like: publication:[2009-01-01T00:00:000Z TO 2010-01-01T00:00:000Z} But now that there seems to be a bug that prevents this [...} kind of search. So do people generally search like this? publication:[2009-01-01T00:00:000Z TO 2009-12-31T23:59:999Z] /Tim
Re: Solr TermsComponent: space in term
Is there no way to achieve what the Op had to say ? TermsComponent operates on indexed terms. One way to achieve multi-word suggestions is to use ShingleFilterFactory at index time.
Re: Solr TermsComponent: space in term
iorixxx wrote: TermsComponent operates on indexed terms. One way to achieve multi-word suggestions is to use ShingleFilterFactory at index time. Thank you @iorixxx. Could you point me where I can find a good docs on how to do this ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624429.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr TermsComponent: space in term
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory On Thursday 03 March 2011 12:15:07 shrinath.m wrote: iorixxx wrote: TermsComponent operates on indexed terms. One way to achieve multi-word suggestions is to use ShingleFilterFactory at index time. Thank you @iorixxx. Could you point me where I can find a good docs on how to do this ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp189 8889p2624429.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
adding a document using curl
I have read the various pages and used Curl a lot but i cannot figure out the correct command line to add a document to the example Solr instance. I have tried a few things however they seem to be for the file on the same server as solr, in my case I am pushing the document from a windows machine to Solr for indexing. Ta Ken
Re: adding a document using curl
Here's a complete example http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL On Thursday 03 March 2011 12:31:11 Ken Foskey wrote: I have read the various pages and used Curl a lot but i cannot figure out the correct command line to add a document to the example Solr instance. I have tried a few things however they seem to be for the file on the same server as solr, in my case I am pushing the document from a windows machine to Solr for indexing. Ta Ken -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Solr TermsComponent: space in term
Markus Jelsma-2 wrote: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory well, thank you Markus, Now My schema has the following : if I run a query like this : http://localhost:8983/solr/select?rows=0q=cfacet=truefacet.field=textfacet.mincount=1facet.prefix=com I get output saying : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 how do I restrict it to only those words present in the documents and not something like compliance w ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624547.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: adding a document using curl
On Thu, 2011-03-03 at 12:36 +0100, Markus Jelsma wrote: Here's a complete example http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL I should have been clearer. A rich text document, XML I can make work and a script is in the example docs folder http://wiki.apache.org/solr/ExtractingRequestHandler I also read the solr 1.4 book and tried samples in there, could not make them work. Ta On Thursday 03 March 2011 12:31:11 Ken Foskey wrote: I have read the various pages and used Curl a lot but i cannot figure out the correct command line to add a document to the example Solr instance. I have tried a few things however they seem to be for the file on the same server as solr, in my case I am pushing the document from a windows machine to Solr for indexing. Ta Ken
Re: adding a document using curl
Hi All, is there any Custom open source SOLR ADMIN application like what lucid imagination provides in its distribution. I am trying to create thing, however thinking it would be a reinventing of wheel. Request you to please redirect me, if there is any open source application that can be used. Waiting for your answer. / Pankaj Bhatt.
Custom SOLR ADMIN Application
Hi All, is there any Custom open source SOLR ADMIN application like what lucid imagination provides in its distribution. I am trying to create thing, however thinking it would be a reinventing of wheel. Request you to please redirect me, if there is any open source application that can be used. Waiting for your answer. / Pankaj Bhatt.
Re: AlternateDistributedMLT.patch not working
Hi all, I am currently working on this AlternateDistributedMLT patch. I've applied it manually on solr 1.4 an solved some Null Pointer Exception issues. It's now working properly. But I'm not sure about its behaviour so i'll ask you, list: I saw that every MLT query for a doc that is in the resultset runs only on its shard (the one where the doc is in the index). This means that you can miss documents, probably related to the doc but not retrieved because they belong to other shards. Does it make sense? Is it the expected behavoiur? If it is, i can submit the patch so then at least it works on solr 1.4.0 Thanks, Edo On Wed, Feb 23, 2011 at 6:53 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Isha, The patch is out of date. You need to look at the patch and rejection and update your local copy of the code to match the logic from the patch, if it's still applicable to the version of Solr source code you have. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Isha Garg isha.g...@orkash.com To: solr-user@lucene.apache.org Sent: Tue, February 22, 2011 2:13:23 AM Subject: AlternateDistributedMLT.patch not working Hello, I tried to use SOLR-788 with solr1.4 so that distributed MLT works well . While working with this patch i got an error mesg like 1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/solr/handler/component/MoreLikeThisComponent.java.rej Can anybody help me out? Thanks! Isha Garg -- Edoardo Tosca Sourcesense - making sense of Open Source: http://www.sourcesense.com
Re: adding a document using curl
As an example, I run this in the same directory as the msword1.doc file: curl http://localhost:8983/solr/core0/update/extract?literal.docid=74literal.type=5; -F file=@msword1.doc The type literal is just part of my schema. Gary. On 03/03/2011 11:45, Ken Foskey wrote: On Thu, 2011-03-03 at 12:36 +0100, Markus Jelsma wrote: Here's a complete example http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL I should have been clearer. A rich text document, XML I can make work and a script is in the example docs folder http://wiki.apache.org/solr/ExtractingRequestHandler I also read the solr 1.4 book and tried samples in there, could not make them work. Ta
Re: Solr Admin Interface, reworked - Go on? Go away?
Picture the URI field above the response field, only half-screen. This facilitates breaking the query apart on different lines in order to debug it. When you have a lot of shards, fq clauses, etc., you end up with a very long URI that is difficult to get your head around and manipulate. We take queries from the logs, split them around parameters, take the shards out, put the shards back in, take the OLS labels out, put them back in, etc. With long, complex queries, it's essential to have a large work space to play in. :) Stefan Matheis wrote: mrw, you mean a field like here (http://files.mathe.is/solr-admin/02_query.png) on the right side, between meta-navigation and plain solr-xml response? actually it's just to display the computed url, but if so .. we could use a larger field for that, of course :) Regards Stefan Am 02.03.2011 22:31, schrieb mrw: Looks nice. Might be also worth it to create a page with large query field for pasting in complete URL-encoded queries that cross cores, etc. I did that at work (via ASP.net) so we could paste in queries from logs and debug them. We tend to use that quite a bit. Cheers Stefan Matheis wrote: Hi List, given that fact that my java-knowledge is sort of non-existing .. my idea was to rework the Solr Admin Interface. Compared to CouchDBs Futon or the MongoDB Admin-Utils .. not that fancy, but it was an idea few weeks ago - and i would like to contrib something, a thing which has to be non-java but not useless - hopefully ;) Actually it's completly work-in-progress .. but i'm interested in what you guys think. Right direction? Completly Wrong, just drop it? http://files.mathe.is/solr-admin/01_dashboard.png http://files.mathe.is/solr-admin/02_query.png http://files.mathe.is/solr-admin/03_schema.png http://files.mathe.is/solr-admin/04_analysis.png http://files.mathe.is/solr-admin/05_plugins.png It's actually using one index.jsp to generate to basic frame, including cores and their navigation. Everything else is loaded via existing SolrAdminHandler. Any Questions, Ideas, Thoughts outta there? Please, let me know :) Regards Stefan -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Admin-Interface-reworked-Go-on-Go-away-tp2620365p2620745.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Admin-Interface-reworked-Go-on-Go-away-tp2620365p2624956.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Understanding multi-field queries with q and fq
Yes, we're investigating dismax (with the qf param), but we're not sure it supports our syntax needs. The users want to put put AND/OR/NOT in their queries, and we don't want to write a lot of code converting those queries into dismax (+/-/mm) format. So, until 3.1 (edismax) ships, we're also trying to get boolean queries to work across multiple fields with the standard query handler. I've seen quite a few unanswered or partially-answered posts on this list on getting boolean syntax right. I can tell it's a thorny issue. Robert Sandiford wrote: Have you looked at the 'qf' parameter? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com _ http://www.cosugi.org/ -Original Message- From: mrw [mailto:mikerobertsw...@gmail.com] Sent: Wednesday, March 02, 2011 2:28 PM To: solr-user@lucene.apache.org Subject: Re: Understanding multi-field queries with q and fq Anyone understand how to do boolean logic across multiple fields? Dismax is nice for searching multiple fields, but doesn't necessarily support our syntax requirements. eDismax appears to be not available until Solr 3.1. In the meantime, it looks like we need to support applying the user's query to multiple fields, so if the user enters led zeppelin merle we need to be able to do the logical equivalent of fq=field1:led zeppelin merle OR field2:led zeppelin merle Any ideas? :) mrw wrote: After searching this list, Google, and looking through the Pugh book, I am a little confused about the right way to structure a query. The Packt book uses the example of the MusicBrainz DB full of song metadata. What if they also had the song lyrics in English and German as files on disk, and wanted to index them along with the metadata, so that each document would basically have song title, artist, publisher, date, ..., All_Metadata (copy field of all metadata fields), Text_English, and Text_German fields? There can only be one default field, correct? So if we want to search for all songs containing (zeppelin AND (dog OR merle)) do we repeat the entire query text for all three major fields in the 'q' clause (assuming we don't want to use the cache): q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog OR merle)+Text_German:(zeppelin AND (dog OR merle)) or repeat the entire query text for all three major fields in the 'fq' clause (assuming we want to use the cache): q=*:*fq=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle)) ? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries- with-q-and-fq-tp2528866p2619700.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2625068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr TermsComponent: space in term
You need to remove EdgeNGramFilterFactory from your analyzer chain. --- On Thu, 3/3/11, shrinath.m shrinat...@webyog.com wrote: From: shrinath.m shrinat...@webyog.com Subject: Re: Solr TermsComponent: space in term To: solr-user@lucene.apache.org Date: Thursday, March 3, 2011, 1:41 PM Markus Jelsma-2 wrote: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory well, thank you Markus, Now My schema has the following : if I run a query like this : http://localhost:8983/solr/select?rows=0q=cfacet=truefacet.field=textfacet.mincount=1facet.prefix=com I get output saying : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 how do I restrict it to only those words present in the documents and not something like compliance w ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-TermsComponent-space-in-term-tp1898889p2624547.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: adding a document using curl
On Thu, Mar 3, 2011 at 5:15 PM, Ken Foskey kfos...@tpg.com.au wrote: On Thu, 2011-03-03 at 12:36 +0100, Markus Jelsma wrote: Here's a complete example http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL I should have been clearer. A rich text document, XML I can make work and a script is in the example docs folder http://wiki.apache.org/solr/ExtractingRequestHandler I also read the solr 1.4 book and tried samples in there, could not make them work. [...] Please provide details on what exactly is not working for you, and the corresponding error message from the Solr logs. E.g., something like I tried posting ABC document to Solr, using XYZ commands, and include the part from the Solr logs relating to the exception that you get. After that, further details might be needed, but without the above it is nigh impossible to guess at what you are trying. Regards, Gora
Re: adding a document using curl
On Thu, Mar 3, 2011 at 5:31 PM, pankaj bhatt panbh...@gmail.com wrote: Hi All, is there any Custom open source SOLR ADMIN application like what lucid imagination provides in its distribution. I am trying to create thing, however thinking it would be a reinventing of wheel. Request you to please redirect me, if there is any open source application that can be used. Waiting for your answer. [...] Please do not hijack an existing thread, but start a new one if you want to discuss a new topic. On Hoss' behalf :-) http://people.apache.org/~hossman/#threadhijack Regards, Gora
Re: adding a document using curl
If you are using the ExtractingRequestHandler, you can also try using the stream.file or stream.url. e.g. curl http://localhost:8080/solr/core0/update/extract?stream.file=C:/777045.zipliteral.id=777045literal.title=Testcommit=true; More detailed explaination @ http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika The literal prefix attributes with normal fields and the content extracted from the document is stored in the text field by default Regards, Jayendra On Thu, Mar 3, 2011 at 7:16 AM, Gary Taylor g...@inovem.com wrote: As an example, I run this in the same directory as the msword1.doc file: curl http://localhost:8983/solr/core0/update/extract?literal.docid=74literal.type=5; -F file=@msword1.doc The type literal is just part of my schema. Gary. On 03/03/2011 11:45, Ken Foskey wrote: On Thu, 2011-03-03 at 12:36 +0100, Markus Jelsma wrote: Here's a complete example http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL I should have been clearer. A rich text document, XML I can make work and a script is in the example docs folder http://wiki.apache.org/solr/ExtractingRequestHandler I also read the solr 1.4 book and tried samples in there, could not make them work. Ta
error in log INFO org.apache.solr.core.SolrCore - webapp=/solr path=/admin/ping params={} status=0 QTime=1
I am using solr under jboss, so this might be more of a jboss config issue, not really sure. But my logs keep getting spammed, because solr sends it as ERROR [STDERR] INFO org.apache.solr.core.SolrCore - webapp=/solr path=/admin/ping params={} status=0 QTime=1 Has anyone seen this and found a workaround to not send this as an Error? Thanks, Mike
Content-Type of XMLResponseWriter / QueryResponseWriter
Dear list, is there any deeper logic behind the fact that XMLResponseWriter is sending CONTENT_TYPE_XML_UTF8=application/xml; charset=UTF-8 ? I would assume (and also most browser) that for XML Output to receive text/xml and not application/xml. Or do you want the browser to call and XML-Editor with the result? Best regards, Bernd
deletedPKQuery does not perform with compound PK
Hello, I'm using a DIH to import documents from a database. Documents in the index represent a relationship between two entities, units and dealpoints (unit has dealpoint); thus document keys in the index refer to a compound SQL key. Full import works fine. In order to optimize the import process, I configured both the database and DIH configuration file for delta-import. I added 3 more tables, updated by triggers: a table tracking modification time of units, another one tracking modification time of dealpoints, and the last one used to track deleted units having a dealpoint. The uniqueKey field of the schema is defined as follows: field name=id type=string indexed=true stored=true required=true multiValued=false / ... uniqueKeyid/uniqueKey Keys are generated by concatenating the unit id and the dealpoint id, separated by '-', in the SQL query. Below is a sample of the data-config.xml I'm using (the original one is quite huge and may be confusing): dataConfig dataSourcedriver=com.mysql.jdbc.Driver url=jdbc:mysql://somehost:3306/somedatabase user=user password=** / document name=unitdealpoints entity name=unitdealpoint pk=unit_id,dealpoint_id query=select concat_ws('-', cast(u.unit_id as char), cast(dp.deal_point_id as char)) as id, ... from unit u, deal_point dp, ... where ... deltaQuery=select us.unit_id as unit_id, dps.deal_point_id as dealpoint_id from unit_state us, deal_point_state dps where us.unit_state_last_mod gt; '${dataimporter.last_index_time}' or dps.deal_point_state_last_mod gt; '${dataimporter.last_index_time}' deltaImportQuery=select concat_ws('-', cast(u.unit_id as char), cast(dp.deal_point_id as char)) as id, ... from unit u, deal_point dp, ... where (u.unit_id = '${dataimporter.delta.unit_id}' or dp.deal_point_id = '${dataimporter.delta.dealpoint_id}') and ... deletedPKQuery=select id from unit_deal_point_delete ... /entity /document /dataConfig I specifically choose to track deleted entities in a dedicated (unit_deal_point_delete) table in order to prevent the known (and apparently unsolved) bugs described here: https://issues.apache.org/jira/browse/SOLR-1229?focusedCommentId=12722427page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12722427 The id field in the unit_deal_point_delete table has the exact same representation as the document keys. Below is an example of a trigger: create trigger unit_delete_before before delete on unit for each row begin insert ignore into unit_deal_point_delete (id) select concat_ws('-', cast(old.unit_id as char), cast(dpu.deal_point_id as char)) from deal_point_unit dpu where dpu.unit_id = old.unit_id; end; Delta and delta-import queries works fine, but the deletedPKQuery seems to always return 0 rows, although the unit_deal_point_delete table is obviously not empty. No errors written in the logs, but: Mar 3, 2011 11:23:49 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: unitdealpoints rows obtained : 0 I have tested it with versions 1.4.0 1.4.1 and the result is the same: documents are not deleted. What is the problem? Am I missing something? Kind regards -- Jerome Droz
Why is SolrDispatchFilter using 90% of the Time?
Hi, I'm working with a recent NightlyBuild of Solr and I'm doing some serious ZooKeeper testing. I've NewRelic monitoring enabled on my solr machines. When I look at the distribution of the Response-time I notice 'SolrDispatchFilter.doFilter()' is taking up 90% of the time. The other 10% is used by SolrSeacher and the QueryComponent. + Can anyone explain me why SolrDispatchFilter is consuming so much time? ++ Can I do something to lower this number? ( After all SolrDispatchFilter must Dispatch each time to the standard searcher. ) Stijn Vanhoorelbeke
uniqueKey merge documents on commit
Hi, I have a unique key within my index, but rather than the default behavour of overwriting I am wondering if there is a method to merge the two different documents on commit of the second document. I have a testcase which explains what I'd like to happen: @Test public void testMerge() throws SolrServerException, IOException { SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField(secid, testid); doc1.addField(value1_i, 1); SolrAllSec.GetSolrServer().add(doc1); SolrAllSec.GetSolrServer().commit(); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(secid, testid); doc2.addField(value2_i, 2); SolrAllSec.GetSolrServer().add(doc2); SolrAllSec.GetSolrServer().commit(); SolrQuery solrQuery = new SolrQuery(); solrQuery = solrQuery.setQuery(secid:testid); QueryResponse response = SolrAllSec.GetSolrServer().query(solrQuery, METHOD.GET); ListSolrDocument result = response.getResults(); Assert.isTrue(result.size() == 1); Assert.isTrue(result.contains(value1)); Assert.isTrue(result.contains(value2)); } Other than reading doc1 and adding the fields from doc2 and recommitting, is there another way? Thanks in advance, Tim
Re: Content-Type of XMLResponseWriter / QueryResponseWriter
Never use text/xml, that overrides any encoding declaration inside the XML file. http://ln.hixie.ch/?start=1037398795count=1 http://www.grauw.nl/blog/entry/489 wunder == Lead Engineer, MarkLogic On Mar 3, 2011, at 7:30 AM, Bernd Fehling wrote: Dear list, is there any deeper logic behind the fact that XMLResponseWriter is sending CONTENT_TYPE_XML_UTF8=application/xml; charset=UTF-8 ? I would assume (and also most browser) that for XML Output to receive text/xml and not application/xml. Or do you want the browser to call and XML-Editor with the result? Best regards, Bernd
Omit hour-min-sec in search?
Hi, Is there a way to omit hour-min-sec in SOLR date field during search? I have indexed a field using TrieDateField and seems like it uses UTC format. The dates get stored as below, lastupdateddate2008-02-26T20:40:30.94Z I want to do a search based on just -MM-DD and omit T20:40:30.94Z.. Not sure if its feasible, just want to check if its possible. Also most of the data in our source doesnt have time information hence we are very much interested in just storing the date without time or even if its stored with some default timestamp we want to search just using date without using the timestamp. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Omit-hour-min-sec-in-search-tp2625840p2625840.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Admin Interface, reworked - Go on? Go away?
Hey Jan, On Thu, Mar 3, 2011 at 11:37 AM, Jan Høydahl jan@cominvent.com wrote: This alone is worthy including, but I also (of course) have some comments/ideas: [...] Really nice! i'll try to make a list of open todos / missing items and attach it to the JIRA-Ticket. Especially for the dismax- spatial-query-params, i would need some information about (not used until now) - but i think these are smaller problems, regarding the complete task : Regards Stefan
Re: SolrJ Tutorial
Dear Lance, Could you tell me where I can find the unit tests code? I appreciate so much for your help! Best regards, LB On Sat, Jan 22, 2011 at 3:58 PM, Lance Norskog goks...@gmail.com wrote: The unit tests are simple and show the steps. Lance On Fri, Jan 21, 2011 at 10:41 PM, Bing Li lbl...@gmail.com wrote: Hi, all, In the past, I always used SolrNet to interact with Solr. It works great. Now, I need to use SolrJ. I think it should be easier to do that than SolrNet since Solr and SolrJ should be homogeneous. But I cannot find a tutorial that is easy to follow. No tutorials explain the SolrJ programming step by step. No complete samples are found. Could anybody offer me some online resources to learn SolrJ? I also noticed Solr Cell and SolrJ POJO. Do you have detailed resources to them? Thanks so much! LB -- Lance Norskog goks...@gmail.com
Re: Omit hour-min-sec in search?
Not sure if there is a means of doing explicitly what you ask, but you could do a date range: +mydate:[-MM-DD 0:0:0 TO -MM-DD 11:59:59] On Thu, Mar 3, 2011 at 9:14 AM, bbarani bbar...@gmail.com wrote: Hi, Is there a way to omit hour-min-sec in SOLR date field during search? I have indexed a field using TrieDateField and seems like it uses UTC format. The dates get stored as below, lastupdateddate2008-02-26T20:40:30.94Z I want to do a search based on just -MM-DD and omit T20:40:30.94Z.. Not sure if its feasible, just want to check if its possible. Also most of the data in our source doesnt have time information hence we are very much interested in just storing the date without time or even if its stored with some default timestamp we want to search just using date without using the timestamp. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Omit-hour-min-sec-in-search-tp2625840p2625840.html Sent from the Solr - User mailing list archive at Nabble.com.
FilterQuery OR statement
Trying to figure out how I can run something similar to this for the fq parameter Field1 in ( 1, 2, 3 4 ) AND Field2 in ( 4, 5, 6, 7 ) I found some examples on the net that looked like this: fq=+field1:(1 2 3 4) +field2(4 5 6 7) but that yields no results.
Re: Dismax, q, q.alt, and defaultSearchField?
Thanks, Jan. It looks like we need to do is use both q and q.alt, such that q.alt is always *:* and q is either empty for filter-only queries, or has the user text. That seems to work. Jan Høydahl / Cominvent wrote: Hi, Try q.alt={!dismax}banana -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 2. mars 2011, at 23.06, mrw wrote: We have two banks of Solr nodes with identical schemas. The data I'm searching for is in both banks. One has defaultSearchField set to field1, the other has defaultSearchField set to field2. We need to support both user queries and facet queries that have no user content. For the latter, it appears I need to use q.alt=*:*, so I am investigating also using q.alt for user content (e.g., q.alt=banana). I run the following query: q.alt=banana defType=dismax mm=1 tie=0.1 qf=field1+field2 On bank one, I get the expected results, but on bank two, I get 0 results. I noticed (via debugQuery=true), that when I use q.alt, it resolves using the defaultSearchField (e.g., field1:banana), not the value of the qf param. Therefore, I get different results. If I switched to using q for user queries and q.alt for facet queries, I would still get different results, because q would resolve against the fields in the qf param, and q.alt would resolve against the default search field. Is there a way to override this behavior in order to get consistent results? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-q-q-alt-and-defaultSearchField-tp2621061p2621061.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-q-q-alt-and-defaultSearchField-tp2621061p2627134.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FilterQuery OR statement
Trying to figure out how I can run something similar to this for the fq parameter Field1 in ( 1, 2, 3 4 ) AND Field2 in ( 4, 5, 6, 7 ) I found some examples on the net that looked like this: fq=+field1:(1 2 3 4) +field2(4 5 6 7) but that yields no results. May be your default operator is set to AND in schema.xml? If yes, try using +field2(4 OR 5 OR 6 OR 7)
Re: uniqueKey merge documents on commit
Nope, there is not. On 3/3/2011 10:55 AM, Tim Gilbert wrote: Hi, I have a unique key within my index, but rather than the default behavour of overwriting I am wondering if there is a method to merge the two different documents on commit of the second document. I have a testcase which explains what I'd like to happen: @Test public void testMerge() throws SolrServerException, IOException { SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField(secid, testid); doc1.addField(value1_i, 1); SolrAllSec.GetSolrServer().add(doc1); SolrAllSec.GetSolrServer().commit(); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(secid, testid); doc2.addField(value2_i, 2); SolrAllSec.GetSolrServer().add(doc2); SolrAllSec.GetSolrServer().commit(); SolrQuery solrQuery = new SolrQuery(); solrQuery = solrQuery.setQuery(secid:testid); QueryResponse response = SolrAllSec.GetSolrServer().query(solrQuery, METHOD.GET); ListSolrDocument result = response.getResults(); Assert.isTrue(result.size() == 1); Assert.isTrue(result.contains(value1)); Assert.isTrue(result.contains(value2)); } Other than reading doc1 and adding the fields from doc2 and recommitting, is there another way? Thanks in advance, Tim
Re: FilterQuery OR statement
--- On Thu, 3/3/11, Ahmet Arslan iori...@yahoo.com wrote: From: Ahmet Arslan iori...@yahoo.com Subject: Re: FilterQuery OR statement To: solr-user@lucene.apache.org Date: Thursday, March 3, 2011, 8:05 PM Trying to figure out how I can run something similar to this for the fq parameter Field1 in ( 1, 2, 3 4 ) AND Field2 in ( 4, 5, 6, 7 ) I found some examples on the net that looked like this: fq=+field1:(1 2 3 4) +field2(4 5 6 7) but that yields no results. May be your default operator is set to AND in schema.xml? If yes, try using +field2(4 OR 5 OR 6 OR 7) Actually you can use local params for that. http://wiki.apache.org/solr/LocalParams fq={!q.op=OR df=field1}1 2 3 4fq={!q.op=OR df=field2}4 5 6 7
Re: FilterQuery OR statement
That worked, thought I tried it before, not sure why it didn't before. Also, is there a way to query without a q parameter? I'm just trying to pull back all of the field results where field1:(1 OR 2 OR 3) etc. so I figured I'd use the FQ param for caching purposes because those queries will likely be run a lot, but if I leave the Q parameter off i get a null pointer error. On Thu, Mar 3, 2011 at 11:05 AM, Ahmet Arslan iori...@yahoo.com wrote: Trying to figure out how I can run something similar to this for the fq parameter Field1 in ( 1, 2, 3 4 ) AND Field2 in ( 4, 5, 6, 7 ) I found some examples on the net that looked like this: fq=+field1:(1 2 3 4) +field2(4 5 6 7) but that yields no results. May be your default operator is set to AND in schema.xml? If yes, try using +field2(4 OR 5 OR 6 OR 7)
Location of Main Class in Solr?
I searched SolrIndexSearcher.java file but there is no main class. I wanted to know as to where this class resides. Can i call this main class (if it exists) using command line options in terminal , rather than through war file? - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Location-of-Main-Class-in-Solr-tp2627576p2627576.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Admin Interface, reworked - Go on? Go away?
Am 02.03.2011 23:48, schrieb Robert Muir: On Wed, Mar 2, 2011 at 5:34 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Robert, even in this WIP-State? if so .. i'll try one tomorrow evening after work Its totally up to you, sometimes it can be useful to upload a partial or WIP solution to an issue: as Hoss mentioned its a good way to get feedback and additional ideas while you work. There you go :) https://issues.apache.org/jira/browse/SOLR-2399
mixing version of solr
Hey all, I have a master slave using the same index folder, the master only writes, and the slave only reads. Is it possible to use different versions of solr for those two servers? Let's say i want to gain from the improved search speed of solr4.0 but since it's my production system, am not willing to index using it since it's not a stable release. Since the slave only reads, if it will crash i'll just restart it. Can i index using solr 1.4.1 and read the same index with solr 4.0? thanks
Re: mixing version of solr
No, that won't work as the index format has changed. On Donnerstag, 3. März 2011 at 20:03, Ofer Fort wrote: Hey all, I have a master slave using the same index folder, the master only writes, and the slave only reads. Is it possible to use different versions of solr for those two servers? Let's say i want to gain from the improved search speed of solr4.0 but since it's my production system, am not willing to index using it since it's not a stable release. Since the slave only reads, if it will crash i'll just restart it. Can i index using solr 1.4.1 and read the same index with solr 4.0? thanks
Limiting on dates in Solr
I am treating Solr as a NoSQL db that has great search capabilities. I am querying on a few fields: 1. text (default) 2. type (my own string field) 3. calibration (my own date field) I'd like to limit the results to only show the calibration using this query: calibration:[2011-03-03T00:00:00.000Z TO 2011-03-03T59:59:99.999Z] This mostly works, but a couple of different dates (March 5) seep into the March 3rd results. Is there any way to exclude the other dates, or at least have them return a lower ranking in the search? I've also tried: calibration:[2011-03-03T00:00:00.000Z TO 2011-03-03T59:59:99.999Z] AND NOT ( calibration:[* TO 2011-03-03T00:00:00.000Z] OR calibration:[2011-03-03T59:59:99.999Z TO *]) Which I found suggested on the stackoverflow web site. I've googled a good bit and nothing seems to be jumping out at me. No one else appears to be trying to do something similar, so I may just have unrealistic expectations of what a search engine will do. Thanks in advance! Steve
Re: FilterQuery OR statement
You might also consider splitting your two seperate AND clauses into two seperate fq's: fq=field1:(1 OR 2 OR 3 OR 4) fq=field2:(4 OR 5 OR 6 OR 7) That will cache the two seperate clauses seperately in the field cache, which is probably preferable in general, without knowing more about your use characteristics. ALSO, instead of either supplying the OR explicitly as above, OR changing the default operator in schema.xml for everything, I believe it would work to supply it as a local param: fq={q.op=OR}field1:(1 2 3 4) If you want to do that. AND, your question, can you search without a 'q'? No, but you can search with a 'q' that selects all documents, to be limited by the fq's. q=[* TO *] On 3/3/2011 1:14 PM, Tanner Postert wrote: That worked, thought I tried it before, not sure why it didn't before. Also, is there a way to query without a q parameter? I'm just trying to pull back all of the field results where field1:(1 OR 2 OR 3) etc. so I figured I'd use the FQ param for caching purposes because those queries will likely be run a lot, but if I leave the Q parameter off i get a null pointer error. On Thu, Mar 3, 2011 at 11:05 AM, Ahmet Arslaniori...@yahoo.com wrote: Trying to figure out how I can run something similar to this for the fq parameter Field1 in ( 1, 2, 3 4 ) AND Field2 in ( 4, 5, 6, 7 ) I found some examples on the net that looked like this: fq=+field1:(1 2 3 4) +field2(4 5 6 7) but that yields no results. May be your default operator is set to AND in schema.xml? If yes, try using +field2(4 OR 5 OR 6 OR 7)
Fwd: [Announce] Now Open: Call for Participation for ApacheCon North America
Begin forwarded message: From: Sally Khudairi s...@apache.org Date: March 3, 2011 3:10:17 PM EST To: annou...@apachecon.com Subject: [Announce] Now Open: Call for Participation for ApacheCon North America Reply-To: s...@apache.org Call for Participation ApacheCon North America 2011 7-11 November 2011 Westin Bayshore, Vancouver, Canada All submissions must be received by Friday, 29 April 2011 at midnight Pacific Time. ApacheCon, the official conference, trainings, and expo of The Apache Software Foundation (ASF), heads to Vancouver, Canada, this November, with dozens of technical, business, and community-focused sessions for beginner, intermediate, and expert audiences. Now in its 11th year, the ASF develops and shepherds nearly 150 Top-Level Projects and new initiatives in the Apache Incubator and Labs. With hundreds of thousands of applications deploying ASF products and code contributions by more than 2,500 Committers from around the world, the Apache community is recognized as among the most robust, successful, and respected in Open Source. This year's ApacheCon focuses on highly-relevant, professionally-directed presentations that demonstrate specific problems and real-world solutions. We welcome proposals --from developers and users alike-- in the areas of Apache and ...: ... Enterprise Solutions (from ActiveMQ to Axis2 to ServiceMix, OFBiz to Chemistry, the gang's all here!) ... Cloud Computing (Hadoop, Cassandra, HBase, CouchDB, and friends) ... Emerging Technologies + Innovation (Incubating projects such as Libcloud, Stonehenge, and Wookie) ... Community Leadership (mentoring and meritocracy, GSoC and related initiatives) ... Data Handling, Search + Analytics (Lucene, Solr, Mahout, OODT, Hive and friends) ... Pervasive Computing (Felix/OSGi, Tomcat, MyFaces Trinidad, and friends) ... Servers, Infrastructure + Tools (HTTP Server, SpamAssassin, Geronimo, Sling, Wicket and friends) Submissions are open to anyone with relevant expertise: ASF affiliation is not required to present at, attend, or otherwise participate in ApacheCon. Whilst we encourage submissions that the highlight the use of specific Apache solutions, we are unable to accept marketing/commercially-oriented presentations. Other proposals, such as panels, have been considered in the past; you are welcome to submit an alternate presentation, however, such sessions are accepted under exceptional circumstances. Please be as descriptive as possible, including names/bios of proposed panelists and any related details. Accepted speakers (not co-presenters) qualify for general conference admission and a minimum of two nights lodging at the conference hotel. Additional hotel nights and travel assistance are possible, depending on the number of presentations given and type of assistance needed. To submit a presentation proposal, please complete our ONLINE SUBMISSION FORM at http://na11.apachecon.com/proposals/new To be considered, proposals must be received by Friday, 29 April 2011 at midnight Pacific Time. Please email any questions regarding proposal submissions to cfp AT apachecon DOT com. Key Dates: 3 March 2011 - CFP Opens 29 April 2011 - CFP Closes 20 May-30 June 2011 - Speaker Notifications and Confirmations 7-11 November 2011 - ApacheCon NA 2011 We look forward to seeing you in Vancouver! – The ApacheCon Planning team - To unsubscribe, e-mail: announce-unsubscr...@apachecon.com For additional commands, e-mail: announce-h...@apachecon.com
Fwd: [Announce] Now Open: Call for Participation for ApacheCon North America
Begin forwarded message: From: Grant Ingersoll grant.ingers...@gmail.com Date: March 3, 2011 3:52:05 PM EST To: u...@mahout.apache.org, solr-user@lucene.apache.org, java-u...@lucene.apache.org, opennlp-u...@incubator.apache.org Subject: Fwd: [Announce] Now Open: Call for Participation for ApacheCon North America Begin forwarded message: From: Sally Khudairi s...@apache.org Date: March 3, 2011 3:10:17 PM EST To: annou...@apachecon.com Subject: [Announce] Now Open: Call for Participation for ApacheCon North America Reply-To: s...@apache.org Call for Participation ApacheCon North America 2011 7-11 November 2011 Westin Bayshore, Vancouver, Canada All submissions must be received by Friday, 29 April 2011 at midnight Pacific Time. ApacheCon, the official conference, trainings, and expo of The Apache Software Foundation (ASF), heads to Vancouver, Canada, this November, with dozens of technical, business, and community-focused sessions for beginner, intermediate, and expert audiences. Now in its 11th year, the ASF develops and shepherds nearly 150 Top-Level Projects and new initiatives in the Apache Incubator and Labs. With hundreds of thousands of applications deploying ASF products and code contributions by more than 2,500 Committers from around the world, the Apache community is recognized as among the most robust, successful, and respected in Open Source. This year's ApacheCon focuses on highly-relevant, professionally-directed presentations that demonstrate specific problems and real-world solutions. We welcome proposals --from developers and users alike-- in the areas of Apache and ...: ... Enterprise Solutions (from ActiveMQ to Axis2 to ServiceMix, OFBiz to Chemistry, the gang's all here!) ... Cloud Computing (Hadoop, Cassandra, HBase, CouchDB, and friends) ... Emerging Technologies + Innovation (Incubating projects such as Libcloud, Stonehenge, and Wookie) ... Community Leadership (mentoring and meritocracy, GSoC and related initiatives) ... Data Handling, Search + Analytics (Lucene, Solr, Mahout, OODT, Hive and friends) ... Pervasive Computing (Felix/OSGi, Tomcat, MyFaces Trinidad, and friends) ... Servers, Infrastructure + Tools (HTTP Server, SpamAssassin, Geronimo, Sling, Wicket and friends) Submissions are open to anyone with relevant expertise: ASF affiliation is not required to present at, attend, or otherwise participate in ApacheCon. Whilst we encourage submissions that the highlight the use of specific Apache solutions, we are unable to accept marketing/commercially-oriented presentations. Other proposals, such as panels, have been considered in the past; you are welcome to submit an alternate presentation, however, such sessions are accepted under exceptional circumstances. Please be as descriptive as possible, including names/bios of proposed panelists and any related details. Accepted speakers (not co-presenters) qualify for general conference admission and a minimum of two nights lodging at the conference hotel. Additional hotel nights and travel assistance are possible, depending on the number of presentations given and type of assistance needed. To submit a presentation proposal, please complete our ONLINE SUBMISSION FORM at http://na11.apachecon.com/proposals/new To be considered, proposals must be received by Friday, 29 April 2011 at midnight Pacific Time. Please email any questions regarding proposal submissions to cfp AT apachecon DOT com. Key Dates: 3 March 2011 - CFP Opens 29 April 2011 - CFP Closes 20 May-30 June 2011 - Speaker Notifications and Confirmations 7-11 November 2011 - ApacheCon NA 2011 We look forward to seeing you in Vancouver! – The ApacheCon Planning team - To unsubscribe, e-mail: announce-unsubscr...@apachecon.com For additional commands, e-mail: announce-h...@apachecon.com -- Grant Ingersoll http://www.lucidimagination.com
Re: Limiting on dates in Solr
Ugh. Of course. I fixed that a couple weeks ago, something must have crept back in! Thanks a mil! From: Andreas Kemkes a5s...@yahoo.com To: solr-user@lucene.apache.org Sent: Thu, March 3, 2011 4:12:02 PM Subject: Re: Limiting on dates in Solr 2011-03-03T59:59:99.999Z - shouldn't that be 2011-03-03T23:59:59.999Z From: Steve Lewis spiritualmecha...@yahoo.com To: solr-user@lucene.apache.org Sent: Thu, March 3, 2011 11:21:53 AM Subject: Limiting on dates in Solr I am treating Solr as a NoSQL db that has great search capabilities. I am querying on a few fields: 1. text (default) 2. type (my own string field) 3. calibration (my own date field) I'd like to limit the results to only show the calibration using this query: calibration:[2011-03-03T00:00:00.000Z TO 2011-03-03T59:59:99.999Z] This mostly works, but a couple of different dates (March 5) seep into the March 3rd results. Is there any way to exclude the other dates, or at least have them return a lower ranking in the search? I've also tried: calibration:[2011-03-03T00:00:00.000Z TO 2011-03-03T59:59:99.999Z] AND NOT ( calibration:[* TO 2011-03-03T00:00:00.000Z] OR calibration:[2011-03-03T59:59:99.999Z TO *]) Which I found suggested on the stackoverflow web site. I've googled a good bit and nothing seems to be jumping out at me. No one else appears to be trying to do something similar, so I may just have unrealistic expectations of what a search engine will do. Thanks in advance! Steve
Out of memory while creating indexes
Hi All, I am trying to create indexes out of a 400MB XML file using the following command and I am running into out of memory exception. $JAVA_HOME/bin/java -Xms768m -Xmx1024m -*Durl*=http://$SOLR_HOST SOLR_PORT/solr/customercarecore/update -jar $SOLRBASEDIR/*dataconvertor*/common/lib/post.jar $SOLRBASEDIR/dataconvertor/customercare/xml/CustomerData.xml I am planning to bump up the memory and try again. Did any one ran into similar issue? Any inputs would be very helpful to resolve the out of memory exception. I was able to create indexes with small file but not with large file. I am not using Solr J. Thanks, Solr User
Max Document Size
Is there a maximum document size that Solr can handle? I'm trying to index documents greater than 15MB, but every time I do I get a random error. One of the other problems with what I'm documenting is that they are not in a human language. They are EDI documents (EDI is a B2B communication system that is similar in format to iCal formatted documents) and don't have many traditional word breaks but do have segment and element character breaks. I tried playing with the maxFieldLength parameter, but that doesn't seem to be helping (and, yes, I changed it in both places in the SolrConfig.xml). Has anyone had any similar problems with Solr? * Sean Todd* Senior Software Developer EDI Technical Operations Build.com, Inc. http://corp.build.com/ Smarter Home Improvement™ P.O. Box 7990 Chico, CA 95927 *P*: 800.375.3403 x534 *F*: 530.566.1893 st...@build.com | Network of Storeshttp://www.build.com/index.cfm?page=help:networkstoressource=emailSignature
Re: Out of memory while creating indexes
On Fri, Mar 4, 2011 at 3:32 AM, Solr User solr...@gmail.com wrote: Hi All, I am trying to create indexes out of a 400MB XML file using the following command and I am running into out of memory exception. Is this a single record in the XML file? If it is more than one, breaking it up into separate XML files, say one per record, should help. $JAVA_HOME/bin/java -Xms768m -Xmx1024m -*Durl*=http://$SOLR_HOST SOLR_PORT/solr/customercarecore/update -jar $SOLRBASEDIR/*dataconvertor*/common/lib/post.jar $SOLRBASEDIR/dataconvertor/customercare/xml/CustomerData.xml I am planning to bump up the memory and try again. [...] If you give Solr enough memory this should work, but IMHO, it would be better to break up your input XML files if you can. Regards, Gora
Model foreign key type of search?
Hi there, I need some advice on how to implement this using solr: We have two tables: urls and bookmarks. - Each url has four fields: {guid, title, text, url} - One url will have one or more bookmarks associated with it. Each bookmark has these: {link.guid, user, tags, comment} I'd like to return matched urls based on not only the title, text from the url schema, but also some kind of aggregated popularity score based on all bookmarks for the same url. The popularity score should base on number/frequency of bookmarks that match the query. For example, a search for Paris. Let's say 15 out of 1000 people has bookmarked a tripadvisor.com page with Paris in tag or comments field; another 15 out of 20 people bookmarked www.ratp.info/orienter/cv/carteparis.php with Paris in it. I'd like to rank the later one, ie the metro planner higher. I am thinking of implementing org.apache.solr.search.ValueSourceParser which takes a guid and run a embedded query to get a score for this guid in the bookmark schema. This would probably requires two separated indexes to begin with. Keen to hear ideas on what's the best way to implement this and where I should start. Thanks, Alex
Re: SolrJ Tutorial
It comes with every solr source code download directory under src/test - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Tutorial-tp2307113p2631223.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Model foreign key type of search?
On Fri, Mar 4, 2011 at 10:24 AM, Alex Dong a...@trunk.ly wrote: Hi there, I need some advice on how to implement this using solr: We have two tables: urls and bookmarks. - Each url has four fields: {guid, title, text, url} - One url will have one or more bookmarks associated with it. Each bookmark has these: {link.guid, user, tags, comment} I'd like to return matched urls based on not only the title, text from the url schema, but also some kind of aggregated popularity score based on all bookmarks for the same url. The popularity score should base on number/frequency of bookmarks that match the query. [...] It is best not to think of Solr as a RDBMS, and not to try to graft RDBMS practices on to it. Instead, you should flatten your data, e.g., in the above, you could have: * Four single-valued fields: guid, title, text, url * Four multi-valued fields: bookmark_guid, bookmark_user, bookmark_tags, bookmark_comment Your index would contain one record per guid of the URL, and you would need to populate the multi-valued bookmark fields from all bookmark instances associated with that URL. Then one could either copy the relevant search fields to a full-text search field, and search only on that, or, e.g., search on bookmark_tags and bookmark_comment in addition to searching on title, and text. Regards, Gora
Re: Model foreign key type of search?
Gora, thanks for the quick reply. Yes, I'm aware of the differences between Solr vs. DBMS. We've actually written some c++ analytical engine that can process through a billion tweets with multiple facets drill down. We may end up cook our own in the end but so far solr suites our needs quite well. The multi-lingual tokenizer and tika integration are all too addictive. What you're suggesting is exactly what I'm doing. Trying to use dynamic fields and copyTo to get all the information into one field, then run the search over that. However, this is not good enough. Allow me to elaborate this using the same Paris example again. Let's say two urls, first has 10 people bookmarked and second has 100. Let's say these two have roughly similar score if we squeeze them into one single field. Then I'd like to rank the one with more users higher. Another way to look at this is PageRank relies on the the number and anchor text of the incoming link, we're trying to use the number of people and their keywords/comments as a weight for the link. Alex On Fri, Mar 4, 2011 at 6:29 PM, Gora Mohanty g...@mimirtech.com wrote: On Fri, Mar 4, 2011 at 10:24 AM, Alex Dong a...@trunk.ly wrote: Hi there, I need some advice on how to implement this using solr: We have two tables: urls and bookmarks. - Each url has four fields: {guid, title, text, url} - One url will have one or more bookmarks associated with it. Each bookmark has these: {link.guid, user, tags, comment} I'd like to return matched urls based on not only the title, text from the url schema, but also some kind of aggregated popularity score based on all bookmarks for the same url. The popularity score should base on number/frequency of bookmarks that match the query. [...] It is best not to think of Solr as a RDBMS, and not to try to graft RDBMS practices on to it. Instead, you should flatten your data, e.g., in the above, you could have: * Four single-valued fields: guid, title, text, url * Four multi-valued fields: bookmark_guid, bookmark_user, bookmark_tags, bookmark_comment Your index would contain one record per guid of the URL, and you would need to populate the multi-valued bookmark fields from all bookmark instances associated with that URL. Then one could either copy the relevant search fields to a full-text search field, and search only on that, or, e.g., search on bookmark_tags and bookmark_comment in addition to searching on title, and text. Regards, Gora
Problem using solr 4.0 in java environment
Hi, i am using fcaet.pivoy feature of solr4.0 it works well and shows result on browser. But when I used solr 4.0 in java i got following error Exception in thread main java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V at org.apache.commons.logging.impl.SLF4JLocationAwareLog.trace(SLF4JLocationAwareLog.java:107) at org.apache.commons.httpclient.methods.PostMethod.clearRequestBody(PostMethod.java:152) at org.apache.commons.httpclient.methods.EntityEnclosingMethod.setRequestEntity(EntityEnclosingMethod.java:547) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:369) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) I have very little knowledge of solr and java . Please help me out Thanks! Isha
Re: Content-Type of XMLResponseWriter / QueryResponseWriter
Hi Walter, many thanks! Bernd Am 03.03.2011 17:01, schrieb Walter Underwood: Never use text/xml, that overrides any encoding declaration inside the XML file. http://ln.hixie.ch/?start=1037398795count=1 http://www.grauw.nl/blog/entry/489 wunder == Lead Engineer, MarkLogic On Mar 3, 2011, at 7:30 AM, Bernd Fehling wrote: Dear list, is there any deeper logic behind the fact that XMLResponseWriter is sending CONTENT_TYPE_XML_UTF8=application/xml; charset=UTF-8 ? I would assume (and also most browser) that for XML Output to receive text/xml and not application/xml. Or do you want the browser to call and XML-Editor with the result? Best regards, Bernd
Re: Out of memory while creating indexes
post.jar is intended for demo purposes, not production use, so it doesn;t surprise me you've managed to break it. Have you tried using curl to do the post? Upayavira On Thu, 03 Mar 2011 17:02 -0500, Solr User solr...@gmail.com wrote: Hi All, I am trying to create indexes out of a 400MB XML file using the following command and I am running into out of memory exception. $JAVA_HOME/bin/java -Xms768m -Xmx1024m -*Durl*=http://$SOLR_HOST SOLR_PORT/solr/customercarecore/update -jar $SOLRBASEDIR/*dataconvertor*/common/lib/post.jar $SOLRBASEDIR/dataconvertor/customercare/xml/CustomerData.xml I am planning to bump up the memory and try again. Did any one ran into similar issue? Any inputs would be very helpful to resolve the out of memory exception. I was able to create indexes with small file but not with large file. I am not using Solr J. Thanks, Solr User --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source