Re: SOLR-236 Patch
Hi Sam, It seems that the patch is out of sync again with the trunk. Can you try patching with revision 955615? I'll update the patch shortly. Martijn On 24 June 2010 09:49, Amdebirhan, Samson, VF-Group samson.amdebir...@vodafone.com wrote: Hi Trying to apply the SOLR-236 patch to a trunk i get what follows. Can anyone help me understanding what I am missing ? . svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk patch -p0 -i SOLR-236-trunk.patch --dry-run patching file solr/src/test/org/apache/solr/search/fieldcollapse/MyDocTermsIndex.java patching file solr/src/java/org/apache/solr/handler/component/CollapseComponent.java patching file solr/src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml patching file solr/src/java/org/apache/solr/search/fieldcollapse/collector/FieldValueC ountCollapseCollectorFactory.java patching file solr/src/java/org/apache/solr/search/fieldcollapse/collector/DocumentGro upCountCollapseCollectorFactory.java can't find file to patch at input line 1068 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |Index: solr/src/java/org/apache/solr/search/DocSetHitCollector.java |=== |--- solr/src/java/org/apache/solr/search/DocSetHitCollector.java (revision 922957) |+++ solr/src/java/org/apache/solr/search/DocSetHitCollector.java (revision ) . Regards Sam
XML DataImportHandler copy + rezise pictures in localhost?
Hi, I'm adding documents to Solr via XML files and DataImportHandler. In the XML file i've got some product picture links: picture picture_urlhttp://www.example.com/pic.jpg/picture_url /picture I would like to keep a local thumb of these picture in local server in order to avoid long external loading time. Example: Original picture: http://www.example.com/pic.jpg is 800x600px == conversion Local picture: http://localhost/pic.jpg in 100x100px Is there a way to do this? Thanks for your help. Marc
Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization
Hi Mitch, thanks for the answer and the link. The use case is to provide content based recommendations for a single item no matter where that came from. So, this input (match) item is the best match, all more like this items compare to it, and the ones that are the most alike would have the highest scores. (Meaning also that the most similar are probably not as good as recommendations because they are too similar. But that is a different story.) Again, I don't want to compare the scores of regular search results (e.g. from dismax) with those of mlt. I only want a way to show to the user a kind of relevancy or similarity indicator (for example using a range of 10 stars) that would give a hint on how similar the mlt hit is to the input (match) item. Greetings from Munich ;-) Chantal On Thu, 2010-06-24 at 17:06 +0200, MitchK wrote: Chantal, have a look at http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/similar/MoreLikeThis.html More like this to have a guess what the MLT's score concerns. The problem is that you can't compare scores. The query for the normal result-response was maybe something like Bill Gates featuring Linus Torvald - The perfect OS song. The user picks now one of the responsed documents and says he wants More like this - maybe, because the concerned topic was okay, but the content was not enough or whatever... But the sent query is totaly different (as you can see in the link) - so that would be like comparing apples and oranges, since they do not use the same base. What would be the use case? Why is score-normalization needed? Kind regards from Germany, - Mitch
Re: performance sorting multivalued field
*There are lot's of docs with the same value, I mention that because I supose that same value has nothing to do with the number of un-inverted term instances. It has to do, I've been able to reproduce teh error by setting different values to each field: HTTP Status 500 - there are more terms than documents in field date, but it's impossible to sort on tokenized fields java.lang.RuntimeException: there are more terms than documents in field id, but it's impossible to sort on tokenized fields at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:706)... But, it's already fixed for Lucene 2.9.4, 3.0.3, 3.1, 4.0 versions: https://issues.apache.org/jira/browse/LUCENE-2142 -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p921752.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH - $deleteDocById
I seem to have a hard time to get $deleteDocById to work with the XPathEntityProcessor. Anyone tested it and got it to work? Here's a snippet of the code: -- .. field column=id xpath=/io/article/@id/ field column=source xpath=/io/article/secti...@homesection='yes']/@source/ .. field column=unique_id template=${document.source}_${document.id}/ field column=$deleteDocById regex=^^(published)$ repaceWith=${document.unique_id} sourceColName=state/ .. Whenever I try to run a delta-import with a document that should delete from the index it only updates the document in the index. The last line in the code above is based upon a tip I found on the net, unsure if it's correct. Any help would be appreciated. Regards, Ingar
Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization
Hi Chantal, Munich? Germany seems to be soo small :-). Chantal Ackermann wrote: I only want a way to show to the user a kind of relevancy or similarity indicator (for example using a range of 10 stars) that would give a hint on how similar the mlt hit is to the input (match) item. Okay, that's making more sense. Unfortunately, you can not do that with Lucene with results that might fit your needs (as far as I know). Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/MoreLikeThis-mlt-use-the-match-s-maxScore-for-result-score-normalization-tp919598p921942.html Sent from the Solr - User mailing list archive at Nabble.com.
[ANN] Solr 1.4.1 Released
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Apache Solr 1.4.1 has been released and is now available for public download! http://www.apache.org/dyn/closer.cgi/lucene/solr/ Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required. Solr 1.4.1 is a bug fix release for Solr 1.4 that includes many Solr bug fixes as well as Lucene bug fixes from Lucene 2.9.3. See all of the CHANGES here: http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.1/CHANGES.txt - - Mark Miller on behalf of the Solr team -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.14 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJMJK3AAAoJED+/0YJ4eWrIrfAP/RLD7QvreOBFebICN/eiRzCH 1dHOt9Scn7qGQU4RvXZ8GQq37AuoRMgmgckntttFLCCD5w5A29/GxzyZbAoQDQ0B OkaHsYIcUuhbLq8QtlTjt+rK3gc6oxMoCRMJBS7DfUFUyROl6om4gpYAVem50qDy FfBdgRxp4VZ07E7VwmMvma03nSrKuvX0bwE8NXksaCAVsvkmi8Sh7aLMPPVHgsuD pbY8kB0hXCULJgs9ZAc2t6+T38+eV9wxJSeAktVlGAvNlYTavW2bxzF5wQk+kXCd DwGjdlU9/ebHdx3MHJyE0zXSl4rGFsy8zfh/ntk7UV7qklQ2jn5Ur18zLqv4vkb1 Ea78GpoqCZWlMGcRUSErtH33cGs4blo/kuJZj/VLrk6jxO4x4beUsAfRcM/YliJW Z6OuFtpcdVDjVl4aB2xbAMwDl2DXqgyNmlxs8vvqdRoDhN8wZ91raO0kkbrkzj1f 5gPD//Efx6RcrYtXAV3HKAwI7FLP8MhzFu1Y2FK2FY7DyFNmirad03+pB6bFs1xq ARU6pdeTYvv+PsWH3Keaw/L/nb0BYbU8R1sVhkvjm+S9gJ6cCcKJkeAkNgL+6QNm JPJ5VeXVFGVmwzQ5mE3j6qX1uDrEmLA2T5Dd7bssWtwveLoyfo0s7qezIfbRamnc T3iyCE6cuSU9CvCEqN+o =nBB9 -END PGP SIGNATURE-
Re: Recommended MySQL JDBC driver
On 18.05.2010, at 17:22, Shawn Heisey wrote: On 5/14/2010 12:40 PM, Shawn Heisey wrote: I downgraded to 5.0.8 for testing. Initially, I thought it was going to be faster, but it slows down as it gets further into the index. It now looks like it's probably going to take the same amount of time. On the server timeout thing - that's a setting you'd have to put in my.ini or my.cfg, there may also be a way to change it on the fly without restarting the server. I suspect that when you are running a multiple query setup like yours, it opens multiple connections, and when one of them is busy doing some work, the others are idle. That may be related to the timeout with the older connector version. On my setup, I only have one query that retrieves records, so I'm probably not going to run into that. I could be wrong about how it works - you can confirm or refute this idea by looking at SHOW PROCESSLIST on your MySQL server while it's working. I was having no trouble with the 5.0.8 connector on 1.5-dev build 922440M, but then I upgraded the test machine to the latest 4.0 from trunk, and ran into the timeout issue you described, so I am going back to the 5.1.12 connector. I just saw the message on the list about branch_3x in SVN, which looks like a better option than trunk. Any news on this topic? regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: [ANN] Solr 1.4.1 Released
Congrats on the release! Something seems to be wrong with solr 1.4.1 maven artifacts, there is in extra solr in the path. E.g. solr-parent-1.4.1.pom at in http://repo1.maven.org/maven2/org/apache/solr/solr/solr-parent/1.4.1/solr-parent-1.4.1.pomwhile it should be at http://repo1.maven.org/maven2/org/apache/solr/solr-parent/1.4.1/solr-parent-1.4.1.pom. Pom's seem to contain correct maven artifact coordinates. Regards, Stevo. On Fri, Jun 25, 2010 at 3:23 PM, Mark Miller markrmil...@apache.org wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Apache Solr 1.4.1 has been released and is now available for public download! http://www.apache.org/dyn/closer.cgi/lucene/solr/ Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required. Solr 1.4.1 is a bug fix release for Solr 1.4 that includes many Solr bug fixes as well as Lucene bug fixes from Lucene 2.9.3. See all of the CHANGES here: http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.1/CHANGES.txt - - Mark Miller on behalf of the Solr team -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.14 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJMJK3AAAoJED+/0YJ4eWrIrfAP/RLD7QvreOBFebICN/eiRzCH 1dHOt9Scn7qGQU4RvXZ8GQq37AuoRMgmgckntttFLCCD5w5A29/GxzyZbAoQDQ0B OkaHsYIcUuhbLq8QtlTjt+rK3gc6oxMoCRMJBS7DfUFUyROl6om4gpYAVem50qDy FfBdgRxp4VZ07E7VwmMvma03nSrKuvX0bwE8NXksaCAVsvkmi8Sh7aLMPPVHgsuD pbY8kB0hXCULJgs9ZAc2t6+T38+eV9wxJSeAktVlGAvNlYTavW2bxzF5wQk+kXCd DwGjdlU9/ebHdx3MHJyE0zXSl4rGFsy8zfh/ntk7UV7qklQ2jn5Ur18zLqv4vkb1 Ea78GpoqCZWlMGcRUSErtH33cGs4blo/kuJZj/VLrk6jxO4x4beUsAfRcM/YliJW Z6OuFtpcdVDjVl4aB2xbAMwDl2DXqgyNmlxs8vvqdRoDhN8wZ91raO0kkbrkzj1f 5gPD//Efx6RcrYtXAV3HKAwI7FLP8MhzFu1Y2FK2FY7DyFNmirad03+pB6bFs1xq ARU6pdeTYvv+PsWH3Keaw/L/nb0BYbU8R1sVhkvjm+S9gJ6cCcKJkeAkNgL+6QNm JPJ5VeXVFGVmwzQ5mE3j6qX1uDrEmLA2T5Dd7bssWtwveLoyfo0s7qezIfbRamnc T3iyCE6cuSU9CvCEqN+o =nBB9 -END PGP SIGNATURE-
Re: [ANN] Solr 1.4.1 Released
Can a solr/maven dude look at this? I simply used the copy command on the release to-do wiki (sounds like it should be updated). If no one steps up, I'll try and straighten it out later. On 6/25/10 10:28 AM, Stevo Slavić wrote: Congrats on the release! Something seems to be wrong with solr 1.4.1 maven artifacts, there is in extra solr in the path. E.g. solr-parent-1.4.1.pom at in http://repo1.maven.org/maven2/org/apache/solr/solr/solr-parent/1.4.1/solr-parent-1.4.1.pomwhile it should be at http://repo1.maven.org/maven2/org/apache/solr/solr-parent/1.4.1/solr-parent-1.4.1.pom. Pom's seem to contain correct maven artifact coordinates. Regards, Stevo. On Fri, Jun 25, 2010 at 3:23 PM, Mark Miller markrmil...@apache.org wrote: Apache Solr 1.4.1 has been released and is now available for public download! http://www.apache.org/dyn/closer.cgi/lucene/solr/ Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required. Solr 1.4.1 is a bug fix release for Solr 1.4 that includes many Solr bug fixes as well as Lucene bug fixes from Lucene 2.9.3. See all of the CHANGES here: http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.1/CHANGES.txt - Mark Miller on behalf of the Solr team
Re: SweetSpotSimilarity
Would someone mind explaining how this differs from the DefaultSimilarity? The difference is length normalization. Default one punishes long documents. Sweet one computes to a constant norm for all lengths in the [min,max] range (the sweet spot), and smaller norm values for lengths out of this range. Documents shorter or longer than the sweet spot range are punished Section 4.1 http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf Also how would one replace the use of the DefaultSimilarity class with this one? I can't seem to find any such configuration in solrconfig.xml. it is in schema.xml: similarity class=org.apache.lucene.search.SweetSpotSimilarity/
Debugging Queries
I have a query that is not returning the results I expect - as in there are missing results. Is there a way given an ID to the index field to dive into how the entity is stored in the index? Thanks.
Re: SweetSpotSimilarity
iorixxx wrote: it is in schema.xml: similarity class=org.apache.lucene.search.SweetSpotSimilarity/ Thanks. Im guessing this is all or nothing.. ie you can't you one similarity class for one request handler and another for a separate request handler. Is that correct? -- View this message in context: http://lucene.472066.n3.nabble.com/SweetSpotSimilarity-tp922546p922622.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SweetSpotSimilarity
Thanks. Im guessing this is all or nothing.. ie you can't you one similarity class for one request handler and another for a separate request handler. Is that correct? correct, also re-index is required. length norms are calculated and stored at index time.
RE: solr indexing takes a long time and is not reponsive to abort command
Thanks for the response. I double-checked that we don't have the core open multiple times. The complete index size is about 200M (around 1,060,000 documents). During the indexing process, 26 files were created. Core admin interface indicated that no query or process were running after roughly 5 hours but the Time Elapsed was still going. We have the indexDefults setting as followed: useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor Do you thinking lower mergeFactor to 5 and set useCompoundfile to true would help? I'll try it out on Monday. Thanks again! -Original Message- From: Don Werve [mailto:d...@madwombat.com] Sent: Thursday, June 24, 2010 9:09 PM To: solr-user@lucene.apache.org Subject: Re: solr indexing takes a long time and is not reponsive to abort command 2010/6/25 Ya-Wen Hsu y...@eline.com This situation doesn't happen consistently. When we only ran the problematic core, the indexing took significant longer than usual(4hrs - 11 hrs). It ran successful in the end. When we ran indexing for all cores at the same time, the problematic core never finished indexing such that we have to kill the process. This happened twice already. I'm running it parallel again to see if the problem still persists. Off the top of my head: Have you accidentally opened this core multiple times within the same JVM? I had the same thing happen to me when I was testing out a Solr interface I had written under JRuby; that was loads of fun to track down... How physically large is the core ('du -sh' if you're on Unix), and how many files does the index contain? I've run into issues where frequent updates created a lot of index files, and which slowed down all core access. If you've got a lot of index files, has the problem core been optimized?
Re: Debugging Queries
Frank: http://www.getopt.org/luke/ Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Frank A fsa...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 25, 2010 1:23:37 PM Subject: Debugging Queries I have a query that is not returning the results I expect - as in there are missing results. Is there a way given an ID to the index field to dive into how the entity is stored in the index? Thanks.
Re: XML DataImportHandler copy + rezise pictures in localhost?
Marc, Why not use http://www.imagemagick.org/script/index.php to generate thumbnails separately from document indexing? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: scr...@asia.com scr...@asia.com To: solr-user@lucene.apache.org Sent: Fri, June 25, 2010 4:12:02 AM Subject: XML DataImportHandler copy + rezise pictures in localhost? Hi, I'm adding documents to Solr via XML files and DataImportHandler. In the XML file i've got some product picture links: picture picture_url http://www.example.com/pic.jpg/picture_url /picture I would like to keep a local thumb of these picture in local server in order to avoid long external loading time. Example: Original picture: http://www.example.com/pic.jpg is 800x600px == conversion Local picture: target=_blank http://localhost/pic.jpg in 100x100px Is there a way to do this? Thanks for your help. Marc
Re: dataimport.properties is not updated on delta-import
Please note that Oracle ( or Oracle jdbc driver ) converts column names to upper case eventhough you state them in lower case. If this is the case then try to rewrite your query in the following form select id as id, name as name from table On Thursday, June 24, 2010, warb w...@mail.com wrote: Hello again! Upon further investigation it seems that something is amiss with delta-import after all, the delta-import does not actually import anything (I thought it did when I ran it previously but I am not sure that was the case any longer.) It does complete successfully as seen from the front-end (dataimport?command=delta-import). Also in the logs it is stated the the import was successful (INFO: Delta Import completed successfully), but there are exception pertaining to some documents. The exception message is that the id field is missing (org.apache.solr.common.SolrException: Document [null] missing required field: id). Now, I have checked the column names in the table, the data-config.xml file and the schema.xml file and they all have the column/field names written in lowercase and are even named exactly the same. Do Solr rollback delta-imports if one or more of the documents failed? -- View this message in context: http://lucene.472066.n3.nabble.com/dataimport-properties-is-not-updated-on-delta-import-tp916753p919609.html Sent from the Solr - User mailing list archive at Nabble.com.
Setting many properties for a multivalued field. Schema.xml ? External file?
Hi, I'm trying to index data containing a multivalued field picture, that has three properties: url, caption and description: picture/ url/ caption/ description/ Thus, each indexed document might have many pictures, each of them has a url, a caption, and a description. I wonder wether it's possible to store this data using only schema.xml. I couldn't figure it out so far. Instead, I'm thinking of using an external file to sore the properties of each picture, but I haven't tried yet this solution, waiting for your suggestions... Thanks, -Saïd
indexing xml document with literals
Does anyone know how to read in data from one or more of the example xml docs and ALSO store the filename and path from which it came? ie: exampledocs/vidcard.xml contains: add doc field name=idEN7800GTX/2DHTV/256M/field field name=nameASUS Extreme N7800GTX/2DHTV (256 MB)/field /doc doc field name=id100-435805/field field name=nameATI Radeon X1900 XTX 512 MB PCIE Video Card/field /doc /add Two questions: once the data gets indexed by solr, is there anything we can use to know that this data came from that file? ie, what was the name and location of the file that holds the data. I need access to the path and filename of the xml file containing the entries when searching. and is there anyway to append information to xml data being indexed through the query parameters like there is with the ExtractingRequestHandler. like literal.id=x;literal.filename=vidcard..xml or does all this information have to be in the particular doc in question. thanks so much for any help on this.
Re: Setting many properties for a multivalued field. Schema.xml ? External file?
Saïd, Dynamic fields could help here, for example imagine a doc with: id pic_url_* pic_caption_* pic_description_* See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields So, for you: dynamicField name=pic_url_* type=string indexed=true stored=true/ dynamicField name=pic_caption_* type=text indexed=true stored=true/ dynamicField name=pic_description_* type=text indexed=true stored=true/ Then you can add docs with unlimited number of pic_(url|caption|description)_* fields, e.g. id pic_url_1 pic_caption_1 pic_description_1 id pic_url_2 pic_caption_2 pic_description_2 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Saïd Radhouani r.steve@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 25, 2010 6:01:13 PM Subject: Setting many properties for a multivalued field. Schema.xml ? External file? Hi, I'm trying to index data containing a multivalued field picture, that has three properties: url, caption and description: picture/ url/ caption/ description/ Thus, each indexed document might have many pictures, each of them has a url, a caption, and a description. I wonder wether it's possible to store this data using only schema.xml. I couldn't figure it out so far. Instead, I'm thinking of using an external file to sore the properties of each picture, but I haven't tried yet this solution, waiting for your suggestions... Thanks, -Saïd