Re: Solr, How to index scripts *.sh and *.SQL
Same in Windows. just plain text files, no metadata, no headers. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-05-11 4:32 GMT-03:00 Gora Mohanty g...@mimirtech.com: On 8 May 2014 12:25, Visser, Marc marc.viss...@ordina.nl wrote: HI All, Recently I have set up an image with SOLR. My goal is to index and extract files on a Windows and Linux server. It is possible for me to index and extract data from multiple file types. This is done by the SOLR CELL request handler. See the post.jar cmd below. j ava -Dauto -Drecursive -jar post.jar Y:\ SimplePostTool version 1.5 Posting files to base url localhost:8983/solr/update.. Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pp tx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log Entering recursive mode, max depth=999, delay=0s 0 files indexed. Is it possible to index and extract metadata/content from file types like .sh and .sql? If it is possible I would like to know how of course :) Don't know about Windows, but on Linux these are just text files. What metadata are you referring to? Normally, a Linux text file only has content, unless you are talking about metadata such as obtained from: file cmd.sh Regards, Gora
Re: Required fields
false alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-03-21 17:17 GMT-03:00 Walter Underwood wun...@wunderwood.org: What is the default value for the required attribute of a field element in a schema? I've just looked everywhere I can think of in the wiki, the reference manual, and the JavaDoc. Most of the documentation doesn't even mention that attribute. Once we answer this, it should be added to the documented attributes for field. wunder -- Walter Underwood wun...@wunderwood.org
Re: Indexing large documents
Even the most non-structured data has to have some breakpoint. I've seen projects running solr that used to index whole books one document per chapter plus a synopsis boosted doc. The question here is how you need to search and match those docs. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-03-18 23:52 GMT-03:00 Stephen Kottmann stephen_kottm...@h3biomedicine.com: Hi Solr Users, I'm looking for advice on best practices when indexing large documents (100's of MB or even 1 to 2 GB text files). I've been hunting around on google and the mailing list, and have found some suggestions of splitting the logical document up into multiple solr documents. However, I haven't been able to find anything that seems like conclusive advice. Some background... We've been using solr with great success for some time on a project that is mostly indexing very structured data - ie. mainly based on ingesting through DIH. I've now started a new project and we're trying to make use of solr again - however, in this project we are indexing mostly unstructured data - pdfs, powerpoint, word, etc. I've not done much configuration - my solr instance is very close to the example provided in the distribution aside from some minor schema changes. Our index is relatively small at this point ( ~3k documents ), and for initial indexing I am pulling documents from a http data source, running them through Tika, and then pushing to solr using solrj. For the most part this is working great... until I hit one of these huge text files and then OOM on indexing. I've got a modest JVM - 4GB allocated. Obviously I can throw more memory at it, but it seems like maybe there's a more robust solution that would scale better. Is splitting the logical document into multiple solr documents best practice here? If so, what are the considerations or pitfalls of doing this that I should be paying attention to. I guess when querying I always need to use a group by field to prevent multiple hits for the same document. Are there issues with term frequency, etc that you need to work around? Really interested to hear how others are dealing with this. Thanks everyone! Stephen -- [This e-mail message may contain privileged, confidential and/or proprietary information of H3 Biomedicine. If you believe that it has been sent to you in error, please contact the sender immediately and delete the message including any attachments, without copying, using, or distributing any of the information contained therein. This e-mail message should not be interpreted to include a digital or electronic signature that can be used to authenticate an agreement, contract or other legal document, nor to reflect an intention to be bound to any legally-binding agreement or contract.]
Re: [ANNOUNCE] Heliosearch 0.04
Chrome on Windows reports the latest Heliosearch as probable malware and asks for a keep or discard. Norton says everything's ok with that file. Are you guys aware of this? alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-03-14 12:58 GMT-03:00 Yago Riveiro yago.rive...@gmail.com: It's possible switch between solr 4.6.1 and Heliosearch in a transparent way? On Fri, Mar 14, 2014 at 3:56 PM, Mike Murphy mmurphy3...@gmail.com wrote: This is fantastic! I tried swapping in heliosearch for a customer that was having big garbage collection issues, and all of the big gc pauses simply disappeared! Now the problem - heliosearch only has a pre-release out based on solr trunk. Are there near term plans for a more stable release that would be advisable for production use? --Mike On Mon, Mar 10, 2014 at 1:04 PM, Yonik Seeley yo...@heliosearch.com wrote: Changes from the previous release are primarily off-heap FieldCache support for strings as well as as all numerics (the previous release only had integer support). Benchmarks for string fields here: http://heliosearch.org/hs-solr-off-heap-fieldcache-performance Try it out here: https://github.com/Heliosearch/heliosearch/releases/ -Yonik http://heliosearch.org - native off-heap filters and fieldcache for solr -- /Yago Riveiro
Re: ExtendedDismax and NOT operator
Just to clarify: the actual url is properly space-escaped? http://localhost:8983/solr/distrib/select?q=term1%20NOT%20 term2start=0rows=0qt=edismax_basicdebugQuery=true alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-02-07 12:40 GMT-02:00 Geert Van Huychem ge...@iframeworx.be: Hi This is my config: requestHandler name=edismax_basic class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qfbody/str str name=pftitle^30 introduction^15 body^10/str str name=ps0/str /lst /requestHandler Executing the following link: http://localhost:8983/solr/distrib/select?q=term1 NOT term2start=0rows=0qt=edismax_basicdebugQuery=true gives me as debuginfo: str name=parsedquery (+(DisjunctionMaxQuery((body:term1)) -DisjunctionMaxQuery((body:term2))) DisjunctionMaxQuery((title:term1 term2^30.0)) DisjunctionMaxQuery((introduction:term1 term2^15.0)) DisjunctionMaxQuery((body:term1 term2^10.0)))/no_coord /str My question is: why is term2 included in the phrase query part? Best Geert Van Huychem
Re: Import data from mysql to sold
1) Yes, its the JDBC connection URL/URI. You can use a JNDI preconfigured datasource instead. It's all here http://wiki.apache.org/solr/DataImportHandler 2) It's a mapping: column is the database column and name is your solr destination field. You only need to specify name when both differ. DIH looks like a 7-headed dragon first time you see it, but by the end of the day you'll love it. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-02-04 rachun rachun.c...@gmail.com: please see below code.. dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/mydb01 user=root password=/ document entity name=users query=select id,firstname,username from users field column=id name=user_id / field column=firstname name=user_firstname / /entity /document /dataConfig my question is.. 1. what is the url for? (url=jdbc:mysql://localhost:3306/mydb01 ) does it means my database url? 2. did i do it right with this field column=id name=user_id / i'm not sure name means the field in Solr? Thank you very much, Chun. -- View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-mysql-to-sold-tp4114982p4115191.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Import data from mysql to sold
I've been using DIH to import large Databases to XML file batches and It's blazing fast. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-02-03 rachun rachun.c...@gmail.com: Dear all gurus, I would like to import my data (mysql) about 4 Million rows into solar 4.6. What is the best way to do it? Please suggest me. Million thanks, Chun. -- View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-mysql-to-sold-tp4114982.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Apache Solr.
That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to use Tikka to import binary/specific file types. http://tika.apache.org/1.4/formats.html alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-02-03 Siegfried Goeschl sgoes...@gmx.at: Hi Vignesh, a few keywords for further investigations * Solr Data Import Handler * Apache Tikka * Apache PDFBox Cheers, Siegfried Goeschl On 03.02.14 09:15, vignesh wrote: Hi Team, I am Vignesh, am using Apache Solr 3.6 and able to Index XML file and now trying to Index PDF file and not able to index .Can you give me the steps to carry out PDF indexing it will be very useful. Kindly guide me through this process. Thanks Regards. Vignesh.V cid:image001.jpg@01CA4872.39B33D40 Ninestars Information Technologies Limited., 72, Greams Road, Thousand Lights, Chennai - 600 006. India. Landline : +91 44 2829 4226 / 36 / 56 X: 144 blocked::http://www.ninestars.in/ www.ninestars.in -- 30 Million Advertisements displayed. Is yours there? http://www.safentrixads.com/adlink?cid=13 --
Re: Disabling Commit/Auto-Commit (SolrCloud)
Why don't you set both solrconfig commits to very high values and issue a commit command in sparsed, small updates? I've been doing this for ages and works perfecly for me. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-01-31 Software Dev static.void@gmail.com: Is there a way to disable commit/hard-commit at runtime? For example, we usually have our hard commit and soft-commit set really low but when we do bulk indexing we would like to disable this to increase performance. If there isn't a an easy way of doing this would simply pushing a new solrconfig to solrcloud work?
Re: Disabling Commit/Auto-Commit (SolrCloud)
I didn't mean to disable, just to put some high value there. I have a script that updates my solr in batches of thousands so I set my commit to 100,000 because when it runs it updates 100,000 records in short time. The other script updates in batches of hundreds and its not so fast, so its internal loops issue a commit after X loops and/or when it finishes processing. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-01-31 Mark Miller markrmil...@gmail.com: It's not a good idea to disable hard commit because the transaction can grow without limit in RAM. Also, try some performance tests. I've never seen it matter if it's set to like a minute, both for bulk and NRT. As far as soft commit, you could turn it off and control visibility when adding docs via commitWithin. - Mark http://about.me/markrmiller On Jan 31, 2014, at 12:45 PM, Software Dev static.void@gmail.com wrote: Is there a way to disable commit/hard-commit at runtime? For example, we usually have our hard commit and soft-commit set really low but when we do bulk indexing we would like to disable this to increase performance. If there isn't a an easy way of doing this would simply pushing a new solrconfig to solrcloud work?
Re: Regarding Solr Faceting on the query response.
I believe its not possible to facet only the page you are, facet is supposed to work only with the full resultset. I never tried but i've never seen a way this could be done. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-01-30 Mikhail Khludnev mkhlud...@griddynamics.com: Hello Do you mean setting http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount to 1 or you want to facet only returned page (rows) instead of full resultset (numFound) ? On Thu, Jan 30, 2014 at 6:24 AM, Nilesh Kuchekar kuchekar.nil...@gmail.comwrote: Yeah it's a typo... I meant company:Apple Thanks Nilesh On Jan 29, 2014, at 8:59 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: On Thu, Jan 30, 2014 at 3:43 AM, Kuchekar kuchekar.nil...@gmail.com wrote: company=Apple Did you mean company:Apple ? Otherwise, that could be the issue. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Solr Nutch
1) Plus, those files are binaries sometimes with metadata, specific crawlers need to understand them. html is a plain text 2) Yes, different data schemes. Sometimes I replicate the same core and make some A-B tests with different weights, filters etc etc and some people like to creare CoreA and CoreB with the same schema and hammer CoreA with updates and commits and optmizes, they make it available for searches while hammering CoreB. Then swap again. This produces faster searches. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-01-28 Jack Krupansky j...@basetechnology.com 1. Nutch follows the links within HTML web pages to crawl the full graph of a web of pages. 2. Think of a core as an SQL table - each table/core has a different type of data. 3. SolrCloud is all about scaling and availability - multiple shards for larger collections and multiple replicas for both scaling of query response and availability if nodes go down. -- Jack Krupansky -Original Message- From: rashmi maheshwari Sent: Tuesday, January 28, 2014 11:36 AM To: solr-user@lucene.apache.org Subject: Solr Nutch Hi, Question1 -- When Solr could parse html, documents like doc, excel pdf etc, why do we need nutch to parse html files? what is different? Questions 2: When do we use multiple core in solar? any practical business case when we need multiple cores? Question 3: When do we go for cloud? What is meaning of implementing solr cloud? -- Rashmi Be the change that you want to see in this world! www.minnal.zor.org disha.resolve.at www.artofliving.org
Re: Synonyms and spellings
2) There are some synonym lists on the web, they aren't always complete but I keep analyzing fields and tokens in order to polish my synonyms. And I like to use tools like http://www.visualthesaurus.com/ to aid me. Hope this helps :-) alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-01-28 rashmi maheshwari maheshwari.ras...@gmail.com Hi, Questions 1) Why do we use Spellings file under solr core conf folder? What spellings do we enter in this? Question 2) : Implementing all synonyms is a tough thing. From where could i get list of as many synonyms as we could see in google search? -- Rashmi Be the change that you want to see in this world! www.minnal.zor.org disha.resolve.at www.artofliving.org
Re: Solr Nutch
Well, not even Google parse those. I'm not sure about Nutch but in some crawlers (jSoup i believe) there's an option to try to get full URLs from plain text, so you can capture some urls in the form of someClickFunction(' http://www.someurl.com/whatever') or even if they are in the middle of some paragraph. Sometimes it works beautifully, sometimes it misleads you to parse urls shortened with ellipsis in the middle. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-01-28 rashmi maheshwari maheshwari.ras...@gmail.com Thanks All for quick response. Today I crawled a webpage using nutch. This page have many links. But all anchor tags have href=# and javascript is written on onClick event of each anchor tag to open a new page. So crawler didnt crawl any of those links which were opening using onClick event and has # href value. How these links are crawled using nutch? On Tue, Jan 28, 2014 at 10:54 PM, Alexei Martchenko ale...@martchenko.com.br wrote: 1) Plus, those files are binaries sometimes with metadata, specific crawlers need to understand them. html is a plain text 2) Yes, different data schemes. Sometimes I replicate the same core and make some A-B tests with different weights, filters etc etc and some people like to creare CoreA and CoreB with the same schema and hammer CoreA with updates and commits and optmizes, they make it available for searches while hammering CoreB. Then swap again. This produces faster searches. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-01-28 Jack Krupansky j...@basetechnology.com 1. Nutch follows the links within HTML web pages to crawl the full graph of a web of pages. 2. Think of a core as an SQL table - each table/core has a different type of data. 3. SolrCloud is all about scaling and availability - multiple shards for larger collections and multiple replicas for both scaling of query response and availability if nodes go down. -- Jack Krupansky -Original Message- From: rashmi maheshwari Sent: Tuesday, January 28, 2014 11:36 AM To: solr-user@lucene.apache.org Subject: Solr Nutch Hi, Question1 -- When Solr could parse html, documents like doc, excel pdf etc, why do we need nutch to parse html files? what is different? Questions 2: When do we use multiple core in solar? any practical business case when we need multiple cores? Question 3: When do we go for cloud? What is meaning of implementing solr cloud? -- Rashmi Be the change that you want to see in this world! www.minnal.zor.org disha.resolve.at www.artofliving.org -- Rashmi Be the change that you want to see in this world! www.minnal.zor.org disha.resolve.at www.artofliving.org
Re: boost a document which has a field not empty
Can u assign a doc boost at index time? 2011/9/21 Zoltan Altfatter altfatt...@gmail.com Hi, I have one entity called organisation. I am indexing their name to be able to search afterwards on their name. I store also the website of the organisation. Some organisations have a website some don't. Can I achieve that when searching for organisations even if I have a match on their name I will show first those which have a website. Thank you. Regards, Zoltan -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Schema fieldType y-m-d ?!?!
If you don't need date-specific functions and/or faceting, you can store it as a int, like 20110914 and parse it in your application but I don't recommend... as a rule of thumb, dates should be stored as dates, the millenium bug (Y2K bug) was all about 'saving some space' remember?
Re: how can we do the solr scheduling in windows o/s?
Under administrative Tools, select Task Scheduler. New task, action: Run program/script, then you can call a java command line like java -jar something.jar the sheduler itself is pretty good, but the tasks it can perform are too few.. but it can run java programs via command line. 2011/9/2 vighnesh svighnesh...@gmail.com hi all anyone can specify the procedure for solr scheduling in windoes o/s? http://wiki.apache.org/solr/DataImportHandler#HTTPPostScheduler i know this link but i need cron job like procedure in windows . Regards, Ganesh. -- View this message in context: http://lucene.472066.n3.nabble.com/how-can-we-do-the-solr-scheduling-in-windows-o-s-tp3303679p3303679.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Image results in Solr Search
Never done that before but as far as I know, Tika does that job. http://tika.apache.org/0.9/formats.html#Image_formats 2011/9/2 Jagdish Kumar jagdish.thapar...@hotmail.com Hi I am trying indexing and searching various type of files in Solr3.3.0, I am able to index image files but it fail to show these files in result of any search operation. I am not aware of how Solr works for searching images, I mean it is content based or meta data based .. I am not sure. If any of you have done Image Searches with Solr , I request you to please help me out with this. Thanks Jagdish -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Solr and Encoding Issue?
What does the Analysis say? Put all words in both field value index and query and compare them plz. Have you tried to encode it manually in the url just in case? 2011/9/2 deniz denizdurmu...@gmail.com I am trying to implement multi accented search on solr... basically i am using asciifolderfilter to provide this feature... but i have a problem... http://localhost:8983/solr/select/?q=*francois*version=2.2start=0rows=10indent=on http://localhost:8983/solr/select/?q=*francois**version=2.2start=0rows=10indent=on http://localhost:8983/solr/select/?q=*françois*version=2.2start=0rows=10indent=on these three above working well and returning correct results, however http://localhost:8983/solr/select/?q=*françois**version=2.2start=0rows=10indent=on the link above returns 0 matching documents... anybody has any ideas on this? could it be because of encoding issue? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-Encoding-Issue-tp3303627p3303627.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Question on functions
We put here requestHandler name=whatever class=solr.StandardRequestHandler default=true lst name=defaults str name=defTypedismax/str str name=qf.../str str name=pf.../str str name=bfrecip(ms(NOW,sear_dataupdate),3.16e-11,1,1)/str ... 2011/9/1 Craig Stadler cstadle...@hotmail.com Regarding : http://wiki.apache.org/solr/**FunctionQuery#Date_Boostinghttp://wiki.apache.org/solr/FunctionQuery#Date_Boosting Specifcally : recip(ms(NOW/HOUR,mydatefield)**,3.16e-11,1,1). I am using dismax, and I am very unsure on where to put this or call the function... for example in the fq= param??, in the q= param? Sample query : http://localhost:8983/solr/**dismax/?q=http://localhost:8983/solr/dismax/?q=george clooneymm=48%25debugQuery=**offindent=onstart=rows=10 If I want to factor in score/date (called creationdate)... recip(ms(NOW/HOUR,**creationdate),3.16e-11,1,1). Help! and thanks so much for any examples or help.. -Craig -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Solr 3.3 dismax MM parameter not working properly
I'm printing a big bold cheatsheet about it and stickin' it everywhere :-) I wish I could change this thread's subject to alexei is not working properly :-/ 2011/8/30 Erick Erickson erickerick...@gmail.com Yep, that one takes a while to figure out, then I wind up re-figuring it out every time I have to change it G... Best Erick On Tue, Aug 30, 2011 at 6:36 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: Hmmm I believe I discovered the problem. When you have something like this: 250% 6-60% you should read it from right to left and use the word MORE. MORE THAN SIX clauses, 60% are optional, MORE THAN TWO clauses (and that includes 3, 4 and 5 AND 6) half is mandatory. if you wanna a special rule for 2 terms just add: 11 250% 6-60% MORE THAN ONE clauses (2) should match 1. NOW this makes sense! 2011/8/30 Alexei Martchenko ale...@superdownloads.com.br Anyone else strugglin' with dismax's MM parameter? We're having a problem here, seems that configs from 3 terms and more are being ignored by solr and it assumes previous configs. if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i get the same results for a 3-term query. If i try str name=mm4lt;25%/str or str name=mm4lt;100%/str I also get same data for a 4-term query. I'm searching: windows service pack str name=mm1lt;100% 2lt;50% 3lt;100%/str - 13000 results str name=mm1lt;100% 2lt;50% 3lt;1/str - the same 13000 results str name=mm1lt;100% 2lt;50%/str - very same 13000 results str name=mm1lt;100% 2lt;100%/str - 93 results. seems that here i get the 33 clause working. str name=mm2lt;100%/str - same 93 results, just in case. str name=mm2lt;50%/str - very same 13000 results as it should str name=mm2lt;-50%/str - 1121 results (weird) then i tried to control 3-term queries. str name=mm2lt;-50% 3lt;100%/str 1121, the same as 2-50%, ignoring the 3 clause. str name=mm2lt;-50% 3lt;1/str the same 1121 results, ignoring again it. I'd like to accomplish something like this: str name=mm2lt;1 3lt;2 4lt;3 8lt;-50%/str translating: 1 or 2 - 1 term, 3 at least 2, 4 at least 3 and 5, 6, 7, 8 terms at least half rounded up (5-3, 6-3, 7-4, 8-4) seems that he's only using 1 and 2 clauses. thanks in advance alexei -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Field grouping?
Yes, Ranged Facets http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range 2011/8/31 Denis Kuzmenok forward...@ukr.net Hi. Suppose i have a field price with different values, and i want to get ranges for this field depending on docs count, for example i want to get 5 ranges for 100 docs with 20 docs in each range, 6 ranges for 200 docs = 34 docs in each field, etc. Is it possible with solr? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Solr Faceting DIH
I had the same problem with a database here, and we discovered that every item had its own product page, its own url. So, we decided that our unique id had to be the url instead of using sql ids and id concatenations. sometimes it works. You can store all ids if u need them for something, but for uniqueids, urls go just fine. 2011/8/30 Erick Erickson erickerick...@gmail.com I'd really think carefully before disabling unique IDs. If you do, you'll have to manage the records yourself, so your next delta-import will add more records to your search result, even those that have been updated. You might do something like make the uniqueKey the concatenation of productid and attributeid or whatever makes sense. Best Erick On Mon, Aug 29, 2011 at 5:52 PM, Aaron Bains aaronba...@gmail.com wrote: Hello, I am trying to setup Solr Faceting on products by using the DataImportHandler to import data from my database. I have setup my data-config.xml with the proper queries and schema.xml with the fields. After the import/index is complete I can only search one productid record in Solr. For example of the three productid '10100039' records there are I am only able to search for one of those. Should I somehow disable unique ids? What is the best way of doing this? Below is the schema I am trying to index: +---+-+-++ | productid | attributeid | valueid | categoryid | +---+-+-++ | 10100039 | 331100 |1580 | 1 | | 10100039 | 331694 |1581 | 1 | | 10100039 |33113319 | 1537370 | 1 | | 10100040 | 331100 |1580 | 1 | | 10100040 | 331694 | 1540230 | 1 | | 10100040 |33113319 | 1537370 | 1 | +---+-+-++ Thanks! -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Solr 3.3 dismax MM parameter not working properly
Anyone else strugglin' with dismax's MM parameter? We're having a problem here, seems that configs from 3 terms and more are being ignored by solr and it assumes previous configs. if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i get the same results for a 3-term query. If i try str name=mm4lt;25%/str or str name=mm4lt;100%/str I also get same data for a 4-term query. I'm searching: windows service pack str name=mm1lt;100% 2lt;50% 3lt;100%/str - 13000 results str name=mm1lt;100% 2lt;50% 3lt;1/str - the same 13000 results str name=mm1lt;100% 2lt;50%/str - very same 13000 results str name=mm1lt;100% 2lt;100%/str - 93 results. seems that here i get the 33 clause working. str name=mm2lt;100%/str - same 93 results, just in case. str name=mm2lt;50%/str - very same 13000 results as it should str name=mm2lt;-50%/str - 1121 results (weird) then i tried to control 3-term queries. str name=mm2lt;-50% 3lt;100%/str 1121, the same as 2-50%, ignoring the 3 clause. str name=mm2lt;-50% 3lt;1/str the same 1121 results, ignoring again it. I'd like to accomplish something like this: str name=mm2lt;1 3lt;2 4lt;3 8lt;-50%/str translating: 1 or 2 - 1 term, 3 at least 2, 4 at least 3 and 5, 6, 7, 8 terms at least half rounded up (5-3, 6-3, 7-4, 8-4) seems that he's only using 1 and 2 clauses. thanks in advance alexei
Re: Solr 3.3 dismax MM parameter not working properly
Hmmm I believe I discovered the problem. When you have something like this: 250% 6-60% you should read it from right to left and use the word MORE. MORE THAN SIX clauses, 60% are optional, MORE THAN TWO clauses (and that includes 3, 4 and 5 AND 6) half is mandatory. if you wanna a special rule for 2 terms just add: 11 250% 6-60% MORE THAN ONE clauses (2) should match 1. NOW this makes sense! 2011/8/30 Alexei Martchenko ale...@superdownloads.com.br Anyone else strugglin' with dismax's MM parameter? We're having a problem here, seems that configs from 3 terms and more are being ignored by solr and it assumes previous configs. if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i get the same results for a 3-term query. If i try str name=mm4lt;25%/str or str name=mm4lt;100%/str I also get same data for a 4-term query. I'm searching: windows service pack str name=mm1lt;100% 2lt;50% 3lt;100%/str - 13000 results str name=mm1lt;100% 2lt;50% 3lt;1/str - the same 13000 results str name=mm1lt;100% 2lt;50%/str - very same 13000 results str name=mm1lt;100% 2lt;100%/str - 93 results. seems that here i get the 33 clause working. str name=mm2lt;100%/str - same 93 results, just in case. str name=mm2lt;50%/str - very same 13000 results as it should str name=mm2lt;-50%/str - 1121 results (weird) then i tried to control 3-term queries. str name=mm2lt;-50% 3lt;100%/str 1121, the same as 2-50%, ignoring the 3 clause. str name=mm2lt;-50% 3lt;1/str the same 1121 results, ignoring again it. I'd like to accomplish something like this: str name=mm2lt;1 3lt;2 4lt;3 8lt;-50%/str translating: 1 or 2 - 1 term, 3 at least 2, 4 at least 3 and 5, 6, 7, 8 terms at least half rounded up (5-3, 6-3, 7-4, 8-4) seems that he's only using 1 and 2 clauses. thanks in advance alexei -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: what is scheduling ? why should we do this?how to achieve this ?
since solr is basically a http server, all you need is a scheduler to browse to specific pages. on windows, u can try the task scheduler (i don't know its name in english) its the clock icon on the 'administrative tools' section. coldfusion for instance, has its own scheduler, other languages as php might have, you can use. hope it helps. 2011/8/29 nagarjuna nagarjuna.avul...@gmail.com Hi pravesh... i already saw the wiki page that what u have given...from that i got the points about collection distribution etc... but i didnt get any link which will explain the cron job process step by step for the windows OS .. can please tell me how to do it for windows? -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-scheduling-why-should-we-do-this-how-to-achieve-this-tp3287115p3292221.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: getting old records in database
depends on the case. we have a database here that updates very frequently, so we just added a field named syncid and set it to the index day. everytime the database updates it updates the syncid to the current day. after we perform a full database update, we tell solr to delete all records different to the current syncid, or the current day. its a xml with a delete query -syncid:27 will delete all records not updated in day 27 update. with databases that update constantly, it works. if anyone else knows another solution, please share. 2011/8/27 mss.mss mss.mss...@gmail.com hi we developed a solr and connected to database and getting the records from database. now we deleted the records in table but iam getting the old records in solr... to solve this what we have to do. how to solve this problem thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/getting-old-records-in-database-tp3288991p3288991.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: commas in synonyms.txt are not escaping
Gary, please post the entire field declaration so I can try to reproduce here 2011/8/26 Moore, Gary gary.mo...@ars.usda.gov I have a number of chemical names containing commas which I'm mapping in index_synonyms.txt thusly: 2\,4-D-butotyl=Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 3,CCRIS 8562 According to the sample synonyms.txt, the comma above should be. i.e. a\,a=b\,b.The problem is that according to analysis.jsp the commas are not being escaped. If I paste in 2,4-D-butotyl, then no mappings. If I paste in 2\,4-D-butotyl, the mappings are done. This is verified by there being no mappings in the index. I assume there would be if 2\,4-D-butotyl actually appeared in a document. The filter I'm declaring in the index analyzer looks like this: filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt tokenizerFactory=solr.KeywordTokenizerFactory ignoreCase=true expand=true/ Doesn't seem to matter which tokenizer I use.This must be something simple that I'm not doing but am a bit stumped at the moment and would appreciate any tips. Thanks Gary -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: commas in synonyms.txt are not escaping
Gary, isn't your wordDelimiter removing your commas in the query time? have u tried it in the analyzer? 2011/8/26 Moore, Gary gary.mo...@ars.usda.gov Here you go -- I'm just hacking the text field at the moment. Thanks, Gary fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt tokenizerFactory=solr.KeywordTokenizerFactory ignoreCase=true expand=true/ !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ !--filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true tokenizerFactory=solr.KeywordTokenizerFactory expand=true/-- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -Original Message- From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] Sent: Friday, August 26, 2011 10:30 AM To: solr-user@lucene.apache.org Subject: Re: commas in synonyms.txt are not escaping Gary, please post the entire field declaration so I can try to reproduce here -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: hierarchical faceting in Solr?
Cheers, very good, congratulations 2011/8/23 Naomi Dushay ndus...@stanford.edu Chris Beer just did a revamp of the wiki page at: http://wiki.apache.org/solr/**HierarchicalFacetinghttp://wiki.apache.org/solr/HierarchicalFaceting Yay Chris! - Naomi ( ... and I helped!) On Aug 22, 2011, at 10:49 AM, Naomi Dushay wrote: Chris, Is there a document somewhere on how to do this? If not, might you create one? I could even imagine such a document living on the Solr wiki ... this one has mostly ancient content: http://wiki.apache.org/solr/**HierarchicalFacetinghttp://wiki.apache.org/solr/HierarchicalFaceting - Naomi -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Field type change / copy field
have u tried in your facet_year index analyzer something like this? analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=\n{4} replacement=$1-01-01T00:00:**00Z replace=all/ this can theoretically do the trick 2011/8/24 Oliver Schihin oliver.schi...@unibas.ch Hello list My documents come with a field holding a date, always a year: year2008/year In the schema, this content is taken for a field year as an integer, and it will be searchable. Through a copyfield-instruction I move the year to a facet_year-field, you guess, to use it for faceting and make range queries possible. Its field type is of the class 'solr.TrieDateField' that requires canonical date representation. Is there a way in solr to extend the simple year to facet_year2008-01-01T00:00:**00Z/facet_year. Or, do i have to solve the problem in preprocessing, before posting? Thanks Oliver -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Problem using stop words
Funny thing is that stopwords files in the examples shown in http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using pipe and other terms. See the spanish one in http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt I never saw this format before. Lucas, try to use only one word per line, no pipes, no trailing spaces. and you can use all spanish accents too. Don't forget to save encoded as UTF-8... u can do that in Eclipse or even Windows Word can open and save txts in UTF-8. 2011/8/22 Erick Erickson erickerick...@gmail.com What does the admin/analysis page show? And if you're really putting the pipe symbol (|) in you stopwords file, I have no clue what Solr will make of it. The stopwords file format is usually just one word per line. I'm assuming your name of string for the field type is just a placeholder or you've replaced the example string fieldType, right? Best Erick On Mon, Aug 22, 2011 at 6:24 AM, Lucas Miguez lucas.mig...@gmail.com wrote: Hi, I am trying to use spanish stop words, but the stop words are not working: Part of the schema.xml file: fieldtype name=string class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=Spanish / filter class=solr.StopFilterFactory words=spanish_stop.txt enablePositionIncrements=true ignoreCase=true / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=Spanish / filter class=solr.StopFilterFactory words=spanish_stop.txt enablePositionIncrements=true ignoreCase=true / /analyzer /fieldtype ___ A piece of the stopwords file: de | from, of la | the, her que| who, that el | the en | in y | and a | to los| the, them del| de + el se | himself, from him etc las| the, them por| for, by, etc un | a para | for con| with no | no una| a su | his, her al | a + el | es from SER lo | him Any idea? Thanks! -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Problem using stop words
That very txt said A Spanish stop word list. Comments begin with vertical bar. Each stop word is at the start of a line. Solr's comments are #s not pipes. Brazilian stopwords file is kinda raw... http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/br/stopwords.txt 2011/8/22 Alexei Martchenko ale...@superdownloads.com.br Funny thing is that stopwords files in the examples shown in http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using pipe and other terms. See the spanish one in http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt I never saw this format before. Lucas, try to use only one word per line, no pipes, no trailing spaces. and you can use all spanish accents too. Don't forget to save encoded as UTF-8... u can do that in Eclipse or even Windows Word can open and save txts in UTF-8. 2011/8/22 Erick Erickson erickerick...@gmail.com What does the admin/analysis page show? And if you're really putting the pipe symbol (|) in you stopwords file, I have no clue what Solr will make of it. The stopwords file format is usually just one word per line. I'm assuming your name of string for the field type is just a placeholder or you've replaced the example string fieldType, right? Best Erick On Mon, Aug 22, 2011 at 6:24 AM, Lucas Miguez lucas.mig...@gmail.com wrote: Hi, I am trying to use spanish stop words, but the stop words are not working: Part of the schema.xml file: fieldtype name=string class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=Spanish / filter class=solr.StopFilterFactory words=spanish_stop.txt enablePositionIncrements=true ignoreCase=true / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=Spanish / filter class=solr.StopFilterFactory words=spanish_stop.txt enablePositionIncrements=true ignoreCase=true / /analyzer /fieldtype ___ A piece of the stopwords file: de | from, of la | the, her que| who, that el | the en | in y | and a | to los| the, them del| de + el se | himself, from him etc las| the, them por| for, by, etc un | a para | for con| with no | no una| a su | his, her al | a + el | es from SER lo | him Any idea? Thanks! -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Problem using stop words
No, I think you're right, i've never seen pipes as comments before... 2011/8/22 Erick Erickson erickerick...@gmail.com Ahh, you're right. I was wy off base there So I guess the question is how you know the words aren't being removed? A common problem is to look at *stored* fields rather than what's actually in the inverted index. The TermsComponent can help here: http://wiki.apache.org/solr/TermsComponent Erick On Mon, Aug 22, 2011 at 11:28 AM, Alexei Martchenko ale...@superdownloads.com.br wrote: That very txt said A Spanish stop word list. Comments begin with vertical bar. Each stop word is at the start of a line. Solr's comments are #s not pipes. Brazilian stopwords file is kinda raw... http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/br/stopwords.txt 2011/8/22 Alexei Martchenko ale...@superdownloads.com.br Funny thing is that stopwords files in the examples shown in http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using pipe and other terms. See the spanish one in http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt I never saw this format before. Lucas, try to use only one word per line, no pipes, no trailing spaces. and you can use all spanish accents too. Don't forget to save encoded as UTF-8... u can do that in Eclipse or even Windows Word can open and save txts in UTF-8. 2011/8/22 Erick Erickson erickerick...@gmail.com What does the admin/analysis page show? And if you're really putting the pipe symbol (|) in you stopwords file, I have no clue what Solr will make of it. The stopwords file format is usually just one word per line. I'm assuming your name of string for the field type is just a placeholder or you've replaced the example string fieldType, right? Best Erick On Mon, Aug 22, 2011 at 6:24 AM, Lucas Miguez lucas.mig...@gmail.com wrote: Hi, I am trying to use spanish stop words, but the stop words are not working: Part of the schema.xml file: fieldtype name=string class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=Spanish / filter class=solr.StopFilterFactory words=spanish_stop.txt enablePositionIncrements=true ignoreCase=true / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=Spanish / filter class=solr.StopFilterFactory words=spanish_stop.txt enablePositionIncrements=true ignoreCase=true / /analyzer /fieldtype ___ A piece of the stopwords file: de | from, of la | the, her que| who, that el | the en | in y | and a | to los| the, them del| de + el se | himself, from him etc las| the, them por| for, by, etc un | a para | for con| with no | no una| a su | his, her al | a + el | es from SER lo | him Any idea? Thanks! -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: How to implement Spell Checker using Solr?
What is the error? 2011/8/22 anupamxyz cse.anu...@gmail.com The changes for Solrconfig.xml in solr is as follows searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namedefault/str str name=classnamesolr.IndexBasedSpellChecker/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.7/str float name=thresholdTokenFrequency.0001/float /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldlowerfilt/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker/str /lst str name=queryAnalyzerFieldTypetextSpell/str /searchComponent And for the Request handler, I have incorporated the following changes: requestHandler name=/spellCheckCompRH class=solr.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.dictionarydefault/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str str name=spellcheck.buildtrue/str str name=spellcheck.collatetrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler The same is failing while crawling. I have reveretd my code for now. But can try it once again and post the exception that I have been getting while crawling. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3274069.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: how can i develop client application with solr url using javascript?
before setting up your solr to response directly to jquery did you manage to bulletproof it agains unwanted deletes? how will you protect your database? be careful before exposing solr directly to 'the world'. 2011/8/22 nagarjuna nagarjuna.avul...@gmail.com hi everybody , i have solr url which produces json response format ...i would like to develop a client application using javascript which is automatic search field please send me any samples or any sample code. i need to use my solr url in jscript or jquery file to implement automatic search field Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/how-can-i-develop-client-application-with-solr-url-using-javascript-tp3275506p3275506.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters
Hi Koji, thanks, it's loading right now. Can't say it's really working though, but I believe those are other issues with FastVectorHighlighter 2011/8/18 Koji Sekiguchi k...@r.email.ne.jp (11/08/19 4:14), Alexei Martchenko wrote: Hi Koji thanks for the reply. MyfragmentsBuilder is defined directly inconfig. SOLR 3.3 warns me highlighting is a deprecated form do you think it is in the wrong place? Hi Alexei, Yes, it is incorrect. What deprecate is that highlighting tag just under config directly. After 3.1, it needs to be under searchComponent for HighlightComponent. Please consult solrconfig.xml in example 3.3. koji -- Check out Query Log Visualizer http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/ -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: ClassNotFoundException when trying to make spellcheck JaraWinkler working
Good knowledge for everybody, those little mistakes like spaces, typos and lack of commas makes lose so many time. thanks for posting this. 2011/8/18 Mike Mander wicket-m...@gmx.de Solution found. The original solr-config.xml jarowinkler definition had some line breaks. If i write the difinition in one line (no tabs, no line breaks) server starts without exception str name=distanceMeasureorg.**apache.lucene.search.spell.** JaroWinklerDistance/str Thanks for helping me Mike Hi Mike, is your config like this? Is queryAnalyzerFieldType matching your type of field to be indexed? Is the field correct? searchComponent name=spellcheck class=solr.**SpellCheckComponent str name=queryAnalyzerFieldType**textSpell/str lst name=spellchecker str name=namejarowinkler/str str name=fieldsear_spellterms/**str str name=buildOnCommitfalse/**str str name=buildOnOptimizetrue/**str str name=distanceMeasureorg.**apache.lucene.search.spell.** JaroWinklerDistance/str str name=spellcheckIndexDir./**spellchecker_jarowinkler/str /lst /searchComponent 2011/8/17 Mike Manderwicket-m...@gmx.de Hello, i get a ClassNotFoundException for JaraWinklerDistance when i start the solr example server. I simply copied the server and uncommented the spellchecker in example/conf/solr-config.xml I did nothing else. I already googled but didn't get a hint. Can someone help me please. Thanks Mike Stacktrace: C:\Users\m.mander\Desktop\temp\apache-solr-3.3.0\examplejava -jar start.jar 2011-08-17 14:55:20.379:INFO::Logging to STDERR via org.mortbay.log.StdErrLog 2011-08-17 14:55:20.462:INFO::jetty-6.1-SNAPSHOT 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoaderinit INFO: Solr home set to 'solr/' 17.08.2011 14:55:20 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) 17.08.2011 14:55:20 org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: C:\Users\m.mander\Desktop\** temp\apache-solr-3.3.0\example\solr\solr.xml 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) 17.08.2011 14:55:20 org.apache.solr.core.CoreContainerinit INFO: New CoreContainer: solrHome=solr/ instance=22725577 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoaderinit INFO: Solr home set to 'solr/' 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoaderinit INFO: Solr home set to 'solr\.\' 17.08.2011 14:55:20 org.apache.solr.core.SolrConfig initLibs INFO: Adding specified lib dirs to ClassLoader 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.* *** 0/contrib/extraction/lib/asm-3.1.jar' to classloader 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.* *** 0/contrib/extraction/lib/asm-LICENSE-BSD_LIKE.txt' to classloader 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.* *** 0/contrib/extraction/lib/asm-NOTICE.txt' to classloader 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.* *** 0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.* *** 0/contrib/extraction/lib/bcmail-LICENSE-BSD_LIKE.txt' to classloader 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.* *** 0/contrib/extraction/lib/bcmail-NOTICE.txt' to classloader 17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.* *** 0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to
Re: suggester issues
It can be done, I did that with shingles, but it's not the way it's meant to be. The main problem with suggester is that we want compound words and we never get them. I try to get internet explorer but when i enter in the second word, internet e the suggester never finds explorer. 2011/8/18 oberman_cs ober...@civicscience.com I was trying to deal with the exact same issue, with the exact same results. Is there really no way to feed a phrase into the suggester (spellchecker) without it splitting the input phrase into words? -- View this message in context: http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters
Hi Koji thanks for the reply. My fragmentsBuilder is defined directly in config. SOLR 3.3 warns me highlighting is a deprecated form do you think it is in the wrong place? 2011/8/17 Koji Sekiguchi k...@r.email.ne.jp Alexei, From the log, I think Solr couldn't find colored fragmentsBuilder defined in solrconfig.xml. Can you check the following fragmentsBuilder/ setting in searchComponent**highlighting... /highlighting/**searchComponent in solrconfig.xml? koji -- Check out Query Log Visualizer http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/ (11/08/16 8:51), Alexei Martchenko wrote: I'm having some trouble trying to upgrade my old hightligher fromhighlightingfragmenter**formatter format (1.4 version, default config in the solr website) to the new Fast Vector highlighter. I'm using SOLR 3.3.0 withluceneMatchVersion** LUCENE_33/luceneMatchVersion inconfig In my solrconfig.xml i added these lines in the default request handler: bool name=hl.**useFastVectorHighlighter**true/bool bool name=hl.usePhraseHighlighter**true/bool bool name=hl.highlightMultiTerm**true/bool str name=hl.fragmentsBuilder**colored/str and fragmentsBuilder name=colored class=org.apache.solr.**highlight.**ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine**,b style=background:magenta, b style=background:palegreen,**b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue**]]/str str name=hl.tag.post![CDATA[/**b]]/str /lst /fragmentsBuilder All I get is: ('grave' means severe) 15/08/2011 20:44:19 org.apache.solr.common.**SolrException log GRAVE: org.apache.solr.common.**SolrException: Unknown fragmentsBuilder: colored at org.apache.solr.highlight.**DefaultSolrHighlighter.** getSolrFragmentsBuilder(**DefaultSolrHighlighter.java:**320) at org.apache.solr.highlight.**DefaultSolrHighlighter.** doHighlightingByFastVectorHigh**lighter(**DefaultSolrHighlighter.java:** 508) at org.apache.solr.highlight.**DefaultSolrHighlighter.**doHighlighting(** DefaultSolrHighlighter.java:**376) at org.apache.solr.handler.**component.HighlightComponent.** process(HighlightComponent.**java:116) at org.apache.solr.handler.**component.SearchHandler.**handleRequestBody(** SearchHandler.java:194) at org.apache.solr.handler.**RequestHandlerBase.**handleRequest(** RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1368) at org.apache.solr.servlet.**SolrDispatchFilter.execute(** SolrDispatchFilter.java:356) at org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.** doFilter(ServletHandler.java:**1212) at org.mortbay.jetty.servlet.**ServletHandler.handle(** ServletHandler.java:399) at org.mortbay.jetty.security.**SecurityHandler.handle(** SecurityHandler.java:216) at org.mortbay.jetty.servlet.**SessionHandler.handle(** SessionHandler.java:182) at org.mortbay.jetty.handler.**ContextHandler.handle(** ContextHandler.java:766) at org.mortbay.jetty.webapp.**WebAppContext.handle(**WebAppContext.java:450) at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(** ContextHandlerCollection.java:**230) at org.mortbay.jetty.handler.**HandlerCollection.handle(** HandlerCollection.java:114) at org.mortbay.jetty.handler.**HandlerWrapper.handle(** HandlerWrapper.java:152) at org.mortbay.jetty.Server.**handle(Server.java:326) at org.mortbay.jetty.**HttpConnection.handleRequest(** HttpConnection.java:542) at org.mortbay.jetty.**HttpConnection$RequestHandler.** headerComplete(HttpConnection.**java:928) at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.** java:212) at org.mortbay.jetty.**HttpConnection.handle(** HttpConnection.java:404) at org.mortbay.jetty.bio.**SocketConnector$Connection.** run(SocketConnector.java:228) at org.mortbay.thread.**QueuedThreadPool$PoolThread.** run(QueuedThreadPool.java:582) Docs in http://wiki.apache.org/solr/**HighlightingParametershttp://wiki.apache.org/solr/HighlightingParameterssay: hl.fragmentsBuilder Specify the name of SolrFragmentsBuilderhttp://**wiki.apache.org/solr/**SolrFragmentsBuilderhttp://wiki.apache.org/solr/SolrFragmentsBuilder . [image:!] Solr3.1http://wiki.apache.**org/solr/Solr3.1http://wiki.apache.org/solr/Solr3.1 This parameter makes sense
Re: ClassNotFoundException when trying to make spellcheck JaraWinkler working
Hi Mike, is your config like this? Is queryAnalyzerFieldType matching your type of field to be indexed? Is the field correct? searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namejarowinkler/str str name=fieldsear_spellterms/str str name=buildOnCommitfalse/str str name=buildOnOptimizetrue/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker_jarowinkler/str /lst /searchComponent 2011/8/17 Mike Mander wicket-m...@gmx.de Hello, i get a ClassNotFoundException for JaraWinklerDistance when i start the solr example server. I simply copied the server and uncommented the spellchecker in example/conf/solr-config.xml I did nothing else. I already googled but didn't get a hint. Can someone help me please. Thanks Mike Stacktrace: C:\Users\m.mander\Desktop\**temp\apache-solr-3.3.0\**examplejava -jar start.jar 2011-08-17 14:55:20.379:INFO::Logging to STDERR via org.mortbay.log.StdErrLog 2011-08-17 14:55:20.462:INFO::jetty-6.1-**SNAPSHOT 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader init INFO: Solr home set to 'solr/' 17.08.2011 14:55:20 org.apache.solr.servlet.**SolrDispatchFilter init INFO: SolrDispatchFilter.init() 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) 17.08.2011 14:55:20 org.apache.solr.core.**CoreContainer$Initializer initialize INFO: looking for solr.xml: C:\Users\m.mander\Desktop\** temp\apache-solr-3.3.0\**example\solr\solr.xml 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) 17.08.2011 14:55:20 org.apache.solr.core.**CoreContainer init INFO: New CoreContainer: solrHome=solr/ instance=22725577 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader init INFO: Solr home set to 'solr/' 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader init INFO: Solr home set to 'solr\.\' 17.08.2011 14:55:20 org.apache.solr.core.**SolrConfig initLibs INFO: Adding specified lib dirs to ClassLoader 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.** 0/contrib/extraction/lib/asm-**3.1.jar' to classloader 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.** 0/contrib/extraction/lib/asm-**LICENSE-BSD_LIKE.txt' to classloader 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.** 0/contrib/extraction/lib/asm-**NOTICE.txt' to classloader 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.** 0/contrib/extraction/lib/**bcmail-jdk15-1.45.jar' to classloader 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.** 0/contrib/extraction/lib/**bcmail-LICENSE-BSD_LIKE.txt' to classloader 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.** 0/contrib/extraction/lib/**bcmail-NOTICE.txt' to classloader 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.** 0/contrib/extraction/lib/**bcprov-jdk15-1.45.jar' to classloader 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.** 0/contrib/extraction/lib/**bcprov-LICENSE-BSD_LIKE.txt' to classloader 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.** 0/contrib/extraction/lib/**bcprov-NOTICE.txt' to classloader 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
Re: Solr 1.4.1 vs 3.3 (Speed)
I'm doing the exact same migration... what I've accomplished so far 1. In solrconfig.xml i put luceneMatchVersionLUCENE_33/luceneMatchVersion in the first line in the config branch. Warnings go like crazy if you don't do that. 2. Highlighter shows a deprecated warning, i'm still working on that. It works, but I'd like to use the new fastvectorhighlight wich i'm strugglin' to death right now 3. All my speed measures are doing exact the same. sometimes we lose 60ms, sometimes we gain 60ms, so it's about average. I'll rebuild the index from scratch to see differences maybe today or later this week 4. Since i had to turned termVectors=true termPositions=true termOffsets=true in 3 fileds to use fastvectorhighlight, i expect speed gains in HL 2011/8/17 Samarendra Pratap samarz...@gmail.com Hi we are planning to migrate from solr 1.4.1 to solr 3.3 and I am doing a manual performance comparison. We have setup two different solr installations (1.4.1 and 3.3) on different ports. 1. Both have same index (old lucene format index) of around 20 GB with 10 million documents and 60 fields (40 fields with indexed=true). 2. Both processes have max 4GB memory allocated (-Xms2048m -Xmx4096m) 3. Both installation are on same server (8 processor Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz, 8GB RAM, 64 bit linux system) 4. We are running solr 1.4.1 with collapsing patch (SOLR-236-1_4_1.patchhttps://issues.apache.org/jira/browse/SOLR-236 ). When I pass exactly similar query to both the servers one by one solr 1.4.1 is more efficient than solr 3.3. Before I convert the index into LUCENE_33 format I thought it would be good to take the expert advice. Is there something which I should look into deeply? Or could this be effect of old index format with new version and should be ignored? When I used debugQuery=true, it clearly shows that org.apache.solr.handler.component.CollapseComponent (solr 1.4.1) noticeably taking less time than org.apache.solr.handler.component.QueryComponent (solr 3.3). I am testing this against simple queries without any faceting, highlighting, collapsing etc. (* http://xxx.xxx:8983/solr/select/?q=Packaging%20Material,%20Suppliesqt=dismaxqf=category ^4.0qf=keywords^2.0qf=title^2.0qf=smalldescqf=companynameqf=usercategoryqf=usrpcatdescqf=cityqs=10pf=category^4.0pf=keywords^3pf=title^3pf=smalldesc^1.5pf=companynamepf=usercategorypf=usrpcatdescpf=cityps=0bq=type:[149%20TO%201500]^3start=0rows=50fl=title,smalldesc,iddebugQuery=true *) Any insights by the experts would be greatly appreciated! Thanks in advance. -- Regards, Samar -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Spell Checker
Its not a file, it's a request handler. you add those in the solrconfig.xml read here plz http://wiki.apache.org/solr/Suggester 2011/8/17 naeluh nae...@gmail.com Hi Dan, I saw this command - http://localhost:8983/solr/spell?q=ANYTHINGHEREspellcheck=truespellcheck.collate=truespellcheck.build=true I tried to issue it and got 404 error that I did not have the path /solr/spell Should I add this file and what type of file is it. I got to via he post on Drupal - http://drupal.org/node/975132 thanks ! Nick -- View this message in context: http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262684.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: suggester issues
I have the very very very same problem. I could copy+paste your message as mine. I've discovered so far that bigger dictionaries work better for me, controlling threshold is much better than avoid indexing one or twio fields. Of course i'm still polishing this. At this very moment I was looking into Shingles, are you using them? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory How are your fields? 2011/8/17 Kuba Krzemień krzemien.k...@gmail.com Hello, I am working on creating a auto-complete functionality for my platform which indexes large ammounts of text (title + contents) - there is too much data for a dictionary. I am using the latest version of Solr (3.3) and I am trying to take advantage of the Suggester functionality. Unfortunately so far the outcome isn't that great. The Suggester works only for single words or whole phrases (depends on the tokenizer). When using the first option, I am unable to suggest any combined queries. For example the suggestion for 'ne' will be 'new'. Suggestion for 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats worse, querying 'new AND y' gives the same results (also when using collate), which means that the returned suggestion may give no results - what makes sense separately often doesn't work combined. I need a way to find only those suggestions, that will return results when doing a AND query (for example 'new AND york', 'new AND year', as long as they give results upon querying - 'new AND yeti' shouldn't be returned as a suggestion). When I use the second tokenizer and the suggestions return phrases, for 'ne' I will get 'new york' and 'new year', but for 'new y' I will get nothing. Also, for 'y' I will get nothing, so the issue remains. If someone has some experience working with the Suggester, or if someone has created a well working auto-suggester based on Solr, please help me. I've been trying to find a sollution for this for quite some time. Yours sincerely, Jackob K -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: suggester issues
I've been indexing and reindexing stuff here with Shingles. I don't believe it's the best approach. Results are interesting, but I believe it's not what the suggester is meant to be. I tried fieldType name=textSuggestion class=solr.TextField positionIncrementGap=10 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=4 outputUnigrams=true outputUnigramsIfNoShingles=false / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType but I got compound words in the suggestion itself. If you query them like http://localhost:8983/solr/{mycore}/suggest/?q=dri i get response lst name=responseHeader int name=status0/int int name=QTime1/int /lst lst name=spellcheck lst name=suggestions lst name=dri int name=numFound6/int int name=startOffset0/int int name=endOffset3/int arr name=suggestion strdrivers/str strdrivers nvidia/str strdrivers intel/str strdrivers nvidia geforce/str strdrive/str strdriver/str /arr /lst str name=collationdrivers/str /lst /lst /response but when i enter the second word, http://localhost:8983/solr/{mycore}/suggest/?q=drivers%20nhttp://localhost:8983/solr/%7Bmycore%7D/suggest/?q=drivers%20n it scrambles everything response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=spellcheck lst name=suggestions lst name=drivers int name=numFound4/int int name=startOffset0/int int name=endOffset7/int arr name=suggestion strdrivers/str strdrivers nvidia/str strdrivers intel/str strdrivers nvidia geforce/str /arr /lst lst name=n int name=numFound10/int int name=startOffset8/int int name=endOffset9/int arr name=suggestion strnvidia/str strnet/str strnvidia geforce/str strnetwork/str strnew/str strn/str strninja/str /arr /lst str name=collationdrivers nvidia/str /lst /lst /response Although the collation seems fine for this, it's not exactly what suggester is supposed to do. Any thoughts? 2011/8/17 Alexei Martchenko ale...@superdownloads.com.br I have the very very very same problem. I could copy+paste your message as mine. I've discovered so far that bigger dictionaries work better for me, controlling threshold is much better than avoid indexing one or twio fields. Of course i'm still polishing this. At this very moment I was looking into Shingles, are you using them? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory How are your fields? 2011/8/17 Kuba Krzemień krzemien.k...@gmail.com Hello, I am working on creating a auto-complete functionality for my platform which indexes large ammounts of text (title + contents) - there is too much data for a dictionary. I am using the latest version of Solr (3.3) and I am trying to take advantage of the Suggester functionality. Unfortunately so far the outcome isn't that great. The Suggester works only for single words or whole phrases (depends on the tokenizer). When using the first option, I am unable to suggest any combined queries. For example the suggestion for 'ne' will be 'new'. Suggestion for 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats worse, querying 'new AND y' gives the same results (also when using collate), which means that the returned suggestion may give no results - what makes sense separately often doesn't work combined. I need a way to find only those suggestions, that will return results when doing a AND query (for example 'new AND york', 'new AND year', as long as they give results upon querying - 'new AND yeti' shouldn't be returned as a suggestion). When I use the second tokenizer and the suggestions return phrases, for 'ne' I will get 'new york' and 'new year', but for 'new y' I will get nothing. Also, for 'y' I will get nothing, so the issue remains. If someone has some experience working with the Suggester, or if someone has created a well working auto-suggester based on Solr, please help me. I've been trying to find a sollution for this for quite some time. Yours sincerely, Jackob K -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Spell Checker
No, if you are trying to build a suggester (what It seems to be) please read the url I sent you. You'll need to create the suggester itself searchComponent class=solr.SpellCheckComponent name=suggest and the url handler requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest in your case, to work on that url, just rename it to requestHandler class=org.apache.solr.handler.component.SearchHandler name=/spell 2011/8/17 naeluh nae...@gmail.com so I add spellcheck.build=true to solrconfig.xml just anywhere and that will wrk? thks very much for your help -- View this message in context: http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262744.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Spell Checker
Config your xml properly, reload your core (or reload solr) then commit. This spellchecker is configured to build on commit str name=buildOnCommittrue/str. Everytime you commit something, it will rebuild your dictionary based on the configuration you selected. 2011/8/17 naeluh nae...@gmail.com so I add spellcheck.build=true to solrconfig.xml just anywhere and that will wrk? thks very much for your help -- View this message in context: http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262744.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Solr spellcheck and multiple collations
Can u show us how is your schema and config? I believe that's how collation is: the best match, only one. 2011/8/17 Herman Kiefus herm...@angieslist.com After a bit of work, we have 'spellchecking' up and going and we are happy with the suggestions. I have not; however, ever been able to generate more than one collation query. Is there something simple that I have overlooked? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Solr spellcheck and multiple collations
Thank you very much for this awesome config. I'm working on it as we speak. 2011/8/17 Herman Kiefus herm...@angieslist.com If you only get one, best, collation then there is no point to my question; however, since you asked... The relevant sections: Solrconfig.xml - searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextDictionary/str lst name=spellchecker str name=namedefault/str str name=classnamesolr.IndexBasedSpellChecker/str str name=fieldTermsDictionary/str str name=spellcheckIndexDir./spellchecker/str float name=thresholdTokenFrequency0.0/float str name=comparatorClassscore/str /lst Schema.xml - fieldType name=textCorrectlySpelled class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.KeepWordFilterFactory words=correctly_spelled_terms.txt ignoreCase=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StandardFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StandardFilterFactory/ /analyzer /fieldType fieldType name=textDictionary class=solr.TextField positionIncrementGap=100 omitNorms=true !-- No index-time analysis as that is done by the fields that source fields of this type-- analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StandardFilterFactory/ /analyzer /fieldType field name=CorrectlySpelledTerms type=textCorrectlySpelled indexed=false stored=false multiValued=true/ field name=TermsDictionary type=textDictionary indexed=true stored=false multiValued=true/ !-- Those fields that will have misspellings stripped before they are put into the dictionary -- copyField source=BusinessDescription dest=CorrectlySpelledTerms/ copyField source=Services dest=CorrectlySpelledTerms/ copyField source=ServiceArea dest=CorrectlySpelledTerms/ copyField source=City dest=CorrectlySpelledTerms/ copyField source=CategoryName dest=CorrectlySpelledTerms/ copyField source=MedicalSpecialtyDescription dest=CorrectlySpelledTerms/ copyField source=ReportComment dest=CorrectlySpelledTerms/ copyField source=ReportDescription dest=CorrectlySpelledTerms/ copyField source=ReportMediaDescription dest=CorrectlySpelledTerms/ copyField source=AdditionalReportInformationAnswer dest=CorrectlySpelledTerms/ !-- The dictionary source field -- !-- Those fields that are not spell checked but rather appear in the dictionary as is -- copyField source=Name dest=TermsDictionary/ copyField source=Contact dest=TermsDictionary/ !-- Plus the rmainder of those fields that are spellchecked -- copyField source =CorrectlySpelledTerms dest=TermsDictionary/ -Original Message- From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] Sent: Wednesday, August 17, 2011 5:34 PM To: solr-user@lucene.apache.org Subject: Re: Solr spellcheck and multiple collations Can u show us how is your schema and config? I believe that's how collation is: the best match, only one. 2011/8/17 Herman Kiefus herm...@angieslist.com After a bit of work, we have 'spellchecking' up and going and we are happy with the suggestions. I have not; however, ever been able to generate more than one collation query. Is there something simple that I have overlooked? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br| ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Random + Boost?
To make random results i'd use something related to dates and milliseconds, not boosting. lemme think about this... 2011/8/16 Ahmet Arslan iori...@yahoo.com This might seem odd, but is it possible to use boost with random ordering? That is, documents that get boosted are more likely to appear towards the top of the ordering (I only display page 1, say 30 documents). Does that make sense? I'm assuming that random ordering is, well, really random - so then it's not possible. But I figured I'd ask. My problem is that I want to display a random assortment of documents, but unfortunately certain types of documents far outnumber other types. So a random assortment ends up with 50% type A, 50% type B, C, D, E, F. So, I was thinking I would essentially boost types B, C, D, E, F until all types are approximately evenly represented in the random assortment. (Or alternatively, if the user has an affinity for type B documents, further boost type B documents so that they're more likely to be represented than other types). Anyone know if there's a way to do something like this in Solr? Sounds like you want to achieve diversity of results. Consider using http://wiki.apache.org/solr/FieldCollapsing Alternatively you can make use of RandomSortField with function queries. http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Unable to get multicore working
Lets try something simplier. My start.jar is on \apache-solr-3.3.0\example\ Here's my local config placed in \apache-solr-3.3.0\example\solr\ ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores core name=softwares01 instanceDir=softwares01 / /cores /solr Create \apache-solr-3.3.0\example\solr\softwares01\conf\ and \apache-solr-3.3.0\example\solr\softwares01\data\ http://localhost:8983/solr/ should work and so is http://localhost:8983/solr/softwares01/admin/ 2011/8/16 David Sauve dnsa...@gmail.com I've been trying (unsuccessfully) to get multicore working for about a day and a half now I'm nearly at wits end and unsure what to do anymore. **Any** help would be appreciated. I've installed Solr using the solr-jetty packages on Ubuntu 10.04. The default Solr install seems to work fine. Now, I want to add three cores: live, staging, preview to be used for the various states of the site. I've created a `solr.xml` file as follows and symlinked it in to /usr/share/solr: ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/cores core name=preview instanceDir=/home/webteam/config/search/preview dataDir=/home/webteam/preview/data / core name=staging instanceDir=/home/webteam/config/search/staging dataDir=/home/webteam/staging/data / core name=live instanceDir=/home/webteam/config/search/live dataDir=/home/webteam/live/data / /cores /solr Now, when I try to view any cores, I get a 404 - Not found. In fact, I can't even view /solr/admin/ anymore after installing that `solr.xml` file. Also, /solr/admin/cores returns an XML file, but it looks to me like there's no cores listed. The output: response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=status/ /response Finally, looking through the logs produced by Jetty doesn't seem to reveal any clues about what is wrong. There doesn't seem to be any errors in there, except the 404s. Long story short. I'm stuck. Any suggestions on where to go with this? David -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Unable to get multicore working
AFAIK you're still seeing singlecore version where is your start.jar? search for solr.xml, see how many u've got plz. 2011/8/16 David Sauve dnsa...@gmail.com I've installed using aptitude so I don't have an example folder (that I can find). /solr/ does work (but lists no cores) /solr/live/admin/ does not -- 404 On Tuesday, 16 August, 2011 at 1:13 PM, Alexei Martchenko wrote: Lets try something simplier. My start.jar is on \apache-solr-3.3.0\example\ Here's my local config placed in \apache-solr-3.3.0\example\solr\ ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores core name=softwares01 instanceDir=softwares01 / /cores /solr Create \apache-solr-3.3.0\example\solr\softwares01\conf\ and \apache-solr-3.3.0\example\solr\softwares01\data\ http://localhost:8983/solr/ should work and so is http://localhost:8983/solr/softwares01/admin/ 2011/8/16 David Sauve dnsa...@gmail.com (mailto:dnsa...@gmail.com) I've been trying (unsuccessfully) to get multicore working for about a day and a half now I'm nearly at wits end and unsure what to do anymore. **Any** help would be appreciated. I've installed Solr using the solr-jetty packages on Ubuntu 10.04. The default Solr install seems to work fine. Now, I want to add three cores: live, staging, preview to be used for the various states of the site. I've created a `solr.xml` file as follows and symlinked it in to /usr/share/solr: ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/cores core name=preview instanceDir=/home/webteam/config/search/preview dataDir=/home/webteam/preview/data / core name=staging instanceDir=/home/webteam/config/search/staging dataDir=/home/webteam/staging/data / core name=live instanceDir=/home/webteam/config/search/live dataDir=/home/webteam/live/data / /cores /solr Now, when I try to view any cores, I get a 404 - Not found. In fact, I can't even view /solr/admin/ anymore after installing that `solr.xml` file. Also, /solr/admin/cores returns an XML file, but it looks to me like there's no cores listed. The output: response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=status/ /response Finally, looking through the logs produced by Jetty doesn't seem to reveal any clues about what is wrong. There doesn't seem to be any errors in there, except the 404s. Long story short. I'm stuck. Any suggestions on where to go with this? David -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br (mailto:ale...@superdownloads.com.br) | ale...@martchenko.com.br (mailto:ale...@martchenko.com.br) | (11) 5083.1018/5080.3535/5080.3533 -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Unable to get multicore working
Is your solr.xml in usr/share/jetty/solr/solr.xml? lets try this xml instead ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores core name=core01 instanceDir=core01 / core name=core02 instanceDir=core02 / core name=core03 instanceDir=core03 / /cores /solr Can you see the logs? You should see something like this 16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader init *INFO: Solr home set to 'solr/'* 16/08/2011 17:30:55 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) 16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader locateSolrHome *INFO: solr home defaulted to 'solr/' (could not find system property or JNDI)* 16/08/2011 17:30:55 org.apache.solr.core.CoreContainer$Initializer initialize *INFO: looking for solr.xml: usr/share/jetty/solr/solr.xml* 16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) 16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader locateSolrHome *INFO: solr home defaulted to 'solr/' (could not find system property or JNDI)* 16/08/2011 17:30:55 org.apache.solr.core.CoreContainer init *INFO: New CoreContainer: solrHome=solr/ instance=21357269* 16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader init *INFO: Solr home set to 'solr/'* 16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader init *INFO: Solr home set to 'solr\core01\'* 2011/8/16 David Sauve dnsa...@gmail.com Just the one `solr.xml`. The one I added (well, symlinked form my config folder -- I like to keep my configurations files organized so they can be managed by git) `start.jar` is in `usr/share/jetty/start.jar`. On Tuesday, 16 August, 2011 at 1:33 PM, Alexei Martchenko wrote: AFAIK you're still seeing singlecore version where is your start.jar? search for solr.xml, see how many u've got plz. 2011/8/16 David Sauve dnsa...@gmail.com (mailto:dnsa...@gmail.com) I've installed using aptitude so I don't have an example folder (that I can find). /solr/ does work (but lists no cores) /solr/live/admin/ does not -- 404 On Tuesday, 16 August, 2011 at 1:13 PM, Alexei Martchenko wrote: Lets try something simplier. My start.jar is on \apache-solr-3.3.0\example\ Here's my local config placed in \apache-solr-3.3.0\example\solr\ ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores core name=softwares01 instanceDir=softwares01 / /cores /solr Create \apache-solr-3.3.0\example\solr\softwares01\conf\ and \apache-solr-3.3.0\example\solr\softwares01\data\ http://localhost:8983/solr/ should work and so is http://localhost:8983/solr/softwares01/admin/ 2011/8/16 David Sauve dnsa...@gmail.com (mailto:dnsa...@gmail.com) I've been trying (unsuccessfully) to get multicore working for about a day and a half now I'm nearly at wits end and unsure what to do anymore. **Any** help would be appreciated. I've installed Solr using the solr-jetty packages on Ubuntu 10.04. The default Solr install seems to work fine. Now, I want to add three cores: live, staging, preview to be used for the various states of the site. I've created a `solr.xml` file as follows and symlinked it in to /usr/share/solr: ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/cores core name=preview instanceDir=/home/webteam/config/search/preview dataDir=/home/webteam/preview/data / core name=staging instanceDir=/home/webteam/config/search/staging dataDir=/home/webteam/staging/data / core name=live instanceDir=/home/webteam/config/search/live dataDir=/home/webteam/live/data / /cores /solr Now, when I try to view any cores, I get a 404 - Not found. In fact, I can't even view /solr/admin/ anymore after installing that `solr.xml` file. Also, /solr/admin/cores returns an XML file, but it looks to me like there's no cores listed. The output: response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=status/ /response Finally, looking through the logs produced by Jetty doesn't seem to reveal any clues about what is wrong. There doesn't seem to be any errors in there, except the 404s. Long story short. I'm stuck. Any suggestions on where to go with this? David -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br (mailto:ale...@superdownloads.com.br) | ale...@martchenko.com.br (mailto:ale...@martchenko.com.br) | (11) 5083.1018
Re: Migration from Autonomy IDOL to SOLR
This might be a longshot but... Adobe is deprecating Verity in Coldfusion engine. Version 9 has both databases but I believe CF10 will only have Solr bundled. Idol is the-new-verity since autonomy acquired verity. Although Adobe wraps solr to work like old verity, there might be some info on people who migrated from verity from solr few years ago. Sorry for not helping much but sometimes these little information leads to something. 2011/8/15 Arcadius Ahouansou arcad...@menelic.com Hello. We have a couple of application running on half a dozen Autonomy IDOL servers. Currently, all feature we need are supported by Solr. We have done some internal testing and realized that SOLR would do a better job. So, we are investigation all possibilities for a smooth migration from IDOL to SOLR. I am looking for advice from people who went through something similar. Ideally, we would like to keep most of our legacy code unchanged and have a kind of query-translation-layer plugged into our app if possible. -Is there lib available? -Any thought? Thanks. Arcadius. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters
I'm having some trouble trying to upgrade my old hightligher from highlightingfragmenterformatter format (1.4 version, default config in the solr website) to the new Fast Vector highlighter. I'm using SOLR 3.3.0 with luceneMatchVersionLUCENE_33/luceneMatchVersion in config In my solrconfig.xml i added these lines in the default request handler: bool name=hl.useFastVectorHighlightertrue/bool bool name=hl.usePhraseHighlightertrue/bool bool name=hl.highlightMultiTermtrue/bool str name=hl.fragmentsBuildercolored/str and fragmentsBuilder name=colored class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder All I get is: ('grave' means severe) 15/08/2011 20:44:19 org.apache.solr.common.SolrException log GRAVE: org.apache.solr.common.SolrException: Unknown fragmentsBuilder: colored at org.apache.solr.highlight.DefaultSolrHighlighter.getSolrFragmentsBuilder(DefaultSolrHighlighter.java:320) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:508) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:376) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Docs in http://wiki.apache.org/solr/HighlightingParameters say: hl.fragmentsBuilder Specify the name of SolrFragmentsBuilderhttp://wiki.apache.org/solr/SolrFragmentsBuilder . [image: !] Solr3.1 http://wiki.apache.org/solr/Solr3.1 This parameter makes sense for FastVectorHighlighterhttp://wiki.apache.org/solr/FastVectorHighlighter only. SolrFragmentsBuilder http://wiki.apache.org/solr/SolrFragmentsBuilder respects hl.tag.pre/post parameters: !-- multi-colored tag FragmentsBuilder -- fragmentsBuilder name=colored class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder -- *Alexei*
Re: strip html from data
. There are stillh3 tags inside the data. Although I believe there are viewer then before but I can not prove that. Fact is, there are still html tags inside the data. Any other ideas what the problem could be? 2011/7/25 Markus Jelsmamarkus.jelsma@**openindex.io markus.jel...@openindex.io You've three analyzer elements, i wonder what that would do. You need to add the char filter to the index-time analyzer. On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote: Hi there, I am trying to strip html tags from the data before adding the documents to the index. To do that I altered schem.xml like this: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=**true analyzer type=index tokenizer class=solr.**WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.** LowerCaseFilterFactory/ filter class=solr.**KeywordMarkerFilterFactory/ filter class=solr.** PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.**WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.** LowerCaseFilterFactory/ filter class=solr.**KeywordMarkerFilterFactory/ filter class=solr.** PorterStemFilterFactory/ /analyzer analyzer charFilter class=solr.**HTMLStripCharFilterFactory/ tokenizer class=solr.**WhitespaceTokenizerFactory/ /analyzer /fieldType fields field name=text type=text indexed=true stored=true required=false/ /fields Unfortunatelly this does not work, the hmtl tags likeh3 are still present after restarting and reindexing. I also tryed htmlstriptransformer, but this did not work either. Has anybody an idea how to get this done? Thank you in advance for any hint. Merlin -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/**markus17 http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: how to ignore case in solr search field?
Here's an example. Since I only query this for spelling, i can lowecase both on index and query time. fieldType name=textSpell class=solr.TextField positionIncrementGap=10 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType 2011/8/10 nagarjuna nagarjuna.avul...@gmail.com Hi please help me .. how to ignore case while searching in solr ex:i need same results for the keywords abc, ABC , aBc,AbC and all the cases. Thank u in advance -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-ignore-case-in-solr-search-field-tp3242967p3242967.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: bug in termfreq? was Re: is it possible to do a sort without query?
are you boosting your docs? 2011/8/8 Jason Toy jason...@gmail.com I am trying to test out and compare different sorts and scoring. When I use dismax to search for indie music with: qf=all_lists_textq=indie+musicdefType=dismaxrows=100 I see some stuff that seems irrelevant, meaning in top results I see only 1 or 2 mentions of indie music, but when I look further down the list I do see other docs that have more occurrences of indie music. So I a want to test by comparing the the different queries versus seeing a list of docs ranked specifically by the count of occurrences of the phrase indie music On Mon, Aug 8, 2011 at 2:19 PM, Markus Jelsma markus.jel...@openindex.io wrote: Dismax queries can. But sort=termfreq(all_lists_text,'indie+music') is not using dismax. Apparenty termfreq function can not? I am not familiar with the termfreq function. It simply returns the TF of the given _term_ as it is indexed of the current document. Sorting on TF like this seems strange as by default queries are already sorted that way since TF plays a big role in the final score. To understand why you'd need to reindex, you might want to read up on how lucene actually works, to get a basic understanding of how different indexing choices effect what is possible at query time. Lucene In Action is a pretty good book. On 8/8/2011 5:02 PM, Jason Toy wrote: Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsmamarkus.jel...@openindex.iowrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That's normal, you issue a catch all query. Sorting should work but.. All the results don't have the phrase indie music anywhere in their data. Does termfreq not support phrases? No, it is TERM frequency and indie music is not one term. I don't know how this function parses your input but it might not understand your + escape and think it's one term constisting of exactly that. If not, how can I sort specifically by termfreq of a phrase? You cannot. What you can do is index multiple terms as one term using the shingle filter. Take care, it can significantly increase your index size and number of unique terms. On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toyjason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533 -- - sent from my mobile 6176064373 -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: is it possible to do a sort without query?
You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toy jason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms? -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)
I'd try solr.PhoneticFilterFactory, it usually converts these slight differences... schmidt, smith and schmid will be something like XMDT 2011/8/1 thomas tom.erfu...@googlemail.com Hi, we have several entries in our database our customer would like to find when using a not exactly matching search string. The Problem is kind of related to spelling correction and synonyms. But instead of single entries in synonyms.txt we would like a automatic solution for this group of problems: When searching for the name: schmid we want to find also documents with the name schmidt included. There are analog names like hildebrand and hildebrandt and more. That is the reason we'd like to find a automatic solution for this group of words. We allready use the following filters in our index chain filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=dictionary_de.txt/ filter class=solr.SnowballPorterFilterFactory language=German2 protected=protwords.txt/ Unfortunatelly the german stemmer is not handling such problems. Nor is this a problem related to compound words. Does anyone know of a solution? maybe its possible to set up a filter rule to extend words ending with letter d automatically with letter t in the query chain? Or other direction to remove t letters after d letters in index chain. Thanks a lot Thomas -- View this message in context: http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3216278.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Query on multi valued field
have you tried multi:1 and multi:2 and multi:3 ? 2011/7/29 rajini maski rajinima...@gmail.com Hi All, I have a specific requirement in the multi-valued field type.The requirement is as follows There is a multivalued field in each document which can have mutliple elements or single element. For Eg: Consider that following are the documents matched for say q= *:* *DOC1* doc arr name=multi str1/str /arr /doc * * *DOC2* doc arr name=multi str1/str str3/str str4/str /arr /doc *DOC3* doc arr name=multi str1/str str2/str /arr /doc The query is get only those documents which have multiple elements for that multivalued field. I.e, doc 2 and 3 should be returned from the above set.. Is there anyway to achieve this? Awaiting reply, Thanks Regards, Rajani -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Solr Incremental Indexing
I always have a field in my databases called datelastmodified, so whenever I update that record, i set it to getdate() - mssql func - and then get all latest records order by that field. 2011/7/29 Mohammed Lateef Hussain mohammedlateefh...@gmail.com Hi Need some help in Solr incremental indexing approch. I have built my Solr index using SolrJ API and now want to update the index whenever any changes has been made in database. My requirement is not to use DB triggers to call any update events. I want to update my index on the fly whenever my application updates any record in database. Note: My indexing logic to get the required data from DB is some what complex and involves many tables. Please suggest me how can I proceed here. Thanks Lateef -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: schema.xml changes, need re-indexing ?
I believe you're fine with that. Don't need to reindex all solr database. 2011/7/27 Charles-Andre Martin charles-andre.mar...@sunmedia.ca Hi, We currently have a big index in production. We would like to add 2 non-required fields to our schema.xml : field name=myfield type=boolean indexed=true stored=true required=false/ field name=myotherfield type=string indexed=true stored=true required=false multiValued=true/ I made some tests: - I stopped tomcat - I changed the schema.xml - I started tomcat The data was still there and I was able to add new document with theses 2 fields. So far, it looks I won't need to re-index all my data. Am I right ? Do I need to re-index all my data or in that case I'm fine ? Thank you ! Charles-André Martin -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: schema.xml changes, need re-indexing ?
I always run http://localhost:8983/solr/admin/cores?action=RELOADcore=corename in the browser when I wanna reload solr and see any changes in config xmls. 2011/7/27 François Schiettecatte fschietteca...@gmail.com I have not seen this mentioned anywhere, but I found a useful 'trick' to restart solr without having to restart tomcat. All you need to do is 'touch' the solr.xml in the solr.home directory. It can take a few seconds but solr will restart and reload any config. Cheers François On Jul 27, 2011, at 2:56 PM, Alexei Martchenko wrote: I believe you're fine with that. Don't need to reindex all solr database. 2011/7/27 Charles-Andre Martin charles-andre.mar...@sunmedia.ca Hi, We currently have a big index in production. We would like to add 2 non-required fields to our schema.xml : field name=myfield type=boolean indexed=true stored=true required=false/ field name=myotherfield type=string indexed=true stored=true required=false multiValued=true/ I made some tests: - I stopped tomcat - I changed the schema.xml - I started tomcat The data was still there and I was able to add new document with theses 2 fields. So far, it looks I won't need to re-index all my data. Am I right ? Do I need to re-index all my data or in that case I'm fine ? Thank you ! Charles-André Martin -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533