Re: Matching + and
On 24 November 2011 15:18, Tomasz Wegrzanowski tomasz.wegrzanow...@gmail.com wrote: On 22 November 2011 14:28, Jan Høydahl jan@cominvent.com wrote: Why do you need spaces in the replacement? Try pattern=\+ replacement=plus - it will cause the transformed charstream to contain as many tokens as the original and avoid the highlighting crash. I tried that, it still crashes. Replacing it with single character, including single non-ASCII character, doesn't cause a crash. I'm sort of tempted to just use reuse some CJK character, and synonym filter it to mean plus. In case anybody else runs into this problem, I found a solution. The only thing that works and doesn't seem to crash solr is CJK expansions: !-- they're not random, that's just what these characters mean -- charFilter class=solr.PatternReplaceCharFilterFactory pattern=\+ replacement=加/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=amp; replacement=和/ Followed by un-CJK-ing in synonym filter: # General rules 加 = plus 和 = and # And any special synonyms you want: r and d, r 和 d = r and d, research and development s and p, s 和 p = s and p, standand and poor's at and t, at 和 t = at and t, american telephone and telegraph User never sees these CJK characters, they only exist for a brief time within solr pipeline to make tokenizer happy. I also tried private use Unicode characters, but they're ignored by tokenizer.
Tidying files after optimize. Is a service restart mandatory?
Hello Brief question: How can I clean-up excess files after performing optimize without restarting the Tomcat service? Detail follows: I've been running several SOLR cores for approx 12 months and have recently noticed the disk usage of one of them is growing considerably faster than the rate at which documents are being added. - 1,200,000 docs 12 months ago used a 45 GB index - 1,700,000 docs today use a 87 GB index - There may have been _some_ deletions, almost certainly 100,000 - The documents are of a broadly uniform style, approx 1000 words So, approximately 45% growth in documents had grown the disk usage by approx 100%. I took a server out of production (I've 1 master 7 slaves) and did the following. I ran http://server/corename/update?stream.body=optimize/ on this core which added 49.4 GB to the index folder No previously existing files were deleted I restarted the Tomcat service ONLY the files generated by the optimize remained. All older files were deleted. This is the result I want, but not quite the method I'd prefer. How can I get to this position without restarting the service? Many thanks in advance for any advice you can give This email transmission is confidential and intended solely for the addressee. If you are not the intended addressee, you must not disclose, copy or distribute the contents of this transmission. If you have received this transmission in error, please notify the sender immediately. SThree Management Services Limited. Registered in England and Wales 4255086. Registered office 5th Floor, GPS House, 215-227 Great Portland Street, London, W1W 5PN. http://www.sthree.com
Re: Unable to index documents using DataImportHandler with MSSQL
Right. This is REALLY weird - I've now started from scratch on another machine (this time Windows 7), and got _exactly_ the same problem !? On Mon, Nov 28, 2011 at 7:37 AM, Husain, Yavar yhus...@firstam.com wrote: Hi Ian I am having exactly the same problem what you are having on Win 7 and 2008 Server http://lucene.472066.n3.nabble.com/DIH-Strange-Problem-tc3530370.html I still have not received any replies which could solve my problem till now. Please do let me know if you have arrived at some solution for your problem. Thanks. Regards, Yavar -Original Message- From: Ian Grainger [mailto:i...@isfluent.com] Sent: Friday, November 25, 2011 10:59 PM To: solr-user@lucene.apache.org Subject: Re: Unable to index documents using DataImportHandler with MSSQL Update on this: I've established: * It's not a problem in the DB (I can index from this DB into a Solr instance on another server) * It's not Tomcat (I get the same problem in Jetty) * It's not the schema (I have simplified it to one field) That leaves SolrConfig.xml and data-config. Only thing changed in SolrConfig.xml is adding: lib dir=D:/Software/Solr/example/solr/dist/ regex=apache-solr-cell-\d.*\.jar / lib dir=D:/Software/Solr/example/solr/dist/ regex=apache-solr-clustering-\d.*\.jar / lib dir=D:/Software/Solr/example/solr/dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configD:/Software/Solr/example/solr/conf/data-config.xml/str /lst /requestHandler And data-config.xml is pretty much as attached - except simpler. Any help or any advice on how to diagnose would be appreciated! On Fri, Nov 25, 2011 at 12:29 PM, Ian Grainger i...@isfluent.com wrote: Hi I have copied my Solr config from a working Windows server to a new one, and it can't seem to run an import. They're both using win server 2008 and SQL 2008R2. This is the data importer config dataConfig dataSource type=JdbcDataSource name=ds1 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost;databaseName=DB user=Solr password=pwd/ document name=datas entity name=data dataSource=ds1 pk=key query=EXEC SOLR_COMPANY_SEARCH_DATA deltaImportQuery=SELECT * FROM Company_Search_Data WHERE [key]='${dataimporter.delta.key}' deltaQuery=SELECT [key] FROM Company_Search_Data WHERE modify_dt '${dataimporter.last_index_time}' field column=WorkDesc_Comments name=WorkDesc_Comments_Split / field column=WorkDesc_Comments name=WorkDesc_Comments_Edge / /entity /document /dataConfig I can use MS SQL Profiler to watch the Solr user log in successfully, but then nothing. It doesn't seem to even try and execute the stored procedure. Any ideas why this would be working one server and not on another? FTR the only thing in the tomcat catalina log is: org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity data with URL: jdbc:sqlserver://localhost;databaseName=CATLive -- Ian i...@isfluent.com +44 (0)1223 257903 -- Ian i...@isfluent.com +44 (0)1223 257903 ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you.- ** FAFLD -- Ian i...@isfluent.com +44 (0)1223 257903
Re: highlighting on range query
I tried this url : http://localhost:8983/solr/select?q=rangefld:[5000%20TO%206000]fl=lily.id,rangefldhl=onrows=5wt=jsonindent=onhl.fl=*,rangefldhl.highlightMultiTerm=truehl.usePhraseHighlighter=truehl.useFastVectorHighlighter=false and output is { responseHeader:{ status:0, QTime:4, params:{ hl.highlightMultiTerm:true, fl:lily.id,rangefld, indent:on, hl.useFastVectorHighlighter:false, q:rangefld:[5000 TO 6000], hl.fl:*,rangefld, wt:json, hl.usePhraseHighlighter:true, hl:on, rows:5}}, response:{numFound:64,start:0,docs:[ { lily.id:UUID.c5f00cd3-343a-47c1-ab16-ace104b2540f, rangefld:5948}, { lily.id:UUID.ed69ece0-1b24-4829-afb6-22eb242939f2, rangefld:5749}, { lily.id:UUID.afa0c654-2f26-4c5b-9fda-8b51c5ec080d, rangefld:5739}, { lily.id:UUID.d92b405d-f41e-4c85-9014-1b89a986ec42, rangefld:5783}, { lily.id:UUID.102adde5-cbff-4ca6-acb1-426bb14fb579, rangefld:5753}] }, highlighting:{ UUID.c5f00cd3-343a-47c1-ab16-ace104b2540f:{}, UUID.ed69ece0-1b24-4829-afb6-22eb242939f2:{}, UUID.afa0c654-2f26-4c5b-9fda-8b51c5ec080d:{}, UUID.d92b405d-f41e-4c85-9014-1b89a986ec42:{}, UUID.102adde5-cbff-4ca6-acb1-426bb14fb579:{}}} Why rangefld is not coming in highlight result. On Mon, Nov 28, 2011 at 12:47 PM, Ahmet Arslan iori...@yahoo.com wrote: Any other Suggestion. as these suggestions are not working. Could it be that you are using FastVectorHighlighter? What happens when you add hl.useFastVectorHighlighter=false to your search URL? -- Thanks Regards Rahul Mehta
RE: Unable to index documents using DataImportHandler with MSSQL
Hi Ian I downloaded and build latest Solr (3.4) from sources and finally hit following line of code in Solr (where I put my debug statement) : if(url != null){ LOG.info(Yavar: getting handle to driver manager:); c = DriverManager.getConnection(url, initProps); LOG.info(Yavar: got handle to driver manager:); } The call to Driver Manager was not returning. Here was the error!! The Driver we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded another driver called jTDS jDBC driver and installed that. Problem got fixed!!! So please follow the following steps: 1. Download jTDS jDBC driver from http://jtds.sourceforge.net/ 2. Put the driver jar file into your Solr/lib directory where you had put Microsoft JDBC driver. 3. In the data-config.xml use this statement: driver=net.sourceforge.jtds.jdbc.Driver 4. Also in data-config.xml mention url like this: url=jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX 5. Now run your indexing. It should solve the problem. Regards, Yavar -Original Message- From: Ian Grainger [mailto:i...@isfluent.com] Sent: Monday, November 28, 2011 4:11 PM To: Husain, Yavar Cc: solr-user@lucene.apache.org Subject: Re: Unable to index documents using DataImportHandler with MSSQL Right. This is REALLY weird - I've now started from scratch on another machine (this time Windows 7), and got _exactly_ the same problem !? On Mon, Nov 28, 2011 at 7:37 AM, Husain, Yavar yhus...@firstam.com wrote: Hi Ian I am having exactly the same problem what you are having on Win 7 and 2008 Server http://lucene.472066.n3.nabble.com/DIH-Strange-Problem-tc3530370.html I still have not received any replies which could solve my problem till now. Please do let me know if you have arrived at some solution for your problem. Thanks. Regards, Yavar -Original Message- From: Ian Grainger [mailto:i...@isfluent.com] Sent: Friday, November 25, 2011 10:59 PM To: solr-user@lucene.apache.org Subject: Re: Unable to index documents using DataImportHandler with MSSQL Update on this: I've established: * It's not a problem in the DB (I can index from this DB into a Solr instance on another server) * It's not Tomcat (I get the same problem in Jetty) * It's not the schema (I have simplified it to one field) That leaves SolrConfig.xml and data-config. Only thing changed in SolrConfig.xml is adding: lib dir=D:/Software/Solr/example/solr/dist/ regex=apache-solr-cell-\d.*\.jar / lib dir=D:/Software/Solr/example/solr/dist/ regex=apache-solr-clustering-\d.*\.jar / lib dir=D:/Software/Solr/example/solr/dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configD:/Software/Solr/example/solr/conf/data-config.xml/str /lst /requestHandler And data-config.xml is pretty much as attached - except simpler. Any help or any advice on how to diagnose would be appreciated! On Fri, Nov 25, 2011 at 12:29 PM, Ian Grainger i...@isfluent.com wrote: Hi I have copied my Solr config from a working Windows server to a new one, and it can't seem to run an import. They're both using win server 2008 and SQL 2008R2. This is the data importer config dataConfig dataSource type=JdbcDataSource name=ds1 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost;databaseName=DB user=Solr password=pwd/ document name=datas entity name=data dataSource=ds1 pk=key query=EXEC SOLR_COMPANY_SEARCH_DATA deltaImportQuery=SELECT * FROM Company_Search_Data WHERE [key]='${dataimporter.delta.key}' deltaQuery=SELECT [key] FROM Company_Search_Data WHERE modify_dt '${dataimporter.last_index_time}' field column=WorkDesc_Comments name=WorkDesc_Comments_Split / field column=WorkDesc_Comments name=WorkDesc_Comments_Edge / /entity /document /dataConfig I can use MS SQL Profiler to watch the Solr user log in successfully, but then nothing. It doesn't seem to even try and execute the stored procedure. Any ideas why this would be working one server and not on another? FTR the only thing in the tomcat catalina log is: org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity data with URL: jdbc:sqlserver://localhost;databaseName=CATLive -- Ian i...@isfluent.com +44 (0)1223 257903 -- Ian i...@isfluent.com +44 (0)1223 257903 ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to
RE: DIH Strange Problem
I figured out the solution and Microsoft and not Solr is the problem here :): I downloaded and build latest Solr (3.4) from sources and finally hit following line of code in Solr (where I put my debug statement) : if(url != null){ LOG.info(Yavar: getting handle to driver manager:); c = DriverManager.getConnection(url, initProps); LOG.info(Yavar: got handle to driver manager:); } The call to Driver Manager was not returning. Here was the error!! The Driver we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded another driver called jTDS jDBC driver and installed that. Problem got fixed!!! So please follow the following steps: 1. Download jTDS jDBC driver from http://jtds.sourceforge.net/ 2. Put the driver jar file into your Solr/lib directory where you had put Microsoft JDBC driver. 3. In the data-config.xml use this statement: driver=net.sourceforge.jtds.jdbc.Driver 4. Also in data-config.xml mention url like this: url=jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX 5. Now run your indexing. It should solve the problem. -Original Message- From: Husain, Yavar Sent: Thursday, November 24, 2011 12:38 PM To: solr-user@lucene.apache.org; Shawn Heisey Subject: RE: DIH Strange Problem Hi Thanks for your replies. I carried out these 2 steps (it did not solve my problem): 1. I tried setting responseBuffering to adaptive. Did not work. 2. For checking Database connection I wrote a simple java program to connect to database and fetch some results with the same driver that I use for solr. It worked. So it does not seem to be a problem with the connection. Now I am stuck where Tomcat log says: Creating a connection for entity . and does nothing, I mean after this log we usually get the getConnection() took x millisecond however I dont get that ,I can just see the time moving with no records getting fetched. Original Problem listed again: I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing data. Indexing and all was working perfectly fine. However today when I started full indexing again, Solr halts/stucks at the line Creating a connection for entity. There are no further messages after that. I can see that DIH is busy and on the DIH console I can see A command is still running, I can also see total rows fetched = 0 and total request made to datasource = 1 and time is increasing however it is not doing anything. This is the exact configuration that worked for me. I am not really able to understand the problem here. Also in the index directory where I am storing the index there are just 3 files: 2 segment files + 1 lucene*-write.lock file. ... data-config.xml: dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders user=testUser password=password/ document . . Logs: INFO: Server startup in 2016 ms Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6] Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1322041133719 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity SampleText with URL: jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, November 23, 2011 7:36 PM To: solr-user@lucene.apache.org Subject: Re: DIH Strange Problem On 11/23/2011 5:21 AM, Chantal Ackermann wrote: Hi Yavar, my experience with similar problems was that there was something wrong with the database connection or the database. Chantal It's also possible that your JDBC driver might be trying to buffer the entire result set. There's a link on the wiki specifically for this problem on MS SQL server. Hopefully it's that, but Chantal could be right too. http://wiki.apache.org/solr/DataImportHandlerFaq Here's the URL to the specific paragraph, but it's likely that it won't survive the email trip in a clickable form:
Using data import handler to clean up db
Hi, I am using a mysql db to store all my data. I had finished configuring my data import handler to get data into solr and then realized about taking care of deletes. This is what i did to handle delete 1) a mysql table 'DeletedContentMapping' with deleted id's 2) deletedPkQuery - to fetch all id's from that table. The problem i face now is it to remove data from 'DeletedContentMapping' table. I used postImportDeleteQuery to issue a delete but it doesn't seem to work. I know a better solution would be to add a time stamp field to 'DeletedContentMapping' table but it is not possible as tables cannot be changed. Thanks for the replies in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-data-import-handler-to-clean-up-db-tp3542026p3542026.html Sent from the Solr - User mailing list archive at Nabble.com.
help no segment in my lucene index!!!
Hi all, after a power supply inperruption my lucene index (about 28 GB) looks like this: 18/11/2011 20:29 2.016.961.997 _3d.fdt 18/11/2011 20:29 1.816.004 _3d.fdx 18/11/2011 20:2989 _3d.fnm 18/11/2011 20:30 197.323.436 _3d.frq 18/11/2011 20:30 1.816.004 _3d.nrm 18/11/2011 20:30 358.016.461 _3d.prx 18/11/2011 20:30 637.604 _3d.tii 18/11/2011 20:3048.565.519 _3d.tis 18/11/2011 20:31 454.004 _3d.tvd 18/11/2011 20:31 1.695.380.935 _3d.tvf 18/11/2011 20:31 3.632.004 _3d.tvx 18/11/2011 23:33 2.048.500.822 _6g.fdt 18/11/2011 23:33 3.032.004 _6g.fdx 18/11/2011 23:3389 _6g.fnm 18/11/2011 23:34 221.593.644 _6g.frq 18/11/2011 23:34 3.032.004 _6g.nrm 18/11/2011 23:34 350.136.996 _6g.prx 18/11/2011 23:34 683.668 _6g.tii 18/11/2011 23:3452.224.328 _6g.tis 18/11/2011 23:36 758.004 _6g.tvd 18/11/2011 23:36 1.758.786.158 _6g.tvf 18/11/2011 23:36 6.064.004 _6g.tvx 19/11/2011 03:29 1.966.167.843 _9j.fdt 19/11/2011 03:29 3.832.004 _9j.fdx 19/11/2011 03:2889 _9j.fnm 19/11/2011 03:30 222.733.606 _9j.frq 19/11/2011 03:30 3.832.004 _9j.nrm 19/11/2011 03:30 324.722.843 _9j.prx 19/11/2011 03:30 715.441 _9j.tii 19/11/2011 03:3054.488.546 _9j.tis without any segment files! I tried to fix with CheckIndex utility in lucene, but I got the following message: ERROR: could not read any segments file in directory org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.a pache.lucene.store.MMapDirectory@E:\recover_me lockFactory=org.apache.lucene.sto re.NativeFSLockFactory@5d36d1d7: files: [_3d.fdt, _3d.fdx, _3d.fnm, _3d.frq, _3d .nrm, _3d.prx, _3d.tii, _3d.tis, _3d.tvd, _3d.tvf, _3d.tvx, _6g.fdt, _6g.fdx, _6 g.fnm, _6g.frq, _6g.nrm, _6g.prx, _6g.tii, _6g.tis, _6g.tvd, _6g.tvf, _6g.tvx, _ 9j.fdt, _9j.fdx, _9j.fnm, _9j.frq, _9j.nrm, _9j.prx, _9j.tii, _9j.tis, _9j.tvd, _9j.tvf, _9j.tvx, _cf.cfs, _cm.fdt, _cm.fdx, _cm.fnm, _cm.frq, _cm.nrm, _cm.prx, _cm.tii, _cm.tis, _cm.tvd, _cm.tvf, _cm.tvx, _ff.fdt, _ff.fdx, _ff.fnm, _ff.frq , _ff.nrm, _ff.prx, _ff.tii, _ff.tis, _ff.tvd, _ff.tvf, _ff.tvx, _ii.fdt, _ii.fd x, _ii.fnm, _ii.frq, _ii.nrm, _ii.prx, _ii.tii, _ii.tis, _ii.tvd, _ii.tvf, _ii.t vx, _lc.cfs, _ll.fdt, _ll.fdx, _ll.fnm, _ll.frq, _ll.nrm, _ll.prx, _ll.tii, _ll. tis, _ll.tvd, _ll.tvf, _ll.tvx, _lo.cfs, _lp.cfs, _lq.cfs, _lr.cfs, _ls.cfs, _lt .cfs, _lu.cfs, _lv.cfs, _lw.fdt, _lw.fdx, _lw.tvd, _lw.tvf, _lw.tvx, _m.fdt, _m. fdx, _m.fnm, _m.frq, _m.nrm, _m.prx, _m.tii, _m.tis, _m.tvd, _m.tvf, _m.tvx] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo s.java:712) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo s.java:593) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:327) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:995) There's a way to recover this index ? Cheers Rob
fuzzy search with prefix
Hi All, I making fuzzy search in my solr application like below, q:squre~ 0.6 i want that some prefix length should not go for match in fuzzy query , say for in this ex. i want that my fuzzy query should not go to match for squ , and rest of term go for fuzzy search. i am doing it by applying wild query with fuzzy query as below q:squre~ 0.6 AND squ* i want to know that , is any better way of doing this? , as per i read around for it, i read that we can set prefix length in our fuzzy query for no. of char. we dont want to match in our fuzzy query. but i didn't found anything how can i set it my solr fuzzy query. Thanks in Advance. Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/fuzzy-search-with-prefix-tp3542064p3542064.html Sent from the Solr - User mailing list archive at Nabble.com.
make fuzzy search for phrase
Hi All, I am doing fuzzy search in my solr , its working good for signle term , but when searching for phrases i get either bulk of data or very less data. is there any good way for getting satisfactory amount of data with nice accuracy. 1) q:kenny zemanski : 9 recors 2) keny~0.7 zemansi~0.7 AND ken* : 22948 records. i want to get amount of data that is good in accuracy and some what near to my actual results. by applying more accuracy than if 0.7 , i am getting very less data and none match with my desired result. anybody have any idea? any help much appreciated. Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/make-fuzzy-search-for-phrase-tp3542079p3542079.html Sent from the Solr - User mailing list archive at Nabble.com.
[newbie] solrj SolrQuery indent response
Hi List, I am new to Solr and lucene world. I have a simple question. I wrote below code segment and it works. public class SolrjTest { public static void main(String[] args) throws MalformedURLException, SolrServerException{ ClassPathXmlApplicationContext c = new ClassPathXmlApplicationContext(/patrades-search-solrj-test-beans.xml); SolrServer server = (SolrServer)c.getBean(solrServer); SolrQuery query = new SolrQuery(); query.setQuery( *:* ) .setFacet(true) ; QueryResponse rsp = server.query( query ); System.err.println(rsp.toString()); } } I want to indent the response string, but couldnt find any answer. I looked at book, mail archive and google. Most relevant link is below http://wiki.apache.org/solr/SimpleFacetParameters but i dont know how to use it. Api is hard to read by the way. regards, -Halil AĞIN
RE: DIH Strange Problem
Do you use Java 6 update 29? There is a known issue with the latest mssql driver: http://blogs.msdn.com/b/jdbcteam/archive/2011/11/07/supported-java-versions-november-2011.aspx In addition, there are known connection failure issues with Java 6 update 29, and the developer preview (non production) versions of Java 6 update 30 and Java 6 update 30 build 12. We are in contact with Java on these issues and we will update this blog once we have more information. Should work with update 28. Kai -Original Message- From: Husain, Yavar [mailto:yhus...@firstam.com] Sent: Monday, November 28, 2011 1:02 PM To: solr-user@lucene.apache.org; Shawn Heisey Subject: RE: DIH Strange Problem I figured out the solution and Microsoft and not Solr is the problem here :): I downloaded and build latest Solr (3.4) from sources and finally hit following line of code in Solr (where I put my debug statement) : if(url != null){ LOG.info(Yavar: getting handle to driver manager:); c = DriverManager.getConnection(url, initProps); LOG.info(Yavar: got handle to driver manager:); } The call to Driver Manager was not returning. Here was the error!! The Driver we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded another driver called jTDS jDBC driver and installed that. Problem got fixed!!! So please follow the following steps: 1. Download jTDS jDBC driver from http://jtds.sourceforge.net/ 2. Put the driver jar file into your Solr/lib directory where you had put Microsoft JDBC driver. 3. In the data-config.xml use this statement: driver=net.sourceforge.jtds.jdbc.Driver 4. Also in data-config.xml mention url like this: url=jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX 5. Now run your indexing. It should solve the problem. -Original Message- From: Husain, Yavar Sent: Thursday, November 24, 2011 12:38 PM To: solr-user@lucene.apache.org; Shawn Heisey Subject: RE: DIH Strange Problem Hi Thanks for your replies. I carried out these 2 steps (it did not solve my problem): 1. I tried setting responseBuffering to adaptive. Did not work. 2. For checking Database connection I wrote a simple java program to connect to database and fetch some results with the same driver that I use for solr. It worked. So it does not seem to be a problem with the connection. Now I am stuck where Tomcat log says: Creating a connection for entity . and does nothing, I mean after this log we usually get the getConnection() took x millisecond however I dont get that ,I can just see the time moving with no records getting fetched. Original Problem listed again: I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing data. Indexing and all was working perfectly fine. However today when I started full indexing again, Solr halts/stucks at the line Creating a connection for entity. There are no further messages after that. I can see that DIH is busy and on the DIH console I can see A command is still running, I can also see total rows fetched = 0 and total request made to datasource = 1 and time is increasing however it is not doing anything. This is the exact configuration that worked for me. I am not really able to understand the problem here. Also in the index directory where I am storing the index there are just 3 files: 2 segment files + 1 lucene*-write.lock file. ... data-config.xml: dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders user=testUser password=password/ document . . Logs: INFO: Server startup in 2016 ms Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6] Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1322041133719 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity SampleText with URL: jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, November 23, 2011 7:36 PM To: solr-user@lucene.apache.org Subject: Re: DIH Strange Problem On 11/23/2011 5:21 AM, Chantal Ackermann wrote:
Re: [newbie] solrj SolrQuery indent response
I step one more. but still no indent. I wrote below code segment query.setQuery( marka_s:atak* ) .setFacet(true) .setParam(indent, on) ; and here is the resulted query string q=marka_s%3Aatak*facet=trueindent=on -halil agin. On Mon, Nov 28, 2011 at 3:07 PM, halil halil.a...@gmail.com wrote: Hi List, I am new to Solr and lucene world. I have a simple question. I wrote below code segment and it works. public class SolrjTest { public static void main(String[] args) throws MalformedURLException, SolrServerException{ ClassPathXmlApplicationContext c = new ClassPathXmlApplicationContext(/patrades-search-solrj-test-beans.xml); SolrServer server = (SolrServer)c.getBean(solrServer); SolrQuery query = new SolrQuery(); query.setQuery( *:* ) .setFacet(true) ; QueryResponse rsp = server.query( query ); System.err.println(rsp.toString()); } } I want to indent the response string, but couldnt find any answer. I looked at book, mail archive and google. Most relevant link is below http://wiki.apache.org/solr/SimpleFacetParameters but i dont know how to use it. Api is hard to read by the way. regards, -Halil AĞIN
Search over multiple indexes
Hello, I'm trying to implement automatic document classification and store the classified attributes as an additional field in Solr document. Then the search goes against that field like q=classified_category:xyz. The document classification is currently implemented as an UpdateRequestProcessor and works quite well. The only problem: for each change in the classification algorithm every document has to be re-indexed which, of course, makes tests and experimentation difficult and binds resources (other than Solr) for several hours. So, my idea would be to store classified attributes in a meta-index and search over the main and meta indexes simultaneously. For example: main index has got fields like color and meta index has got classified_category. The query q=classified_category:xyz AND color:black should be then split over the main and meta index. This way, the classification could run on Solr over the main index and store classified fields in the meta index so that only Solr resources are bound. Has anybody already done something like that? It's a little bit like sharding but different in that each shard would process its part of the query and live in the same Solr instance. Regards, Valeriy
Re: Huge Performance: Solr distributed search
Hi all again. Thanks to all for your replies. On this weekend I'd made some interesting tests, and I would like to share it with you. First of all I made speed test of my hdd: root@LSolr:~# hdparm -t /dev/sda9 /dev/sda9: Timing buffered disk reads: 146 MB in 3.01 seconds = 48.54 MB/sec Then with iperf I had tested my network: [ 4] 0.0-18.7 sec 2.00 GBytes917 Mbits/sec Then, I tried to post my quesries using shard parameter with one shard, so my queries were like: http://localhost:8080/solr1/select/?q=(test)qt=requestShards http://localhost:8080/solr1/select/?q=%28test%29qt=requestShards where requestShards is: requestHandler name=requestShards class=solr.SearchHandler default=false lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=shards127.0.0.1:8080/solr1 http://127.0.0.1:8080/solr1/str /lst /requestHandler Maybe its not correct, but: INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(genuflections)qt=requestShardsrows=2000}status=0 QTime=6525 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(tunefulness)qt=requestShardsrows=2000} status=0 QTime=20170 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(societal)qt=requestShardsrows=2000} status=0 QTime=44958 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(euchre's)qt=requestShardsrows=2000} status=0 QTime=32161 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(monogram's)qt=requestShardsrows=2000} status=0 QTime=85252 When I posted similar queries direct to solr1 without requestShards I had: INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(reopening)rows=2000} hits=712 status=0 QTime=10 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(housemothers)rows=2000} hits=0 status=0 QTime=446 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(harpooners)rows=2000} hits=76 status=0 QTime=399 INFO: [] webapp=/solr1 path=/select/ params={fl=*,scoreident=truestart=0q=(coaxing)rows=2000} hits=562 status=0 QTime=2820 INFO: [] webapp=/solr1 path=/select/ params={fl=*,scoreident=truestart=0q=(superstar's)rows=2000} hits=4748 status=0 QTime=672 INFO: [] webapp=/solr1 path=/select/ params={fl=*,scoreident=truestart=0q=(sedateness's)rows=2000} hits=136 status=0 QTime=923 INFO: [] webapp=/solr1 path=/select/ params={fl=*,scoreident=truestart=0q=(petrolatum)rows=2000} hits=8 status=0 QTime=6183 INFO: [] webapp=/solr1 path=/select/ params={fl=*,scoreident=truestart=0q=(everlasting's)rows=2000} hits=1522 status=0 QTime=2625 And finally I found a bug: https://issues.apache.org/jira/browse/SOLR-1524 https://issues.apache.org/jira/browse/SOLR-1524 Why is no activity on it? Its not actual? Today I wrote a bash script: #!/bin/bash ds=$(date +%s.%N) echo START: $ds ./data/east_2000 curl http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=(east)rows=2000 http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=%28east%29rows=2000-s -s-H 'Content-type:text/xml; charset=utf-8' ./data/east_2000 de=$(date +%s.%N) ddf=$(echo $de - $ds | bc) echo END: $de ./data/east_2000 echo DIFF: $ddf ./data/east_2000 Before runing a Tomcat I'd dropped cache: root@LSolr:~# echo 3 /proc/sys/vm/drop_caches Then I started Tomcat and run the script. Result is bellow: START: 1322476131.783146691 ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime125/intlst name=paramsstr name=fl*,score/strstr name=identtrue/strstr name=start0/strstr name=q(east)/strstr name=rows2000/str/lst/lstresult name=response numFound=21439 start=0 maxScore=4.387605 ... /response END: 1322476180.262770244 DIFF: 48.479623553 File size is: root@LSolr:~# ls -l | grep east -rw-r--r-- 1 root root 1063579 Nov 28 12:29 east_2000 I'm using nmon to monitor a HDD activity. It was near 100% when I run the script. But when I tried to run it again the result was: DIFF: .063678709 and no much HDD activity at nmon. I can't undestand one thing: is this my huge hardware such as slow HDDor its a Solr troubles? And why is no activity on bug https://issues.apache.org/jira/browse/SOLR-1524 https://issues.apache.org/jira/browse/SOLR-1524 since 27/Oct/09 07:19? On 11/25/2011 10:02 AM, Dmitry Kan wrote: 45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and shard given 12GB of RAM max. !-- Filter Cache Cache used by SolrIndexSearcher for filters (DocSets), unordered sets of *all* documents that match a query. When a new searcher is opened, its caches may be prepopulated or autowarmed using data from caches in the old searcher. autowarmCount is the number of items to prepopulate. For LRUCache, the autowarmed items will be
Fuzzy search with slop
Hi, Can i apply fuzzy query and slop together... like q=hello world~0.5~3 I am getting error when applying like this. i want to make both fuzzy search and slop work. How can i do this, can anybody help me? Thanks in Advance. Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/Fuzzy-search-with-slop-tp3542280p3542280.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: help no segment in my lucene index!!!
Which version of Solr/Lucene were you using when you hit power loss? There was a known bug that could allow power loss to cause corruption, but this was fixed in Lucene 3.4.0. Unfortunately, there is no easy way to recreate the segments_N file... in principle it should be possible and maybe not too much work but nobody has created such a tool yet, that I know of. Mike McCandless http://blog.mikemccandless.com On Mon, Nov 28, 2011 at 5:54 AM, Roberto Iannone roberto.iann...@gmail.com wrote: Hi all, after a power supply inperruption my lucene index (about 28 GB) looks like this: 18/11/2011 20:29 2.016.961.997 _3d.fdt 18/11/2011 20:29 1.816.004 _3d.fdx 18/11/2011 20:29 89 _3d.fnm 18/11/2011 20:30 197.323.436 _3d.frq 18/11/2011 20:30 1.816.004 _3d.nrm 18/11/2011 20:30 358.016.461 _3d.prx 18/11/2011 20:30 637.604 _3d.tii 18/11/2011 20:30 48.565.519 _3d.tis 18/11/2011 20:31 454.004 _3d.tvd 18/11/2011 20:31 1.695.380.935 _3d.tvf 18/11/2011 20:31 3.632.004 _3d.tvx 18/11/2011 23:33 2.048.500.822 _6g.fdt 18/11/2011 23:33 3.032.004 _6g.fdx 18/11/2011 23:33 89 _6g.fnm 18/11/2011 23:34 221.593.644 _6g.frq 18/11/2011 23:34 3.032.004 _6g.nrm 18/11/2011 23:34 350.136.996 _6g.prx 18/11/2011 23:34 683.668 _6g.tii 18/11/2011 23:34 52.224.328 _6g.tis 18/11/2011 23:36 758.004 _6g.tvd 18/11/2011 23:36 1.758.786.158 _6g.tvf 18/11/2011 23:36 6.064.004 _6g.tvx 19/11/2011 03:29 1.966.167.843 _9j.fdt 19/11/2011 03:29 3.832.004 _9j.fdx 19/11/2011 03:28 89 _9j.fnm 19/11/2011 03:30 222.733.606 _9j.frq 19/11/2011 03:30 3.832.004 _9j.nrm 19/11/2011 03:30 324.722.843 _9j.prx 19/11/2011 03:30 715.441 _9j.tii 19/11/2011 03:30 54.488.546 _9j.tis without any segment files! I tried to fix with CheckIndex utility in lucene, but I got the following message: ERROR: could not read any segments file in directory org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.a pache.lucene.store.MMapDirectory@E:\recover_me lockFactory=org.apache.lucene.sto re.NativeFSLockFactory@5d36d1d7: files: [_3d.fdt, _3d.fdx, _3d.fnm, _3d.frq, _3d .nrm, _3d.prx, _3d.tii, _3d.tis, _3d.tvd, _3d.tvf, _3d.tvx, _6g.fdt, _6g.fdx, _6 g.fnm, _6g.frq, _6g.nrm, _6g.prx, _6g.tii, _6g.tis, _6g.tvd, _6g.tvf, _6g.tvx, _ 9j.fdt, _9j.fdx, _9j.fnm, _9j.frq, _9j.nrm, _9j.prx, _9j.tii, _9j.tis, _9j.tvd, _9j.tvf, _9j.tvx, _cf.cfs, _cm.fdt, _cm.fdx, _cm.fnm, _cm.frq, _cm.nrm, _cm.prx, _cm.tii, _cm.tis, _cm.tvd, _cm.tvf, _cm.tvx, _ff.fdt, _ff.fdx, _ff.fnm, _ff.frq , _ff.nrm, _ff.prx, _ff.tii, _ff.tis, _ff.tvd, _ff.tvf, _ff.tvx, _ii.fdt, _ii.fd x, _ii.fnm, _ii.frq, _ii.nrm, _ii.prx, _ii.tii, _ii.tis, _ii.tvd, _ii.tvf, _ii.t vx, _lc.cfs, _ll.fdt, _ll.fdx, _ll.fnm, _ll.frq, _ll.nrm, _ll.prx, _ll.tii, _ll. tis, _ll.tvd, _ll.tvf, _ll.tvx, _lo.cfs, _lp.cfs, _lq.cfs, _lr.cfs, _ls.cfs, _lt .cfs, _lu.cfs, _lv.cfs, _lw.fdt, _lw.fdx, _lw.tvd, _lw.tvf, _lw.tvx, _m.fdt, _m. fdx, _m.fnm, _m.frq, _m.nrm, _m.prx, _m.tii, _m.tis, _m.tvd, _m.tvf, _m.tvx] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo s.java:712) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo s.java:593) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:327) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:995) There's a way to recover this index ? Cheers Rob
Re: highlighting on range query
and output is { responseHeader:{ status:0, QTime:4, params:{ hl.highlightMultiTerm:true, fl:lily.id,rangefld, indent:on, hl.useFastVectorHighlighter:false, q:rangefld:[5000 TO 6000], hl.fl:*,rangefld, I don't think hl.fl parameter accepts * value. Please try hl.fl=rangefld
Re: make fuzzy search for phrase
I am doing fuzzy search in my solr , its working good for signle term , but when searching for phrases i get either bulk of data or very less data. is there any good way for getting satisfactory amount of data with nice accuracy. 1) q:kenny zemanski : 9 recors 2) keny~0.7 zemansi~0.7 AND ken* : 22948 records. You can do it with https://issues.apache.org/jira/browse/SOLR-1604 q=keny~0.7 zemansi~0.7 AND ken*
Re: Fuzzy search with slop
Can i apply fuzzy query and slop together... like q=hello world~0.5~3 I am getting error when applying like this. i want to make both fuzzy search and slop work. How can i do this, can anybody help me? It is possible with this plugin. https://issues.apache.org/jira/browse/SOLR-1604
how index words with their perfix in solr?
I use solr 3.3,I want solr index words with their suffixes. when i index 'book' and 'books' and search 'book', solr show any document that has 'book' or 'books' but when I index 'rain' and 'rainy' and search 'rain', solr show any document that has 'rain' but i whant that solr show any document that has 'rain' or 'rainy'.help me. -- View this message in context: http://lucene.472066.n3.nabble.com/how-index-words-with-their-perfix-in-solr-tp3542300p3542300.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: highlighting on range query
Tried below url and got the same output. Any other suggestion . http://localhost:8983/solr/select?q=rangefld:[5000%20TO%206000]fl=lily.id,rangefldhl=onrows=5wt=jsonindent=onhl.fl=rangefldhl.highlightMultiTerm=truehl.usePhraseHighlighter=truehl.useFastVectorHighlighter=false On Mon, Nov 28, 2011 at 8:10 PM, Ahmet Arslan iori...@yahoo.com wrote: and output is { responseHeader:{ status:0, QTime:4, params:{ hl.highlightMultiTerm:true, fl:lily.id,rangefld, indent:on, hl.useFastVectorHighlighter:false, q:rangefld:[5000 TO 6000], hl.fl:*,rangefld, I don't think hl.fl parameter accepts * value. Please try hl.fl=rangefld -- Thanks Regards Rahul Mehta
Re: how index words with their perfix in solr?
It looks like you are using the plural stemmer, you might want to look into using the Porter stemmer instead: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming François On Nov 28, 2011, at 9:14 AM, mina wrote: I use solr 3.3,I want solr index words with their suffixes. when i index 'book' and 'books' and search 'book', solr show any document that has 'book' or 'books' but when I index 'rain' and 'rainy' and search 'rain', solr show any document that has 'rain' but i whant that solr show any document that has 'rain' or 'rainy'.help me. -- View this message in context: http://lucene.472066.n3.nabble.com/how-index-words-with-their-perfix-in-solr-tp3542300p3542300.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: help no segment in my lucene index!!!
Hi Michael, thx for your help :) 2011/11/28 Michael McCandless luc...@mikemccandless.com Which version of Solr/Lucene were you using when you hit power loss? I'm using Lucene 3.4. There was a known bug that could allow power loss to cause corruption, but this was fixed in Lucene 3.4.0. Unfortunately, there is no easy way to recreate the segments_N file... in principle it should be possible and maybe not too much work but nobody has created such a tool yet, that I know of. some hints about how could I write this code by myself ? Cheers Rob
Re: turning off solr server verbosity
Hi Ahmet, thanks. Is this not then a jetty setting? I'll search for that. RR Ahmet Arslan wrote: I have not managed to figure out how to prevent verbose output of the solr server. I assume the verbosity on the server side slows down the response and it would be preferable to turn it off? If anyone knows how to achieve this, advice would be appreciated. Fuad reported such improvement gained by disabling info log level. Here is the original post : http://search-lucene.com/m/VBFAXnwp6x1
Re: Unable to index documents using DataImportHandler with MSSQL
Hah, I've just come on here to suggest you do the same thing! Thanks for getting back to me - and interesting we both came up with the same solution! Now I have the problem that running a delta update updates the 'dataimport.properties' file - but then just re-fetches all the data regardless! Weird! On Mon, Nov 28, 2011 at 11:59 AM, Husain, Yavar yhus...@firstam.com wrote: Hi Ian I downloaded and build latest Solr (3.4) from sources and finally hit following line of code in Solr (where I put my debug statement) : if(url != null){ LOG.info(Yavar: getting handle to driver manager:); c = DriverManager.getConnection(url, initProps); LOG.info(Yavar: got handle to driver manager:); } The call to Driver Manager was not returning. Here was the error!! The Driver we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded another driver called jTDS jDBC driver and installed that. Problem got fixed!!! So please follow the following steps: 1. Download jTDS jDBC driver from http://jtds.sourceforge.net/ 2. Put the driver jar file into your Solr/lib directory where you had put Microsoft JDBC driver. 3. In the data-config.xml use this statement: driver=net.sourceforge.jtds.jdbc.Driver 4. Also in data-config.xml mention url like this: url=jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX 5. Now run your indexing. It should solve the problem. Regards, Yavar -Original Message- From: Ian Grainger [mailto:i...@isfluent.com] Sent: Monday, November 28, 2011 4:11 PM To: Husain, Yavar Cc: solr-user@lucene.apache.org Subject: Re: Unable to index documents using DataImportHandler with MSSQL Right. This is REALLY weird - I've now started from scratch on another machine (this time Windows 7), and got _exactly_ the same problem !? On Mon, Nov 28, 2011 at 7:37 AM, Husain, Yavar yhus...@firstam.com wrote: Hi Ian I am having exactly the same problem what you are having on Win 7 and 2008 Server http://lucene.472066.n3.nabble.com/DIH-Strange-Problem-tc3530370.html I still have not received any replies which could solve my problem till now. Please do let me know if you have arrived at some solution for your problem. Thanks. Regards, Yavar -Original Message- From: Ian Grainger [mailto:i...@isfluent.com] Sent: Friday, November 25, 2011 10:59 PM To: solr-user@lucene.apache.org Subject: Re: Unable to index documents using DataImportHandler with MSSQL Update on this: I've established: * It's not a problem in the DB (I can index from this DB into a Solr instance on another server) * It's not Tomcat (I get the same problem in Jetty) * It's not the schema (I have simplified it to one field) That leaves SolrConfig.xml and data-config. Only thing changed in SolrConfig.xml is adding: lib dir=D:/Software/Solr/example/solr/dist/ regex=apache-solr-cell-\d.*\.jar / lib dir=D:/Software/Solr/example/solr/dist/ regex=apache-solr-clustering-\d.*\.jar / lib dir=D:/Software/Solr/example/solr/dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configD:/Software/Solr/example/solr/conf/data-config.xml/str /lst /requestHandler And data-config.xml is pretty much as attached - except simpler. Any help or any advice on how to diagnose would be appreciated! On Fri, Nov 25, 2011 at 12:29 PM, Ian Grainger i...@isfluent.com wrote: Hi I have copied my Solr config from a working Windows server to a new one, and it can't seem to run an import. They're both using win server 2008 and SQL 2008R2. This is the data importer config dataConfig dataSource type=JdbcDataSource name=ds1 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost;databaseName=DB user=Solr password=pwd/ document name=datas entity name=data dataSource=ds1 pk=key query=EXEC SOLR_COMPANY_SEARCH_DATA deltaImportQuery=SELECT * FROM Company_Search_Data WHERE [key]='${dataimporter.delta.key}' deltaQuery=SELECT [key] FROM Company_Search_Data WHERE modify_dt '${dataimporter.last_index_time}' field column=WorkDesc_Comments name=WorkDesc_Comments_Split / field column=WorkDesc_Comments name=WorkDesc_Comments_Edge / /entity /document /dataConfig I can use MS SQL Profiler to watch the Solr user log in successfully, but then nothing. It doesn't seem to even try and execute the stored procedure. Any ideas why this would be working one server and not on another? FTR the only thing in the tomcat catalina log is: org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity data with URL: jdbc:sqlserver://localhost;databaseName=CATLive -- Ian i...@isfluent.com +44 (0)1223 257903
RE: DIH Strange Problem
Thanks Kai for sharing this. Ian encountered the same problem so marking him in the mail too. From: Kai Gülzau [kguel...@novomind.com] Sent: Monday, November 28, 2011 6:55 PM To: solr-user@lucene.apache.org Subject: RE: DIH Strange Problem Do you use Java 6 update 29? There is a known issue with the latest mssql driver: http://blogs.msdn.com/b/jdbcteam/archive/2011/11/07/supported-java-versions-november-2011.aspx In addition, there are known connection failure issues with Java 6 update 29, and the developer preview (non production) versions of Java 6 update 30 and Java 6 update 30 build 12. We are in contact with Java on these issues and we will update this blog once we have more information. Should work with update 28. Kai -Original Message- From: Husain, Yavar [mailto:yhus...@firstam.com] Sent: Monday, November 28, 2011 1:02 PM To: solr-user@lucene.apache.org; Shawn Heisey Subject: RE: DIH Strange Problem I figured out the solution and Microsoft and not Solr is the problem here :): I downloaded and build latest Solr (3.4) from sources and finally hit following line of code in Solr (where I put my debug statement) : if(url != null){ LOG.info(Yavar: getting handle to driver manager:); c = DriverManager.getConnection(url, initProps); LOG.info(Yavar: got handle to driver manager:); } The call to Driver Manager was not returning. Here was the error!! The Driver we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded another driver called jTDS jDBC driver and installed that. Problem got fixed!!! So please follow the following steps: 1. Download jTDS jDBC driver from http://jtds.sourceforge.net/ 2. Put the driver jar file into your Solr/lib directory where you had put Microsoft JDBC driver. 3. In the data-config.xml use this statement: driver=net.sourceforge.jtds.jdbc.Driver 4. Also in data-config.xml mention url like this: url=jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX 5. Now run your indexing. It should solve the problem. -Original Message- From: Husain, Yavar Sent: Thursday, November 24, 2011 12:38 PM To: solr-user@lucene.apache.org; Shawn Heisey Subject: RE: DIH Strange Problem Hi Thanks for your replies. I carried out these 2 steps (it did not solve my problem): 1. I tried setting responseBuffering to adaptive. Did not work. 2. For checking Database connection I wrote a simple java program to connect to database and fetch some results with the same driver that I use for solr. It worked. So it does not seem to be a problem with the connection. Now I am stuck where Tomcat log says: Creating a connection for entity . and does nothing, I mean after this log we usually get the getConnection() took x millisecond however I dont get that ,I can just see the time moving with no records getting fetched. Original Problem listed again: I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing data. Indexing and all was working perfectly fine. However today when I started full indexing again, Solr halts/stucks at the line Creating a connection for entity. There are no further messages after that. I can see that DIH is busy and on the DIH console I can see A command is still running, I can also see total rows fetched = 0 and total request made to datasource = 1 and time is increasing however it is not doing anything. This is the exact configuration that worked for me. I am not really able to understand the problem here. Also in the index directory where I am storing the index there are just 3 files: 2 segment files + 1 lucene*-write.lock file. ... data-config.xml: dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders user=testUser password=password/ document . . Logs: INFO: Server startup in 2016 ms Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6] Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1322041133719 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity SampleText with URL:
Re: DIH Strange Problem
Aha! That sounds like it might be it! On Mon, Nov 28, 2011 at 4:16 PM, Husain, Yavar yhus...@firstam.com wrote: Thanks Kai for sharing this. Ian encountered the same problem so marking him in the mail too. From: Kai Gülzau [kguel...@novomind.com] Sent: Monday, November 28, 2011 6:55 PM To: solr-user@lucene.apache.org Subject: RE: DIH Strange Problem Do you use Java 6 update 29? There is a known issue with the latest mssql driver: http://blogs.msdn.com/b/jdbcteam/archive/2011/11/07/supported-java-versions-november-2011.aspx In addition, there are known connection failure issues with Java 6 update 29, and the developer preview (non production) versions of Java 6 update 30 and Java 6 update 30 build 12. We are in contact with Java on these issues and we will update this blog once we have more information. Should work with update 28. Kai -Original Message- From: Husain, Yavar [mailto:yhus...@firstam.com] Sent: Monday, November 28, 2011 1:02 PM To: solr-user@lucene.apache.org; Shawn Heisey Subject: RE: DIH Strange Problem I figured out the solution and Microsoft and not Solr is the problem here :): I downloaded and build latest Solr (3.4) from sources and finally hit following line of code in Solr (where I put my debug statement) : if(url != null){ LOG.info(Yavar: getting handle to driver manager:); c = DriverManager.getConnection(url, initProps); LOG.info(Yavar: got handle to driver manager:); } The call to Driver Manager was not returning. Here was the error!! The Driver we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded another driver called jTDS jDBC driver and installed that. Problem got fixed!!! So please follow the following steps: 1. Download jTDS jDBC driver from http://jtds.sourceforge.net/ 2. Put the driver jar file into your Solr/lib directory where you had put Microsoft JDBC driver. 3. In the data-config.xml use this statement: driver=net.sourceforge.jtds.jdbc.Driver 4. Also in data-config.xml mention url like this: url=jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX 5. Now run your indexing. It should solve the problem. -Original Message- From: Husain, Yavar Sent: Thursday, November 24, 2011 12:38 PM To: solr-user@lucene.apache.org; Shawn Heisey Subject: RE: DIH Strange Problem Hi Thanks for your replies. I carried out these 2 steps (it did not solve my problem): 1. I tried setting responseBuffering to adaptive. Did not work. 2. For checking Database connection I wrote a simple java program to connect to database and fetch some results with the same driver that I use for solr. It worked. So it does not seem to be a problem with the connection. Now I am stuck where Tomcat log says: Creating a connection for entity . and does nothing, I mean after this log we usually get the getConnection() took x millisecond however I dont get that ,I can just see the time moving with no records getting fetched. Original Problem listed again: I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing data. Indexing and all was working perfectly fine. However today when I started full indexing again, Solr halts/stucks at the line Creating a connection for entity. There are no further messages after that. I can see that DIH is busy and on the DIH console I can see A command is still running, I can also see total rows fetched = 0 and total request made to datasource = 1 and time is increasing however it is not doing anything. This is the exact configuration that worked for me. I am not really able to understand the problem here. Also in the index directory where I am storing the index there are just 3 files: 2 segment files + 1 lucene*-write.lock file. ... data-config.xml: dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders user=testUser password=password/ document . . Logs: INFO: Server startup in 2016 ms Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6] Nov 23, 2011 4:11:27 PM
Re: Tidying files after optimize. Is a service restart mandatory?
On 11/28/2011 3:26 AM, Jones, Graham wrote: Hello Brief question: How can I clean-up excess files after performing optimize without restarting the Tomcat service? Detail follows: I've been running several SOLR cores for approx 12 months and have recently noticed the disk usage of one of them is growing considerably faster than the rate at which documents are being added. - 1,200,000 docs 12 months ago used a 45 GB index - 1,700,000 docs today use a 87 GB index - There may have been _some_ deletions, almost certainly100,000 - The documents are of a broadly uniform style, approx 1000 words So, approximately 45% growth in documents had grown the disk usage by approx 100%. I took a server out of production (I've 1 master 7 slaves) and did the following. I ran http://server/corename/update?stream.body=optimize/ on this core which added 49.4 GB to the index folder No previously existing files were deleted I restarted the Tomcat service ONLY the files generated by the optimize remained. All older files were deleted. This is the result I want, but not quite the method I'd prefer. How can I get to this position without restarting the service? Based on this description, it seems likely that you are running Solr on Windows. On Windows, if you have a file open for any reason (even just reading) it's not possible to delete that file. Solr keeps the old index files open to serve queries until the new index is fully committed and ready to take over, which can often be quite a while in software terms. On Unix/Linux, deleting a file just removes the link to that file in the filesystem directory. When the last link is gone, the space is reclaimed. When a program opens a file, the OS creates an internal link to that file. If you delete that file while it's still open, it is still there, but only accessible via the internal link. This is what happens during an optimize - the files are removed from the directory, but part of Solr still has them open, until the newly created index is completely online and all queries to the old one are complete. Once they are closed, the OS reclaims the space. I'm fairly sure that there is little communication between the processes that serve queries and the processes that update and merge the index. I've checked previous messages on this. If you can arrange to run the optimize a second time before any documents are added or deleted, it will complete instantaneously and the extra files will be deleted. If the index is changed at all between the two optimizes, it won't really help, as you'll have a new set of old files that won't get deleted. I am not in a position to test it, but it's possible that issuing a RELOAD command to the CoreAdmin might also take care of deleting the old files. I'm pretty sure that such an action is potentially disruptive, but in my experience, the index is back online within a second or two, much much faster than a full restart. http://wiki.apache.org/solr/CoreAdmin#RELOAD This has been a known problem for quite a while, but I do not believe that it is a major priority for most Solr users. Most people I've seen posting to this list do not run on Windows. I found the following bug filed on Solr: https://issues.apache.org/jira/browse/SOLR-1691 Thanks, Shawn
Re: turning off solr server verbosity
thanks. Is this not then a jetty setting? I'll search for that. I don't use jetty but there is a logging section here : http://wiki.apache.org/solr/SolrJetty
PatternTokenizer failure
Hi all, I'm trying to use PatternTokenizer and not getting expected results. Not sure where the failure lies. What I'm trying to do is split my input on whitespace except in cases where the whitespace is preceded by a hyphen character. So to do this I'm using a negative look behind assertion in the pattern, e.g. (?!-)\s+. Expected behavior: foo bar - [foo,bar] - OK foo \n bar - [foo,bar] - OK foo- bar - [foo- bar] - OK foo-\nbar - [foo-\nbar] - OK foo- \n bar - [foo- \n bar] - FAILS Here's a test case that demonstrates the failure: public void testPattern() throws Exception { MapString,String args = new HashMapString, String(); args.put( PatternTokenizerFactory.GROUP, -1 ); args.put( PatternTokenizerFactory.PATTERN, (?!-)\\s+ ); Reader reader = new StringReader(blah \n foo bar- baz\nfoo-\nbar- baz foo- \n bar); PatternTokenizerFactory tokFactory = new PatternTokenizerFactory(); tokFactory.init( args ); TokenStream stream = tokFactory.create( reader ); assertTokenStreamContents(stream, new String[] { blah, foo, bar- baz, foo-\nbar- baz, foo- \n bar }); } This fails with the following output: org.junit.ComparisonFailure: term 4 expected:foo- [\n bar] but was:foo- [] Am I doing something wrong? Incorrect expectations? Or could this be a bug? Thanks, --jay
DirectSolrSpellChecker on request specified field.
Hi, Can the DirectSolrSpellChecker be used for autosuggest but defer to request time the name of the field to use to create the dictionary. That way I don't have to define spellcheckers specific to each field which for me is not really possible as the fields I wish to spell check are DynamicFields. I could copy all dynamic fields into a 'spellcheck' field but then I could get false suggestions if I use it to get suggestions for a particular dynamic field where a term returned derives from a different field. Phil
Re: [newbie] solrj SolrQuery indent response
I'm not sure what you're really after here. Indent how? The indent parameter is to make the reply readable, it really has nothing to do with printing the query. Could you show an example of what you want for output? Best Erick On Mon, Nov 28, 2011 at 8:42 AM, halil halil.a...@gmail.com wrote: I step one more. but still no indent. I wrote below code segment query.setQuery( marka_s:atak* ) .setFacet(true) .setParam(indent, on) ; and here is the resulted query string q=marka_s%3Aatak*facet=trueindent=on -halil agin. On Mon, Nov 28, 2011 at 3:07 PM, halil halil.a...@gmail.com wrote: Hi List, I am new to Solr and lucene world. I have a simple question. I wrote below code segment and it works. public class SolrjTest { public static void main(String[] args) throws MalformedURLException, SolrServerException{ ClassPathXmlApplicationContext c = new ClassPathXmlApplicationContext(/patrades-search-solrj-test-beans.xml); SolrServer server = (SolrServer)c.getBean(solrServer); SolrQuery query = new SolrQuery(); query.setQuery( *:* ) .setFacet(true) ; QueryResponse rsp = server.query( query ); System.err.println(rsp.toString()); } } I want to indent the response string, but couldnt find any answer. I looked at book, mail archive and google. Most relevant link is below http://wiki.apache.org/solr/SimpleFacetParameters but i dont know how to use it. Api is hard to read by the way. regards, -Halil AĞIN
Re: how to apply fuzzy search with slop
Interestingly, Ahmet Arslan just answered a virtually identical question: It is possible with this plugin. https://issues.apache.org/jira/browse/SOLR-1604; Best Erick On Mon, Nov 28, 2011 at 9:09 AM, vrpar...@gmail.com vrpar...@gmail.com wrote: Hello all, i want to search on phrase with fuzzy, e.g. q=word1 word2~ also want to apply slop for both words in phrase q=(word1 word2~)~2 doesn't work? how can i apply same? Thanks, Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-apply-fuzzy-search-with-slop-tp3542286p3542286.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DirectSolrSpellChecker on request specified field.
technically it could? I'm just not sure if the current spellchecking apis allow for it? But maybe someone has a good idea on how to easily expose this. I think its a good idea. Care to open a JIRA issue? On Mon, Nov 28, 2011 at 1:31 PM, Phil Hoy p...@friendsreunited.co.uk wrote: Hi, Can the DirectSolrSpellChecker be used for autosuggest but defer to request time the name of the field to use to create the dictionary. That way I don't have to define spellcheckers specific to each field which for me is not really possible as the fields I wish to spell check are DynamicFields. I could copy all dynamic fields into a 'spellcheck' field but then I could get false suggestions if I use it to get suggestions for a particular dynamic field where a term returned derives from a different field. Phil -- lucidimagination.com
Re: Faceting is not Using Field Value Cache . . ?
: To Erick's Point: Can you be more specific then 'certain circumstances'? : : Can anyone provide an example of when fieldValueCache would be used? either FC and FVC are used most of the time -- which one is used depends on wether the field is multivalued or not, and if it's tokenized or not: ie: max 1 term per doc == FC, else FVC. The most of the time depends on facet.method... https://wiki.apache.org/solr/SimpleFacetParameters#facet.method ...if the enum method is used, them the filterCache is used. Yonik discussed a lot of these subtleties in his facet talk @ EuroCon... http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/many-facets-apache-solr : Christopher : : : : On 2:59 PM, Erick Erickson wrote: : In addition to Samuel's comment, the filterCache is also used under : certain circumstances : : Best : Erick : : 2011/11/22 Samuel García Martínezsamuelgmarti...@gmail.com: : AFAIK, FieldValueCache is only used for faceting on tokenized fields. : Maybe, are you getting confused with FieldCache ( : http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/FieldCache.html)? : This is used for common facets (using facet.methodü and not tokenized : fields). : : This makes any sense for you? : : On Tue, Nov 22, 2011 at 7:21 PM, : CRBsub.scripti...@metaheuristica.comwrote: : :Seeing something odd going on with faceting . . . we execute facets with :every query and yet the fieldValueCache is not being used: : :name: fieldValueCache :class: org.apache.solr.search.**FastLRUCache :version: 1.0 :description: Concurrent LRU Cache(maxSize000, initialSize, :minSize???00, acceptableSize•00, cleanupThreadúlse) :stats: lookups : 0 :hits : 0 :hitratio : 0.00 :inserts : 0 :evictions : 0 :size : 0 :warmupTime : 0 :cumulative_lookups : 0 :cumulative_hits : 0 :cumulative_hitratio : 0.00 :cumulative_inserts : 0 :cumulative_evictions : 0 : :I was under the impression the fieldValueCache was an implicit cache :(if :you don't define it, it will still exist). : :We are running Solr v3.3 (and NOT using {!cacheúlse}). : :Thoughts? : : : : -- : Un saludo, : Samuel García. : : : -Hoss
RE: DirectSolrSpellChecker on request specified field.
Added issue: https://issues.apache.org/jira/browse/SOLR-2926 Please let me know if more information needs adding to JIRA. Phil -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: 28 November 2011 19:32 To: solr-user@lucene.apache.org Subject: Re: DirectSolrSpellChecker on request specified field. technically it could? I'm just not sure if the current spellchecking apis allow for it? But maybe someone has a good idea on how to easily expose this. I think its a good idea. Care to open a JIRA issue? On Mon, Nov 28, 2011 at 1:31 PM, Phil Hoy p...@friendsreunited.co.uk wrote: Hi, Can the DirectSolrSpellChecker be used for autosuggest but defer to request time the name of the field to use to create the dictionary. That way I don't have to define spellcheckers specific to each field which for me is not really possible as the fields I wish to spell check are DynamicFields. I could copy all dynamic fields into a 'spellcheck' field but then I could get false suggestions if I use it to get suggestions for a particular dynamic field where a term returned derives from a different field. Phil -- lucidimagination.com __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
Re: DirectSolrSpellChecker on request specified field.
On Mon, Nov 28, 2011 at 4:36 PM, Phil Hoy p...@friendsreunited.co.uk wrote: Added issue: https://issues.apache.org/jira/browse/SOLR-2926 Please let me know if more information needs adding to JIRA. Phil Thanks, I'll followup on the issue -- lucidimagination.com
Re: Huge Performance: Solr distributed search
Problem has been resolved. My disk subsystem been a bottleneck for quick search. I put my indexes to RAM and I see very nice QTimes :) Sorry for your time, guys. On Mon, Nov 28, 2011 at 4:02 PM, Artem Lokotosh arco...@gmail.com wrote: Hi all again. Thanks to all for your replies. On this weekend I'd made some interesting tests, and I would like to share it with you. First of all I made speed test of my hdd: root@LSolr:~# hdparm -t /dev/sda9 /dev/sda9: Timing buffered disk reads: 146 MB in 3.01 seconds = 48.54 MB/sec Then with iperf I had tested my network: [ 4] 0.0-18.7 sec 2.00 GBytes 917 Mbits/sec Then, I tried to post my quesries using shard parameter with one shard, so my queries were like: http://localhost:8080/solr1/select/?q=(test)qt=requestShards http://localhost:8080/solr1/select/?q=%28test%29qt=requestShards where requestShards is: requestHandler name=requestShards class=solr.SearchHandler default=false lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=shards127.0.0.1:8080/solr1 http://127.0.0.1:8080/solr1/str /lst /requestHandler Maybe its not correct, but: INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(genuflections)qt=requestShardsrows=2000}status=0 QTime=6525 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(tunefulness)qt=requestShardsrows=2000} status=0 QTime=20170 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(societal)qt=requestShardsrows=2000} status=0 QTime=44958 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(euchre's)qt=requestShardsrows=2000} status=0 QTime=32161 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(monogram's)qt=requestShardsrows=2000} status=0 QTime=85252 When I posted similar queries direct to solr1 without requestShards I had: INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(reopening)rows=2000} hits=712 status=0 QTime=10 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(housemothers)rows=2000} hits=0 status=0 QTime=446 INFO: [] webapp=/solr1 path=/select/params={fl=*,scoreident=truestart=0q=(harpooners)rows=2000} hits=76 status=0 QTime=399 INFO: [] webapp=/solr1 path=/select/ params={fl=*,scoreident=truestart=0q=(coaxing)rows=2000} hits=562 status=0 QTime=2820 INFO: [] webapp=/solr1 path=/select/ params={fl=*,scoreident=truestart=0q=(superstar's)rows=2000} hits=4748 status=0 QTime=672 INFO: [] webapp=/solr1 path=/select/ params={fl=*,scoreident=truestart=0q=(sedateness's)rows=2000} hits=136 status=0 QTime=923 INFO: [] webapp=/solr1 path=/select/ params={fl=*,scoreident=truestart=0q=(petrolatum)rows=2000} hits=8 status=0 QTime=6183 INFO: [] webapp=/solr1 path=/select/ params={fl=*,scoreident=truestart=0q=(everlasting's)rows=2000} hits=1522 status=0 QTime=2625 And finally I found a bug: https://issues.apache.org/jira/browse/SOLR-1524 https://issues.apache.org/jira/browse/SOLR-1524 Why is no activity on it? Its not actual? Today I wrote a bash script: #!/bin/bash ds=$(date +%s.%N) echo START: $ds ./data/east_2000 curl http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=(east)rows=2000 http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=%28east%29rows=2000-s -s-H 'Content-type:text/xml; charset=utf-8' ./data/east_2000 de=$(date +%s.%N) ddf=$(echo $de - $ds | bc) echo END: $de ./data/east_2000 echo DIFF: $ddf ./data/east_2000 Before runing a Tomcat I'd dropped cache: root@LSolr:~# echo 3 /proc/sys/vm/drop_caches Then I started Tomcat and run the script. Result is bellow: START: 1322476131.783146691 ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime125/intlst name=paramsstr name=fl*,score/strstr name=identtrue/strstr name=start0/strstr name=q(east)/strstr name=rows2000/str/lst/lstresult name=response numFound=21439 start=0 maxScore=4.387605 ... /response END: 1322476180.262770244 DIFF: 48.479623553 File size is: root@LSolr:~# ls -l | grep east -rw-r--r-- 1 root root 1063579 Nov 28 12:29 east_2000 I'm using nmon to monitor a HDD activity. It was near 100% when I run the script. But when I tried to run it again the result was: DIFF: .063678709 and no much HDD activity at nmon. I can't undestand one thing: is this my huge hardware such as slow HDDor its a Solr troubles? And why is no activity on bug https://issues.apache.org/jira/browse/SOLR-1524 https://issues.apache.org/jira/browse/SOLR-1524 since 27/Oct/09 07:19? On 11/25/2011 10:02 AM, Dmitry Kan wrote: 45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and shard given 12GB of RAM max. !-- Filter Cache Cache used by SolrIndexSearcher for filters (DocSets),
Re: help no segment in my lucene index!!!
On Mon, Nov 28, 2011 at 10:49 AM, Roberto Iannone iann...@crmpa.unisa.it wrote: Hi Michael, thx for your help :) You're welcome! 2011/11/28 Michael McCandless luc...@mikemccandless.com Which version of Solr/Lucene were you using when you hit power loss? I'm using Lucene 3.4. Hmm, which OS/filesystem? Unexpected power loss (nor OS crash, JVM crash) in 3.4.0 should not cause corrumption, as long as the IO system properly implements fsync. There was a known bug that could allow power loss to cause corruption, but this was fixed in Lucene 3.4.0. Unfortunately, there is no easy way to recreate the segments_N file... in principle it should be possible and maybe not too much work but nobody has created such a tool yet, that I know of. some hints about how could I write this code by myself ? Well, you'd need to take a listing of all files, aggregate those into unique segment names, open a SegmentReader on each segment name, and from that SegmentReader reconstruct what you can (numDocs, delCount, isCompoundFile, etc.) about each SegmentInfo. Add all the resulting SegmentInfo instances into a new SegmentInfos and write it to the directory. Was the index newly created in 3.4.x? If not (if you inherited segments from earlier Lucene versions) you might also have to reconstruct shared doc stores (stored fields, term vectors) files, which will be trickier... Mike
Re: conditionally update document on unique id
I wanted something similar for a file crawler/uploader in c#, but don't even want to upload the document if it exists... I'm currently querying solr first... Is this is optimal, silly, or otherwise? var url = http://solr/select?q=myid.docrows=0;; var txt = webclient.DownloadString(url); if (txt.Contains(numFound=\0\)) { //upload the file } -- View this message in context: http://lucene.472066.n3.nabble.com/conditionally-update-document-on-unique-id-tp3119302p3543866.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: conditionally update document on unique id
oops... the query looks more like this http://solr/select?q=*id:*myid.docrows=0 -- View this message in context: http://lucene.472066.n3.nabble.com/conditionally-update-document-on-unique-id-tp3119302p3543871.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index a null text field
: I am indexing a table that has a field by the name of solr_keywords of type : text in mysql. And it contains null values also. While creating index in : solr, this field is not getting indexed. what exactly is the problem you are seeing? If your documents are being indexed w/o error, but some documents with null in the solr_keywords database field are not getting any (stored or indexed) values in the resulting solr index, then it sounds like everything is working properly. There is no concept of a null value in a Solr index. Documents either have a field value or they do not -- if you want to index the string null (or any other special string for that matter) when a document has no valued for a field, then there a few differnet wyas to do that. this simplest in your case would probably be adding a default property on the field in your schema, or using something like the COALESCE function in your SQL. -Hoss
Incomplete logging on local machine
I'm stumped. For some reason on my local set up, Solr is not logging all that it should. None of the searches, updates, errors are logged at all. I just did a fresh install of Tomcat 7, Solr 3.5 and it's all the same. No logging. The *only* thing I change to the default configuration is the location of my Solr Home in web.xml When I go into solr/admin/logging, I can see the problem. Only three solr logging options are available. org.apache.solr org.apache.solr.servlet org.apache.solr.servlet.LogLevelSelection That's all. And they're all set to INFO. If I compare that to the production server, I can see there that there's several dozen other Solr logging categories. Why would these not be available to me on my local machine? I've combed the internet for anything, and no-one else seems to have this issue :( I'm running on a Mac (10.5.8), Tomcat 7, Solr 3.5 -- View this message in context: http://lucene.472066.n3.nabble.com/Incomplete-logging-on-local-machine-tp3543960p3543960.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: UpdateRequestProcessor - processCommit
: I'm assuming the processCommit method is called for each : UpdateRequestProcessor chain class when the records are being commited to : the Lucene index. Not exactly. RequestHandlers that want to modify the index do so by asking the SolrCore for a processor chain (either by name or just get the default), and then they execute methods on that chain passing in instances of UpdateCommand objects that model the type of index update they want to perform. The first element in the chain decides if/when to pass the UpdateCommand on to subsequent members of the chain, and most processor chains include a RunUpdateProcessorFactory instance, which is responsible for actually performing the update on the UpdateHandler used by the SolrCore. Which means: 1) there is no garuntee that processCommit is called on every UpdateRequestProcessor in the chain -- that's entirely dependent on what comes before RunUpdateProcessorFactory in the chain. 2) RunUpdateProcessorFactory itself is what tells the underlying IndexWriter to commit (ie: processCommit is not a callback method done after the underlying commit happens -- you may be comfusing UpdateRequestProcessor with the SolrEventListener API) : I'm debugging the processor chain using the debug functionality in the : dataimport.jsp page, and I have selected verbose and commit as options. : When I import 10 records, : the processAddd methods are getting called, but the processCommit methods : aren't. ... : I'm using SOLR 1.4 Hmmm... I can confirm the behavior you describe in Solr 1.4.1, but using Solr 3.5.0 I can see that the processCommit method is definitely getting called by DIH when using the Debug Now button of the DIH console when the commit checkbox is checked (FWIW: I tested using the 'rss' core of example-DIH/solr and watching the logs for the log messages from LogUpdateProcessorFactory) So please consider upgrading -- besides this evident fix, there have been a *TON* of other bug fixes and other improvements between Solr 1.4 and Solr 3.5. -Hoss
Re: solrQueryParser defaultOperator
: are you using either dismax or edismax? They don't respect : the defaultOperator. Use the mm param to get this kind : of behavior. FWIW: that has not been tru since Solr 3.1 ... mm's default value is now based on q.op (which get's it's default from defaultOperator in the schema.xml) By Erick's point is still valid: we need all the details of the request you are executing, and what the request handler config looks like, and what the debugQuery output for that request lookse like, etc. before we can make a guess as to why you are getting the results you are getting. -Hoss
Re: make fuzzy search for phrase
This seems to be solution of my problem.. i definitely try this. Thanks for your reply. Meghana. -- View this message in context: http://lucene.472066.n3.nabble.com/make-fuzzy-search-for-phrase-tp3542079p3544239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how index words with their perfix in solr?
thank you for your answer.i read it and i use this filter in my schema.xml in solr: filter class=solr.PorterStemFilterFactory/ but this filter doesn't understand all words with their suffix and prefix. this means when i search 'rain' solr doesn't show me any document that have 'rainy'. -- View this message in context: http://lucene.472066.n3.nabble.com/how-index-words-with-their-perfix-in-solr-tp3542300p3544319.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Returning and faceting on some of the field's values
Well, here's something that might just work. Using the Solr 3.4+ facet.prefix parameter, as well as prefixing the values of the particular field I want to facet based on the node neighbor ID, I get what I need. Adding the field: field name=n_directionalityFacet type=string indexed=true stored=false multiValued=true omitNorms=true / Then, for each value, I prefix it with {nodeId}-. For example, using the focus node ID of ING:afa, I can get as a result document set, all of the neighbors of that node ID. Then, I also tell Solr to facet using that same focus node ID prefix: http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afarows=0facet=truefacet.mincount=1facet.field=n_directionalityFacetf.n_directionalityFacet.facet.prefix=ING%3Aafa And, for that particular facet, I get only the values and counts relevant to the focus node ID: lst name=facet_fields lst name=n_directionalityFacet int name=ING:afa-D82/int int name=ING:afa-B2/int int name=ING:afa-A1/int int name=ING:afa-U1/int /lst /lst My app can then take this response and remove the prefix before returning the values and counts to the client. It may inflate the size of index some, but it sure beats my alternative proposals... Cheers, Jeff On Nov 26, 2011, at 1:22 PM, Jeff Schmidt wrote: Hello: I'm still not finding much joy with this issue. For one, it looks like FacetComponent (via SimpleFacets.getFieldCacheCounts()) goes directly to the Lucene FieldCache (non-enum, multi-valued field, single string token) in order to get terms to count. So, even if it were possible for me to somehow modify the ResponseBuilder in between the QueryComponent and FacetComponent, that won't do much good. i'd rather not modify Solr/Lucene code and have a custom build (though that's not impossible in the short term), but QueryComponent does not provide sufficient access. I suppose I could further investigate going the RequestHandler route. But, let me know if this is crazy talk: From what I can tell in org.apache.solr.request.SimpleFacets, line 366 (sorry, no SCM info in source file, but is from the 3.4.0 source distribution); FieldCache.StringIndex si = FieldCache.DEFAULT.getStringIndex(searcher.getReader(), fieldName); final String[] terms = si.lookup; final int[] termNum = si.order; SimpleFacets.getFieldCacheCounts() uses the response from the Lucene FIeldCache to do its work. My thought is to use AspectJ to place after advice on the Lucene method (org.apache.lucene.search.FieldCacheImpl), to modify the response. I don't want to muck with the field cache itself. After all, the field values I don't want to count for this focusNodeId, I may well with another. Given the FieldCacheImpl method: // inherit javadocs public StringIndex getStringIndex(IndexReader reader, String field) throws IOException { return (StringIndex) caches.get(StringIndex.class).get(reader, new Entry(field, (Parser)null)); } I seems I could take the returned StringIndex instance, and create a new filtered one, leaving the cached original intact. StringIndex (defined in FieldCache) is public static class with a public constructor. Then, SimpleFacets will facet what I provided it. The other trick is to inform my aspect within Lucene just what the what focusNodeId is, so it knows how to filter. This is request specific. I'm running Solr within Tomcat. I've not looked exhaustively into how Solr threading works. But, if the current app server request thread is used synchronously to satisfy any given SolrJ request, then I could provide a SearchComponent that looked for some special parameter that indicates the focusNodeId of interest, and then place it in a ThreadLocal which the interceptor could pick up. If the ThreadLocal is not defined, then the interceptor does not filter (a definite scenario) and returns Lucene's StringIndex instance. If there is another thread involved in handling the request, then more investigation is needed. Any inside information would be appreciated. Or, firmly stated I should not go there would also be appreciated. :) Cheers, Jeff On Nov 21, 2011, at 4:31 PM, Jeff Schmidt wrote: Hello: Solr version: 3.4.0 I'm trying to figure out if it's possible to both return (retrieval) as well as facet on certain values of a multivalued field. The scenario is a life science app comprised of a graph of nodes (genes, chemicals etc.) and each node has a neighborhood consisting of one or more nodes with which it has a relationships defined as processes (inhibition, phosphorylation etc.). What I've done is add a number of multi-valued fields to each node consisting of the neighbor node ID (neighbor's document ID), process, and couple of other related items. For a given node, it'll have multiple neighbors, as well as multiple
Seek past EOF
HI all After upgrading tol Solr 3.4 we are having trouble with the replication. The setup is one indexing master with a few slaves that replicate the indexes once every night. The largest index is 20 GB and the master and slaves are on the same DMZ. Almost every night one of the indexes (17 in total) fail after the replication with an EOF file. SEVERE: Error during auto-warming of key:org.apache.solr.search.QueryResultKey@bda006e3:java.io.IOException: seek past EOF at org.apache.lucene.store.MMapDirectory$MMapIndexInput.seek(MMapDirectory.java:347) at org.apache.lucene.index.SegmentTermEnum.seek(SegmentTermEnum.java:114) at org.apache.lucene.index.TermInfosReader.seekEnum(TermInfosReader.java:203) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:273) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:210) at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:507) at org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309) at org.apache.lucene.search.TermQuery$TermWeight$1.add(TermQuery.java:56) at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:77) at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:82) After a restart the errors are gone, anyone else seen this ? Thanks Ruben Chadien
Re: [newbie] solrj SolrQuery indent response
Yes, your are right. I was trying to indent the solr json response. Actually, Solr Json response is not exactly json, i couldnt understand the output format. But found a solution yo solve the problem, i mean making json and indenting the solr result set. Here is the code segment. public class SolrjTest { public static void main(String[] args) throws Exception{ ClassPathXmlApplicationContext c = new ClassPathXmlApplicationContext(/patrades-search-solrj-test-beans.xml); SolrServer server = (SolrServer)c.getBean(solrServer); SolrQuery query = new SolrQuery(); query.setQuery( marka_s:atak* ); System.err.println(query.toString()); QueryResponse rsp = server.query( query ); ListPatradesSolrBean beans = rsp.getBeans(PatradesSolrBean.class); ObjectMapper om = new ObjectMapper(); String s = om.defaultPrettyPrintingWriter().writeValueAsString(beans); System.err.println(s); } } On Mon, Nov 28, 2011 at 9:10 PM, Erick Erickson erickerick...@gmail.comwrote: I'm not sure what you're really after here. Indent how? The indent parameter is to make the reply readable, it really has nothing to do with printing the query. Could you show an example of what you want for output? Best Erick On Mon, Nov 28, 2011 at 8:42 AM, halil halil.a...@gmail.com wrote: I step one more. but still no indent. I wrote below code segment query.setQuery( marka_s:atak* ) .setFacet(true) .setParam(indent, on) ; and here is the resulted query string q=marka_s%3Aatak*facet=trueindent=on -halil agin. On Mon, Nov 28, 2011 at 3:07 PM, halil halil.a...@gmail.com wrote: Hi List, I am new to Solr and lucene world. I have a simple question. I wrote below code segment and it works. public class SolrjTest { public static void main(String[] args) throws MalformedURLException, SolrServerException{ ClassPathXmlApplicationContext c = new ClassPathXmlApplicationContext(/patrades-search-solrj-test-beans.xml); SolrServer server = (SolrServer)c.getBean(solrServer); SolrQuery query = new SolrQuery(); query.setQuery( *:* ) .setFacet(true) ; QueryResponse rsp = server.query( query ); System.err.println(rsp.toString()); } } I want to indent the response string, but couldnt find any answer. I looked at book, mail archive and google. Most relevant link is below http://wiki.apache.org/solr/SimpleFacetParameters but i dont know how to use it. Api is hard to read by the way. regards, -Halil AĞIN