Urgent_Can not index binary data stored in DB as BLOB type
Sir, Please send me the data-config file to index binary data which are stored in Database as BLOB type. Thanking you, Chandan
Re: Urgent_Can not index binary data stored in DB as BLOB type
On 25 February 2014 14:27, Chandan khatua chand...@nrifintech.com wrote: Sir, Please send me the data-config file to index binary data which are stored in Database as BLOB type. Are you paying attention to the follow-ups? I had suggested possibilities, including the fact that Solr cannot automatically decide whether a blob contains richtext or not. Please do not start multiple threads for the same issue. Regards, Gora
RE: Can not index raw binary data stored in Database in BLOB format.
Hi Gora, The column type in DB is BLOB. It only stores binary data. If I do not use TikaEntityProcessor, then the following exception occurs: at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457) 59163 [Thread-16] ERROR org.apache.solr.handler.dataimport.DocBuilder û Exception while processing: messages document : SolrInputDocument(fields: [id =2158]):org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: oracle.jdbc.driver.OracleBlobInputStream cannot b e cast to java.util.Iterator at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityPro cessor.java:65) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProce ssor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProc essorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 469) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 495) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 408) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323 ) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja va:411) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476 ) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457) Caused by: java.lang.ClassCastException: oracle.jdbc.driver.OracleBlobInputStream cannot be cast to java.util.Iterator at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityPro cessor.java:59) ... 10 more I have used ClobTransformer in data-config file as bellow and even then it is not working: dataConfig dataSource name=db driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@//a.a.a.a:a/d11gr21 user= password=a / dataSource name=dastream type=FieldStreamDataSource/ document entity name=messages pk=x_MSG_PK query=select * from table1 dataSource=db field column =x_MSG_PK name =id / entity name=message transformer=ClobTransformer dataSource=dastream processor=TikaEntityProcessor dataField=messages.MESSAGE format=text field column=text name=mxMsg clob=true/ /entity /entity /document /dataConfig So, what changes do I need? -Chandan -Original Message- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Monday, February 24, 2014 5:49 PM To: solr-user@lucene.apache.org Subject: Re: Can not index raw binary data stored in Database in BLOB format. On 24 February 2014 15:34, Chandan khatua chand...@nrifintech.com wrote: Hi Gora ! Your concern was What is the type of the column used to store the binary data in Oracle? The column type is BLOB in DB. The column can also have rich text file. Um, your original message said that it does *not* contain richtext data. How do you tell whether it has richtext data, or not? For just a binary blob, the ClobTransformer should work, but you need the TikaEntityProcessor for richtext data. If you do not know whether the data in the blob is richtext or not, you will need to roll your own solution to determine that. Regards, Gora
Re: Fetching uniqueKey and other int quickly from documentCache?
I vaguely remember such a Jira issue but I can't find it now. Gregg, can you open an issue? A patch would be even better. On Tue, Feb 25, 2014 at 8:28 AM, Gregg Donovan gregg...@gmail.com wrote: We fetch a large number of documents -- 1000+ -- for each search. Each request fetches only the uniqueKey or the uniqueKey plus one secondary integer key. Despite this, we find that we spent a sizable amount of time in SolrIndexSearcher#doc(int docId, SetString fields). Time is spent fetching the two stored fields, LZ4 decoding, etc. I would love to be able to tell Solr to always fetch these two fields from memory. We have them both in the fieldCache so we're already spending the RAM. I've seen this asked previously [1], so it seems like a fairly common need, especially for distributed search. Any ideas? A few possible ideas I had: --Check FieldCache.html#getCacheEntries() before going to stored fields. --Give the documentCache config a list of fields it should load from the fieldCache Having an in-memory mapping from docId-uniqueKey has come up for us before. We've used a custom SolrCache maintaining that mapping to quickly filter over personalized collections. Maybe the uniqueKey should be more optimized out of the box? Perhaps a custom uniqueKey codec that also maintained the docId-uniqueKey mapping in memory? --Gregg [1] http://search-lucene.com/m/oCUKJ1heHUU1 -- Regards, Shalin Shekhar Mangar.
Re: Can not index raw binary data stored in Database in BLOB format.
On 25 February 2014 14:54, Chandan khatua chand...@nrifintech.com wrote: Hi Gora, The column type in DB is BLOB. It only stores binary data. If I do not use TikaEntityProcessor, then the following exception occurs: [...] It is difficult to follow what you are doing when you say one thing, and seem to do another. You say above that you are not using TikaEntityProcessor but your DIH data configuration file shows that you are. Please start with one configuration, and show us the *exact* files in use, and the error from the Solr logs. Regards, Gora
RE: Can not index raw binary data stored in Database in BLOB format.
Okey. Here is my data-config file: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=db driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@//1.2.3.4:1/d11gr21 user= password= / dataSource name=dastream type=FieldStreamDataSource/ document entity name=messages pk=X_MSG_PK query=select * from table1 dataSource=db field column =X_MSG_PK name =id / entity name=message transformer=ClobTransformer dataSource=dastream processor=TikaEntityProcessor dataField=messages.MESSAGE format=text field column=text name=mxMsg clob=true/ /entity /entity /document /dataConfig -- Solr.log file : INFO - 2014-02-25 17:33:40.023; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/admin/mbeans params={cat=QUERYHANDLER_=1393329819994wt=json} status=0 QTime=1 INFO - 2014-02-25 17:33:40.094; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/admin/mbeans params={cat=QUERYHANDLER_=1393329820083wt=json} status=0 QTime=0 INFO - 2014-02-25 17:33:40.117; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/dataimport params={indent=truecommand=status_=1393329820089wt=json} status=0 QTime=16 INFO - 2014-02-25 17:33:40.131; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/dataimport params={indent=truecommand=show-config_=1393329820084} status=0 QTime=29 INFO - 2014-02-25 17:33:42.026; org.apache.solr.handler.dataimport.DataImporter; Loading DIH Configuration: /dataconfig/data-config.xml INFO - 2014-02-25 17:33:42.031; org.apache.solr.handler.dataimport.DataImporter; Data Configuration loaded successfully INFO - 2014-02-25 17:33:42.033; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/dataimport params={optimize=falseindent=trueclean=truecommit=trueverbose=falsecomm and=full-importdebug=falsewt=json} status=0 QTime=8 INFO - 2014-02-25 17:33:42.035; org.apache.solr.handler.dataimport.DataImporter; Starting Full Import INFO - 2014-02-25 17:33:42.043; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/dataimport params={indent=truecommand=status_=1393329822040wt=json} status=0 QTime=0 INFO - 2014-02-25 17:33:42.064; org.apache.solr.handler.dataimport.SimplePropertiesWriter; Read dataimport.properties INFO - 2014-02-25 17:33:42.092; org.apache.solr.search.SolrIndexSearcher; Opening Searcher@2a858a73 realtime INFO - 2014-02-25 17:33:42.093; org.apache.solr.handler.dataimport.JdbcDataSource$1; Creating a connection for entity messages with URL: jdbc:oracle:thin:@//172.16.29.92:1521/d11gr21 INFO - 2014-02-25 17:33:42.113; org.apache.solr.handler.dataimport.JdbcDataSource$1; Time taken for getConnection(): 19 INFO - 2014-02-25 17:33:42.564; org.apache.solr.handler.dataimport.DocBuilder; Import completed successfully INFO - 2014-02-25 17:33:42.564; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=fa lse,softCommit=false,prepareCommit=false} INFO - 2014-02-25 17:33:42.867; org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onCommit: commits: num=2 commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@C:\solr -4.5.1\example\multicore\CHESS_CORE\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory@2c6d8073; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_l,generation=21} commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@C:\solr -4.5.1\example\multicore\CHESS_CORE\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory@2c6d8073; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_m,generation=22} INFO - 2014-02-25 17:33:42.868; org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 22 INFO - 2014-02-25 17:33:42.882; org.apache.solr.search.SolrIndexSearcher; Opening Searcher@558ea0cc main INFO - 2014-02-25 17:33:42.886; org.apache.solr.core.QuerySenderListener; QuerySenderListener sending requests to Searcher@558ea0cc main{StandardDirectoryReader(segments_m:55:nrt _d(4.5.1):C80)} INFO - 2014-02-25 17:33:42.889; org.apache.solr.core.QuerySenderListener; QuerySenderListener done. INFO - 2014-02-25 17:33:42.889; org.apache.solr.core.SolrCore; [CHESS_CORE] Registered new searcher Searcher@558ea0cc main{StandardDirectoryReader(segments_m:55:nrt _d(4.5.1):C80)} INFO - 2014-02-25 17:33:42.893; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2014-02-25 17:33:42.899; org.apache.solr.handler.dataimport.SimplePropertiesWriter; Read dataimport.properties INFO - 2014-02-25 17:33:42.901; org.apache.solr.handler.dataimport.SimplePropertiesWriter; Wrote last indexed time to dataimport.properties INFO - 2014-02-25
Re: Can not index raw binary data stored in Database in BLOB format.
A few things: 1) If your database uses a BLOB, you should not use clobtransformer; FieldStreamDataSource should be sufficient. 2) In a previous message, it showed that the converted/etxracted document was empty (except for an html boilerplate wrapper). This was using the configuration I suggested. I'm guessing that TikaEntityProcessor is either receiving empty strings as source, or failing to extract the content of certain file formats. To test the latter, you could export one of the blobs to a file, and run the stan-aloen tika app on it. As to the possibility that TikaEntitiyProcessor is receiving empty strings as input: I had a similar issue, but with varchars. In my case, the reason was that I was running a really old version of Oracle, which would not work with recent versions of the Oracle support libraries. Another thing that might be worth checking: your main query uses select * ... as the main query. Have you tried explicitly listing the columns you're interested in? Something like select X_MSG_PK, MESSAGE from table1. On Tue, Feb 25, 2014 at 1:11 PM, Chandan khatua chand...@nrifintech.comwrote: Okey. Here is my data-config file: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=db driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@//1.2.3.4:1/d11gr21 user= password= / dataSource name=dastream type=FieldStreamDataSource/ document entity name=messages pk=X_MSG_PK query=select * from table1 dataSource=db field column =X_MSG_PK name =id / entity name=message transformer=ClobTransformer dataSource=dastream processor=TikaEntityProcessor dataField=messages.MESSAGE format=text field column=text name=mxMsg clob=true/ /entity /entity /document /dataConfig -- Solr.log file : INFO - 2014-02-25 17:33:40.023; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/admin/mbeans params={cat=QUERYHANDLER_=1393329819994wt=json} status=0 QTime=1 INFO - 2014-02-25 17:33:40.094; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/admin/mbeans params={cat=QUERYHANDLER_=1393329820083wt=json} status=0 QTime=0 INFO - 2014-02-25 17:33:40.117; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/dataimport params={indent=truecommand=status_=1393329820089wt=json} status=0 QTime=16 INFO - 2014-02-25 17:33:40.131; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/dataimport params={indent=truecommand=show-config_=1393329820084} status=0 QTime=29 INFO - 2014-02-25 17:33:42.026; org.apache.solr.handler.dataimport.DataImporter; Loading DIH Configuration: /dataconfig/data-config.xml INFO - 2014-02-25 17:33:42.031; org.apache.solr.handler.dataimport.DataImporter; Data Configuration loaded successfully INFO - 2014-02-25 17:33:42.033; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/dataimport params={optimize=falseindent=trueclean=truecommit=trueverbose=falsecomm and=full-importdebug=falsewt=json} status=0 QTime=8 INFO - 2014-02-25 17:33:42.035; org.apache.solr.handler.dataimport.DataImporter; Starting Full Import INFO - 2014-02-25 17:33:42.043; org.apache.solr.core.SolrCore; [CHESS_CORE] webapp=/solr path=/dataimport params={indent=truecommand=status_=1393329822040wt=json} status=0 QTime=0 INFO - 2014-02-25 17:33:42.064; org.apache.solr.handler.dataimport.SimplePropertiesWriter; Read dataimport.properties INFO - 2014-02-25 17:33:42.092; org.apache.solr.search.SolrIndexSearcher; Opening Searcher@2a858a73 realtime INFO - 2014-02-25 17:33:42.093; org.apache.solr.handler.dataimport.JdbcDataSource$1; Creating a connection for entity messages with URL: jdbc:oracle:thin:@// 172.16.29.92:1521/d11gr21 INFO - 2014-02-25 17:33:42.113; org.apache.solr.handler.dataimport.JdbcDataSource$1; Time taken for getConnection(): 19 INFO - 2014-02-25 17:33:42.564; org.apache.solr.handler.dataimport.DocBuilder; Import completed successfully INFO - 2014-02-25 17:33:42.564; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=fa lse,softCommit=false,prepareCommit=false} INFO - 2014-02-25 17:33:42.867; org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onCommit: commits: num=2 commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@C :\solr -4.5.1\example\multicore\CHESS_CORE\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory@2c6d8073; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_l,generation=21} commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@C :\solr -4.5.1\example\multicore\CHESS_CORE\data\index
Performance problem on Solr query on stemmed values
Hi, I would like to know whether anyone have experienced this kind of phenomena. We are having performance problem regarding query on stemmed value. I've documented the symptoms which I'm currently facing: Search on field content Search on field spell Highlighting (on content field) Processing speed active active Active Slow active not active Active Fast active active not active Fast not active active Active Slow not active active not active Fast *Fast means 1000x faster than slow. Field Content is our index field, which holds original text, and spell is the field with stemmed value. According to my measurement result, search on both fields (stemmed and not stemmed) is really fast. But when I start to take highlighting into our query it takes too long to process. Best Regards Erwin
programmatically disable/enable solr queryResultCache...
is there any way programmatically disable/enable solr queryResultCache? I am using SolrJ. Thanks Regards, Senthilnathan V
Re: Performance problem on Solr query on stemmed values
Right, highlighting may have to re-analyze the input in order to return the highlighted data. This will be significantly slower than the search, especially if you have a large number of rows you're returning. You can get better performance in highlighting by using FastVectorHighlighter. See: https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter 1000x is unusual, though, unless your fields are very large or you're returning a lot of documents. Best, Erick On Tue, Feb 25, 2014 at 5:23 AM, Erwin Gunadi festiva.s...@gmail.comwrote: Hi, I would like to know whether anyone have experienced this kind of phenomena. We are having performance problem regarding query on stemmed value. I've documented the symptoms which I'm currently facing: Search on field content Search on field spell Highlighting (on content field) Processing speed active active Active Slow active not active Active Fast active active not active Fast not active active Active Slow not active active not active Fast *Fast means 1000x faster than slow. Field Content is our index field, which holds original text, and spell is the field with stemmed value. According to my measurement result, search on both fields (stemmed and not stemmed) is really fast. But when I start to take highlighting into our query it takes too long to process. Best Regards Erwin
Re: programmatically disable/enable solr queryResultCache...
This seems like an XY problem, you're asking for specifics on doing something without any indication _why_ you think this would help. Nor are you explaining what the problem you're having is in the first place. At any rate, queryResultCache is unlikely to impact much. All it is is a map containing the query and the first few document IDs (internal Lucene). See queryResultWindowSize in solrconfig.xml. It is quite light-weight, it does NOT store the entire result set, nor even the contents of the documents. Best Erick On Tue, Feb 25, 2014 at 6:07 AM, Senthilnathan Vijayaraja senthilnat...@8kmiles.com wrote: is there any way programmatically disable/enable solr queryResultCache? I am using SolrJ. Thanks Regards, Senthilnathan V
SolrCloud: How to replicate shard of another machine for failover?
Hi, tldr: I have troubles configuring SolrCloud 4.3.1 to replicate the shard of another machine. Basically what it boils down is the question how to tell on solr instance to replicate the shard of another machine. I though that the system property `-Dshard=2` will do the trick but it doesn't do anything. What to do? --- I want the following setup leader.host_1:7070 / shard1 / \ /replica.host_2:7071 collection \leader.host_2:7070 \ / shard2 \ replica.host_1:7071 I want to run two logical instances (leader replica) of Solr on each physical machine (host_1 host_2). Everything is running but the shard is replicated on the same physical machine! Which doesn't work as a failover mechanism. So at the moment the layout is as follows: leader.host_1:7070 / shard1 / \ /replica.host_1:7071 collection \leader.host_2:7070 \ / shard2 \ replica.host_2:7071 I basically run the following commands on each machine. First on host_1 host_1$ java -Djetty.home=/opt/solr -DnumShards=2 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7070 -Dsolr.solr.home=/opt/solr -Dbootstrap_confdir=conf -cp classpath host_1$ java -Djetty.home=/opt/solr-replica-1 -DnumShards=2 -Dshard=shard2 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7071 -Dsolr.solr.home=/opt/solr-replica-1 -Dbootstrap_confdir=conf -cp classpath Then on host_2 host_2$ java -Djetty.home=/opt/solr -DnumShards=2 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7070 -Dsolr.solr.home=/opt/solr -Dbootstrap_confdir=conf -cp classpath host_2$ java -Djetty.home=/opt/solr-replica-1 -DnumShards=2 -Dshard=shard1 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7071 -Dsolr.solr.home=/opt/solr-replica-1 -Dbootstrap_confdir=conf -cp classpath Am I using the wrong configuration parameter? Is this behaviour possible (with Solr 4.3)? Best regards Oliver
Re: SolrCloud: How to replicate shard of another machine for failover?
Oliver, You'll probably have better luck not supplying CLI arguments and creating your collection via the collections api (https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateaCollection). Try removing -DnumShards and setting the -Dcollection.configName to something abstract such as collection1 rather than solrconfig.xml as you'll actually end up creating a directory in zookeeper called solrconfig.xml which can get confusing. Something like: http://localhost:7071/solr/admin/collections?action=CREATEname=collection1numShards=2replicationFactor=2maxShardsPerNode=2collection.configName=collection1 should fit what you're trying to accomplish. Thanks, Greg On Feb 25, 2014, at 9:09 AM, Oliver Schrenk oliver.schr...@gmail.com wrote: Hi, tldr: I have troubles configuring SolrCloud 4.3.1 to replicate the shard of another machine. Basically what it boils down is the question how to tell on solr instance to replicate the shard of another machine. I though that the system property `-Dshard=2` will do the trick but it doesn't do anything. What to do? --- I want the following setup leader.host_1:7070 / shard1 / \ /replica.host_2:7071 collection \leader.host_2:7070 \ / shard2 \ replica.host_1:7071 I want to run two logical instances (leader replica) of Solr on each physical machine (host_1 host_2). Everything is running but the shard is replicated on the same physical machine! Which doesn't work as a failover mechanism. So at the moment the layout is as follows: leader.host_1:7070 / shard1 / \ /replica.host_1:7071 collection \leader.host_2:7070 \ / shard2 \ replica.host_2:7071 I basically run the following commands on each machine. First on host_1 host_1$ java -Djetty.home=/opt/solr -DnumShards=2 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7070 -Dsolr.solr.home=/opt/solr -Dbootstrap_confdir=conf -cp classpath host_1$ java -Djetty.home=/opt/solr-replica-1 -DnumShards=2 -Dshard=shard2 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7071 -Dsolr.solr.home=/opt/solr-replica-1 -Dbootstrap_confdir=conf -cp classpath Then on host_2 host_2$ java -Djetty.home=/opt/solr -DnumShards=2 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7070 -Dsolr.solr.home=/opt/solr -Dbootstrap_confdir=conf -cp classpath host_2$ java -Djetty.home=/opt/solr-replica-1 -DnumShards=2 -Dshard=shard1 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7071 -Dsolr.solr.home=/opt/solr-replica-1 -Dbootstrap_confdir=conf -cp classpath Am I using the wrong configuration parameter? Is this behaviour possible (with Solr 4.3)? Best regards Oliver
Re: SolrCloud: How to replicate shard of another machine for failover?
I don’t actually run these commands. Everything is written down in either jetty.conf or solr.xml. I basically copy-pasted the output from a `ps -ef | grep solr`. Is the Collections API the only way to do so? At the moment this is a proof of concept, but for going to production and I want to put this into puppet and I would feel more comfortable using configuration files than making a call to a webservice. On 25 Feb 2014, at 16:19, Greg Walters greg.walt...@answers.com wrote: Oliver, You'll probably have better luck not supplying CLI arguments and creating your collection via the collections api (https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateaCollection). Try removing -DnumShards and setting the -Dcollection.configName to something abstract such as collection1 rather than solrconfig.xml as you'll actually end up creating a directory in zookeeper called solrconfig.xml which can get confusing. Something like: http://localhost:7071/solr/admin/collections?action=CREATEname=collection1numShards=2replicationFactor=2maxShardsPerNode=2collection.configName=collection1 should fit what you're trying to accomplish. Thanks, Greg On Feb 25, 2014, at 9:09 AM, Oliver Schrenk oliver.schr...@gmail.com wrote: Hi, tldr: I have troubles configuring SolrCloud 4.3.1 to replicate the shard of another machine. Basically what it boils down is the question how to tell on solr instance to replicate the shard of another machine. I though that the system property `-Dshard=2` will do the trick but it doesn't do anything. What to do? --- I want the following setup leader.host_1:7070 / shard1 / \ /replica.host_2:7071 collection \leader.host_2:7070 \ / shard2 \ replica.host_1:7071 I want to run two logical instances (leader replica) of Solr on each physical machine (host_1 host_2). Everything is running but the shard is replicated on the same physical machine! Which doesn't work as a failover mechanism. So at the moment the layout is as follows: leader.host_1:7070 / shard1 / \ /replica.host_1:7071 collection \leader.host_2:7070 \ / shard2 \ replica.host_2:7071 I basically run the following commands on each machine. First on host_1 host_1$ java -Djetty.home=/opt/solr -DnumShards=2 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7070 -Dsolr.solr.home=/opt/solr -Dbootstrap_confdir=conf -cp classpath host_1$ java -Djetty.home=/opt/solr-replica-1 -DnumShards=2 -Dshard=shard2 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7071 -Dsolr.solr.home=/opt/solr-replica-1 -Dbootstrap_confdir=conf -cp classpath Then on host_2 host_2$ java -Djetty.home=/opt/solr -DnumShards=2 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7070 -Dsolr.solr.home=/opt/solr -Dbootstrap_confdir=conf -cp classpath host_2$ java -Djetty.home=/opt/solr-replica-1 -DnumShards=2 -Dshard=shard1 -Dcollection.configName=solrconfig.xml -DzkHost=localhost:2181 -Djetty.port=7071 -Dsolr.solr.home=/opt/solr-replica-1 -Dbootstrap_confdir=conf -cp classpath Am I using the wrong configuration parameter? Is this behaviour possible (with Solr 4.3)? Best regards Oliver
Re: SolrCloud: How to replicate shard of another machine for failover?
On 2/25/2014 8:09 AM, Oliver Schrenk wrote: I want to run two logical instances (leader replica) of Solr on each physical machine (host_1 host_2). Everything is running but the shard is replicated on the same physical machine! Which doesn't work as a failover mechanism. So at the moment the layout is as follows: Don't run multiple instances of Solr on one machine. Instead, run one instance per machine and create the collection with the maxShardsPerNode parameter set to 2 or whatever value you need. Running multiple instances is a waste of memory, and Solr is perfectly capable of running multiple indexes (cores) on one instance. When there is one Solr instance per machine, SolrCloud will never put replicas on the same machine unless you specifically build them that way with the CoreAdmin API. The way you've set it up, SolrCloud just sees that you have four Solr instances. It does not know that they are on the same machine. As far as it is concerned, they are entirely separate. You might think that it should be able to see that they have the same hostname or IP address, but if we checked for that, we would lose a *lot* of flexibility that users demand. It would be impossible to set up test instances where they are all on the same machine. There are probably other networking scenarios that wouldn't function properly. Something that would be a good idea is an optional config flag that would make SolrCloud compare hostnames when building a collection and avoid putting replicas on nodes where the hostname matches. Whether to default this option to on or off is a whole separate discussion. Yet another whole separate discussion: You need three physical nodes for a redundant zookeeper, but I see only one host (localhost) in your zkHost parameter. Thanks, Shawn
CollapseQParserPlugin problem with ElevateComponent
https://issues.apache.org/jira/browse/SOLR-5773 I am having trouble with CollapseQParserPlugin showing duplicate groups when the search results contain a member of a grouped document but another member of that grouped document is defined in the elevate component. I have described the issue in more detail here: https://issues.apache.org/jira/browse/SOLR-5773 Any help is appreciated. Also any hints as to how I can solve this problem myself would be great as I'm having a bit of trouble understanding the code to implement a fix. -- View this message in context: http://lucene.472066.n3.nabble.com/CollapseQParserPlugin-problem-with-ElevateComponent-tp4119596.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud: How to replicate shard of another machine for failover?
Hi; There is a round robin process when assigning nodes at cluster. If you want to achieve what you want you should change your Solr start up order. Thanks; Furkan KAMACI 2014-02-25 19:17 GMT+02:00 Shawn Heisey s...@elyograg.org: On 2/25/2014 8:09 AM, Oliver Schrenk wrote: I want to run two logical instances (leader replica) of Solr on each physical machine (host_1 host_2). Everything is running but the shard is replicated on the same physical machine! Which doesn't work as a failover mechanism. So at the moment the layout is as follows: Don't run multiple instances of Solr on one machine. Instead, run one instance per machine and create the collection with the maxShardsPerNode parameter set to 2 or whatever value you need. Running multiple instances is a waste of memory, and Solr is perfectly capable of running multiple indexes (cores) on one instance. When there is one Solr instance per machine, SolrCloud will never put replicas on the same machine unless you specifically build them that way with the CoreAdmin API. The way you've set it up, SolrCloud just sees that you have four Solr instances. It does not know that they are on the same machine. As far as it is concerned, they are entirely separate. You might think that it should be able to see that they have the same hostname or IP address, but if we checked for that, we would lose a *lot* of flexibility that users demand. It would be impossible to set up test instances where they are all on the same machine. There are probably other networking scenarios that wouldn't function properly. Something that would be a good idea is an optional config flag that would make SolrCloud compare hostnames when building a collection and avoid putting replicas on nodes where the hostname matches. Whether to default this option to on or off is a whole separate discussion. Yet another whole separate discussion: You need three physical nodes for a redundant zookeeper, but I see only one host (localhost) in your zkHost parameter. Thanks, Shawn
Autocommit, opensearchers and ingestion
Hi all, I'm working with Solr 4.6.1 and I'm trying to tune my ingestion process. The ingestion runs a big DB query and then does some ETL on it and inserts via SolrJ. I have a 4 node cluster with 1 shard per node running in Tomcat with -Xmx=4096M. Each node has a separate instance of Zookeeper on it, plus the ingestion server has one as well. The Solr servers have 8 cores and 64 Gb of total RAM. The ingestion server is a VM with 8 Gb and 2 cores. My ingestion code uses a few settings to control concurrency and batch size. solr.update.batchSize=500 solr.threadCount=4 With this setup, I'm getting a lot of errors and the ingestion is taking much longer than it should. Every so often during the ingestion I get these errors on the Solr servers: WARN shard1 - 2014-02-25 11:18:34.341; org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay tlog{file=/usr/local/solr_shard1/productCatalog/data/tlog/tlog.0014074 refcount=2} active=true starting pos=776774 WARN shard1 - 2014-02-25 11:18:37.275; org.apache.solr.update.UpdateLog$LogReplayer; Log replay finished. recoveryInfo=RecoveryInfo{adds=4065 deletes=0 deleteByQuery=0 errors=0 positionOfStart=776774} WARN shard1 - 2014-02-25 11:18:37.960; org.apache.solr.core.SolrCore; [productCatalog] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 WARN shard1 - 2014-02-25 11:18:37.961; org.apache.solr.core.SolrCore; [productCatalog] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. WARN shard1 - 2014-02-25 11:18:37.961; org.apache.solr.core.SolrCore; [productCatalog] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. ERROR shard1 - 2014-02-25 11:18:37.961; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1575) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1346) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:592) I cut threads down to 1 and batchSize down to 100 and the errors go away, but the upload time jumps up by a factor of 25. My solrconfig.xml has: autoCommit maxDocs${solr.autoCommit.maxDocs:1}/maxDocs maxTime${solr.autoCommit.maxTime:15000}/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime /autoSoftCommit I turned autowarmCount down to 0 for all the caches. What else can I tune to allow me to run bigger batch sizes and more threads in my upload script? -- joel cohen, senior system engineer e joel.co...@bluefly.com p 212.944.8000 x276 bluefly, inc. 42 w. 39th st. new york, ny 10018 www.bluefly.com http://www.bluefly.com/?referer=autosig | *fly since 2013...*
Re: Autocommit, opensearchers and ingestion
Hi; You should read here: http://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F On the other hand do you have 4 Zookeeper instances as a quorum? Thanks; Furkan KAMACI 2014-02-25 20:31 GMT+02:00 Joel Cohen joel.co...@bluefly.com: Hi all, I'm working with Solr 4.6.1 and I'm trying to tune my ingestion process. The ingestion runs a big DB query and then does some ETL on it and inserts via SolrJ. I have a 4 node cluster with 1 shard per node running in Tomcat with -Xmx=4096M. Each node has a separate instance of Zookeeper on it, plus the ingestion server has one as well. The Solr servers have 8 cores and 64 Gb of total RAM. The ingestion server is a VM with 8 Gb and 2 cores. My ingestion code uses a few settings to control concurrency and batch size. solr.update.batchSize=500 solr.threadCount=4 With this setup, I'm getting a lot of errors and the ingestion is taking much longer than it should. Every so often during the ingestion I get these errors on the Solr servers: WARN shard1 - 2014-02-25 11:18:34.341; org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay tlog{file=/usr/local/solr_shard1/productCatalog/data/tlog/tlog.0014074 refcount=2} active=true starting pos=776774 WARN shard1 - 2014-02-25 11:18:37.275; org.apache.solr.update.UpdateLog$LogReplayer; Log replay finished. recoveryInfo=RecoveryInfo{adds=4065 deletes=0 deleteByQuery=0 errors=0 positionOfStart=776774} WARN shard1 - 2014-02-25 11:18:37.960; org.apache.solr.core.SolrCore; [productCatalog] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 WARN shard1 - 2014-02-25 11:18:37.961; org.apache.solr.core.SolrCore; [productCatalog] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. WARN shard1 - 2014-02-25 11:18:37.961; org.apache.solr.core.SolrCore; [productCatalog] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. ERROR shard1 - 2014-02-25 11:18:37.961; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1575) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1346) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:592) I cut threads down to 1 and batchSize down to 100 and the errors go away, but the upload time jumps up by a factor of 25. My solrconfig.xml has: autoCommit maxDocs${solr.autoCommit.maxDocs:1}/maxDocs maxTime${solr.autoCommit.maxTime:15000}/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime /autoSoftCommit I turned autowarmCount down to 0 for all the caches. What else can I tune to allow me to run bigger batch sizes and more threads in my upload script? -- joel cohen, senior system engineer e joel.co...@bluefly.com p 212.944.8000 x276 bluefly, inc. 42 w. 39th st. new york, ny 10018 www.bluefly.com http://www.bluefly.com/?referer=autosig | *fly since 2013...*
Wildcard search not working if the query conatins numbers along with special characters.
Hi, I have a very weird problem. The wild card search works fine for all scenarios but one. It doesn't seem to give any result for query 1999/99*. I checked the debug query and its formed perfect. str name=rawquerystringtitle_autocomplete:1999/99*/str str name=querystringtitle_autocomplete:1999/99*/str str name=parsedquery(+title_autocomplete:1999/99* ())/no_coord/str str name=parsedquery_toString+title_autocomplete:1999/99* ()/str This is my fieldType fieldType name=text_general_Title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please help we with this. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query conatins numbers along with special characters.
Hi Kashish, What happens when you use this q={!prefix f=title_autocomplete}1999/99 I suspect '/' character is a special query parser character therefore it needs to be escaped. Ahmet On Tuesday, February 25, 2014 9:55 PM, Kashish itzz.me.kash...@gmail.com wrote: Hi, I have a very weird problem. The wild card search works fine for all scenarios but one. It doesn't seem to give any result for query 1999/99*. I checked the debug query and its formed perfect. str name=rawquerystringtitle_autocomplete:1999/99*/str str name=querystringtitle_autocomplete:1999/99*/str str name=parsedquery(+title_autocomplete:1999/99* ())/no_coord/str str name=parsedquery_toString+title_autocomplete:1999/99* ()/str This is my fieldType fieldType name=text_general_Title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please help we with this. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit, opensearchers and ingestion
This blog by Eric will help you to understand different commit option and transaction logs and it does provide some recommendation for ingestion process. http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ On Tue, Feb 25, 2014 at 11:40 AM, Furkan KAMACI furkankam...@gmail.comwrote: Hi; You should read here: http://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F On the other hand do you have 4 Zookeeper instances as a quorum? Thanks; Furkan KAMACI 2014-02-25 20:31 GMT+02:00 Joel Cohen joel.co...@bluefly.com: Hi all, I'm working with Solr 4.6.1 and I'm trying to tune my ingestion process. The ingestion runs a big DB query and then does some ETL on it and inserts via SolrJ. I have a 4 node cluster with 1 shard per node running in Tomcat with -Xmx=4096M. Each node has a separate instance of Zookeeper on it, plus the ingestion server has one as well. The Solr servers have 8 cores and 64 Gb of total RAM. The ingestion server is a VM with 8 Gb and 2 cores. My ingestion code uses a few settings to control concurrency and batch size. solr.update.batchSize=500 solr.threadCount=4 With this setup, I'm getting a lot of errors and the ingestion is taking much longer than it should. Every so often during the ingestion I get these errors on the Solr servers: WARN shard1 - 2014-02-25 11:18:34.341; org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay tlog{file=/usr/local/solr_shard1/productCatalog/data/tlog/tlog.0014074 refcount=2} active=true starting pos=776774 WARN shard1 - 2014-02-25 11:18:37.275; org.apache.solr.update.UpdateLog$LogReplayer; Log replay finished. recoveryInfo=RecoveryInfo{adds=4065 deletes=0 deleteByQuery=0 errors=0 positionOfStart=776774} WARN shard1 - 2014-02-25 11:18:37.960; org.apache.solr.core.SolrCore; [productCatalog] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 WARN shard1 - 2014-02-25 11:18:37.961; org.apache.solr.core.SolrCore; [productCatalog] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. WARN shard1 - 2014-02-25 11:18:37.961; org.apache.solr.core.SolrCore; [productCatalog] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. ERROR shard1 - 2014-02-25 11:18:37.961; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1575) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1346) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:592) I cut threads down to 1 and batchSize down to 100 and the errors go away, but the upload time jumps up by a factor of 25. My solrconfig.xml has: autoCommit maxDocs${solr.autoCommit.maxDocs:1}/maxDocs maxTime${solr.autoCommit.maxTime:15000}/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime /autoSoftCommit I turned autowarmCount down to 0 for all the caches. What else can I tune to allow me to run bigger batch sizes and more threads in my upload script? -- joel cohen, senior system engineer e joel.co...@bluefly.com p 212.944.8000 x276 bluefly, inc. 42 w. 39th st. new york, ny 10018 www.bluefly.com http://www.bluefly.com/?referer=autosig | *fly since 2013...*
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Ahmet, Thanks for your reply. Yes. I pass my query this way - q=title_autocomplete:1999%2f99 I tried your way too. But no luck. :( -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119615.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit, opensearchers and ingestion
Gopal: I'm glad somebody noticed that blog! Joel: For bulk loads it's a Good Thing to lengthen out your soft autocommit interval. A lot. Every second poor Solr is trying to open up a new searcher while you're throwing lots of documents at it. That's what's generating the too many searchers problem I'd guess. Soft commits are less expensive than hard commits with openSearcher=true (you're not doing this, and you shouldn't be). But soft commits aren't free. All the top-level caches are thrown away and autowarming is performed. Also, I'd probably consider just leaving off the bit about maxDocs in your hard commit, I find it rarely does all that much good. After all, even if you have to replay the transaction log, you're only talking 15 seconds here. Best, Erick On Tue, Feb 25, 2014 at 12:08 PM, Gopal Patwa gopalpa...@gmail.com wrote: This blog by Eric will help you to understand different commit option and transaction logs and it does provide some recommendation for ingestion process. http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ On Tue, Feb 25, 2014 at 11:40 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; You should read here: http://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F On the other hand do you have 4 Zookeeper instances as a quorum? Thanks; Furkan KAMACI 2014-02-25 20:31 GMT+02:00 Joel Cohen joel.co...@bluefly.com: Hi all, I'm working with Solr 4.6.1 and I'm trying to tune my ingestion process. The ingestion runs a big DB query and then does some ETL on it and inserts via SolrJ. I have a 4 node cluster with 1 shard per node running in Tomcat with -Xmx=4096M. Each node has a separate instance of Zookeeper on it, plus the ingestion server has one as well. The Solr servers have 8 cores and 64 Gb of total RAM. The ingestion server is a VM with 8 Gb and 2 cores. My ingestion code uses a few settings to control concurrency and batch size. solr.update.batchSize=500 solr.threadCount=4 With this setup, I'm getting a lot of errors and the ingestion is taking much longer than it should. Every so often during the ingestion I get these errors on the Solr servers: WARN shard1 - 2014-02-25 11:18:34.341; org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay tlog{file=/usr/local/solr_shard1/productCatalog/data/tlog/tlog.0014074 refcount=2} active=true starting pos=776774 WARN shard1 - 2014-02-25 11:18:37.275; org.apache.solr.update.UpdateLog$LogReplayer; Log replay finished. recoveryInfo=RecoveryInfo{adds=4065 deletes=0 deleteByQuery=0 errors=0 positionOfStart=776774} WARN shard1 - 2014-02-25 11:18:37.960; org.apache.solr.core.SolrCore; [productCatalog] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 WARN shard1 - 2014-02-25 11:18:37.961; org.apache.solr.core.SolrCore; [productCatalog] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. WARN shard1 - 2014-02-25 11:18:37.961; org.apache.solr.core.SolrCore; [productCatalog] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. ERROR shard1 - 2014-02-25 11:18:37.961; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1575) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1346) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:592) I cut threads down to 1 and batchSize down to 100 and the errors go away, but the upload time jumps up by a factor of 25. My solrconfig.xml has: autoCommit maxDocs${solr.autoCommit.maxDocs:1}/maxDocs maxTime${solr.autoCommit.maxTime:15000}/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime /autoSoftCommit I turned autowarmCount down to 0 for all the caches. What else can I tune to allow me to run bigger batch sizes and more threads in my upload script? -- joel cohen, senior system engineer e joel.co...@bluefly.com p 212.944.8000 x276 bluefly, inc. 42 w. 39th st. new york, ny 10018 www.bluefly.com http://www.bluefly.com/?referer=autosig | *fly since 2013...*
Re: Wildcard search not working if the query conatins numbers along with special characters.
What does it say happens on your admin/analysis page for that field? And did you by any chance change your schema without reindexing everything? Also, try the TermsComonent to see what tokens are actually _in_ your index. Schema-browser from the admin page can help here too. Best, Erick On Tue, Feb 25, 2014 at 12:05 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Kashish, What happens when you use this q={!prefix f=title_autocomplete}1999/99 I suspect '/' character is a special query parser character therefore it needs to be escaped. Ahmet On Tuesday, February 25, 2014 9:55 PM, Kashish itzz.me.kash...@gmail.com wrote: Hi, I have a very weird problem. The wild card search works fine for all scenarios but one. It doesn't seem to give any result for query 1999/99*. I checked the debug query and its formed perfect. str name=rawquerystringtitle_autocomplete:1999/99*/str str name=querystringtitle_autocomplete:1999/99*/str str name=parsedquery(+title_autocomplete:1999/99* ())/no_coord/str str name=parsedquery_toString+title_autocomplete:1999/99* ()/str This is my fieldType fieldType name=text_general_Title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please help we with this. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi, By saying escaping I mean this : q=title_autocomplete:1999\/99* It is different than URL encoding. http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters If prefix query parser didn't return what you want then it must be something with indexed terms. Can you give an example raw documents text that you expect to retrieve with this query? On Tuesday, February 25, 2014 10:15 PM, Kashish itzz.me.kash...@gmail.com wrote: Hi Ahmet, Thanks for your reply. Yes. I pass my query this way - q=title_autocomplete:1999%2f99 I tried your way too. But no luck. :( -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119615.html Sent from the Solr - User mailing list archive at Nabble.com.
excludeIds in QueryElevationComponent (4.7)
Guys, I've been testing out https://issues.apache.org/jira/browse/SOLR-5541 on 4.7RC4. I previously had an elevate.xml that elevated 3 documents for a specific query. My understanding is that I could, at runtime, exclude one of those. So I tried that like this: http://localhost:8080/solr/ecommerce/search?q=canonexcludeIds=208464207 and now NONE of my documents are elevated. What I would have expected is that I'd have 2 elevated documents, but the 208464207 would not be amongst them. Sadly, what happens is that now nothing is elevated. Am I misunderstanding something or should I open a JIRA? Looking at the source code I can't immediately see what would be wrong. Thanks, Lajos
Re: excludeIds in QueryElevationComponent (4.7)
Hit the send button too fast ... What is seems that is happening is that excludeIds or elevateIds ignores what's in elevate.xml. I would have expected (hoped) that it would layer on top of that, which makes a bit more sense I think. Thanks, Lajos On 25/02/2014 22:58, Lajos wrote: Guys, I've been testing out https://issues.apache.org/jira/browse/SOLR-5541 on 4.7RC4. I previously had an elevate.xml that elevated 3 documents for a specific query. My understanding is that I could, at runtime, exclude one of those. So I tried that like this: http://localhost:8080/solr/ecommerce/search?q=canonexcludeIds=208464207 and now NONE of my documents are elevated. What I would have expected is that I'd have 2 elevated documents, but the 208464207 would not be amongst them. Sadly, what happens is that now nothing is elevated. Am I misunderstanding something or should I open a JIRA? Looking at the source code I can't immediately see what would be wrong. Thanks, Lajos
Re: excludeIds in QueryElevationComponent (4.7)
: What is seems that is happening is that excludeIds or elevateIds ignores : what's in elevate.xml. I would have expected (hoped) that it would layer on : top of that, which makes a bit more sense I think. That's not how it's implemented -- i believe Joel implemented this way intentional because otherwise if the elevate.xml said elevate A,B and exclude X,Y there would be no simple way to say instead of what's in elevate.xml, i want to elevate X,Y and i don't wnat to exclude *anything* I made sure this was explicitly documented in the ref guide... https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component#TheQueryElevationComponent-TheelevateIdsandexcludeIdsParameters If either one of these parameters is specified at request time, the the entire elevation configuration for the query is ignored. -Hoss http://www.lucidworks.com/
Re: excludeIds in QueryElevationComponent (4.7)
Thanks Hoss, that makes sense. Anyway, I like the new paradigm better ... it allows for more intelligent elevation control. Cheers, L On 25/02/2014 23:26, Chris Hostetter wrote: : What is seems that is happening is that excludeIds or elevateIds ignores : what's in elevate.xml. I would have expected (hoped) that it would layer on : top of that, which makes a bit more sense I think. That's not how it's implemented -- i believe Joel implemented this way intentional because otherwise if the elevate.xml said elevate A,B and exclude X,Y there would be no simple way to say instead of what's in elevate.xml, i want to elevate X,Y and i don't wnat to exclude *anything* I made sure this was explicitly documented in the ref guide... https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component#TheQueryElevationComponent-TheelevateIdsandexcludeIdsParameters If either one of these parameters is specified at request time, the the entire elevation configuration for the query is ignored. -Hoss http://www.lucidworks.com/
Re: SolrCloud Startup
Jeff : Thanks. I have tried reload before but it is not reliable (atleast in 4.3.1). A few cores get initialized and few dont (show as just recovering or down) and hence had to move away from it. Is it a known issue in 4.3.1? Shawn,Otis,Erick Yes I have reviewed the page before and have given 1/4 of my mem to JVM and the rest to RAM/Os Cache. (15 Gb heap and 45 G to rest. Totally 60G machine). I have also reviewed the tlog file and they are in the order of KB (4-10 or 30). I have SSD and the reads are hardly noticable (in the order of 100Kb during that time frame). I have also disabled swap on all machines Regarding firstSearcher, It is currently set to externalFileLoader. What is the use of first searcher? I havent played around with it Thanks Nitin On Mon, Feb 24, 2014 at 7:58 PM, Erick Erickson erickerick...@gmail.comwrote: What is your firstSearcher set to in solrconfig.xml? If you're doing something really crazy there that might be an issue. But I think Otis' suggestion is a lot more probable. What are your autocommits configured to? Best, Erick On Mon, Feb 24, 2014 at 7:41 PM, Shawn Heisey s...@elyograg.org wrote: Hi I have a 4 node solrcloud cluster with more than 50 collections with 4 shards each. Everytime I want to make a schema change, I upload configs to zookeeper and then restart all nodes. However the restart of every node is very slow and takes about 20-30 minutes per node. Is it recommended to make loadOnStartup=false and allow solrcloud to lazy load? Is there a way to make schema changes without restarting solrcloud? I'm on my phone so getting a Url for you is hard. Search the wiki for SolrPerformanceProblems. There's a section there on slow startup. If that's not it, it's probably not enough RAM for the OS disk cache. That is also discussed on that wiki page. Thanks, Shawn
Re: SolrCloud Startup
Erick: My autocommit is set to trigger every 30 seconds with openSearcher=false. The autocommit for soft commits are disabled On Tue, Feb 25, 2014 at 3:30 PM, KNitin nitin.t...@gmail.com wrote: Jeff : Thanks. I have tried reload before but it is not reliable (atleast in 4.3.1). A few cores get initialized and few dont (show as just recovering or down) and hence had to move away from it. Is it a known issue in 4.3.1? Shawn,Otis,Erick Yes I have reviewed the page before and have given 1/4 of my mem to JVM and the rest to RAM/Os Cache. (15 Gb heap and 45 G to rest. Totally 60G machine). I have also reviewed the tlog file and they are in the order of KB (4-10 or 30). I have SSD and the reads are hardly noticable (in the order of 100Kb during that time frame). I have also disabled swap on all machines Regarding firstSearcher, It is currently set to externalFileLoader. What is the use of first searcher? I havent played around with it Thanks Nitin On Mon, Feb 24, 2014 at 7:58 PM, Erick Erickson erickerick...@gmail.comwrote: What is your firstSearcher set to in solrconfig.xml? If you're doing something really crazy there that might be an issue. But I think Otis' suggestion is a lot more probable. What are your autocommits configured to? Best, Erick On Mon, Feb 24, 2014 at 7:41 PM, Shawn Heisey s...@elyograg.org wrote: Hi I have a 4 node solrcloud cluster with more than 50 collections with 4 shards each. Everytime I want to make a schema change, I upload configs to zookeeper and then restart all nodes. However the restart of every node is very slow and takes about 20-30 minutes per node. Is it recommended to make loadOnStartup=false and allow solrcloud to lazy load? Is there a way to make schema changes without restarting solrcloud? I'm on my phone so getting a Url for you is hard. Search the wiki for SolrPerformanceProblems. There's a section there on slow startup. If that's not it, it's probably not enough RAM for the OS disk cache. That is also discussed on that wiki page. Thanks, Shawn
Re: SolrCloud Startup
On 2/25/2014 4:30 PM, KNitin wrote: Jeff : Thanks. I have tried reload before but it is not reliable (atleast in 4.3.1). A few cores get initialized and few dont (show as just recovering or down) and hence had to move away from it. Is it a known issue in 4.3.1? With Solr 4.3.1, you are running into this bug with reloads under SolrCloud: https://issues.apache.org/jira/browse/SOLR-4805 The only way to recover from this bug is to restart Solr.The bug is fixed in 4.4.0 and later. Shawn,Otis,Erick Yes I have reviewed the page before and have given 1/4 of my mem to JVM and the rest to RAM/Os Cache. (15 Gb heap and 45 G to rest. Totally 60G machine). I have also reviewed the tlog file and they are in the order of KB (4-10 or 30). I have SSD and the reads are hardly noticable (in the order of 100Kb during that time frame). I have also disabled swap on all machines Regarding firstSearcher, It is currently set to externalFileLoader. What is the use of first searcher? I havent played around with it I don't think it's a good idea to have extensive warming queries. I do exactly one query in firstSearcher and newSearcher: a query for all documents with zero rows, sorted on our most common sort field. This is designed purely to preload the sort data into the FieldCache. Thanks, Shawn
Re: CollapseQParserPlugin problem with ElevateComponent
Hi David, Just read through your comments on the jira. Feel free to create a jira for this. The way this currently works is that if the elevated document is not the selected group head, then both the elevated document and the group head are in the result set. What you are suggesting is that the elevated document becomes the group head. We can discuss the best way to handle this on the new ticket. Joel Joel Bernstein Search Engineer at Heliosearch On Tue, Feb 25, 2014 at 1:29 PM, dboychuck dboych...@build.com wrote: https://issues.apache.org/jira/browse/SOLR-5773 I am having trouble with CollapseQParserPlugin showing duplicate groups when the search results contain a member of a grouped document but another member of that grouped document is defined in the elevate component. I have described the issue in more detail here: https://issues.apache.org/jira/browse/SOLR-5773 Any help is appreciated. Also any hints as to how I can solve this problem myself would be great as I'm having a bit of trouble understanding the code to implement a fix. -- View this message in context: http://lucene.472066.n3.nabble.com/CollapseQParserPlugin-problem-with-ElevateComponent-tp4119596.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
Hi Ahmet/Erick, I tried escaping as well. See no luck. The title am looking for is - ARABIAN NIGHTS #01 (1999/99) I figured out that if i pass the query as *1999/99* (i.e asterisk not only at the end but at the beginning as well), It works. The problem is the braces. I can change my field type and add filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ But this will show too many results in autocomplete. Is there any best way to handle this? Or should i pass asterisk before and after the query? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119678.html Sent from the Solr - User mailing list archive at Nabble.com.
XML with duplicate element names
I'm trying to query XML documents stored in Riak 2.0, which has integrated Solr. My XML looks like this. MainData Info Info name=Bob city=Columbus / Info name=Joe city=Cincinnati / /Info /MainData So a search in Riak might look something like this: q=MainData.Info.Info@name:Bob So let's say I want to match all documents where the name=Bob and city=Cincinnati, for same element ... If I do something like the following: q=MainData.Info.Info@name:Bob AND MainData.Info.Info@city:Cincinnati I'll get a hit - even though that's not what I'm really looking for - I want Bob and Cincinnati matching in the same Info element. So if you take my example XML at the top of my post here, how would I write the query to match a document where the MainData.Info.Info element has the attributes name=Joe and city=Cincinnati ... or the following line: Info name=Joe city=Cincinnati / I did try a fq, that looked like this, figuring I could filter down to the element where name=Joe, then test to see if city=Cincinnati - that it didn't work: q=MainData.Info.Info@city:Cincinnatifq=MainData.Info.Info@name:Joe I'm obviously a noob here, so I apologize for my noobness in advance. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/XML-with-duplicate-element-names-tp4119679.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CollapseQParserPlugin problem with ElevateComponent
Hi Joel, Thank you for the reply. I created https://issues.apache.org/jira/browse/SOLR-5773 for this new feature. I was looking at the getBoostDocs() function and if I understand it correctly it iterates through the boosted set that is passed into the function and then iterates over the boosted SetString and casts each element to a BytesRef and stores it in a HashSet. While I'm confused what all the type conversion is actually doing I can follow the logic somewhat. You then traverse the index and retrieve all of the terms for the uniqueid of the schema. You then seek in the localBoosts hashset for the current document and if it is in the index you add it to boostDocs to be returned from the function as well as remove the document from the localBoosts hashset. I don't think the document is actually removed from the result set in this function however. I spent some hours today trying to decipher some of this code. I am very interested in understanding this code so that I can contribute back to this project but I am finding it all a bit daunting. As always your help is greatly appreciated and thank you for the quick response. On Tue, Feb 25, 2014 at 5:38 PM, Joel Bernstein [via Lucene] ml-node+s472066n411966...@n3.nabble.com wrote: Hi David, Just read through your comments on the jira. Feel free to create a jira for this. The way this currently works is that if the elevated document is not the selected group head, then both the elevated document and the group head are in the result set. What you are suggesting is that the elevated document becomes the group head. We can discuss the best way to handle this on the new ticket. Joel Joel Bernstein Search Engineer at Heliosearch On Tue, Feb 25, 2014 at 1:29 PM, dboychuck [hidden email]http://user/SendEmail.jtp?type=nodenode=4119662i=0 wrote: https://issues.apache.org/jira/browse/SOLR-5773 I am having trouble with CollapseQParserPlugin showing duplicate groups when the search results contain a member of a grouped document but another member of that grouped document is defined in the elevate component. I have described the issue in more detail here: https://issues.apache.org/jira/browse/SOLR-5773 Any help is appreciated. Also any hints as to how I can solve this problem myself would be great as I'm having a bit of trouble understanding the code to implement a fix. -- View this message in context: http://lucene.472066.n3.nabble.com/CollapseQParserPlugin-problem-with-ElevateComponent-tp4119596.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/CollapseQParserPlugin-problem-with-ElevateComponent-tp4119596p4119662.html To unsubscribe from CollapseQParserPlugin problem with ElevateComponent, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4119596code=ZGJveWNodWNrQGJ1aWxkLmNvbXw0MTE5NTk2fDE3MjM1MzAyMTk= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- *David Boychuck* Software Engineer Search, Team Lead Build.com, Inc. http://corp.build.com/ Smarter Home Improvement(tm) P.O. Box 7990 Chico, CA 95927 *P*: 800.375.3403 *F*: 530.566.1893 dboych...@build.com | Network of Storeshttp://www.build.com/index.cfm?page=help:networkstoressource=emailSignature -- View this message in context: http://lucene.472066.n3.nabble.com/CollapseQParserPlugin-problem-with-ElevateComponent-tp4119596p4119680.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard search not working if the query contains numbers along with special characters.
The admin/analysis page is your friend. Taking some time to get acquainted with that page will save you lots and lots and lots of time. In this case, you'd have seen that your input is actually tokenized as (1999/99), parentheses and all as a _single_ token, so of course searching for 1999/99 wouldn't work. Searching for *1999/99* is generally a bad idea. It'll work, but it's a kludge. What you _do_ need to do is define your use-cases. Let's assume that you _never_ want parentheses to be relevant. You could use PatternReplaceCharFilterFactory or PatternReplaceFilterFactory in both index and query parts of your analysis chain to remove parens. Or really any kinds of extraneous characters you decided were unimportant. But you need to decide what's important and enforce that. Best, Erick On Tue, Feb 25, 2014 at 7:28 PM, Kashish itzz.me.kash...@gmail.com wrote: Hi Ahmet/Erick, I tried escaping as well. See no luck. The title am looking for is - ARABIAN NIGHTS #01 (1999/99) I figured out that if i pass the query as *1999/99* (i.e asterisk not only at the end but at the beginning as well), It works. The problem is the braces. I can change my field type and add filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ But this will show too many results in autocomplete. Is there any best way to handle this? Or should i pass asterisk before and after the query? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119678.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: programmatically disable/enable solr queryResultCache...
Erick, Thanks for the response. Kindly have a look at my sample query, select?fl=city,$scoreq=*:*fq={!lucene q.op=OR df=city v=$cit}cit=Chennai *sort=$score desc score=norm($la,value,10) la=8 b=1c=2*here, score= norm($la,value,10), norm is a custom function *,if I change la then the $score will change.* first time it work fine but if I am changing la alone and firing the query again the result remains in the same order as first query result.Which means sorting is not happening even the score is different.But If I am changing the cit=Chennai to cit=someCity then I am getting result in proper order,means sorting works fine. At any rate, queryResultCache is unlikely to impact much. All it is is *a map containing the query and the first few document IDs *(internal Lucene). which means query is the unique key and list of document ids are values mapped with that key.If I am not wrong, may I know how solr builds the unique keys based on the queries. Whether it builds the key based on only solr common query parameters or it will include all the parameters supplied by user as part of query(for e.g la=8b=1c=2 ). any clue? Thanks Regards, Senthilnathan V On Tue, Feb 25, 2014 at 8:00 PM, Erick Erickson erickerick...@gmail.comwrote: This seems like an XY problem, you're asking for specifics on doing something without any indication _why_ you think this would help. Nor are you explaining what the problem you're having is in the first place. At any rate, queryResultCache is unlikely to impact much. All it is is a map containing the query and the first few document IDs (internal Lucene). See queryResultWindowSize in solrconfig.xml. It is quite light-weight, it does NOT store the entire result set, nor even the contents of the documents. Best Erick On Tue, Feb 25, 2014 at 6:07 AM, Senthilnathan Vijayaraja senthilnat...@8kmiles.com wrote: is there any way programmatically disable/enable solr queryResultCache? I am using SolrJ. Thanks Regards, Senthilnathan V