Re: Hash range to shard assignment
That is in the pipeline. within next 3-4 months for sure On Mon, Sep 23, 2013 at 11:07 PM, lochri loc...@web.de wrote: Yes, actually that would be a very comfortable solution. Is that planned ? And if so, when will it be released ? -- View this message in context: http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204p4091591.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul
Re: Hash range to shard assignment
Custom routers is an idea that is floated around ad easy to implement. Just that it is something we resist to add another extension point. The point is we are planning other features which would obviate the need for a custom router. Something like splitting a shard by a query. Will it be a good enough solution for you? On Mon, Sep 23, 2013 at 2:52 PM, lochri loc...@web.de wrote: Thanks for the clarification. Still I would think it is sub-optimal to split shards when we actually don't know which mailboxes we actually split. It may create splits of small users which leads to unnecessary distribution of the smaller users. We thought about doing the routing ourself. As far as a I understood we can do distributed searches across multiple collections. What do you think about this option ? For the ideal solution: when will custom routers be supported ? Regards, Lochri -- View this message in context: http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204p4091503.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul
Re: Hash range to shard assignment
This would need you to plug your own router . It is not yet possible But , you can split that shard repeatedly and keep the no:of users in that shard limited On Fri, Sep 20, 2013 at 3:52 PM, lochri loc...@web.de wrote: Hello folks, we would like to have control of where certain hash values or ranges are being located. The reason is that we want to shard per user but we know ahead that one or more specific users could grow way faster than others. Therefore we would like to locate them on separate shards (which may be on the same server initially and can be moved out later). So my question: can we control the hash-ranges and hash-range to shard assignment in SolrCloud ? Regards, Lochri -- View this message in context: http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul
Re: dataconfig to index ZIP Files
IIRC Zip files are not supported On Mon, Jul 1, 2013 at 10:30 PM, ericrs22 ericr...@yahoo.com wrote: To answer the previous Post: I was not sure what datasource=binaryFile I took it from a PDF sample thinking that would help. after setting datasource=null I'm still gett the same errors... dataConfig dataSource type=BinFileDataSource user=svcSolr password=SomePassword / document entity name=Archive processor=FileListEntityProcessor baseDir=E:\ArchiveRoot fileName=.zip$ recursive=true rootEntity=false dataSource=null onError=skip field column=fileSize name=size/ field column=file name=filename/ /entity /document /dataConfig the logs report this: INFO - 2013-07-01 16:45:57.317; org.apache.solr.handler.dataimport.DataImporter; Starting Full Import WARN - 2013-07-01 16:45:57.333; org.apache.solr.handler.dataimport.SimplePropertiesWriter; Unable to read: dataimport.properties -- View this message in context: http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074399.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul
Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor
it is possible to create two separate root entities . one for full-import and another for delta. for the delta-import you can skip Cache that way On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber constantin.wol...@medicalcolumbus.de wrote: Hi, i searched for a solution for quite some time but did not manage to find some real hints on how to fix it. I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a tomcat 6 container. My data import setup is basically the following: Data-config.xml: entity name=article dataSource=ds1 query=SELECT * FROM article deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date gt; '${dih.last_index_time} deltaImportQuery=SELECT * FROM article WHERE myownid=${dih.delta.myownid} pk=myownid field column=myownid name=id/ entity name=supplier dataSource=ds2 query=SELECT * FROM supplier WHERE status=1 processor=CachedSqlEntityProcessor cacheKey=SUPPLIER_ID cacheLookup=article.ARTICLE_SUPPLIER_ID /entity entity name=attributes dataSource=ds1 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' Value:'+ATTRIBUTE_VALUE FROM attributes cacheKey=ARTICLE_ID cacheLookup=article.myownid processor=CachedSqlEntityProcessor /entity /entity Ok now for the problem: At first I tried everything without the Cache. But the full-import took a very long time. Because the attributes query is pretty slow compared to the rest. As a result I got a processing speed of around 150 Documents/s. When switching everything to the CachedSqlEntityProcessor the full import processed at the speed of 4000 Documents/s So full import is running quite fine. Now I wanted to use the delta import. When running the delta import I was expecting the ramp up time to be about the same as in full import since I need to load the whole table supplier and attributes to the cache in the first step. But when looking into the log file the weird thing is solr seems to refresh the Cache for every single document that is processed. So currently my delta-import is a lot slower than the full-import. I even tried to add the deltaImportQuery parameter to the entity but it doesn't change the behavior at all (of course I know it is not supposed to change anything in the setup I run). The following solutions would be possible in my opinion: 1. Is there any way to tell the config to ignore the Cache when running a delta import? That would help already because we are talking about the maximum of 500 documents changed in 15 minutes compared to over 5 million documents in total. 2. Get solr to not refresh the cash for every document. Best Regards Constantin Wolber -- - Noble Paul
Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor
yes. that's right On Thu, Jun 20, 2013 at 8:16 PM, Constantin Wolber constantin.wol...@medicalcolumbus.de wrote: Hi, i may have been a little to fast with my response. After reading a bit more I imagine you meant running the full-import with the entity param for the root entity for full import. And running the delta import with the entity param for the delta entity. Is that correct? Regards Constantin -Ursprüngliche Nachricht- Von: Constantin Wolber [mailto:constantin.wol...@medicalcolumbus.de] Gesendet: Donnerstag, 20. Juni 2013 16:42 An: solr-user@lucene.apache.org Betreff: AW: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor Hi, and thanks for the answer. But I'm a little bit confused about what you are suggesting. I did not really use the rootEntity attribute before. But from what I read in the documentation as far as I can tell that would result in two documents (maybe with the same id which would probably result in only one document being stored) because one for each root entity. It would be great if you could just sketch the setup with the entities I provided. Because currently I have no idea on how to do it. Regards Constantin -Ursprüngliche Nachricht- Von: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] Gesendet: Donnerstag, 20. Juni 2013 15:42 An: solr-user@lucene.apache.org Betreff: Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor it is possible to create two separate root entities . one for full-import and another for delta. for the delta-import you can skip Cache that way On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber constantin.wol...@medicalcolumbus.de wrote: Hi, i searched for a solution for quite some time but did not manage to find some real hints on how to fix it. I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a tomcat 6 container. My data import setup is basically the following: Data-config.xml: entity name=article dataSource=ds1 query=SELECT * FROM article deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date gt; '${dih.last_index_time} deltaImportQuery=SELECT * FROM article WHERE myownid=${dih.delta.myownid} pk=myownid field column=myownid name=id/ entity name=supplier dataSource=ds2 query=SELECT * FROM supplier WHERE status=1 processor=CachedSqlEntityProcessor cacheKey=SUPPLIER_ID cacheLookup=article.ARTICLE_SUPPLIER_ID /entity entity name=attributes dataSource=ds1 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' Value:'+ATTRIBUTE_VALUE FROM attributes cacheKey=ARTICLE_ID cacheLookup=article.myownid processor=CachedSqlEntityProcessor /entity /entity Ok now for the problem: At first I tried everything without the Cache. But the full-import took a very long time. Because the attributes query is pretty slow compared to the rest. As a result I got a processing speed of around 150 Documents/s. When switching everything to the CachedSqlEntityProcessor the full import processed at the speed of 4000 Documents/s So full import is running quite fine. Now I wanted to use the delta import. When running the delta import I was expecting the ramp up time to be about the same as in full import since I need to load the whole table supplier and attributes to the cache in the first step. But when looking into the log file the weird thing is solr seems to refresh the Cache for every single document that is processed. So currently my delta-import is a lot slower than the full-import. I even tried to add the deltaImportQuery parameter to the entity but it doesn't change the behavior at all (of course I know it is not supposed to change anything in the setup I run). The following solutions would be possible in my opinion: 1. Is there any way to tell the config to ignore the Cache when running a delta import? That would help already because we are talking about the maximum of 500 documents changed in 15 minutes compared to over 5 million documents in total. 2. Get solr to not refresh the cash for every document. Best Regards Constantin Wolber -- - Noble Paul -- - Noble Paul
Re: Replication not working
can you check with the indexversion command on both mater and slave? pollInterval is set to 2 minutes. It is usually long . So you may need to wait for 2 mins for the replication to kick in On Tue, Jun 11, 2013 at 3:21 PM, thomas.poroc...@der.net wrote: Hi all, we have a setup with multiple cores, loaded via DataImportHandlers. Works fine so far. Now we are trying to get the replication working (for one core so far). But the automated replication is never happening. Manually triggered replication works! Environment: Solr 4.1 (also tried with 4.3) App-Server JBoss 4.3. Java 1.6 There are two JBoss instances running on different ports on the same box with their own solr.home directories. Configuration is done like described in the documentation: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${de.der.pu.solr.master.enable:false}/str str name=replicateAfterstartup/str str name=replicateAftercommit/str str name=replicateAfteroptimize/str str name=confFilesstopwords.txt, solrconfig.xml/str /lst lst name=slave str name=enable${de.der.pu.solr.slave.enable:false}/str str name=masterUrlhttp://localhost:30006/solr/${solr.core.name}/str str name=pollInterfall00:02:00/str /lst /requestHandler Basically it looks all fine from the admin-pages. The polling from the slave is going on but nothing happens. We have tried to delete slave index completely and restart both servers. Reimportet the master data several times and so on.. On the masters replication page I see: - replication enable: true - replicateAfter: commit, startup - confFiles: stopwords.txt, solrconfig.xml On slave side I see: -masters version 1370612995391 53 2.56 MB -master url: http://localhost:30006/solr/contacts -poling enable: true And master settings like on master side... When I enter http://localhost:30006/solr/contacts/replication?command=detailswt=json indent=true in the browser the response seems ok: { responseHeader:{ status:0, QTime:0}, details:{ indexSize:2.56 MB, indexPath:D:\\usr\\local\\phx-unlimited\\jboss\\solr_cache\\test_pto_ node1_solr\\contacts\\data\\index/, commits:[[ indexVersion,1370612995391, generation,53, filelist,[_1r.fdt, _1r.fdx, _1r.fnm, _1r.nvd, _1r.nvm, _1r.si, _1r_Lucene41_0.doc, _1r_Lucene41_0.pos, _1r_Lucene41_0.tim, _1r_Lucene41_0.tip, segments_1h]]], isMaster:true, isSlave:false, indexVersion:1370612995391, generation:53, master:{ confFiles:stopwords.txt, solrconfig.xml, replicateAfter:[commit, startup], replicationEnabled:true, replicableVersion:1370612995391, replicableGeneration:53}}, WARNING:This response format is experimental. It is likely to change in the future.} Any idea how we could go on? Regards Thomas -- - Noble Paul
Re: Replication not working
You said polling is happening and nothing is replicated What do the logs say on slave (Set level to INFO) ? On Tue, Jun 11, 2013 at 4:54 PM, thomas.poroc...@der.net wrote: Calling indexversion on master gives: response lst name=responseHeader int name=status0/intint name=QTime0/int /lst long name=indexversion1370612995391/long long name=generation53/long /response On Slave: response lst name=responseHeaderint name=status0/int int name=QTime0/int/lst long name=indexversion0/long long name=generation1/long /response pollInterval is set to 2 minutes. It is usually long I know ;-) -Ursprüngliche Nachricht- Von: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] Gesendet: Dienstag, 11. Juni 2013 13:16 An: solr-user@lucene.apache.org Betreff: Re: Replication not working can you check with the indexversion command on both mater and slave? pollInterval is set to 2 minutes. It is usually long . So you may need to wait for 2 mins for the replication to kick in On Tue, Jun 11, 2013 at 3:21 PM, thomas.poroc...@der.net wrote: Hi all, we have a setup with multiple cores, loaded via DataImportHandlers. Works fine so far. Now we are trying to get the replication working (for one core so far). But the automated replication is never happening. Manually triggered replication works! Environment: Solr 4.1 (also tried with 4.3) App-Server JBoss 4.3. Java 1.6 There are two JBoss instances running on different ports on the same box with their own solr.home directories. Configuration is done like described in the documentation: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${de.der.pu.solr.master.enable:false}/str str name=replicateAfterstartup/str str name=replicateAftercommit/str str name=replicateAfteroptimize/str str name=confFilesstopwords.txt, solrconfig.xml/str /lst lst name=slave str name=enable${de.der.pu.solr.slave.enable:false}/str str name=masterUrlhttp://localhost:30006/solr/${solr.core.name}/str str name=pollInterfall00:02:00/str /lst /requestHandler Basically it looks all fine from the admin-pages. The polling from the slave is going on but nothing happens. We have tried to delete slave index completely and restart both servers. Reimportet the master data several times and so on.. On the masters replication page I see: - replication enable: true - replicateAfter: commit, startup - confFiles: stopwords.txt, solrconfig.xml On slave side I see: -masters version 1370612995391 53 2.56 MB -master url: http://localhost:30006/solr/contacts -poling enable: true And master settings like on master side... When I enter http://localhost:30006/solr/contacts/replication?command=detailswt=json indent=true in the browser the response seems ok: { responseHeader:{ status:0, QTime:0}, details:{ indexSize:2.56 MB, indexPath:D:\\usr\\local\\phx-unlimited\\jboss\\solr_cache\\test_pto_ node1_solr\\contacts\\data\\index/, commits:[[ indexVersion,1370612995391, generation,53, filelist,[_1r.fdt, _1r.fdx, _1r.fnm, _1r.nvd, _1r.nvm, _1r.si, _1r_Lucene41_0.doc, _1r_Lucene41_0.pos, _1r_Lucene41_0.tim, _1r_Lucene41_0.tip, segments_1h]]], isMaster:true, isSlave:false, indexVersion:1370612995391, generation:53, master:{ confFiles:stopwords.txt, solrconfig.xml, replicateAfter:[commit, startup], replicationEnabled:true, replicableVersion:1370612995391, replicableGeneration:53}}, WARNING:This response format is experimental. It is likely to change in the future.} Any idea how we could go on? Regards Thomas -- - Noble Paul -- - Noble Paul
Re: Replication not working
I mean , the log when polling happens when from slave. Not when you issue a command. On Tue, Jun 11, 2013 at 5:28 PM, thomas.poroc...@der.net wrote: Log on slave: 2013-06-11 13:19:08,477 8385607 INFO [org.apache.solr.core.SolrCore] (http-0.0.0.0-31006-1:) [contacts] webapp=/solr path=/replication params={indent=truecommand=indexversionswt=json+} status=0 QTime=0 2013-06-11 13:19:08,477 8385607 DEBUG [org.apache.solr.servlet.SolrDispatchFilter] (http-0.0.0.0-31006-1:) Closing out SolrRequest: {indent=truecommand=indexversionswt=json+} 2013-06-11 13:22:27,017 8584147 INFO [org.apache.solr.core.SolrCore] (http-0.0.0.0-31006-1:) [contacts] webapp=/solr path=/replication params={command=indexversion} status=0 QTime=0 2013-06-11 13:22:27,017 8584147 DEBUG [org.apache.solr.servlet.SolrDispatchFilter] (http-0.0.0.0-31006-1:) Closing out SolrRequest: {command=indexversion} -Ursprüngliche Nachricht- Von: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] Gesendet: Dienstag, 11. Juni 2013 13:41 An: solr-user@lucene.apache.org Betreff: Re: Replication not working You said polling is happening and nothing is replicated What do the logs say on slave (Set level to INFO) ? On Tue, Jun 11, 2013 at 4:54 PM, thomas.poroc...@der.net wrote: Calling indexversion on master gives: response lst name=responseHeader int name=status0/intint name=QTime0/int /lst long name=indexversion1370612995391/long long name=generation53/long /response On Slave: response lst name=responseHeaderint name=status0/int int name=QTime0/int/lst long name=indexversion0/long long name=generation1/long /response pollInterval is set to 2 minutes. It is usually long I know ;-) -Ursprüngliche Nachricht- Von: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] Gesendet: Dienstag, 11. Juni 2013 13:16 An: solr-user@lucene.apache.org Betreff: Re: Replication not working can you check with the indexversion command on both mater and slave? pollInterval is set to 2 minutes. It is usually long . So you may need to wait for 2 mins for the replication to kick in On Tue, Jun 11, 2013 at 3:21 PM, thomas.poroc...@der.net wrote: Hi all, we have a setup with multiple cores, loaded via DataImportHandlers. Works fine so far. Now we are trying to get the replication working (for one core so far). But the automated replication is never happening. Manually triggered replication works! Environment: Solr 4.1 (also tried with 4.3) App-Server JBoss 4.3. Java 1.6 There are two JBoss instances running on different ports on the same box with their own solr.home directories. Configuration is done like described in the documentation: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${de.der.pu.solr.master.enable:false}/str str name=replicateAfterstartup/str str name=replicateAftercommit/str str name=replicateAfteroptimize/str str name=confFilesstopwords.txt, solrconfig.xml/str /lst lst name=slave str name=enable${de.der.pu.solr.slave.enable:false}/str str name=masterUrlhttp://localhost:30006/solr/${solr.core.name}/str str name=pollInterfall00:02:00/str /lst /requestHandler Basically it looks all fine from the admin-pages. The polling from the slave is going on but nothing happens. We have tried to delete slave index completely and restart both servers. Reimportet the master data several times and so on.. On the masters replication page I see: - replication enable: true - replicateAfter: commit, startup - confFiles: stopwords.txt, solrconfig.xml On slave side I see: -masters version 1370612995391 53 2.56 MB -master url: http://localhost:30006/solr/contacts -poling enable: true And master settings like on master side... When I enter http://localhost:30006/solr/contacts/replication?command=detailswt=json indent=true in the browser the response seems ok: { responseHeader:{ status:0, QTime:0}, details:{ indexSize:2.56 MB, indexPath:D:\\usr\\local\\phx-unlimited\\jboss\\solr_cache\\test_pto_ node1_solr\\contacts\\data\\index/, commits:[[ indexVersion,1370612995391, generation,53, filelist,[_1r.fdt, _1r.fdx, _1r.fnm, _1r.nvd, _1r.nvm, _1r.si, _1r_Lucene41_0.doc, _1r_Lucene41_0.pos, _1r_Lucene41_0.tim, _1r_Lucene41_0.tip
Re: LotsOfCores feature
Aleksey, It was a less than ideal situation. because we did not have a choice. We had external systems/scripts to manage this. A new custom implementation is being built on SolrCloud which would have taken care of most of hose issues. SolrReplication is a hidden once you move to cloud. But it will continue in the same way if you have a stand-lone deployment. On Mon, Jun 10, 2013 at 1:20 AM, Aleksey bitterc...@gmail.com wrote: Thanks Paul. Just a little clarification: You mention that you migrate data using built-in replication, but if you map and route users yourself, doesn't that mean that you also need to manage replication yourself? Your routing logic needs to be aware of how to map both replicas for each user, and if one hosts goes down, then it needs to distribute traffic that it was receiving over other hosts. Same thing for adding more hosts. I did a couple of quick searches and found mostly older wikis that say solr replication will change in the future. Would you be able to point me to the right one? - On Fri, Jun 7, 2013 at 8:34 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: We set it up like this + individual solr instances are setup + external mapping/routing to allocate users to instances. This information can be stored in an external data store + all cores are created as transient and loadonstart as false + cores come online on demand + as and when users data get bigger (or hosts are hot)they are migrated between less hit hosts using in built replication Keep in mind we had the schema for all users. Currently there is no way to upload a new schema to solr. On Jun 8, 2013 1:15 AM, Aleksey bitterc...@gmail.com wrote: Aleksey: What would you say is the average core size for your use case - thousands or millions of rows? And how sharded would each of your collections be, if at all? Average core/collection size wouldn't even be thousands, hundreds more like. And the largest would be half a million or so but that's a pathological case. I don't need sharding and queries than fan out to different machines. If fact I'd like to avoid that so I don't have to collate the results. The Wiki page was built not for Cloud Solr. We have done such a deployment where less than a tenth of cores were active at any given point in time. though there were tens of million indices they were split among a large no:of hosts. If you don't insist of Cloud deployment it is possible. I'm not sure if it is possible with cloud By Cloud you mean specifically SolrCloud? I don't have to have it if I can do without it. Bottom line is I want a bunch of small cores to be distributed over a fleet, each core completely fitting on one server. Would you be willing to provide a little more details on your setup? In particular, how are you managing the cores? How do you route requests to proper server? If you scale the fleet up and down, does reshuffling of the cores happen automatically or is it an involved manual process? Thanks, Aleksey -- - Noble Paul
Re: LotsOfCores feature
The Wiki page was built not for Cloud Solr. We have done such a deployment where less than a tenth of cores were active at any given point in time. though there were tens of million indices they were split among a large no:of hosts. If you don't insist of Cloud deployment it is possible. I'm not sure if it is possible with cloud On Fri, Jun 7, 2013 at 12:38 AM, Aleksey bitterc...@gmail.com wrote: I was looking at this wiki and linked issues: http://wiki.apache.org/solr/LotsOfCores they talk about a limit being 100K cores. Is that per server or per entire fleet because zookeeper needs to manage that? I was considering a use case where I have tens of millions of indices but less that a million needs to be active at any time, so they need to be loaded on demand and evicted when not used for a while. Also since number one requirement is efficient loading of course I assume I will store a prebuilt index somewhere so Solr will just download it and strap it in, right? The root issue is marked as won;t fix but some other important subissues are marked as resolved. What's the overall status of the effort? Thank you in advance, Aleksey -- - Noble Paul
Re: SOLR CSV output in custom order
Have you tried explicitly giving the field names (fl) as parameter http://wiki.apache.org/solr/CommonQueryParameters#fl On Thu, Jun 6, 2013 at 12:41 PM, anurag.jain anurag.k...@gmail.com wrote: I want output of csv file in proper order. when I use wt=csv it gives output in random order. Is there any way to get output in proper format. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-CSV-output-in-custom-order-tp4068527.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul
Re: LotsOfCores feature
We set it up like this + individual solr instances are setup + external mapping/routing to allocate users to instances. This information can be stored in an external data store + all cores are created as transient and loadonstart as false + cores come online on demand + as and when users data get bigger (or hosts are hot)they are migrated between less hit hosts using in built replication Keep in mind we had the schema for all users. Currently there is no way to upload a new schema to solr. On Jun 8, 2013 1:15 AM, Aleksey bitterc...@gmail.com wrote: Aleksey: What would you say is the average core size for your use case - thousands or millions of rows? And how sharded would each of your collections be, if at all? Average core/collection size wouldn't even be thousands, hundreds more like. And the largest would be half a million or so but that's a pathological case. I don't need sharding and queries than fan out to different machines. If fact I'd like to avoid that so I don't have to collate the results. The Wiki page was built not for Cloud Solr. We have done such a deployment where less than a tenth of cores were active at any given point in time. though there were tens of million indices they were split among a large no:of hosts. If you don't insist of Cloud deployment it is possible. I'm not sure if it is possible with cloud By Cloud you mean specifically SolrCloud? I don't have to have it if I can do without it. Bottom line is I want a bunch of small cores to be distributed over a fleet, each core completely fitting on one server. Would you be willing to provide a little more details on your setup? In particular, how are you managing the cores? How do you route requests to proper server? If you scale the fleet up and down, does reshuffling of the cores happen automatically or is it an involved manual process? Thanks, Aleksey
Re: Upgrading from SOLR 3.5 to 4.2.1 Results.
Actually , It's pretty high end for most of the users. Rishi, u can post the real h/w details and our typical deployment . No :of cpus per node No :of disks per host Vms per host Gc params No :of cores per instance Noble Paul Sent from phone On 21 May 2013 01:47, Rishi Easwaran rishi.easwa...@aol.com wrote: No, we just upgraded to 4.2.1. With the size of our complex and effort required apply our patches and rollout, our upgrades are not that often. -Original Message- From: Noureddine Bouhlel nouredd...@ecotour.com To: solr-user solr-user@lucene.apache.org Sent: Mon, May 20, 2013 3:36 pm Subject: Re: Upgrading from SOLR 3.5 to 4.2.1 Results. Hi Rishi, Have you done any tests with Solr 4.3 ? Regards, Cordialement, BOUHLEL Noureddine On 17 May 2013 21:29, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, Its Friday 3:00pm, warm sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our mail search backend. We have been using it since pre-SOLR 1.4 and strong supporters of SOLR community. We deal with millions indexes and billions of requests a day across our complex. We finished full rollout of SOLR 4.2.1 into our production last week. Some key highlights: - ~75% Reduction in Search response times - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90% Reduction in errors - Garbage collection total stop reduction by over 50% moving application throughput into the 99.8% - 99.9% range - ~15% reduction in CPU usage We did not tune our application moving from 3.5 to 4.2.1 nor update java. For the most part it was a binary upgrade, with patches for our special use case. Now going forward we are looking at prototyping SOLR Cloud for our search system, upgrade java and tomcat, tune our application further. Lots of fun stuff :) Have a great weekend everyone. Thanks, Rishi.
Re: javabin binary format specification
There is no spec documented anywhere . It is all in this single file https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java On Wed, Jul 25, 2012 at 6:47 PM, Ahmet Arslan iori...@yahoo.com wrote: Sorry, but I could not find any spec on the binary format SolrJ is using. Can you point me to an URL if any? may be this? https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/BinaryResponseWriter.java -- - Noble Paul
Re: @field for child object
no On Mon, Jul 4, 2011 at 3:34 PM, Kiwi de coder kiwio...@gmail.com wrote: hi, i wondering solrj @Field annotation support embedded child object ? e.g. class A { @field string somefield; @emebedded B b; } regards, kiwi -- - Noble Paul
Re: Re; DIH Scheduling
On Thu, Jun 23, 2011 at 9:13 PM, simon mtnes...@gmail.com wrote: The Wiki page describes a design for a scheduler, which has not been committed to Solr yet (I checked). I did see a patch the other day (see https://issues.apache.org/jira/browse/SOLR-2305) but it didn't look well tested. I think that you're basically stuck with something like cron at this time. If your application is written in java, take a look at the Quartz scheduler - http://www.quartz-scheduler.org/ It was considered and decided against. -Simon -- - Noble Paul
Re: Where is LogTransformer log file path??
it will be in the solr logs On Tue, Jun 21, 2011 at 2:18 PM, Alucard alucard...@gmail.com wrote: Hi all. I follow the steps of creating a LogTransformer in DataImportHandler wiki: entity name=office_address dataSource=jdbc pk=office_add_Key transformer=LogTransformer logLevel=debug logTemplate=office_add_Key: ${office_address.office_add_Key}, last_index_time: ${dataimporter.last_index_time} ... /entity The java statement that start Solr: java -Dremarks=solr:8983 -Djava.util.logging.config.file=logging.properties -jar start.jar logging.properties file content # Default global logging level: .level = DEBUG # Write to a file: handlers = java.util.logging.FileHandler # Write log messages in human readable format: java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter # Log to the logs subdirectory, with log files named solrxxx.log java.util.logging.FileHandler.pattern = logs/solr_log-%g.log java.util.logging.FileHandler.append = true java.util.logging.FileHandler.count = 10 java.util.logging.FileHandler.limit = 500 #Roughly 5MB So the log file (solr_log0.log) is there, startup message are properly logged. However, when I do a delta import, the message defined in logTemplate attribute is not logged. I have done some research but cannot find anything related to: LogTransformer file path/log path or so on... So, can anyone please tell me where are those messgae logged? Thank you in advance for any help. Ellery -- - Noble Paul
Re: Need help with DIH dataconfig.xml
Use TemplateTransformer dataConfig dataSource name = wld type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/wld user=root password=pass/ document name=variants entity name=III_1_1 query=SELECT * FROM `wld`.`III_1_1` transformer=TemplateTransformer field column=id template='${III_1_1.id}III_1_1}'/ field column=lemmatitel name=lemma / field column=vraagtekst name=vraagtekst / field column=lexical_variant name=variant / /entity entity name=III_1_2 query=SELECT * FROM `wld`.`III_1_2` field column=id name='${III_1_2_ + id}'/ field column=lemmatitel name=lemma / field column=vraagtekst name=vraagtekst / field column=lexical_variant name=variant / /entity /document /dataConfig On Wed, Jun 15, 2011 at 4:41 PM, MartinS martin.snijd...@gmail.com wrote: Hello, I want to perform a data import from a relational database. That all works well. However, i want to dynamically create a unique id for my solr documents while importing by using my data config file. I cant get it to work, maybe its not possible this way, but i thought i would ask you ll. (I set up schema.xml to use the field id as the unique id for solr documents) My solr config looks like this : dataConfig dataSource name = wld type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/wld user=root password=pass/ document name=variants entity name=III_1_1 query=SELECT * FROM `wld`.`III_1_1` field column=id name='${variants.name + id}'/ field column=lemmatitel name=lemma / field column=vraagtekst name=vraagtekst / field column=lexical_variant name=variant / /entity entity name=III_1_2 query=SELECT * FROM `wld`.`III_1_2` field column=id name='${III_1_2_ + id}'/ field column=lemmatitel name=lemma / field column=vraagtekst name=vraagtekst / field column=lexical_variant name=variant / /entity /document /dataConfig For a unique id I would like the concatenate the primary key of the table (Column id) with the table name. How can I do this ? Both ways as shown in the example data config don't work while importing. Any help is appreciated. Martin -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-DIH-dataconfig-xml-tp3066855p3066855.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul
Re: DIH entity threads
Sub entities can slow down indexing remarkably.What is that datasource? DB? then try using CachedSqlEntityProcessor On Tue, Jun 14, 2011 at 8:31 PM, Mark static.void@gmail.com wrote: Hello all, We are using DIH to index our data (~6M documents) and its taking an extremely long time (~24 hours). I am trying to find ways that we can speed this up. I've been reading through older posts and it's my understanding this should not take that long. One probably bottleneck is that we have a sub entity pulling in item descriptions from a separate datasource which we then strip html from. Before stripping the html we run it through JTidy. Our data-config looks something like this: http://pastie.org/2067011 I've heard about entity threads and I was wondering if this would help in my case? I haven't been able to find any good documentation on this. Another possible bottleneck is the the number of sub entities we have... 5 (only 1 of which is CachedSqlEntityProcessor). Any ideas? Thanks for the help -- - Noble Paul
Re: Throttling replication
There is no way to currently throttle replication. It consumes the whole bandwidth available. It is a nice to have feature On Thu, Sep 2, 2010 at 8:11 PM, Mark static.void@gmail.com wrote: Is there any way or forthcoming patch that would allow configuration of how much network bandwith (and ultimately disk I/O) a slave is allowed during replication? We have the current problem of while replicating our disk I/O goes through the roof. I would much rather have the replication take 2x as long with half the disk I/O? Any thoughts? Thanks -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Dynamic dataConfig files in DIH
On Fri, Jun 11, 2010 at 11:13 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Is there a way to dynamically point which dataConfig file to use to import : using DIH without using the defaults hardcoded in solrconfig.xml? what do you mean by dynamically ? ... it's a query param, so you can specify the file name in the url when you issue the command. not it is not. it is not reloaded for every request. We should enhance dih to do so. But the whole data-config file can be sent as a request param and it works (this is used by the dih debug mode) -Hoss -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr DataConfig / DIH Question
this looks like a common problem. I guess DIH should handle this more gracefully. Instead of firing a query and failing it should not fire a query if any of the values are missing . This can b made configurable if needed On Sun, Jun 13, 2010 at 9:14 AM, Lance Norskog goks...@gmail.com wrote: This is a slow way to do this; databases are capable of doing this join and feeding the results very efficiently. The 'skipDoc' feature allows you to break out of the processing chain after the first query. It is used in the wikipedia example. http://wiki.apache.org/solr/DataImportHandler On Sat, Jun 12, 2010 at 6:37 PM, Holmes, Charles V. chol...@mitre.org wrote: I'm putting together an entity. A simplified version of the database schema is below. There is a 1-[0,1] relationship between Person and Address with address_id being the nullable foreign key. If it makes any difference, I'm using SQL Server 2005 on the backend. Person [id (pk), name, address_id (fk)] Address [id (pk), zipcode] My data config looks like the one below. This naturally fails when the address_id is null since the query ends up being select * from user.address where id = . entity name=person Query=select * from user.person entity name=address Query=select * from user.address where id = ${person.address_id} /entity /entity I've worked around it by using a config like this one. However, this makes the queries quite complex for some of my larger joins. entity name=person Query=select * from user.person entity name=address Query=select * from user.address where id = (select address_id from user.person where id = ${person.id}) /entity /entity Is there a cleaner / better way of handling these type of relationships? I've also tried to specify a default in the Solr schema, but that seems to only work after all the data is indexed which makes sense but surprised me initially. BTW, thanks for the great DIH tutorial on the wiki! Thanks! Charles -- Lance Norskog goks...@gmail.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: TikaEntityProcessor on Solr 1.4?
just copy the dih-extras jar file from the nightly should be fine On Sat, May 22, 2010 at 3:12 AM, Sixten Otto six...@sfko.com wrote: On Fri, May 21, 2010 at 5:30 PM, Chris Harris rygu...@gmail.com wrote: Actually, rather than cherry-pick just the changes from SOLR-1358 and SOLR-1583 what I did was to merge in all DataImportHandler-related changes from between the 1.4 release up through Solr trunk r890679 (inclusive). I'm not sure if that's what would work best for you, but it's one option. I'd rather, of course, not to have to build my own. But if I'm going to dabble in the source at all, it's just a slippery slope from the former to the latter. :-) (My main hesitation in doing so would be that I'm new enough to the code that I have no idea what core changes the trunk's DIH might also depend on. And my Java's pretty rusty.) How did you arrive at your patch? Just grafting the entire trunk/solr/contrib/dataimporthandler onto 1.4's code? Or did you go through Jira/SVN looking for applicable changesets? I'll be very interested to hear how your testing goes! Sixten -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: TikaEntityProcessor on Solr 1.4?
I guess it should work because Tika Entityprocessor does not use any new 1.4 APIs On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote: Sorry to repeat this question, but I realized that it probably belonged in its own thread: The TikaEntityProcessor class that enables DataImportHandler to process business documents was added after the release of Solr 1.4, along with some other changes (like the binary DataSources) to support it. Obviously, there hasn't been an official release of Solr since then. Has anyone tried back-porting those changes to Solr 1.4? (I do see that the question was asked last month, without any response: http://www.lucidimagination.com/search/document/5d2d25bc57c370e9) The patches for these issues don't seem all that complex or pervasive, but it's hard for me (as a Solr n00b) to tell whether this is really all that's involved: https://issues.apache.org/jira/browse/SOLR-1583 https://issues.apache.org/jira/browse/SOLR-1358 Sixten -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Issue with delta import (not finding data in a column)
Are u reusing the context object? It may help if u can paste the relevant part of ur code On 10 May 2010 19:03, ahammad ahmed.ham...@gmail.com wrote: I have a Solr core that retrieves data from an Oracle DB. The DB table has a few columns, one of which is a Blob that represents a PDF document. In order to retrieve the actual content of the PDF file, I wrote a Blob transformer that converts the Blob into the PDF file, and subsequently reads it using PDFBox. The blob is contained in a DB column called DOCUMENT, and the data goes into a Solr field called fileContent, which is required. This works fine when doing full imports, but it fails for delta imports. I debugged my transformer, and it appears that when it attempts to fetch the blob stored in the column, it gets nothing back (i.e. null). Because the data is essentially null, it cannot retrieve anything, and cannot store anything into Solr. As a result, the document does not get imported. I am not sure what the problem is, because this only occurs with delta imports. Here is my data-config file: dataConfig dataSource driver=oracle.jdbc.driver.OracleDriver url=address user=user password=pass/ document name=table1 entity name=TABLE1 pk=ID query=select * from TABLE1 deltaImportQuery=select * from TABLE1 where ID ='${dataimporter.delta.ID}' deltaQuery=select ID from TABLE1 where (LASTMODIFIED to_date('${dataimporter.last_index_time}', '-mm-dd HH24:MI:SS')) transformer=BlobTransformer field column=ID name=id / field column=TITLE name=title / field column=FILENAME name=filename / field column=DOCUMENT name=fileContent blob=true/ field column=LASTMODIFIED name=lastModified / /entity /document /dataConfig Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-delta-import-not-finding-data-in-a-column-tp788993p788993.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom DIH EventListeners
nope. register any event listener and check for the context.currentProcess() to figure out what is the event On Thu, May 6, 2010 at 8:21 AM, Blargy zman...@hotmail.com wrote: I know one can create custom event listeners for update or query events, but is it possible to create one for any DIH event (Full-Import, Delta-Import)? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DIH-EventListeners-tp780517p780517.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Custom DIH variables
you can use the core from this API and use EmbeddedSolrServer (part of solrj) . So the calls will be in-vm On Thu, May 6, 2010 at 6:08 AM, Blargy zman...@hotmail.com wrote: Thanks Noble this is exactly what I was looking for. What is the preferred way to query solr within these sorts of classes? Should I grab the core from the context that is being passed in? Should I be using SolrJ? Can you provide an example and/or provide some tutorials/documentation. Once again, thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p780332.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Custom DIH variables
ok , u can't write a variable. But you may write a function (Evaluator). it will look something like ${dataimporter.functions.foo()} http://wiki.apache.org/solr/DataImportHandler#Custom_formatting_in_query_and_url_using_Functions On Wed, May 5, 2010 at 9:12 PM, Blargy zman...@hotmail.com wrote: Thanks Paul, that will certainly work. I was just hoping there was a way I could write my own class that would inject this value as needed instead of precomputing this value and then passing it along in the params. My specific use case is instead of using dataimporter.last_index_time I want to use something like dataimporter.updated_time_of_last_document. Our DIH is set up to use a bunch of slave databases and there have been problems with some documents getting lost due to replication lag. I would prefer to compute this value using a custom variable at runtime instead of passing it along via the params. Is that even possible? If not Ill have to go with your previous suggestion. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p779278.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Custom DIH variables
you can use custom parameters from request like , ${dataimporter.request.foo}. pass the value of foo as a request param say foo=bar On Wed, May 5, 2010 at 6:05 AM, Blargy zman...@hotmail.com wrote: Can someone please point me in the right direction (classes) on how to create my own custom dih variable that can be used in my data-config.xml So instead of ${dataimporter.last_index_time} I want to be able to create ${dataimporter.foo} Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p777696.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DIH: inner select fails when outter entity is null/empty
do an onError=skip on the inner entity On Fri, Apr 23, 2010 at 3:56 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, Here is a newbie DataImportHandler question: Currently, I have entities with entities. There are some situations where a column value from the outer entity is null, and when I try to use it in the inner entity, the null just gets replaced with an empty string. That in turn causes the SQL query in the inner entity to fail. This seems like a common problem, but I couldn't find any solutions or mention in the FAQ ( http://wiki.apache.org/solr/DataImportHandlerFaq ) What is the best practice to avoid or convert null values to something safer? Would this be done via a Transformer or is there a better mechanism for this? I think the problem I'm describing is similar to what was described here: http://search-lucene.com/m/cjlhtFkG6m ... except I don't have the luxury of rewriting the SQL selects. Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DIH best pratices question
On Sat, Mar 27, 2010 at 3:25 AM, Blargy zman...@hotmail.com wrote: I have a items table on db1 and and item_descriptions table on db2. The items table is very small in the sense that it has small columns while the item_descriptions table has a very large text field column. Both tables are around 7 million rows What is the best way to import these into one document? document entity name=item ... entity name=item_descriptions ... /entity /entity /document this is the right way Or document entity name=item_descriptions rootEntity=false entity name=item ... /entity /entity /document Or is there an alternative way? Maybe using the second way with a CachedSqlEntityProcessor for the item entity? I don't think CachedSqlEntityProcessor helps here. Any thoughts are greatly appreciated. Thanks! -- View this message in context: http://n3.nabble.com/DIH-best-pratices-question-tp677568p677568.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: expungeDeletes on commit in Dataimport
On Thu, Mar 25, 2010 at 10:14 PM, Ruben Chadien ruben.chad...@aspiro.com wrote: Hi I know this has been discussed before, but is there any way do expungeDeletes=true when the DataImportHandler does the commit. expungeDeletes= true is not used does not mean that the doc does not get deleted.deleteDocByQuery does not do a commit. if you wish to commit you should do it explicitly I am using the deleteDocByQuery in a Transformer when doing a delta-import and as discussed before the documents are not deleted until restart. Also, how do i know in a Transformer if its running a Delta or Full Import , i tries looking at Context. currentProcess() but that gives me FULL_DUMP when doing a delta import...? the variable ${dataimporter.request.command} tells you which command is being run Thanks! Ruben Chadien -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: ReplicationHandler reports incorrect replication failures
please create a bug On Fri, Mar 26, 2010 at 7:29 PM, Shawn Smith ssmit...@gmail.com wrote: We're using Solr 1.4 Java replication, which seems to be working nicely. While writing production monitors to check that replication is healthy, I think we've run into a bug in the status reporting of the ../solr/replication?command=details command. (I know it's experimental...) Our monitor parses the replication?command=details XML and checks that replication lag is reasonable by diffing the indexVersion of the master and slave indices to make sure it's within a reasonable time range. Our monitor also compares the first elements of indexReplicatedAtList and replicationFailedAtList lists to see if the last replication attempt failed. This is where we're having a problem with the monitor throwing false errors. It looks like there's a bug that causes successful replications to be considered failures. The bug is triggered immediately after a slave restarts when the slave is already in sync with the master. Each no-op replication attempt after restart is considered a failure until something on the master changes and replication has to actually do work. From the code, it looks like SnapPuller.successfulInstall starts out false on restart. If the slave starts out in sync with the master, then each no-op replication poll leaves successfulInstall set to false which makes SnapPuller.logReplicationTimeAndConfFiles log the poll as a failure. SnapPuller.successfulInstall stays false until the first time replication actually has to do something, at which point it gets set to true, and then everything is OK. Thanks, Shawn -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: XPath Processing Applied to Clob
keep in mind that the xpath is case-sensitive. paste a sample xml what is dataField=d.text it does not seem to refer to anything. where is the enclosing entity? did you mean dataField=doc.text. xpath=//BODY is a supported syntax as long as you are using Solr1.4 or higher On Thu, Mar 18, 2010 at 3:15 AM, Neil Chaudhuri nchaudh...@potomacfusion.com wrote: Incidentally, I tried adding this: datasource name=f type=FieldReaderDataSource / document entity dataSource=f processor=XPathEntityProcessor dataField=d.text forEach=/MESSAGE field column=body xpath=//BODY/ /entity /document But this didn't seem to change anything. Any insight is appreciated. Thanks. From: Neil Chaudhuri Sent: Wednesday, March 17, 2010 3:24 PM To: solr-user@lucene.apache.org Subject: XPath Processing Applied to Clob I am using the DataImportHandler to index 3 fields in a table: an id, a date, and the text of a document. This is an Oracle database, and the document is an XML document stored as Oracle's xmltype data type. Since this is nothing more than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. However, I don't want to index/store all the XML but instead just the XML within a set of tags. The XPath itself is trivial, but it seems like the XPathEntityProcessor only works for XML file content rather than the output of a Transformer. Here is what I currently have that fails: document entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer field column=EFFECTIVE_DT name=effectiveDate / field column=ARCHIVE_ID name=id / field column=TEXT name=text clob=true entity name=text processor=XPathEntityProcessor forEach=/MESSAGE url=${doc.text} field column=body xpath=//BODY/ /entity /entity /document Is there an easy way to do this without writing my own custom transformer? Thanks. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Is it possible to use ODBC with DIH?
if you have a jdbc-odbc bridge driver , it should be fine On Sun, Mar 7, 2010 at 4:52 AM, JavaGuy84 bbar...@gmail.com wrote: Hi, I have a ODBC driver with me for MetaMatrix DB(Redhat). I am trying to figure out a way to use DIH using the DSN which has been created in my machine with that ODBC driver? Is it possible to spcify a DSN in DIH and index the DB? if its possible, can you please let me know the ODBC URL that I need to enter for Datasource in DIH data-config.xml? Thanks, Barani -- View this message in context: http://old.nabble.com/Is-it-possible-to-use-ODBC-with-DIH--tp27808016p27808016.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: If you could have one feature in Solr...
On Fri, Mar 5, 2010 at 4:34 AM, Mark Miller markrmil...@gmail.com wrote: On 03/04/2010 05:56 PM, Chris Hostetter wrote: : The ability to read solr configuration files from the classpath instead of : solr.solr.home directory. Solr has always supported this. When SolrResourceLoader.openResourceLoader is asked to open a resource it first checks if it's an absolute path -- if it's not then it checks relative the conf dir (under whatever the instanceDir is, ie: Solr Home in a single core setup), then it checks relative the current working dir and if it still can't find it it checks via the current ClassLoader. that said: it's not something that a lot of people have ever taken advantage of, so it wouldn't suprise me if some features in Solr are buggy because they try to open files directly w/o utilizing openResourceLoader -- in particular a quick test of the trunk example using... java -Djetty.class.path=./solr/conf -Dsolr.solr.home=/tmp/new-solr-home -jar start.jar ...seems to suggest that QueryElevationComponent isn't using openResource to look for elevate.xml (i set solr.solr.home in that line so solr would *NOT* attempt to look at ./solr ... it does need some sort of Solr Home, but in this case it was a completley empty directory) -Hoss I've been trying to think of ways to tackle this. I hate getConfigDir - it lets anyone just get around the ResourceLoader basically. It would be awesome to get rid of it somehow - it would make ZooKeeperSolrResourceLoader so much easier to get working correctly across the board. Why not just get rid of it? Components depending on filesystems is a big headache. The main thing I'm hung up on is how to update a file - some code I've seen uses getConfigDir to update files eg you get the content of solrconfig, then you want to update it and reload the core. Most other things, I think are doable without getConfigDir. QueryElevationComponent is actually sort of simple to get around - we just need to add an exists method that return true/false if the resource exists. QEC just uses getConfigDir to a do an exists on the elevate.xml - if its not there, it looks in the data dir. -- - Mark http://www.lucidimagination.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: replication issue
The data/index.20100226063400 dir is a temporary dir and isc reated in the same dir where the index dir is located. I'm wondering if the symlink is causing the problem. Why don't you set the data dir as /raid/data instead of /solr/data On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: Hi I am still having issues with the replication and wonder if things are working properly So I have 1 master and 1 slave On the slave, I deleted the data/index directory and data/replication.properties file and restarted solr. When slave is pulling data from master, I can see that the size of data directory is growing r...@slr8:/raid/data# du -sh 3.7M . r...@slr8:/raid/data# du -sh 4.7M . and I can see that data/replication.properties file got created and also a directory data/index.20100226063400 soon after index.20100226063400 disapears and the size of data/index is back to 12K r...@slr8:/raid/data/index# du -sh 12K . And when I look for the number of documents via the admin interface, I still see 0 documents so I feel something is wrong One more thing, I have a symlink for /solr/data --- /raid/data Thank you for your help ! matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Using XSLT with DIH for a URLDataSource
this is the only one place this should be a problem.'xsl' is not a very commonly used attribute On Fri, Feb 26, 2010 at 10:46 AM, Lance Norskog goks...@gmail.com wrote: There could be a common 'open an url' utility method. This would help make the DIH components consistent. 2010/2/24 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: you are right. The StreamSource class is not throwing the proper exception Do we really have to handle this.? On Thu, Feb 25, 2010 at 9:06 AM, Lance Norskog goks...@gmail.com wrote: [Taken off the list] The problem is that the XSLT code swallows the real exception, and does not return it as the deeper exception. To show the right error, the code would open a file name or an URL directly. The problem is, the code has to throw an exception on a file or an URL and try the other, then decide what to do. try { URL u = new URL(xslt); iStream = u.openStream(); } catch (MalformedURLException e) { iStream = new FileInputStream(new File(xslt)); } TransformerFactory transFact = TransformerFactory.newInstance(); xslTransformer = transFact.newTransformer(new StreamSource(iStream)); On Mon, Feb 22, 2010 at 6:24 AM, Roland Villemoes r...@alpha-solutions.dk wrote: You're right! I was as simple (stupid!) as that, Thanks a lot (for your time .. very appreciated) Roland -Oprindelig meddelelse- Fra: noble.p...@gmail.com [mailto:noble.p...@gmail.com] På vegne af Noble Paul ??? ?? Sendt: 22. februar 2010 14:01 Til: solr-user@lucene.apache.org Emne: Re: Using XSLT with DIH for a URLDataSource The xslt file looks fine . is the location of the file correct ? On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes r...@alpha-solutions.dk wrote: Hi (thanks a lot) Yes, The full stacktrace is this: 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Error initializing XSL Processing Document # 1 at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:103) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) Caused by: javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:825) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:614) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:98) ... 24 more 22-02-2010 08:37:00
Re: If you could have one feature in Solr...
On Wed, Feb 24, 2010 at 7:18 PM, Patrick Sauts patrick.via...@gmail.com wrote: Synchronisation between the slaves to switch the new index at the same time after replication. I shall open as issue for this. And let us figure out how best it should be done https://issues.apache.org/jira/browse/SOLR-1800
Re: Using XSLT with DIH for a URLDataSource
you are right. The StreamSource class is not throwing the proper exception Do we really have to handle this.? On Thu, Feb 25, 2010 at 9:06 AM, Lance Norskog goks...@gmail.com wrote: [Taken off the list] The problem is that the XSLT code swallows the real exception, and does not return it as the deeper exception. To show the right error, the code would open a file name or an URL directly. The problem is, the code has to throw an exception on a file or an URL and try the other, then decide what to do. try { URL u = new URL(xslt); iStream = u.openStream(); } catch (MalformedURLException e) { iStream = new FileInputStream(new File(xslt)); } TransformerFactory transFact = TransformerFactory.newInstance(); xslTransformer = transFact.newTransformer(new StreamSource(iStream)); On Mon, Feb 22, 2010 at 6:24 AM, Roland Villemoes r...@alpha-solutions.dk wrote: You're right! I was as simple (stupid!) as that, Thanks a lot (for your time .. very appreciated) Roland -Oprindelig meddelelse- Fra: noble.p...@gmail.com [mailto:noble.p...@gmail.com] På vegne af Noble Paul ??? ?? Sendt: 22. februar 2010 14:01 Til: solr-user@lucene.apache.org Emne: Re: Using XSLT with DIH for a URLDataSource The xslt file looks fine . is the location of the file correct ? On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes r...@alpha-solutions.dk wrote: Hi (thanks a lot) Yes, The full stacktrace is this: 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Error initializing XSL Processing Document # 1 at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:103) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) Caused by: javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:825) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:614) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:98) ... 24 more 22-02-2010 08:37:00 org.apache.solr.update.DirectUpdateHandler2 rollback My import feed (for testing is this): ?xml version='1.0' encoding='utf-8'? products product id='738' rank='10' brand id='48'![CDATA[World's Best]]/brandname![CDATA[Kontakt Cream-Special 4 x 10]]/name categories primarycategory='17' category id='7' name![CDATA[Jeans Bukser]]/name category id='17'
Re: error while using the DIH handler
can you paste the DIH part in your solrconfig.xml ? On Tue, Feb 23, 2010 at 7:01 PM, Na_D nabam...@zaloni.com wrote: yes i did check the location of the data-config.xml its in the folder example-DIH/solr/db/conf -- View this message in context: http://old.nabble.com/error-while-using-the-DIH-handler-tp27702772p2770.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Using XSLT with DIH for a URLDataSource
The xslt file looks fine . is the location of the file correct ? On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes r...@alpha-solutions.dk wrote: Hi (thanks a lot) Yes, The full stacktrace is this: 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Error initializing XSL Processing Document # 1 at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:103) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) Caused by: javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:825) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:614) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:98) ... 24 more 22-02-2010 08:37:00 org.apache.solr.update.DirectUpdateHandler2 rollback My import feed (for testing is this): ?xml version='1.0' encoding='utf-8'? products product id='738' rank='10' brand id='48'![CDATA[World's Best]]/brandname![CDATA[Kontakt Cream-Special 4 x 10]]/name categories primarycategory='17' category id='7' name![CDATA[Jeans Bukser]]/name category id='17' name![CDATA[Jeans]]/name /category /category category id='8' name![CDATA[Nyheder]]/name /category /categories description![CDATA[4 pakker med 10 stk. glatte kondomer, med reservoir og creme.]]/descriptionprice currency='SEK'310.70/pricesalesprice currency='SEK'233.03/salespricecolor id='227'![CDATA[4 x 10 kondomer]]/colorsize id='6'![CDATA[Large]]/sizeproductUrl![CDATA[http://www.website.se/butik/visvare.asp?id=738]]/productUrlimageUrl![CDATA[http://www.website.se/varebilleder/738_intro.jpg]]/imageUrllastmodified11-11-2008 15:10:31/lastmodified/product product id='320' rank='10' categories primarycategory='17' category id='7' name![CDATA[Jeans Bukser]]/name category id='17' name![CDATA[Jeans]]/name /category /category category id='8' name![CDATA[Nyheder]]/name /category /categories brand id='1'![CDATA[JBS]]/brandname![CDATA[JBS trusser]]/namecategory id='39'![CDATA[Trusser]]/categorydescription![CDATA[Gråmeleret JBS trusser model Classic med gylp.]]/descriptionprice currency='SEK'154.96/pricesalesprice currency='SEK'154.96/salespricecolor id='28'![CDATA[Gråmeleret]]/colorsize
Re: replications issue
wha is the problem. Is the replication not happening after you do a commit on the master? frequent polling is not a problem. frequent commits can slow down the system On Fri, Feb 19, 2010 at 2:41 PM, giskard gisk...@autistici.org wrote: Ciao, Uhm after some time a new index in data/index on the slave has been written with the ~size of the master index. the configure on both master slave is the same one on the solrReplication wiki page enable/disable master/slave in a node requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=confFilesschema.xml,stopwords.txt/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://localhost:8983/solr/replication/str str name=pollInterval00:00:60/str /lst /requestHandler When the master is started, pass in -Denable.master=true and in the slave pass in -Denable.slave=true. Alternately , these values can be stored in a solrcore.properties file as follows #solrcore.properties in master enable.master=true enable.slave=false Il giorno 19/feb/2010, alle ore 03.43, Otis Gospodnetic ha scritto: giskard, Is this on the master or on the slave(s)? Maybe you can paste your replication handler config for the master and your replication handler config for the slave. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ From: giskard gisk...@autistici.org To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 12:16:37 PM Subject: replications issue Hi all, I've setup solr replication as described in the wiki. when i start the replication a directory called index.$numebers is created after a while it disappears and a new index.$othernumbers is created index/ remains untouched with an empty index. any clue? thank you in advance, Riccardo -- ciao, giskard -- ciao, giskard -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: @Field annotation support
On Fri, Feb 19, 2010 at 11:41 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Ok then, is this the correct class to support the @Field annotation? Because I have it on the path but its not working. yes , it is the right class. But, what is not working? org\apache\solr\solr-solrj\1.4.0\solr-solrj-1.4.0.jar/org\apache\solr\client\solrj\beans\Field.class 2010/2/18 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: solrj jar On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello All, When I use Maven or Eclipse to try and compile my bean which has the @Field annotation as specified in http://wiki.apache.org/solr/Solrj page ... the compiler doesn't find any class to support the annotation. What jar should we use to bring in this custom Solr annotation? -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: @Field annotation support
solrj jar On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello All, When I use Maven or Eclipse to try and compile my bean which has the @Field annotation as specified in http://wiki.apache.org/solr/Solrj page ... the compiler doesn't find any class to support the annotation. What jar should we use to bring in this custom Solr annotation? -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Preventing mass index delete via DataImportHandler full-import
On Wed, Feb 17, 2010 at 8:03 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I have a small worry though. When I call the full-import functions, can : I configure Solr (via the XML files) to make sure there are rows to : index before wiping everything? What worries me is if, for some unknown : reason, we have an empty database, then the full-import will just wipe : the live index and the search will be broken. I believe if you set clear=false when doing the full-import, DIH won't it is clean=false or use command=import instead of command=full-import delete the entire index before it starts. it probably makes the full-import slower (most of the adds wind up being deletes followed by adds) but it should prevent you from having an empty index if something goes wrong with your DB. the big catch is you now have to be responsible for managing deletes (using the XmlUpdateRequestHandler) yourself ... this bug looks like it's goal is to make this easier to deal with (but i'd not really clear to me what deletedPkQuery is ... it doesnt' seem to be documented. https://issues.apache.org/jira/browse/SOLR-1168 -Hoss -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr 1.4: Full import FileNotFoundException
can we confirm that the user does not have multiple DIH configured? any request for an import, while an import is going on, is rejected On Sat, Feb 13, 2010 at 11:40 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : concurrent imports are not allowed in DIH, unless u setup multiple DIH instances Right, but that's not the issue -- the question is wether attemping to do so might be causing index corruption (either because of a bug or because of some possibly really odd config we currently know nothing about) : : I have noticed that when I run concurrent full-imports using DIH in Solr : : 1.4, the index ends up getting corrupted. I see the following in the log : : I'm fairly confident that concurrent imports won't work -- but it : shouldn't corrupt your index -- even if the DIH didn't actively check for : this type of situation, the underlying Lucene LockFactory should ensure : that one of the inports wins ... you'll need to tell us what kind of : Filesystem you are using, and show us the relevent settings from your : solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, : etc...) : : At worst you should get a lock time out exception. : : : But I looked at: : : http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html : : : : and was under the impression that this issue was fixed in Solr 1.4. : : ...right, attempting to run two concurrent imports with DIH should cause : the second one to abort immediatley. : : : : : -Hoss : : : : : : -- : - : Noble Paul | Systems Architect| AOL | http://aol.com : -Hoss -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr 1.4: Full import FileNotFoundException
concurrent imports are not allowed in DIH, unless u setup multiple DIH instances On Sat, Feb 13, 2010 at 7:05 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I have noticed that when I run concurrent full-imports using DIH in Solr : 1.4, the index ends up getting corrupted. I see the following in the log I'm fairly confident that concurrent imports won't work -- but it shouldn't corrupt your index -- even if the DIH didn't actively check for this type of situation, the underlying Lucene LockFactory should ensure that one of the inports wins ... you'll need to tell us what kind of Filesystem you are using, and show us the relevent settings from your solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, etc...) At worst you should get a lock time out exception. : But I looked at: : http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html : : and was under the impression that this issue was fixed in Solr 1.4. ...right, attempting to run two concurrent imports with DIH should cause the second one to abort immediatley. -Hoss -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DIH: delta-import not working
try this deltaImportQuery=select id, bytes from attachment where application = 'MYAPP' and id = '${dataimporter.delta.id}' be aware that the names are case sensitive . if the id comes as 'ID' this will not work On Tue, Feb 9, 2010 at 3:15 PM, Jorg Heymans jorg.heym...@gmail.com wrote: Hi, I am having problems getting the delta-import to work for my schema. Following what i have found in the list, jira and the wiki below configuration should just work but it doesn't. dataConfig dataSource name=ora driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@. user= password=/ dataSource name=orablob type=FieldStreamDataSource / document name=mydocuments entity dataSource=ora name=attachment pk=id query=select id, bytes from attachment where application = 'MYAPP' deltaImportQuery=select id, bytes from attachment where application = 'MYAPP' and id = '${dataimporter.attachment.id}' deltaQuery=select id from attachment where application = 'MYAPP' and modified_on gt; to_date('${dataimporter.attachment.last_index_time}', '-mm-dd hh24:mi:ss') field column=id name=attachmentId / entity dataSource=orablob processor=TikaEntityProcessor url=bytes dataField=attachment.bytes field column=text name=attachmentContents/ /entity /entity /document /dataConfig The sql generated in the deltaquery is correct, the timestamp is passed correctly. When i execute that query manually in the DB it returns the pk of the rows that were added. However no documents are added to the index. What am i missing here ?? I'm using a build snapshot from 03/02. Thanks Jorg -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Call URL, simply parse the results using SolrJ
you can also try URL urlo = new URL(url);// ensure that the url has wt=javabin in that NamedListObject namedList = new JavaBinCodec().unmarshal(urlo.openConnection().getInputStream()); QueryResponse response = new QueryResponse(namedList, null); On Mon, Feb 8, 2010 at 11:49 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Here's what I did to resolve this: XMLResponseParser parser = new XMLResponseParser(); URL urlo = new URL(url); InputStreamReader isr = new InputStreamReader(urlo.openConnection().getInputStream()); NamedListObject namedList = parser.processResponse(isr); QueryResponse response = new QueryResponse(namedList, null); On Mon, Feb 8, 2010 at 10:03 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: So here's what happens if I pass in a URL with parameters, SolrJ chokes: Exception in thread main java.lang.RuntimeException: Invalid base url for solrj. The base URL must not contain parameters: http://locahost:8080/solr/main/select?q=videoqt=dismax at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:205) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:180) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:152) at org.apache.solr.util.QueryTime.main(QueryTime.java:20) On Mon, Feb 8, 2010 at 9:32 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Sorry for the poorly worded title... For SOLR-1761 I want to pass in a URL and parse the query response... However it's non-obvious to me how to do this using the SolrJ API, hence asking the experts here. :) -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: How to configure multiple data import types
are you referring to nested entities? http://wiki.apache.org/solr/DIHQuickStart#Index_data_from_multiple_tables_into_Solr On Mon, Feb 8, 2010 at 5:42 PM, stefan.ma...@bt.com wrote: I have got a dataimport request handler configured to index data by selecting data from a DB view I now need to index additional data sets from other views so that I can support other search queries I defined additional entity .. definitions within the document .. section of my data-config.xml But I only seem to pull in data for the 1st entity .. and not both Is there an xsd (or dtd) for data-config.xml schema.xml slrconfig.xml As these might help with understanding how to construct usable conf files Regards Stefan Maric BT Innovate Design | Collaboration Platform - Customer Innovation Solutions -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandlerException for custom DIH Transformer
On Mon, Feb 8, 2010 at 9:13 AM, Tommy Chheng tommy.chh...@gmail.com wrote: I'm having trouble making a custom DIH transformer in solr 1.4. I compiled the General TrimTransformer into a jar. (just copy/paste sample code from http://wiki.apache.org/solr/DIHCustomTransformer) I placed the jar along with the dataimporthandler jar in solr/lib (same directory as the jetty jar) do not keep in solr/lib it wont work. keep it in {solr.home}/lib Then I added to my DIH data-config.xml file: transformer=DateFormatTransformer, RegexTransformer, com.chheng.dih.transformers.TrimTransformer Now I get this exception when I try running the import. org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodException: com.chheng.dih.transformers.TrimTransformer.transformRow(java.util.Map) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.loadTransformers(EntityProcessorWrapper.java:120) I noticed the exception lists TrimTransformer.transformRow(java.util.Map) but the abstract Transformer class defines a two parameter method: transformRow(MapString, Object row, Context context)? -- Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource
unfortunately, no On Fri, Feb 5, 2010 at 2:23 PM, Jorg Heymans jorg.heym...@gmail.com wrote: dow, thanks for that Paul :-| I suppose schema validation for data-config.xml is already in Jira somewhere ? Jorg 2010/2/5 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com wrong datasource name=orablob type=FieldStreamDataSource / right dataSource name=orablob type=FieldStreamDataSource / On Thu, Feb 4, 2010 at 9:27 PM, Jorg Heymans jorg.heym...@gmail.com wrote: Hi, I'm having some troubles getting this to work on a snapshot from 3rd feb My config looks as follows dataSource name=ora driver=oracle.jdbc.OracleDriver url= / datasource name=orablob type=FieldStreamDataSource / document name=mydoc entity dataSource=ora name=meta query=select id, filename, bytes from documents field column=ID name=id / field column=FILENAME name=filename / entity dataSource=orablob processor=TikaEntityProcessor url=bytes dataField=meta.BYTES field column=text name=mainDocument/ /entity /entity /document and i get this stacktrace org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: bytes Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98) It seems that whatever is in the url attribute it is trying to execute as a query. So i thought i put url=select bytes from documents where id = ${meta.ID} but then i get a classcastexception. Caused by: java.lang.ClassCastException: org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1 at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:233) Any ideas what is wrong with the config ? Thanks Jorg 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com There is no corresponding DataSurce which can be used with TikaEntityProcessor which reads from BLOB I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737 On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal ns...@columnit.com wrote: Hi, I am fairly new to Solr and would like to use the DIH to pull rich text files (pdfs, etc) from BLOB fields in my database. There was a suggestion made to use the FieldReaderDataSource with the recently commited TikaEntityProcessor. Has anyone accomplished this? This is my configuration, and the resulting error - I'm not sure if I'm using the FieldReaderDataSource correctly. If anyone could shed light on whether I am going the right direction or not, it would be appreciated. ---Data-config.xml: dataConfig datasource name=f1 type=FieldReaderDataSource / dataSource name=orcle driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:un/p...@host:1521:sid / document entity dataSource=orcle name=attach query=select id as name, attachment from testtable2 entity dataSource=f1 processor=TikaEntityProcessor dataField=attach.attachment format=text field column=text name=NAME / /entity /entity /document /dataConfig -Debug error: response lst name=responseHeader int name=status0/int int name=QTime203/int /lst lst name=initArgs lst name=defaults str name=configtestdb-data-config.xml/str /lst /lst str name=commandfull-import/str str name=modedebug/str null name=documents/ lst name=verbose-output lst name=entity:attach lst name=document#1 str name=queryselect id as name, attachment from testtable2/str str name=time-taken0:0:0.32/str str--- row #1-/str str name=NAMEjava.math.BigDecimal:2/str str name=ATTACHMENToracle.sql.BLOB:oracle.sql.b...@1c8e807/str str-/str lst name=entity:253433571801723 str name=EXCEPTION org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource :f1 available for entity :253433571801723 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
Re: DataImportHandler - convertType attribute
On Wed, Feb 3, 2010 at 3:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote: One thing I find awkward about convertType is that it is JdbcDataSource specific, rather than field-specific. Isn't the current implementation far too broad? it is feature of JdbcdataSource and no other dataSource offers it. we offer it because JDBC drivers have mechanism to do type conversion What do you mean by it is too broad? Erik On Feb 3, 2010, at 1:16 AM, Noble Paul നോബിള് नोब्ळ् wrote: implicit conversion can cause problem when Transformers are applied. It is hard for user to guess the type of the field by looking at the schema.xml. In Solr, String is the most commonly used type. if you wish to do numeric operations on a field convertType will cause problems. If it is explicitly set, user knows why the type got changed. On Tue, Feb 2, 2010 at 6:38 PM, Alexey Serba ase...@gmail.com wrote: Hello, I encountered blob indexing problem and found convertType solution in FAQhttp://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5 I was wondering why it is not enabled by default and found the following comment http://www.lucidimagination.com/search/document/169e6cc87dad5e67/dataimporthandler_and_blobs#169e6cc87dad5e67in mailing list: We used to attempt type conversion from the SQL type to the field's given type. We found that it was error prone and switched to using the ResultSet#getObject for all columns (making the old behavior a configurable option – convertType in JdbcDataSource). Why it is error prone? Is it safe enough to enable convertType for all jdbc data sources by default? What are the side effects? Thanks in advance, Alex -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler - convertType attribute
On Wed, Feb 3, 2010 at 4:16 PM, Erik Hatcher erik.hatc...@gmail.com wrote: On Feb 3, 2010, at 5:36 AM, Noble Paul നോബിള് नोब्ळ् wrote: On Wed, Feb 3, 2010 at 3:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote: One thing I find awkward about convertType is that it is JdbcDataSource specific, rather than field-specific. Isn't the current implementation far too broad? it is feature of JdbcdataSource and no other dataSource offers it. we offer it because JDBC drivers have mechanism to do type conversion What do you mean by it is too broad? I mean the convertType flag is not field-specific (or at least field overridable). Conversions occur on a per-field basis, but the setting is for the entire data source and thus all fields. Yes. it is true. First of all this is not very widely used, so fine tuning did not make sense Erik -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: java.lang.NullPointerException with MySQL DataImportHandler
On Thu, Feb 4, 2010 at 10:50 AM, Lance Norskog goks...@gmail.com wrote: I just tested this with a DIH that does not use database input. If the DataImportHandler JDBC code does not support a schema that has optional fields, that is a major weakness. Noble/Shalin, is this true? The problem is obviously not with DIH. DIH blindly passes on all the fields it could obtain from the DB. if some field is missing DIH does not do anything On Tue, Feb 2, 2010 at 8:50 AM, Sascha Szott sz...@zib.de wrote: Hi, since some of the fields used in your DIH configuration aren't mandatory (e.g., keywords and tags are defined as nullable in your db table schema), add a default value to all optional fields in your schema configuration (e.g., default = ). Note, that Solr does not understand the db-related concept of null values. Solr's log output SolrInputDocument[{keywords=keywords(1.0)={Dolce}, name=name(1.0)={Dolce amp; Gabbana Damp;G Neckties designer Tie for men 543}, productID=productID(1.0)={220213}}] indicates that there aren't any tags or descriptions stored for the item with productId 220213. Since no default value is specified, Solr raises an error when creating the index document. -Sascha Jean-Michel Philippon-Nadeau wrote: Hi, Thanks for the reply. On Tue, 2010-02-02 at 16:57 +0100, Sascha Szott wrote: * the output of MySQL's describe command for all tables/views referenced in your DIH configuration mysql describe products; ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | productID | int(10) unsigned | NO | PRI | NULL | auto_increment | | skuCode | varchar(320) | YES | MUL | NULL | | | upcCode | varchar(320) | YES | MUL | NULL | | | name | varchar(320) | NO | | NULL | | | description | text | NO | | NULL | | | keywords | text | YES | | NULL | | | disqusThreadID | varchar(50) | NO | | NULL | | | tags | text | YES | | NULL | | | createdOn | int(10) unsigned | NO | | NULL | | | lastUpdated | int(10) unsigned | NO | | NULL | | | imageURL | varchar(320) | YES | | NULL | | | inStock | tinyint(1) | YES | MUL | 1 | | | active | tinyint(1) | YES | | 1 | | ++--+--+-+-++ 13 rows in set (0.00 sec) mysql describe product_soldby_vendor; +-+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-+---+ | productID | int(10) unsigned | NO | MUL | NULL | | | productVendorID | int(10) unsigned | NO | MUL | NULL | | | price | double | NO | | NULL | | | currency | varchar(5) | NO | | NULL | | | buyURL | varchar(320) | NO | | NULL | | +-+--+--+-+-+---+ 5 rows in set (0.00 sec) mysql describe products_vendors_subcategories; ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | productVendorSubcategoryID | int(10) unsigned | NO | PRI | NULL | auto_increment | | productVendorCategoryID | int(10) unsigned | NO | | NULL | | | labelEnglish | varchar(320) | NO | | NULL | | | labelFrench | varchar(320) | NO | | NULL | | ++--+--+-+-++ 4 rows in set (0.00 sec) mysql describe products_vendors_categories; +-+--+--+-+-++ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-++ | productVendorCategoryID | int(10) unsigned | NO | PRI | NULL | auto_increment | | labelEnglish | varchar(320) | NO | | NULL | | | labelFrench | varchar(320) | NO | | NULL | | +-+--+--+-+-++ 3 rows in set (0.00 sec) mysql describe product_vendor_in_subcategory; +---+--+--+-+-+---+ | Field | Type | Null | Key | Default |
Re: DataImportHandler delta-import confusion
try deltaImportQuery=select [bunch of stuff] WHERE m.moment_id = '${dataimporter.delta.moment_id}' The key has to be same and in the same case On Tue, Feb 2, 2010 at 1:45 AM, Jon Drukman jdruk...@gmail.com wrote: First, let me just say that DataImportHandler is fantastic. It got my old mysql-php-xml index rebuild process down from 30 hours to 6 minutes. I'm trying to use the delta-import functionality now but failing miserably. Here's my entity tag: (some SELECT statements reduced to increase readability) entity name=moment query=select ... deltaQuery=select moment_id from moments where date_modified '${dataimporter.last_index_time}' deltaImportQuery=select [bunch of stuff] WHERE m.moment_id = '${dataimporter.delta.MOMENTID}' pk=MOMENTID transformer=TemplateTransformer When I look at the MySQL query log I see the date modified query running fine and returning 3 rows. The deltaImportQuery, however, does not have the proper primary key in the where clause. It's just blank. I also tried changing it to ${moment.MOMENTID}. I don't really get the relation between the pk field and the ${dataimport.delta.whatever} stuff. Help please! -jsd- -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler delta-import confusion
Please do not hijack a thread. http://people.apache.org/~hossman/#threadhijack On Tue, Feb 2, 2010 at 11:32 PM, Leann Pereira le...@1sourcestaffing.com wrote: Hi Paul, Can you take me off this distribution list? Thanks, Leann From: noble.p...@gmail.com [noble.p...@gmail.com] On Behalf Of Noble Paul നോബിള് नोब्ळ् [noble.p...@corp.aol.com] Sent: Tuesday, February 02, 2010 2:12 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler delta-import confusion try deltaImportQuery=select [bunch of stuff] WHERE m.moment_id = '${dataimporter.delta.moment_id}' The key has to be same and in the same case On Tue, Feb 2, 2010 at 1:45 AM, Jon Drukman jdruk...@gmail.com wrote: First, let me just say that DataImportHandler is fantastic. It got my old mysql-php-xml index rebuild process down from 30 hours to 6 minutes. I'm trying to use the delta-import functionality now but failing miserably. Here's my entity tag: (some SELECT statements reduced to increase readability) entity name=moment query=select ... deltaQuery=select moment_id from moments where date_modified '${dataimporter.last_index_time}' deltaImportQuery=select [bunch of stuff] WHERE m.moment_id = '${dataimporter.delta.MOMENTID}' pk=MOMENTID transformer=TemplateTransformer When I look at the MySQL query log I see the date modified query running fine and returning 3 rows. The deltaImportQuery, however, does not have the proper primary key in the where clause. It's just blank. I also tried changing it to ${moment.MOMENTID}. I don't really get the relation between the pk field and the ${dataimport.delta.whatever} stuff. Help please! -jsd- -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler - convertType attribute
implicit conversion can cause problem when Transformers are applied. It is hard for user to guess the type of the field by looking at the schema.xml. In Solr, String is the most commonly used type. if you wish to do numeric operations on a field convertType will cause problems. If it is explicitly set, user knows why the type got changed. On Tue, Feb 2, 2010 at 6:38 PM, Alexey Serba ase...@gmail.com wrote: Hello, I encountered blob indexing problem and found convertType solution in FAQhttp://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5 I was wondering why it is not enabled by default and found the following comment http://www.lucidimagination.com/search/document/169e6cc87dad5e67/dataimporthandler_and_blobs#169e6cc87dad5e67in mailing list: We used to attempt type conversion from the SQL type to the field's given type. We found that it was error prone and switched to using the ResultSet#getObject for all columns (making the old behavior a configurable option – convertType in JdbcDataSource). Why it is error prone? Is it safe enough to enable convertType for all jdbc data sources by default? What are the side effects? Thanks in advance, Alex -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler problem - reading XML from a file
It clear that the xpaths provided won't fetch anything. because there is no data in those paths. what do you really wish to be indexed ? On Sun, Jan 31, 2010 at 10:30 AM, Lance Norskog goks...@gmail.com wrote: This DataImportHandler script does not find any documents in this HTML file. The DIH definitely opens the file, but the either the xpathprocessor gets no data or it does not recognize the xpaths described. Any hints? (I'm using Solr 1.5-dev, sometime recent.) Thanks! Lance xhtml-data-config.xml: dataConfig dataSource type=FileDataSource encoding=UTF-8 / document entity name=xhtml forEach=/html/head | /html/body processor=XPathEntityProcessor pk=id transformer=TemplateTransformer url=/cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html field column=head_s xpath=/html/head/ field column=body_s xpath=/html/body/ /entity /document /dataConfig Sample data file: cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html ?xml version=1.0 encoding=UTF-8 ? html head meta content=en-US name=DC.language / /head body div id=header a href=ch05-tokenizers-filters-Solr1.4.htmlFirst/a span class=nolinkPrevious/span a href=ch05-tokenizers-filters-Solr1.41.htmlNext/a a href=ch05-tokenizers-filters-Solr1.460.htmlLast/a /div div dir=ltr id=content style=background-color:transparent h1 id=toc0 span class=SectionNumber1/span a id=RefHeading36402771/a a id=bkmRefHeading36402771/a Understanding Analyzers, Tokenizers, and Filters /h1 /div /body /html -- Lance Norskog goks...@gmail.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: replication setup
it is always recommended to paste your actual configuration and startup commands, instead of saying as described in wiki . On Tue, Jan 26, 2010 at 9:52 PM, Matthieu Labour matthieu_lab...@yahoo.com wrote: Hi I have set up replication following the wiki I downloaded the latest apache-solr-1.4 release and exploded it in 2 different directories I modified both solrconfig.xml for the master the slave as described on the wiki page In both sirectory, I started solr from the example directory example on the master: java -Dsolr.solr.home=multicore -Djetty.host=0.0.0.0 -Djetty.port=8983 -DSTOP.PORT=8078 -DSTOP.KEY=stop.now -jar start.jar and on the slave java -Dsolr.solr.home=multicore -Djetty.host=0.0.0.0 -Djetty.port=8982 -DSTOP.PORT=8077 -DSTOP.KEY=stop.now -jar start.jar I can see core0 and core 1 when I open the solr url However, I don't see a replication link and the following url solr url / replication returns a 404 error I must be doing something wrong. I would appreciate any help ! thanks a lot matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: loading an updateProcessorChain with multicore in trunk
I guess . default=true should not be necessary if there is only one updateRequestProcessorChain specified . Open an issue On Fri, Jan 29, 2010 at 6:06 PM, Marc Sturlese marc.sturl...@gmail.com wrote: I am testing trunk and have seen a different behaviour when loading updateProcessors wich I don't know if it's normal (at least with multicore) Before I use to use an updateProcessorChain this way: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processormyChain/str /lst /requestHandler updateRequestProcessorChain name=myChain processor class=org.apache.solr.update.processor.CustomUpdateProcessorFactory / processor class=org.apache.solr.update.processor.LogUpdateProcessorFactory / processor class=org.apache.solr.update.processor.RunUpdateProcessorFactory / /updateRequestProcessorChain It does not work in current trunk. I have debuged the code and I have seen now UpdateProcessorChain is loaded via: public T T initPlugins(ListPluginInfo pluginInfos, MapString, T registry, ClassT type, String defClassName) { T def = null; for (PluginInfo info : pluginInfos) { T o = createInitInstance(info,type, type.getSimpleName(), defClassName); registry.put(info.name, o); if(info.isDefault()){ def = o; } } return def; } As I don't have default=true in the configuration, my custom processorChain is not used. Setting default=true makes it work: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processormyChain/str /lst /requestHandler updateRequestProcessorChain name=myChain default=true processor class=org.apache.solr.update.processor.CustomUpdateProcessorFactory / processor class=org.apache.solr.update.processor.LogUpdateProcessorFactory / processor class=org.apache.solr.update.processor.RunUpdateProcessorFactory / /updateRequestProcessorChain As far as I understand, if you specify the chain you want to use in here: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processormyChain/str /lst /requestHandler Shouldn't be necesary to set it as default. Is it going to be kept this way? Thanks in advance -- View this message in context: http://old.nabble.com/loading-an-updateProcessorChain-with-multicore-in-trunk-tp27371375p27371375.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Help using CachedSqlEntityProcessor
Thanks for pointing this out. The wiki had a problem fro a while and we could not update the documentation. It is updated here http://wiki.apache.org/solr/DataImportHandler#cached On Thu, Jan 28, 2010 at 6:31 PM, KirstyS kirst...@gmail.com wrote: Thanks, I saw that mistake and I have it working now!!! thank you for all your help. Out of interest, is the cacheKey and cacheLookup documented anywhere? Rolf Johansson-2 wrote: It's always a good thing if you can check the debug log (fx catalina.out) or run with debug/verbose to check how Solr runs trough the dataconfig. You've also made a typo in the pk and query, LinkedCatAricleId is missing a t. /Rolf Den 2010-01-28 11.20, skrev KirstyS kirst...@gmail.com: Okay, I changed my entity to look like this (have included my main entity as well): document name=ArticleDocument entity name=article pk=CmsArticleId query=Select * from vArticleSummaryDetail_SolrSearch (nolock) WHERE ArticleStatusId = 1 entity name=LinkedCategory pk=LinkedCatAricleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) processor=CachedSqlEntityProcessor cacheKey=LinkedCatArticleId cacheLookup=article.CmsArticleId /entity /entity /document BUT now the index is taking SO much longer Have I missed any other configurationg changes? Do I need to add anything into the solfconfig.xml file? Do I have my syntax completely wrong? Any help is greatly appreciated!!! -- View this message in context: http://old.nabble.com/Help-using-CachedSqlEntityProcessor-tp27337635p27355501.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr 1.4 Replication index directories
the index.20100127044500/ is a temp directory should have got cleaned up if there was no problem in replication (see the logs if there was a problem) . if there is a problem the temp directory will be used as the new index directory and the old one will no more be used.at any given point only one directory is used for the index. check the replication dashboard to check which one it is. Everything else can be deleted. On Fri, Jan 29, 2010 at 6:03 AM, mark angelillo li...@snooth.com wrote: Thanks, Otis. Responses inline. Hi, We're using the new replication and it's working pretty well. There's one detail I'd like to get some more information about. As the replication works, it creates versions of the index in the data directory. Originally we had index/, but now there are dated versions such as index.20100127044500/, which are the replicated versions. Each copy is sized in the vicinity of 65G. With our current hard drive it's fine to have two around, but 3 gets a little dicey. Sometimes we're finding that the replication doesn't always clean up after itself. I would like to understand this better, or to not have this happen. It could be a configuration issue. Some more specific questions: - Is it safe to remove the index/ directory (that doesn't have the date on it)? I think I tried this once and the whole thing broke, however maybe something else was wrong at the time. No, that's the real, live index, you don't want to remove that one. Yeah... I tried it once and remember things breaking. However nothing in this directory has been modified for over a week (since the last replication initialization). And I'm still sitting on 130GB of data for what is only 65GB on the master - Is there a way to know which one is the current one? (I'm looking at the file index.properties, and it seems to be correct, but sometimes there's a newer version in the directory, which later is removed) I think the index one is always current, no? If not, I imagine the admin replication page will tell you, or even the Statistics page. e.g. reader : SolrIndexReader{this=46a55e,r=readonlysegmentrea...@46a55e,segments=1} readerDir : org.apache.lucene.store.NIOFSDirectory@/mnt/solrhome/cores/foo/data/index reader : SolrIndexReader{this=5c3aef1,r=readonlydirectoryrea...@5c3aef1,refCnt=1,segments=9} readerDir : org.apache.lucene.store.NIOFSDirectory@/home/solr/solr_1.4/solr/data/index.20100127044500 - Could it be that the index does not finish replicating in the poll interval I give it? What happens if, say there's a poll interval X and replicating the index happens to take longer than X sometimes. (Our current poll interval is 45 minutes, and every time I'm watching it it completes in time.) you can keep a very small pollInterval and it is OK. if a replication is going on no new replication will be initiated till the old one completes I think only 1 replication will/should be happening at a time. Whew, that's comforting. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Fastest way to use solrj
how many fields are there in each doc? the binary format just reduces overhead. it does not touch/compress the payload 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I have 3 millon documents, each having 5000 chars. The xml file is about 15GB. The binary file is also about 15GB. I was a bit surprised about this. It doesn't bother me much though. At least it performs better. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: if you write only a few docs you may not observe much difference in size. if you write large no:of docs you may observe a big difference. 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I got the binary format to work perfectly now. Performance is better than with xml. Thanks! Although, it doesn't look like a binary file is smaller in size than an xml file? /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/21 Tim Terlegård tim.terleg...@gmail.com: Yes, it worked! Thank you very much. But do I need to use curl or can I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't use BinaryWriter then I don't know how to do this. if your data is serialized using JavaBinUpdateRequestCodec, you may POST it using curl. If you are writing directly , use CommonsHttpSolrServer /Tim 2010/1/20 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Fastest way to use solrj
The binary format just reduces overhead. in your case , all the data is in the big text field which is not compressed. But overall, the parsing is a lot faster for the binary format. So you see a perf boost 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I have 6 fields. The text field is the biggest, it contains almost all of the 5000 chars. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: how many fields are there in each doc? the binary format just reduces overhead. it does not touch/compress the payload 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I have 3 millon documents, each having 5000 chars. The xml file is about 15GB. The binary file is also about 15GB. I was a bit surprised about this. It doesn't bother me much though. At least it performs better. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: if you write only a few docs you may not observe much difference in size. if you write large no:of docs you may observe a big difference. 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I got the binary format to work perfectly now. Performance is better than with xml. Thanks! Although, it doesn't look like a binary file is smaller in size than an xml file? /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/21 Tim Terlegård tim.terleg...@gmail.com: Yes, it worked! Thank you very much. But do I need to use curl or can I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't use BinaryWriter then I don't know how to do this. if your data is serialized using JavaBinUpdateRequestCodec, you may POST it using curl. If you are writing directly , use CommonsHttpSolrServer /Tim 2010/1/20 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Help using CachedSqlEntityProcessor
cacheKey and cacheLookup are required attributes . On Thu, Jan 28, 2010 at 12:51 AM, KirstyS kirst...@gmail.com wrote: Thanks. I am on 1.4..so maybe that is the problem. Will try when I get back to work tomorrow. Thanks Rolf Johansson-2 wrote: I recently had issues with CachedSqlEntityProcessor too, figuring out how to use the syntax. After a while, I managed to get it working with cacheKey and cacheLookup. I think this is 1.4 specific though. It seems you have double WHERE clauses, one in the query and one in the where attribute. Try using cacheKey and cacheLookup instead in something like this: entity name=LinkedCategory pk=LinkedCatArticleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) processor=CachedSqlEntityProcessor cacheKey=LINKEDCATARTICLEID cacheLookup=article.CMSARTICLEID deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}' parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity /Rolf Den 2010-01-27 12.36, skrev KirstyS kirst...@gmail.com: Hi, I have looked on the wiki. Using the CachedSqlEntityProcessor looks like it was simple. But I am getting no speed benefit and am not sure if I have even got the syntax correct. I have a main root entity called 'article'. And then I have a number of sub entities. One such entity is as such : entity name=LinkedCategory pk=LinkedCatAricleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') processor=CachedSqlEntityProcessor WHERE=LinkedCatArticleId = article.CmsArticleId deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') AND (convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}') parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity As you can see I have added (for the main query - not worrying about the delta queries yet!!) the processor and the 'where' but not sure if it's correct. Can anyone point me in the right direction??? Thanks Kirsty -- View this message in context: http://old.nabble.com/Help-using-CachedSqlEntityProcessor-tp27337635p27345412.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource
There is no corresponding DataSurce which can be used with TikaEntityProcessor which reads from BLOB I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737 On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal ns...@columnit.com wrote: Hi, I am fairly new to Solr and would like to use the DIH to pull rich text files (pdfs, etc) from BLOB fields in my database. There was a suggestion made to use the FieldReaderDataSource with the recently commited TikaEntityProcessor. Has anyone accomplished this? This is my configuration, and the resulting error - I'm not sure if I'm using the FieldReaderDataSource correctly. If anyone could shed light on whether I am going the right direction or not, it would be appreciated. ---Data-config.xml: dataConfig datasource name=f1 type=FieldReaderDataSource / dataSource name=orcle driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:un/p...@host:1521:sid / document entity dataSource=orcle name=attach query=select id as name, attachment from testtable2 entity dataSource=f1 processor=TikaEntityProcessor dataField=attach.attachment format=text field column=text name=NAME / /entity /entity /document /dataConfig -Debug error: response lst name=responseHeader int name=status0/int int name=QTime203/int /lst lst name=initArgs lst name=defaults str name=configtestdb-data-config.xml/str /lst /lst str name=commandfull-import/str str name=modedebug/str null name=documents/ lst name=verbose-output lst name=entity:attach lst name=document#1 str name=queryselect id as name, attachment from testtable2/str str name=time-taken0:0:0.32/str str--- row #1-/str str name=NAMEjava.math.BigDecimal:2/str str name=ATTACHMENToracle.sql.BLOB:oracle.sql.b...@1c8e807/str str-/str lst name=entity:253433571801723 str name=EXCEPTION org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource :f1 available for entity :253433571801723 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da taImporter.java:279) at org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl .java:93) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit yProcessor.java:97) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity ProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:357) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java :242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 0) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte r.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java :389) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D ataImportHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan dler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2 16) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler Collection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav a:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne ction.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
Re: Fastest way to use solrj
if you write only a few docs you may not observe much difference in size. if you write large no:of docs you may observe a big difference. 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I got the binary format to work perfectly now. Performance is better than with xml. Thanks! Although, it doesn't look like a binary file is smaller in size than an xml file? /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/21 Tim Terlegård tim.terleg...@gmail.com: Yes, it worked! Thank you very much. But do I need to use curl or can I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't use BinaryWriter then I don't know how to do this. if your data is serialized using JavaBinUpdateRequestCodec, you may POST it using curl. If you are writing directly , use CommonsHttpSolrServer /Tim 2010/1/20 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Replication Handler Severe Error: Unable to move index file
On Fri, Jan 22, 2010 at 4:24 AM, Trey solrt...@gmail.com wrote: Unfortunately, when I went back to look at the logs this morning, the log file had been blown away... that puts a major damper on my debugging capabilities - so sorry about that. As a double whammy, we optimize nightly, so the old index files have completely changed at this point. I do not remember seeing an exception / stack trace in the logs associated with the SEVERE *Unable to move file* entry, but we were grepping the logs, so if it was outputted onto another line it could have possibly been there. I wouldn't really expect to see anything based upon the code in SnapPuller.java: /** * Copy a file by the File#renameTo() method. If it fails, it is considered a failure * p/ * Todo may be we should try a simple copy if it fails */ private boolean copyAFile(File tmpIdxDir, File indexDir, String fname, ListString copiedfiles) { File indexFileInTmpDir = new File(tmpIdxDir, fname); File indexFileInIndex = new File(indexDir, fname); boolean success = indexFileInTmpDir.renameTo(indexFileInIndex); if (!success) { LOG.error(Unable to move index file from: + indexFileInTmpDir + to: + indexFileInIndex); for (String f : copiedfiles) { File indexFile = new File(indexDir, f); if (indexFile.exists()) indexFile.delete(); } delTree(tmpIdxDir); return false; } return true; } In terms of whether this is an off case: this is the first occurrence of this I have seen in the logs. We tried to replicate the conditions under which the exception occurred, but were unable. I'll send along some more useful info if this happens again. In terms of the behavior we saw: It appears that a replication occurred and the Unable to move file error occurred. As a result, it looks like the ENTIRE index was subsequently replicated again into a temporary directory (several times, over and over). The end result was that we had multiple full copies of the index in temporary index folders on the slave, and the original still couldn't be updated (the move to ./index wouldn't work). Does Solr ever hold files open in a manner that would prevent a file in the index directory from being overridden? There is a TODO which says manual it try to copy if move (renameTo) fails. We never did it because we never observed renameTo failing. 2010/1/21 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com is it a one off case? do you observerve this frequently? On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: It's hard to tell without poking around, but one of the first things I'd do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm - does this file/dir really exist? Or, rather, did it exist when the error happened. I'm not looking at the source code now, but is that really the only error you got? No exception stack trace? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Trey solrt...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, January 20, 2010 11:54:43 PM Subject: Replication Handler Severe Error: Unable to move index file Does anyone know what would cause the following error?: 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile SEVERE: *Unable to move index file* from: /home/solr/cores/core8/index.20100119103919/_6qv.fnm to: /home/solr/cores/core8/index/_6qv.fnm This occurred a few days back and we noticed that several full copies of the index were subsequently pulled from the master to the slave, effectively evicting our live index from RAM (the linux os cache), and killing our query performance due to disk io contention. Has anyone experienced this behavior recently? I found an old thread about this error from early 2009, but it looks like it was patched almost a year ago: http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html Additional Relevant information: -We are using the Solr 1.4 official release + a field collapsing patch from mid December (which I believe should only affect query side, not indexing / replication). -Our Replication PollInterval for slaves checking the master is very small (15 seconds) -We have a multi-box distributed search with each box possessing multiple cores -We issue a manual (rolling) optimize across the cores on the master once a day (occurred ~ 1-2 hours before the above timeline) -maxWarmingSearchers is set to 1. -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Replication Handler Severe Error: Unable to move index file
is it a one off case? do you observerve this frequently? On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: It's hard to tell without poking around, but one of the first things I'd do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm - does this file/dir really exist? Or, rather, did it exist when the error happened. I'm not looking at the source code now, but is that really the only error you got? No exception stack trace? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Trey solrt...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, January 20, 2010 11:54:43 PM Subject: Replication Handler Severe Error: Unable to move index file Does anyone know what would cause the following error?: 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile SEVERE: *Unable to move index file* from: /home/solr/cores/core8/index.20100119103919/_6qv.fnm to: /home/solr/cores/core8/index/_6qv.fnm This occurred a few days back and we noticed that several full copies of the index were subsequently pulled from the master to the slave, effectively evicting our live index from RAM (the linux os cache), and killing our query performance due to disk io contention. Has anyone experienced this behavior recently? I found an old thread about this error from early 2009, but it looks like it was patched almost a year ago: http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html Additional Relevant information: -We are using the Solr 1.4 official release + a field collapsing patch from mid December (which I believe should only affect query side, not indexing / replication). -Our Replication PollInterval for slaves checking the master is very small (15 seconds) -We have a multi-box distributed search with each box possessing multiple cores -We issue a manual (rolling) optimize across the cores on the master once a day (occurred ~ 1-2 hours before the above timeline) -maxWarmingSearchers is set to 1. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Fastest way to use solrj
2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Fastest way to use solrj
2010/1/19 Tim Terlegård tim.terleg...@gmail.com: There are a few ways to use solrj. I just learned that I can use the javabin format to get some performance gain. But when I try the binary format nothing is added to the index. This is how I try to use this: server = new CommonsHttpSolrServer(http://localhost:8983/solr;) server.setRequestWriter(new BinaryRequestWriter()) request = new UpdateRequest() request.setAction(UpdateRequest.ACTION.COMMIT, true, true); request.setParam(stream.file, /tmp/data.bin) request.process(server) Should this work? Could there be something wrong with the file? I haven't found a good reference for how to create a javabin file, but by reading the source code I came up with this (groovy code): BinaryRequestWriter does not read from a file and post it fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) I haven't found any examples of using stream.file like this with a binary file. Is it supported? Is it better/faster to use StreamingUpdateSolrServer and send everything over HTTP instead? Would code for that look something like this? while (moreDocs) { xmlDoc = readDocFromFileUsingSaxParser() doc = new SolrInputDocument() doc.addField(id, 9-0) doc.addField(text, Some text) server.add(doc) } To me it instinctively looks as if stream.file would be faster because it doesn't have to use HTTP and it doesn't have to create a bunch of SolrInputDocument objects. /Tim -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DIH delta import - last modified date
While invoking the delta-import you may, pass the value as a request parameter. That value can be used in the query as ${dih.request.xyz} where as xyz is the request parameter name On Wed, Jan 20, 2010 at 1:15 AM, Yao Ge yao...@gmail.com wrote: I am struggling with the concept of delta import in DIH. According the to documentation, the delta import will automatically record the last index time stamp and make it available to use for the delta query. However in many case when the last_modified date time stamp in the database lag behind the current time, the last index time stamp is the not good for delta query. Can I pick a different mechanism to generate last_index_time by using time stamp computed from the database (such as from a column of the database)? -- View this message in context: http://old.nabble.com/DIH-delta-import---last-modified-date-tp27231449p27231449.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: NullPointerException in ReplicationHandler.postCommit + question about compression
When you copy paste config from wiki, just copy what you need. excluding documentation and comments On Wed, Jan 13, 2010 at 12:51 AM, Stephen Weiss swe...@stylesight.com wrote: Hi Solr List, We're trying to set up java-based replication with Solr 1.4 (dist tarball). We are running this to start with on a pair of test servers just to see how things go. There's one major problem we can't seem to get past. When we replicate manually (via the admin page) things seem to go well. However, when replication is triggered by a commit event on the master, the master gets a NullPointerException and no replication seems to take place. SEVERE: java.lang.NullPointerException at org.apache.solr.handler.ReplicationHandler$4.postCommit(ReplicationHandler.java:922) at org.apache.solr.update.UpdateHandler.callPostCommitCallbacks(UpdateHandler.java:78) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:411) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:169) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:741) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:213) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) This is the master config: requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string-- str name=replicateAftercommit/str !--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string. Note that this is just for backup, replication does not require this. -- !-- str name=backupAfteroptimize/str -- !--If configuration files need to be replicated give the names here, separated by comma -- str name=confFilessolrconfig_slave.xml:solrconfig.xml,schema.xml,synonyms.txt,stopwords.txt,elevate.xml/str !--The default value of reservation is 10 secs.See the documentation below. Normally , you should not need to specify this -- str name=commitReserveDuration00:00:10/str /lst /requestHandler and... the slave config: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master . It is possible to pass on this as a request param for the fetchindex command-- str name=masterUrlhttp://hostname.obscured.com:8080/solr/calendar_core/replication/str !--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API -- str name=pollInterval00:00:20/str !-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED-- !--to use compression while transferring the index files. The possible values are internal|external
Re: Data Full Import Error
You need more memory to run dataimport. On Tue, Jan 12, 2010 at 4:46 PM, Lee Smith l...@weblee.co.uk wrote: Hi All I am trying to do a data import but I am getting the following error. INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=405 2010-01-12 03:08:08.576::WARN: Error for /solr/dataimport java.lang.OutOfMemoryError: Java heap space Jan 12, 2010 3:08:05 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.lang.OutOfMemoryError: Java heap space Exception in thread btpool0-2 java.lang.OutOfMemoryError: Java heap space Jan 12, 2010 3:08:14 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jan 12, 2010 3:08:21 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback Jan 12, 2010 3:08:23 AM org.apache.solr.update.SolrIndexWriter finalize SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! This is OK. don't bother Any ideas what this can be ?? Hope you can help. Lee -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Data Full Import Error
it is the way you start your solr server( -Xmx option) On Tue, Jan 12, 2010 at 6:00 PM, Lee Smith l...@weblee.co.uk wrote: Thank you for your response. Will I just need to adjust the allowed memory in a config file or is this a server issue. ? Sorry I know nothing about Java. Hope you can advise ! On 12 Jan 2010, at 12:26, Noble Paul നോബിള് नोब्ळ् wrote: You need more memory to run dataimport. On Tue, Jan 12, 2010 at 4:46 PM, Lee Smith l...@weblee.co.uk wrote: Hi All I am trying to do a data import but I am getting the following error. INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=405 2010-01-12 03:08:08.576::WARN: Error for /solr/dataimport java.lang.OutOfMemoryError: Java heap space Jan 12, 2010 3:08:05 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.lang.OutOfMemoryError: Java heap space Exception in thread btpool0-2 java.lang.OutOfMemoryError: Java heap space Jan 12, 2010 3:08:14 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jan 12, 2010 3:08:21 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback Jan 12, 2010 3:08:23 AM org.apache.solr.update.SolrIndexWriter finalize SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! This is OK. don't bother Any ideas what this can be ?? Hope you can help. Lee -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler - synchronous execution
it can be added On Tue, Jan 12, 2010 at 10:18 PM, Alexey Serba ase...@gmail.com wrote: Hi, I found that there's no explicit option to run DataImportHandler in a synchronous mode. I need that option to run DIH from SolrJ ( EmbeddedSolrServer ) in the same thread. Currently I pass dummy stream to DIH as a workaround for this, but I think it makes sense to add specific option for that. Any objections? Alex -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Synonyms from Database
On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Ravi, I think if your synonyms were in a DB, it would be trivial to periodically dump them into a text file Solr expects. You wouldn't want to hit the DB to look up synonyms at query time... Why query time. Can it not be done at startup time ? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Ravi Gidwani ravi.gidw...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 9, 2010 10:20:18 PM Subject: Synonyms from Database Hi : Is there any work done in providing synonyms from a database instead of synonyms.txt file ? Idea is to have a dictionary in DB that can be enhanced on the fly in the application. This can then be used at query time to check for synonyms. I know I am not putting thoughts to the performance implications of this approach, but will love to hear about others thoughts. ~Ravi. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: replication -- missing field data file
actually it does not. BTW, FYI, backup is just to take periodics backups not necessary for the Replicationhandler to work On Thu, Jan 7, 2010 at 2:37 AM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: How can you tell when the backup is done? -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Wednesday, January 06, 2010 12:23 PM To: solr-user Subject: Re: replication -- missing field data file the index dir is in the name index others will be stored as indexdate-as-number On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: How can you differentiate between the backup and the normal index files? -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Wednesday, January 06, 2010 11:52 AM To: solr-user Subject: Re: replication -- missing field data file On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. The backup is done asynchronously. So it always gives an OK response immedietly. The backup is created in the data dir itself Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at
Re: readOnly=true IndexReader
On Wed, Jan 6, 2010 at 4:26 PM, Patrick Sauts patrick.via...@gmail.com wrote: In the Wiki page : http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, I've found -Open the IndexReader with readOnly=true. This makes a big difference when multiple threads are sharing the same reader, as it removes certain sources of thread contention. How to open the IndexReader with readOnly=true ? I can't find anything related to this parameter. Do the VJM parameters -Dslave=disabled or -Dmaster=disabled have any incidence on solr with a standart solrConfig.xml? these are not variables used by Solr. These are just substituted in solrconfig.xml and probably consumed by ReplicationHandler (this is not a standard) Thank you for your answers. Patrick. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: replication -- missing field data file
On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. The backup is done asynchronously. So it always gives an OK response immedietly. The backup is created in the data dir itself Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65) at org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599) at org.apache.lucene.index.DirectoryReader.lt;initgt;(DirectoryReader.java:103) at org.apache.lucene.index.ReadOnlyDirectoryReader.lt;initgt;(ReadOnlyDirectoryReader.java:27) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:68) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at
Re: replication -- missing field data file
the index dir is in the name index others will be stored as indexdate-as-number On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: How can you differentiate between the backup and the normal index files? -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Wednesday, January 06, 2010 11:52 AM To: solr-user Subject: Re: replication -- missing field data file On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. The backup is done asynchronously. So it always gives an OK response immedietly. The backup is created in the data dir itself Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65) at org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599) at
Re: replicating extension JARs
jars are not replicated. It is by design. But that is not to say that we can't do it. open an issue . On Wed, Jan 6, 2010 at 6:20 AM, Ryan Kennedy rcken...@gmail.com wrote: Will the built-in Solr replication replicate extension JAR files in the lib directory? The documentation appears to indicate that only the index and any specified configuration files will be replicated, however if your solrconfig.xml references a class in a JAR file added to the lib directory then you'll need that replicated as well (otherwise the slave will encounter ClassDefNotFound exceptions). I'm wondering if I'm missing something and Solr replication will do that or if it's a deficiency in Solr's replication. Ryan -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr Replication Questions
On Wed, Jan 6, 2010 at 2:51 AM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: http://wiki.apache.org/solr/SolrReplication I've been looking over this replication wiki and I'm still unclear on a two points about Solr Replication: 1. If there have been small changes to the index on the master, does the slave copy the entire contents of the index files that were affected? only the delta is copied. a. Let's say I add one document to the master. Presumably that causes changes to the position file, amidst a few others. Does the slave download the entire position file? Or just the portion that was changed? Lucene never modifies a file which was written by previous commits. So if you add a new document and commit , it is written to new files. Solr replication will only replicate those new files 2. If you have a multi-core slave, is it possible to share one configuration file (i.e. one instance directory) amidst the multiple cores, and yet each core poll a different master? a. Can you set the masterUrl for each core separately in the server.xml? Thanks for your help, Gio. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: serialize SolrInputDocument to java.io.File and back again?
what serialization would you wish to use? you can use java serialization or solrj helps you serialize it as xml or javabin format (org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec) On Thu, Dec 31, 2009 at 6:55 AM, Phillip Rhodes rhodebumpl...@gmail.com wrote: I want to store a SolrInputDocument to the filesystem until it can be sent to the solr server via the solrj client. I will be using a quartz job to periodically query a table that contains a listing of SolrInputDocuments stored as java.io.File that need to be processed. Thanks for your time. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: fl parameter and dynamic fields
if you wish to search on fields using wild-card you have to use a copyField to copy all the values of Bool_* to another field and search on that field. On Tue, Dec 29, 2009 at 4:14 AM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: I use dynamic fields heavily in my SOLR config. I would like to be able to specify which fields should be returned from a query based on a pattern for the field name. For instance, given: dynamicField name=Bool_* type=boolean indexed=true stored=true / I might be able to construct a query like: http://localhost:8080/solr/select?q=Bool_*:truerows=10 Is there something like this in SOLR? Thanks, Tim Harsch -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Problem with simple use of DIH
did you run it w/o the debug? On Sun, Dec 27, 2009 at 6:31 PM, AHMET ARSLAN iori...@yahoo.com wrote: I'm trying to use DataImportHandler to load my index and having some strange results. I have two tables in my database. DPRODUC contains products and FSKUMAS contains the skus related to each product. This is the data-config I'm using. dataConfig dataSource type=JdbcDataSource driver=com.ibm.as400.access.AS400JDBCDriver url=jdbc:as400:IWAVE;prompt=false;naming=system user=IPGUI password=IPGUI/ document entity name=dproduc query=select dprprd, dprdes from dproduc where dprprd like 'F%' field column=dprprd name=id / field column=dprdes name=name / entity name=fskumas query=select fsksku, fcoclr, fszsiz, fskret from fskumas where dprprd='${dproduc.DPRPRD}' field column=fsksku name=sku / field column=fcoclr name=color / field column=fszsiz name=size / field column=fskret name=price / /entity /entity /document /dataConfig What is the primary key of dproduc table? If it is dprprd can you try adding pk=dprprd to entity name=dproduc? entity name=dproduc pk=dprprd query=select dprprd, dprdes from dproduc where dprprd like 'F%' -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Problem with simple use of DIH
The field names are case sensitive. But if the field tags are missing they are mapped to corresponding solr fields in a case insensistive way.apparently all the fields come out of you ALL CAPS you should put the 'column' values in ALL CAPS too On Sun, Dec 27, 2009 at 9:03 PM, Jay Fisher jay.l.fis...@gmail.com wrote: I did run it without debug and the result was that 0 documents were processed. The problem seems to be with the field tags that I was using to map from the table column names to the schema.xml field names. I switched to using an AS clause in the SQL statement instead and it worked. I think the column names may be case-sensitive, although I haven't proven that to be the case. I did discover that references to column names in the velocity template are case sensitive; ${dproduc.DPRPRD} works and ${dproduc.dprprd} does not. Thanks, Jay 2009/12/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com did you run it w/o the debug? On Sun, Dec 27, 2009 at 6:31 PM, AHMET ARSLAN iori...@yahoo.com wrote: I'm trying to use DataImportHandler to load my index and having some strange results. I have two tables in my database. DPRODUC contains products and FSKUMAS contains the skus related to each product. This is the data-config I'm using. dataConfig dataSource type=JdbcDataSource driver=com.ibm.as400.access.AS400JDBCDriver url=jdbc:as400:IWAVE;prompt=false;naming=system user=IPGUI password=IPGUI/ document entity name=dproduc query=select dprprd, dprdes from dproduc where dprprd like 'F%' field column=dprprd name=id / field column=dprdes name=name / entity name=fskumas query=select fsksku, fcoclr, fszsiz, fskret from fskumas where dprprd='${dproduc.DPRPRD}' field column=fsksku name=sku / field column=fcoclr name=color / field column=fszsiz name=size / field column=fskret name=price / /entity /entity /document /dataConfig What is the primary key of dproduc table? If it is dprprd can you try adding pk=dprprd to entity name=dproduc? entity name=dproduc pk=dprprd query=select dprprd, dprdes from dproduc where dprprd like 'F%' -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: suggestions for DIH batchSize
A bigger batchSize results in increased memory usage. I guess performance should be slightly better with bigger values but not verified. On Wed, Dec 23, 2009 at 2:51 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, it looks like from looking at the code the default is 500, is the recommended setting for this? Has anyone notice any significant performance/memory tradeoffs by making this much bigger? thanks Joel -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Documents are indexed but not searchable
just search for *:* and see if the docs are indeed there in the index. --Noble On Mon, Dec 21, 2009 at 9:26 AM, krosan kro...@gmail.com wrote: Hi, I'm trying to test solr for a proof of concept project, but I'm having some problems. I indexed my document, but when I search for a word which is 100% certain in the document, I don't get any hits. These are my files: First: my data-config.xml dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://host.com:3306/crossfire3 user=user password=pass batchSize=1/ document entity name=users query=select username, password, email from users field column=username name=username / field column=password name=password / field column=email name=email / /entity /document /dataConfig Now, I have used this in the debugger, and with commit on, and verbose on, I get this reply: http://pastebin.com/m7a460711 This clearly states that those 2 rows have been processed and are now in the index. However, when I try to do a search with the http parameters, I get this response: For the hyperlink http://localhost:8080/solr/select?q=username:krosandebugQuery=on this is the response: http://pastebin.com/m7bb1dcaa I'm clueless on what the problem could be! These are my two config files: schema.xml: http://pastebin.com/m1fd1da58 solrconfig.xml: http://pastebin.com/m44b73d83 (look for krosan in the documents to see what I've added to the standard docs) Any help will be greatly appreciated! Thanks in advance, Andreas Evers -- View this message in context: http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26868925.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Is DataImportHandler ThreadSafe???
On Sat, Dec 19, 2009 at 2:16 PM, gurudev suyalprav...@yahoo.com wrote: Hi, Just wanted to know, Is the DataImportHandler available in solr1.3 thread-safe?. I would like to use multiple instances of data import handler running concurrently and posting my various set of data from DB to Index. Can I do this by registering the DIH multiple times with various names in solrconfig.xml and then invoking all concurrently to achieve maximum throughput? Would i need to define different data-config.xml's dataimport.properties for each DIH? yes , this should work. it is thread-safe If it would be possible to specify the query in data-config.xml to restrict one DIH from overlapping the data-set fetched by another DIH through some SQL clauses? -- View this message in context: http://old.nabble.com/Is-DataImportHandler-ThreadSafetp26853521p26853521.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: shards parameter
yes. put it under the defaults section in your standard requesthandler. On Thu, Dec 17, 2009 at 5:22 PM, pcurila p...@eea.sk wrote: Hello, is there any way to configure shards parameter in solrconfig.xml? So I do not need provide it in the url. Thanks Peter -- View this message in context: http://old.nabble.com/shards-parameter-tp26826908p26826908.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: solr core size on disk
look at the index dir and see the size of the files . it is typically in $SOLR_HOME/data/index On Thu, Dec 17, 2009 at 2:56 AM, Matthieu Labour matth...@kikin.com wrote: Hi I am new to solr. Here is my question: How to find out the size of a solr core on disk ? Thank you matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: question regarding dynamic fields
use a copyField to copy those fields to another field and search on that On Mon, Dec 14, 2009 at 1:00 PM, Phanindra Reva reva.phanin...@gmail.com wrote: Hello.., I have observed that the text or keywords which are being indexed using dynamicField concept are being searchable only when we mention field name too while querying.Am I wrong with my observation or is it the default and can not be changed? I am just wondering if there is any route to search the text indexed using dynamicFields with out having to mention the field name in the query. Thanks. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Request Assistance with DIH
On Sat, Dec 12, 2009 at 6:15 AM, Robbin rob...@drivesajeep.com wrote: I've been trying to use the DIH with oracle and would love it if someone could give me some pointers. I put the ojdbc14.jar in both the Tomcat lib and solr home/lib. I created a dataimport.xml and enabled it in the solrconfig.xml. I go to the http://solr server/solr/admin/dataimport.jsp. This all seems to be fine, but I get the default page response and doesn't look like the connection to the oracle server is even attempted. Did you trigger an import? what is the message on the we page and what do the logs say? I'm using the Solr 1.4 release on Nov 10. Do I need an oracle client on the server? I thought having the ojdbc jar should be sufficient. Any help or configuration examples for setting this up would be much appreciated. You need all the jars you would normally use to connect to Oracle. Thanks Robbin -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: apache-solr-common.jar
there is no solrcommon jar anymore. you may use the solrj jar which contains all the classes which were there in the comon jar. On Mon, Dec 14, 2009 at 9:22 PM, gudumba l gudumba.sm...@gmail.com wrote: Hello All, I have been using apache-solr-common-1.3.0.jar in my module. I am planning to shift to the latest version, because of course it has more flexibility. But it is really strange that I dont find any corresponding jar of the latest version. I have serached in total apachae solr 1.4 folder (which is downloaded from site), but have not found any. , I am sorry, its really silly to request for a jar, but have no option. Thanks. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Custom Field sample?
how exactly do you wish to query these documents? On Fri, Dec 11, 2009 at 4:35 PM, Antonio Zippo reven...@yahoo.it wrote: I need to add theese features to each document Document1 --- Argument1, positive Argument2, positive Argument3, neutral Argument4, positive Argument5, negative Argument6, negative Document2 --- Argument1, negative Argument2, positive Argument3, negative Argument6, negative Argument7, neutral where the argument name is dynamic using a relational database I could use a master detail structure, but in solr? I thought about a Map or Pair field Da: Grant Ingersoll gsing...@apache.org A: solr-user@lucene.apache.org Inviato: Gio 10 dicembre 2009, 19:47:55 Oggetto: Re: Custom Field sample? Can you perhaps give a little more info on what problem you are trying to solve? FWIW, there are a lot of examples of custom FieldTypes in the Solr code. On Dec 10, 2009, at 11:46 AM, Antonio Zippo wrote: Hi all, could you help me to create a custom field? I need to create a field structured like a Map is it possible? how to define if the search string is on key or value (or both)? A way could be to create a char separated multivalued string field... but it isn't the best way. and with facets is the worst way could you give me a custom field sample? Thanks in advance, Revenge -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: indexing XML with solr example webapp - out of java heap space
the post.jar does not stream. use curl if you are using *nix. --Noble On Wed, Dec 9, 2009 at 12:28 AM, Feroze Daud fero...@zillow.com wrote: Hi! I downloaded SOLR and am trying to index an XML file. This XML file is huge (500M). When I try to index it using the post.jar tool in example\exampledocs, I get a out of java heap space error in the SimplePostTool application. Any ideas how to fix this? Passing in -Xms1024M does not fix it. Feroze. -- - Noble Paul | Systems Architect| AOL | http://aol.com