Re: How to use Solr in my project
On 29 December 2013 11:10, Fatima Issawi issa...@qu.edu.qa wrote: [...] We will have the full text stored, but we want to highlight the text in the original image. I expect to process the image after retrieval. We do plan on storing the (x, y) coordinates of the words in a database - I suspected that it would be too expensive to store them in Solr. I guess I'm still confused about how to use Solr to index the document, but then retrieve the (x, y) coordinates of the search term from the database. Is this possible? If it can, can you give an example how this can be done? Storing, and retrieving the coordinates from Solr will likely be faster than from the database. However, I still think that you should think more carefully about your use case of highlighting the images. It can be done, but is a significant amount of work, and will need storage, and computational resources. 1. For highlighting in the image, you will need to store two sets of coordinates (e.g., top right and bottom left corners) as you not know the length of the word in the image. Thus, say with 15 words per line, 50 lines per page, 100 pages per document, you will need to store: 4 x 15 x 50 x 100 = 3,00,000 coordinates/document 2. Also, how are you going to get the coordinates in the first place? Regards, Gora
Re: Maybe a bug for solr 4.6 when create a new core
Hi Mark Miller It's great that you have fixed the bug. By the way,there is another point i want to remind you which I posted it before.I am just posting it again. Please check it again.whether is the autoCreated check here is fit? IAfter I create a new core with explicite shard and coreNodeName successfully when I fixed the bug,I can not create a replica for above new core also with explicite coreNodeName and the same shard and collection ,the Request url as following: http://10.7.23.122:8080/solr/admin/cores?action=CREATEname=Test1shard=Testcollection.configName=myconfschema=schema.xmlconfig=solrconfigLocal_default.xmlcollection=defaultcolcoreNodeName=Test1 It responses an error: response lst name=responseHeader int name=status400/int int name=QTime29/int /lst lst name=error str name=msgError CREATEing SolrCore 'Test1': Test1 is removed/str int name=code400/int /lst /response I find out that in the src class in org.apache.solr.cloud.ZkController line 1369~ 1384: As the code says,when I indicate a coreNodeName and collection explicitly,it goes to check a 'autoCreated' property of the Collection which I have already created. My question :Why does it need to check the 'autoCreated' property,any jira about this 'autoCreated' property? How can I make through the check? [1]--- -- try { if(cd.getCloudDescriptor().getCollectionName() !=null cd.getCloudDescriptor().getCoreNodeName() != null ) { //we were already registered if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){ DocCollection coll = zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName()); if(!true.equals(coll.getStr(autoCreated))){ Slice slice = coll.getSlice(cd.getCloudDescriptor().getShardId()); if(slice != null){ if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName()) == null) { log.info(core_removed This core is removed from ZK); throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName + is removed); } } } } } -- 2013/12/27 YouPeng Yang yypvsxf19870...@gmail.com Hi Mark I have filed a jira about the NPE: https://issues.apache.org/jira/browse/SOLR-5580 2013/12/27 YouPeng Yang yypvsxf19870...@gmail.com Hi Mark. Thanks for your reply. I will file a JIRA issue about the NPE. By the way,would you look through the Question 2. After I create a new core with explicite shard and coreNodeName successfully,I can not create a replica for above new core also with explicite coreNodeName and the same shard and collection Request url as following: http://10.7.23.122:8080/solr/admin/cores?action=CREATEname=Test1shard=Testcollection.configName=myconfschema=schema.xmlconfig=solrconfigLocal_default.xmlcollection=defaultcolcoreNodeName=Test1 It responses an error: response lst name=responseHeader int name=status400/int int name=QTime29/int /lst lst name=error str name=msgError CREATEing SolrCore 'Test1': Test1 is removed/str int name=code400/int /lst /response I find out that in the src class in org.apache.solr.cloud. ZkController line 1369~ 1384: As the code says,when I indicate a coreNodeName and collection explicitly,it goes to check a 'autoCreated' property of the Collection which I have already created. My question :Why does it need to check the 'autoCreated' property,any jira about this 'autoCreated' property? How can I make through the check? [1]- try { if(cd.getCloudDescriptor().getCollectionName() !=null cd.getCloudDescriptor().getCoreNodeName() != null ) { //we were already registered if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){ DocCollection coll = zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName()); if(!true.equals(coll.getStr(autoCreated))){ Slice slice = coll.getSlice(cd.getCloudDescriptor().getShardId()); if(slice != null){ if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName()) == null) { log.info(core_removed This core is removed from ZK); throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName + is removed); } } } } }
Re: lots of tlog.
Hi Mark Miller How can a log replay fail . And I can not figure out the reason of the exception. It seems to no BigDecimal type field in my schema. Please give some suggestions The exception : 133462 [recoveryExecutor-48-thread-1] WARN org.apache.solr.update. UpdateLog – Starting log replay hdfs tlog{file=hdfs://lklcluster/solr/repCore/repCore/core_node2 0/data/tlog/tlog.693 refcount=2} active=false starting pos=0 133576 [recoveryExecutor-48-thread-1] WARN org.apache.solr.update.UpdateLog – REYPLAY_ERR: IOException reading log org.apache.solr.common.SolrException: Invalid Number: java.math.BigDecimal:238124404 at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396) at org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98) .
Re: Questions about integrateing SolrCloud with HDFS
Hi Mark Are there have roadmap or plan about your futher work on Solr and with. I do anticipate your great work. Regards 2013/12/27 Mark Miller markrmil...@gmail.com 1. The exception and change in experience on the move to 4.6 seems like it could be a bug we want to investigate. 2. Solr storing data on hdfs in other ways seems like a different issue / improvement. 3. You shouldn't try and force more than one core to use the same index on hdfs. This would be bad. 4. You really want to use the solr.hdfs.home setting described in the documentation IMO. - Mark On Dec 26, 2013, at 1:56 PM, Greg Walters greg.walt...@answers.com wrote: Mark, I'd be happy to but some clarification first; should this issue be about creating cores with overlapping names and the stack trace that YouPeng initially described, Solr's behavior when storing data on HDFS or YouPeng's other thread (Maybe a bug for solr 4.6 when create a new core) that looks like it might be a near duplicate of this one? Thanks, Greg On Dec 26, 2013, at 12:40 PM, Mark Miller markrmil...@gmail.com wrote: Can you file a JIRA issue? - Mark On Dec 24, 2013, at 2:57 AM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi users Solr supports for writing and reading its index and transaction log files to the HDFS distributed filesystem. **I am curious about that there are any other futher improvement about the integration with HDFS.* **For the solr native replication will make multiple copies of the master node's index. Because of the native replication of HDFS,there is no need to do that.It just to need that multiple cores in solrcloud share the same index directory in HDFS?* The above supposition is what I want to achive when we are integrating SolrCloud with HDFS (Solr 4.6). To make sure of our application high available,we still have to take the solr replication with some tricks. Firstly ,noting that solr's index directory is made up of *collectionName/coreNodeName/data/index * *collectionName/coreNodeName/data/tlog* So to achive this,we want to create multi cores that use the same hdfs index directory . I have tested this within solr 4.4 by expilcitly indicating the same coreNodeName. For example: Step1, a core was created with the name=core1 and shard=core_shard1 and collection=clollection1 and coreNodeName=*core1* Step2. create another core with the name=core2 and shard=core_shard1 and collection=clollection1 and coreNodeName= *core1* * T*he two core share the same shard ,collection and coreNodeName.As a result,the two core will get the same index data which is stored in the hdfs directory : hdfs://myhdfs/*clollection1*/*core1*/data/index hdfs://myhdfs/*clollection1*/*core1*/data/tlog Unfortunately*, *as the solr 4.6 was released,we upgraded . the above goal failed. We could not create a core with both expilcit shard and coreNodeName. Exceptions are as [1]. * Can some give some help?* Regards [1]-- 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController ?.publishing core=hdfstest3 state=down 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController ?.numShards not found on descriptor - reading it from system property 64893698 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController ?.look for our core node name 64951227 [http-bio-8080-exec-17] INFO org.apache.solr.core.SolrCore ?.[reportCore_201208] webapp=/solr path=/replication params={slave=falsecommand=detailswt=javabinqt=/replicationversion=2} status=0 QTime=107 65213770 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController ?.waiting to find shard id in clusterstate for hdfstest3 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore 'hdfstest3': Could not get shard id for core: hdfstest3 at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
Redis as Solr Cache
While researching for Solr Caching options and interesting cases, I bumped on this https://github.com/dfdeshom/solr-redis-cache. Does anyone has any experience with this setup? Using Redis as Solr Cache. I see a lot of advantage in having a distributed cache for solr. One solr node benefiting from the cache generated on another one would be beautiful. I see problems too. Performance wise, I don't know if it would be viable for Solr to write it's cache through the network on Redis Master node. And what about if I have Solr nodes with different index version looking at the same cache? IMO as long as Redis is useful, if it isn't to have a distributed cache, I think it's not possible to get better performance using it. -- Alexander Ramos Jardim
Re: Redis as Solr Cache
On Sun, Dec 29, 2013, at 02:35 PM, Alexander Ramos Jardim wrote: While researching for Solr Caching options and interesting cases, I bumped on this https://github.com/dfdeshom/solr-redis-cache. Does anyone has any experience with this setup? Using Redis as Solr Cache. I see a lot of advantage in having a distributed cache for solr. One solr node benefiting from the cache generated on another one would be beautiful. I see problems too. Performance wise, I don't know if it would be viable for Solr to write it's cache through the network on Redis Master node. And what about if I have Solr nodes with different index version looking at the same cache? IMO as long as Redis is useful, if it isn't to have a distributed cache, I think it's not possible to get better performance using it. This idea makes assumptions about how a Solr/Lucene index operates. Certainly, in a SolrCloud setup, each node is responsible for its own committing, and its caches exist for the timespan between commits. Thus, the cache one node will need will not necessarily be the same as the one that is needed by another node, which might have a commit interval slightly out of sync with the first. So, whilst this may be possible, and may give some benefits, I'd reckon that it would be a rather substantial engineering exercise, rather than the quick win you seem to be assuming it might be. Upayavira
Re: lots of tlog.
It can fail because it may contain a partial record - that is why that is warn level rather than error. A fail does not necessarily indicate a problem. - Mark On 12/29/2013 09:04 AM, YouPeng Yang wrote: Hi Mark Miller How can a log replay fail . And I can not figure out the reason of the exception. It seems to no BigDecimal type field in my schema. Please give some suggestions The exception : 133462 [recoveryExecutor-48-thread-1] WARN org.apache.solr.update. UpdateLog – Starting log replay hdfs tlog{file=hdfs://lklcluster/solr/repCore/repCore/core_node2 0/data/tlog/tlog.693 refcount=2} active=false starting pos=0 133576 [recoveryExecutor-48-thread-1] WARN org.apache.solr.update.UpdateLog – REYPLAY_ERR: IOException reading log org.apache.solr.common.SolrException: Invalid Number: java.math.BigDecimal:238124404 at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396) at org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98) .
Re: Questions about integrateing SolrCloud with HDFS
We have a few ideas we may end up working on. I'm going to start the implementation of the first idea pretty soon - I'll create a JIRA issue which will be the best way to keep tabs on the progress. I don't expect an initial implementation to take long. That's for something akin to the HBase region server model. A shared index among leader and replicas is a longer term item we might explore. A lot of trade offs depending on what we try and do, so nothing set in stone yet, but I'll start on phase 1 impl any day now. - Mark On 12/29/2013 09:08 AM, YouPeng Yang wrote: Hi Mark Are there have roadmap or plan about your futher work on Solr and with. I do anticipate your great work. Regards 2013/12/27 Mark Miller markrmil...@gmail.com 1. The exception and change in experience on the move to 4.6 seems like it could be a bug we want to investigate. 2. Solr storing data on hdfs in other ways seems like a different issue / improvement. 3. You shouldn't try and force more than one core to use the same index on hdfs. This would be bad. 4. You really want to use the solr.hdfs.home setting described in the documentation IMO. - Mark On Dec 26, 2013, at 1:56 PM, Greg Walters greg.walt...@answers.com wrote: Mark, I'd be happy to but some clarification first; should this issue be about creating cores with overlapping names and the stack trace that YouPeng initially described, Solr's behavior when storing data on HDFS or YouPeng's other thread (Maybe a bug for solr 4.6 when create a new core) that looks like it might be a near duplicate of this one? Thanks, Greg On Dec 26, 2013, at 12:40 PM, Mark Miller markrmil...@gmail.com wrote: Can you file a JIRA issue? - Mark On Dec 24, 2013, at 2:57 AM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi users Solr supports for writing and reading its index and transaction log files to the HDFS distributed filesystem. **I am curious about that there are any other futher improvement about the integration with HDFS.* **For the solr native replication will make multiple copies of the master node's index. Because of the native replication of HDFS,there is no need to do that.It just to need that multiple cores in solrcloud share the same index directory in HDFS?* The above supposition is what I want to achive when we are integrating SolrCloud with HDFS (Solr 4.6). To make sure of our application high available,we still have to take the solr replication with some tricks. Firstly ,noting that solr's index directory is made up of *collectionName/coreNodeName/data/index * *collectionName/coreNodeName/data/tlog* So to achive this,we want to create multi cores that use the same hdfs index directory . I have tested this within solr 4.4 by expilcitly indicating the same coreNodeName. For example: Step1, a core was created with the name=core1 and shard=core_shard1 and collection=clollection1 and coreNodeName=*core1* Step2. create another core with the name=core2 and shard=core_shard1 and collection=clollection1 and coreNodeName= *core1* * T*he two core share the same shard ,collection and coreNodeName.As a result,the two core will get the same index data which is stored in the hdfs directory : hdfs://myhdfs/*clollection1*/*core1*/data/index hdfs://myhdfs/*clollection1*/*core1*/data/tlog Unfortunately*, *as the solr 4.6 was released,we upgraded . the above goal failed. We could not create a core with both expilcit shard and coreNodeName. Exceptions are as [1]. * Can some give some help?* Regards [1]-- 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController ?.publishing core=hdfstest3 state=down 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController ?.numShards not found on descriptor - reading it from system property 64893698 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController ?.look for our core node name 64951227 [http-bio-8080-exec-17] INFO org.apache.solr.core.SolrCore ?.[reportCore_201208] webapp=/solr path=/replication params={slave=falsecommand=detailswt=javabinqt=/replicationversion=2} status=0 QTime=107 65213770 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController ?.waiting to find shard id in clusterstate for hdfstest3 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore 'hdfstest3': Could not get shard id for core: hdfstest3 at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at
Re: Boosting results on value of field different from query
Hi Puneet, http://localhost:8983/solr/my/select?q=type:sedan^100 type:compact^10 (:*)^1wt=jsonindent=truefl=,scoredebug=resultsbf=recip(rord(publish_date),1,2,3)^1.5sort=score desc Not really. The query I mentioned it does not restrict the other types, instead it would push other types to bottom of the list as they have lower score. The query I mentioned, Cars with type sedan listed first, compact secondly and rest are the bottom of the list. You would have to control the logic at the application level, which type you want to be on the top. In the above query, we are boosting type sedan to be on top, compact below the sedan types and rest are at the bottom. Change the those boosting values to get the different results. -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-results-on-value-of-field-different-from-query-tp4108180p4108636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Maybe a bug for solr 4.6 when create a new core
Thanks - that is more bad code that needs to be removed. I'll reopen that issue and add to it. - Mark On Sun, Dec 29, 2013 at 8:56 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote: Hi Mark Miller It's great that you have fixed the bug. By the way,there is another point i want to remind you which I posted it before.I am just posting it again. Please check it again.whether is the autoCreated check here is fit? IAfter I create a new core with explicite shard and coreNodeName successfully when I fixed the bug,I can not create a replica for above new core also with explicite coreNodeName and the same shard and collection ,the Request url as following: http://10.7.23.122:8080/solr/admin/cores?action=CREATEname=Test1shard=Testcollection.configName=myconfschema=schema.xmlconfig=solrconfigLocal_default.xmlcollection=defaultcolcoreNodeName=Test1 It responses an error: response lst name=responseHeader int name=status400/int int name=QTime29/int /lst lst name=error str name=msgError CREATEing SolrCore 'Test1': Test1 is removed/str int name=code400/int /lst /response I find out that in the src class in org.apache.solr.cloud.ZkController line 1369~ 1384: As the code says,when I indicate a coreNodeName and collection explicitly,it goes to check a 'autoCreated' property of the Collection which I have already created. My question :Why does it need to check the 'autoCreated' property,any jira about this 'autoCreated' property? How can I make through the check? [1]--- -- try { if(cd.getCloudDescriptor().getCollectionName() !=null cd.getCloudDescriptor().getCoreNodeName() != null ) { //we were already registered if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){ DocCollection coll = zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName()); if(!true.equals(coll.getStr(autoCreated))){ Slice slice = coll.getSlice(cd.getCloudDescriptor().getShardId()); if(slice != null){ if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName()) == null) { log.info(core_removed This core is removed from ZK); throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName + is removed); } } } } } -- 2013/12/27 YouPeng Yang yypvsxf19870...@gmail.com Hi Mark I have filed a jira about the NPE: https://issues.apache.org/jira/browse/SOLR-5580 2013/12/27 YouPeng Yang yypvsxf19870...@gmail.com Hi Mark. Thanks for your reply. I will file a JIRA issue about the NPE. By the way,would you look through the Question 2. After I create a new core with explicite shard and coreNodeName successfully,I can not create a replica for above new core also with explicite coreNodeName and the same shard and collection Request url as following: http://10.7.23.122:8080/solr/admin/cores?action=CREATEname=Test1shard=Testcollection.configName=myconfschema=schema.xmlconfig=solrconfigLocal_default.xmlcollection=defaultcolcoreNodeName=Test1 It responses an error: response lst name=responseHeader int name=status400/int int name=QTime29/int /lst lst name=error str name=msgError CREATEing SolrCore 'Test1': Test1 is removed/str int name=code400/int /lst /response I find out that in the src class in org.apache.solr.cloud. ZkController line 1369~ 1384: As the code says,when I indicate a coreNodeName and collection explicitly,it goes to check a 'autoCreated' property of the Collection which I have already created. My question :Why does it need to check the 'autoCreated' property,any jira about this 'autoCreated' property? How can I make through the check? [1]- try { if(cd.getCloudDescriptor().getCollectionName() !=null cd.getCloudDescriptor().getCoreNodeName() != null ) { //we were already registered if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){ DocCollection coll = zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName()); if(!true.equals(coll.getStr(autoCreated))){ Slice slice = coll.getSlice(cd.getCloudDescriptor().getShardId()); if(slice != null){ if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName()) == null) {
RE: How to use Solr in my project
Hi again, We have another program that will be extracting the text, and it will be extracting the top right and bottom left corners of the words. You are right, I do expect to have a lot of data. When would solr start experiencing issues in performance? Is it better to: INDEX: - document metadata - words STORE: - document metadata - words - coordinates in Solr rather than in the database? How would I set up the schema in order to store the coordinates? If storing the coordinates in solr is not recommended, what would be the best process to get the coordinates after indexing the words and metadata? Do I search in solr and then use the documentID to then search the database for the words and coordinates? Thanks for your patience. I don't have much choice in the use case. -Original Message- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Sunday, December 29, 2013 2:48 PM To: solr-user@lucene.apache.org Subject: Re: How to use Solr in my project On 29 December 2013 11:10, Fatima Issawi issa...@qu.edu.qa wrote: [...] We will have the full text stored, but we want to highlight the text in the original image. I expect to process the image after retrieval. We do plan on storing the (x, y) coordinates of the words in a database - I suspected that it would be too expensive to store them in Solr. I guess I'm still confused about how to use Solr to index the document, but then retrieve the (x, y) coordinates of the search term from the database. Is this possible? If it can, can you give an example how this can be done? Storing, and retrieving the coordinates from Solr will likely be faster than from the database. However, I still think that you should think more carefully about your use case of highlighting the images. It can be done, but is a significant amount of work, and will need storage, and computational resources. 1. For highlighting in the image, you will need to store two sets of coordinates (e.g., top right and bottom left corners) as you not know the length of the word in the image. Thus, say with 15 words per line, 50 lines per page, 100 pages per document, you will need to store: 4 x 15 x 50 x 100 = 3,00,000 coordinates/document 2. Also, how are you going to get the coordinates in the first place? Regards, Gora