Re: How to use Solr in my project

2013-12-29 Thread Gora Mohanty
On 29 December 2013 11:10, Fatima Issawi issa...@qu.edu.qa wrote:
[...]
 We will have the full text stored, but we want to highlight the text in the 
 original image. I expect to process the image after retrieval. We do plan on 
 storing the (x, y) coordinates of the words in a database - I suspected that 
 it would be too expensive to store them in Solr. I guess I'm still confused 
 about how to use Solr to index the document, but then retrieve the (x, y) 
 coordinates of the search term from the database. Is this possible? If it 
 can, can you give an example how this can be done?

Storing, and retrieving the coordinates from Solr will likely be
faster than from the database. However, I still think that you
should think more carefully about your use case of highlighting
the images. It can be done, but is a significant amount of work,
and will need storage, and computational resources.
1. For highlighting in the image, you will need to store two sets
of coordinates (e.g., top right and bottom left corners) as you
not know the length of the word in the image. Thus, say with
15 words per line, 50 lines per page, 100 pages per document,
you will need to store:
  4 x 15 x 50 x 100 = 3,00,000 coordinates/document
2. Also, how are you going to get the coordinates in the first
place?

Regards,
Gora


Re: Maybe a bug for solr 4.6 when create a new core

2013-12-29 Thread YouPeng Yang
Hi Mark Miller

   It's great that you have fixed the bug. By the way,there is another
point i want to remind you which I posted it before.I am just posting  it
again.
  Please check it again.whether is the autoCreated check here  is fit?

  IAfter I create a new core with explicite shard and coreNodeName
successfully when I fixed the bug,I can not create a replica for above new
core also with explicite coreNodeName and the same shard and collection
,the Request url as following:

http://10.7.23.122:8080/solr/admin/cores?action=CREATEname=Test1shard=Testcollection.configName=myconfschema=schema.xmlconfig=solrconfigLocal_default.xmlcollection=defaultcolcoreNodeName=Test1

 It responses an error:

response
lst name=responseHeader
   int name=status400/int
 int name=QTime29/int
  /lst
  lst name=error
 str name=msgError CREATEing SolrCore 'Test1': Test1 is
removed/str
 int name=code400/int
  /lst
  /response

   I find out that in the src class in  org.apache.solr.cloud.ZkController
line 1369~ 1384:
   As the code says,when I indicate a  coreNodeName and collection
explicitly,it goes to check a 'autoCreated' property of the Collection
which I have already created.

  My question :Why does it need to check the 'autoCreated' property,any
jira about this 'autoCreated' property? How can I make  through the check?

   [1]---
--
try {
  if(cd.getCloudDescriptor().getCollectionName() !=null 
 cd.getCloudDescriptor().getCoreNodeName() != null ) {
//we were already registered

 
if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){
DocCollection coll =
 
zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName());
 if(!true.equals(coll.getStr(autoCreated))){
   Slice slice =
 coll.getSlice(cd.getCloudDescriptor().getShardId());
   if(slice != null){
 if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName())
 == null) {
   log.info(core_removed This core is removed from ZK);
   throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName +
 is removed);
 }
   }
 }
}
  }
 
--


2013/12/27 YouPeng Yang yypvsxf19870...@gmail.com

  Hi Mark

   I have filed a jira about the NPE:
   https://issues.apache.org/jira/browse/SOLR-5580


 2013/12/27 YouPeng Yang yypvsxf19870...@gmail.com

 Hi Mark.

Thanks for your reply.

 I will file a JIRA issue about the NPE.

By the way,would you look through the Question 2. After I create a new
 core with explicite shard and coreNodeName successfully,I can not create a
 replica for above new core also with explicite coreNodeName and the same
 shard and collection
   Request url as following:

 http://10.7.23.122:8080/solr/admin/cores?action=CREATEname=Test1shard=Testcollection.configName=myconfschema=schema.xmlconfig=solrconfigLocal_default.xmlcollection=defaultcolcoreNodeName=Test1

  It responses an error:

 response
 lst name=responseHeader
int name=status400/int
  int name=QTime29/int
   /lst
   lst name=error
  str name=msgError CREATEing SolrCore 'Test1': Test1 is
 removed/str
  int name=code400/int
   /lst
   /response

I find out that in the src class in  org.apache.solr.cloud.
 ZkController line 1369~ 1384:
As the code says,when I indicate a  coreNodeName and collection
 explicitly,it goes to check a 'autoCreated' property of the Collection
 which I have already created.

   My question :Why does it need to check the 'autoCreated' property,any
 jira about this 'autoCreated' property? How can I make  through the check?


 [1]-
 try {
   if(cd.getCloudDescriptor().getCollectionName() !=null 
  cd.getCloudDescriptor().getCoreNodeName() != null ) {
 //we were already registered


  
 if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){
 DocCollection coll =

  
 zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName());
  if(!true.equals(coll.getStr(autoCreated))){
Slice slice =
  coll.getSlice(cd.getCloudDescriptor().getShardId());
if(slice != null){

 if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName())
  == null) {
log.info(core_removed This core is removed from ZK);
throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName +
  is removed);
  }
}
  }
 }
   }

  
 

Re: lots of tlog.

2013-12-29 Thread YouPeng Yang
Hi Mark Miller

   How can a  log replay  fail .
 And I can not figure out the reason of the exception. It seems to no
BigDecimal type field  in my schema.

  Please give some suggestions


The exception :
133462 [recoveryExecutor-48-thread-1] WARN  org.apache.solr.update.
UpdateLog  – Starting log replay hdfs
tlog{file=hdfs://lklcluster/solr/repCore/repCore/core_node2
0/data/tlog/tlog.693 refcount=2} active=false starting pos=0
133576 [recoveryExecutor-48-thread-1] WARN
org.apache.solr.update.UpdateLog  – REYPLAY_ERR: IOException reading log
org.apache.solr.common.SolrException: Invalid Number:
java.math.BigDecimal:238124404
at
org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396)
at
org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98)
.


Re: Questions about integrateing SolrCloud with HDFS

2013-12-29 Thread YouPeng Yang
Hi Mark

   Are there have roadmap or plan about your futher work on Solr and with.
I do anticipate your great work.


Regards


2013/12/27 Mark Miller markrmil...@gmail.com

 1. The exception and change in experience on the move to 4.6 seems like it
 could be a bug we want to investigate.

 2. Solr storing data on hdfs in other ways seems like a different issue /
 improvement.

 3. You shouldn't try and force more than one core to use the same index on
 hdfs. This would be bad.

 4. You really want to use the solr.hdfs.home setting described in the
 documentation IMO.

 - Mark

  On Dec 26, 2013, at 1:56 PM, Greg Walters greg.walt...@answers.com
 wrote:
 
  Mark,
 
  I'd be happy to but some clarification first; should this issue be about
 creating cores with overlapping names and the stack trace that YouPeng
 initially described, Solr's behavior when storing data on HDFS or YouPeng's
 other thread (Maybe a bug for solr 4.6 when create a new core) that looks
 like it might be a near duplicate of this one?
 
  Thanks,
  Greg
 
  On Dec 26, 2013, at 12:40 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Can you file a JIRA issue?
 
  - Mark
 
  On Dec 24, 2013, at 2:57 AM, YouPeng Yang yypvsxf19870...@gmail.com
 wrote:
 
  Hi users
 
  Solr supports for writing and reading its index and transaction log
 files
  to the HDFS distributed filesystem.
  **I am curious about that there are any other futher improvement about
  the integration with HDFS.*
  **For the solr  native replication  will make multiple copies  of the
  master node's index. Because of the native replication of HDFS,there
 is no
  need to do that.It just to need that multiple cores in solrcloud share
 the
  same index directory in HDFS?*
 
 
  The above supposition is what I want to achive when we are integrating
  SolrCloud with HDFS (Solr 4.6).
  To make sure of our application high available,we still have  to take
  the solr   replication with   some tricks.
 
  Firstly ,noting that  solr's index directory is made up of
  *collectionName/coreNodeName/data/index *
 
  *collectionName/coreNodeName/data/tlog*
  So to achive this,we want to create multi cores that use the same  hdfs
  index directory .
 
  I have tested this  within solr 4.4 by expilcitly indicating  the same
  coreNodeName.
 
  For example:
  Step1, a core was created with the name=core1 and shard=core_shard1 and
  collection=clollection1 and coreNodeName=*core1*
  Step2. create another core  with the name=core2 and shard=core_shard1
 and
  collection=clollection1 and coreNodeName=
  *core1*
  *  T*he two core share the same shard ,collection and coreNodeName.As a
  result,the two core will get the same index data which is stored in the
  hdfs directory :
  hdfs://myhdfs/*clollection1*/*core1*/data/index
  hdfs://myhdfs/*clollection1*/*core1*/data/tlog
 
  Unfortunately*, *as the solr 4.6 was released,we upgraded . the above
  goal failed. We could not create a core with both expilcit shard and
  coreNodeName.
  Exceptions are as [1].
  *  Can some give some help?*
 
 
  Regards
 
 [1]--
  64893635 [http-bio-8080-exec-1] INFO
  org.apache.solr.cloud.ZkController
  ?.publishing core=hdfstest3 state=down
  64893635 [http-bio-8080-exec-1] INFO
  org.apache.solr.cloud.ZkController
  ?.numShards not found on descriptor - reading it from system property
  64893698 [http-bio-8080-exec-1] INFO
  org.apache.solr.cloud.ZkController
  ?.look for our core node name
 
 
 
  64951227 [http-bio-8080-exec-17] INFO  org.apache.solr.core.SolrCore
  ?.[reportCore_201208] webapp=/solr path=/replication
 
 params={slave=falsecommand=detailswt=javabinqt=/replicationversion=2}
  status=0 QTime=107
 
 
  65213770 [http-bio-8080-exec-1] INFO
  org.apache.solr.cloud.ZkController
  ?.waiting to find shard id in clusterstate for hdfstest3
  65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore
  ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore
  'hdfstest3': Could not get shard id for core: hdfstest3
   at
 
 org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535)
   at
 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
   at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at
 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
   at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
   at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
   at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 

Redis as Solr Cache

2013-12-29 Thread Alexander Ramos Jardim
While researching for Solr Caching options and interesting cases, I bumped
on this https://github.com/dfdeshom/solr-redis-cache. Does anyone has any
experience with this setup? Using Redis as Solr Cache.

I see a lot of advantage in having a distributed cache for solr. One solr
node benefiting from the cache generated on another one would be beautiful.

I see problems too. Performance wise, I don't know if it would be viable
for Solr to write it's cache through the network on Redis Master node.

And what about if I have Solr nodes with different index version looking at
the same cache?

IMO as long as Redis is useful, if it isn't to have a distributed cache, I
think it's not possible to get better performance using it.

-- 
Alexander Ramos Jardim


Re: Redis as Solr Cache

2013-12-29 Thread Upayavira
On Sun, Dec 29, 2013, at 02:35 PM, Alexander Ramos Jardim wrote:
 While researching for Solr Caching options and interesting cases, I
 bumped
 on this https://github.com/dfdeshom/solr-redis-cache. Does anyone has any
 experience with this setup? Using Redis as Solr Cache.
 
 I see a lot of advantage in having a distributed cache for solr. One solr
 node benefiting from the cache generated on another one would be
 beautiful.
 
 I see problems too. Performance wise, I don't know if it would be viable
 for Solr to write it's cache through the network on Redis Master node.
 
 And what about if I have Solr nodes with different index version looking
 at
 the same cache?
 
 IMO as long as Redis is useful, if it isn't to have a distributed cache,
 I
 think it's not possible to get better performance using it.

This idea makes assumptions about how a Solr/Lucene index operates.
Certainly, in a SolrCloud setup, each node is responsible for its own
committing, and its caches exist for the timespan between commits. Thus,
the cache one node will need will not necessarily be the same as the one
that is needed by another node, which might have a commit interval
slightly out of sync with the first.

So, whilst this may be possible, and may give some benefits, I'd reckon
that it would be a rather substantial engineering exercise, rather than
the quick win you seem to be assuming it might be.

Upayavira


Re: lots of tlog.

2013-12-29 Thread Mark Miller
It can fail because it may contain a partial record - that is why that
is warn level rather than error. A fail does not necessarily indicate a
problem.

- Mark

On 12/29/2013 09:04 AM, YouPeng Yang wrote:
 Hi Mark Miller
  
How can a  log replay  fail .
  And I can not figure out the reason of the exception. It seems to
 no BigDecimal type field  in my schema.
 
   Please give some suggestions
 
 
 The exception :
 133462 [recoveryExecutor-48-thread-1] WARN  org.apache.solr.update.
 UpdateLog  – Starting log replay hdfs
 tlog{file=hdfs://lklcluster/solr/repCore/repCore/core_node2
 0/data/tlog/tlog.693 refcount=2} active=false starting pos=0
 133576 [recoveryExecutor-48-thread-1] WARN 
 org.apache.solr.update.UpdateLog  – REYPLAY_ERR: IOException reading log
 org.apache.solr.common.SolrException: Invalid Number:
 java.math.BigDecimal:238124404
 at
 org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396)
 at
 org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98)
 .
 
 
 



Re: Questions about integrateing SolrCloud with HDFS

2013-12-29 Thread Mark Miller
We have a few ideas we may end up working on. I'm going to start the
implementation of the first idea pretty soon - I'll create a JIRA issue
which will be the best way to keep tabs on the progress. I don't expect
an initial implementation to take long.

That's for something akin to the HBase region server model. A shared
index among leader and replicas is a longer term item we might explore.
A lot of trade offs depending on what we try and do, so nothing set in
stone yet, but I'll start on phase 1 impl any day now.

- Mark

On 12/29/2013 09:08 AM, YouPeng Yang wrote:
 Hi Mark
 
Are there have roadmap or plan about your futher work on Solr and with.
 I do anticipate your great work.
 
 
 Regards
 
 
 2013/12/27 Mark Miller markrmil...@gmail.com
 
 1. The exception and change in experience on the move to 4.6 seems like it
 could be a bug we want to investigate.

 2. Solr storing data on hdfs in other ways seems like a different issue /
 improvement.

 3. You shouldn't try and force more than one core to use the same index on
 hdfs. This would be bad.

 4. You really want to use the solr.hdfs.home setting described in the
 documentation IMO.

 - Mark

 On Dec 26, 2013, at 1:56 PM, Greg Walters greg.walt...@answers.com
 wrote:

 Mark,

 I'd be happy to but some clarification first; should this issue be about
 creating cores with overlapping names and the stack trace that YouPeng
 initially described, Solr's behavior when storing data on HDFS or YouPeng's
 other thread (Maybe a bug for solr 4.6 when create a new core) that looks
 like it might be a near duplicate of this one?

 Thanks,
 Greg

 On Dec 26, 2013, at 12:40 PM, Mark Miller markrmil...@gmail.com
 wrote:

 Can you file a JIRA issue?

 - Mark

 On Dec 24, 2013, at 2:57 AM, YouPeng Yang yypvsxf19870...@gmail.com
 wrote:

 Hi users

 Solr supports for writing and reading its index and transaction log
 files
 to the HDFS distributed filesystem.
 **I am curious about that there are any other futher improvement about
 the integration with HDFS.*
 **For the solr  native replication  will make multiple copies  of the
 master node's index. Because of the native replication of HDFS,there
 is no
 need to do that.It just to need that multiple cores in solrcloud share
 the
 same index directory in HDFS?*


 The above supposition is what I want to achive when we are integrating
 SolrCloud with HDFS (Solr 4.6).
 To make sure of our application high available,we still have  to take
 the solr   replication with   some tricks.

 Firstly ,noting that  solr's index directory is made up of
 *collectionName/coreNodeName/data/index *

 *collectionName/coreNodeName/data/tlog*
 So to achive this,we want to create multi cores that use the same  hdfs
 index directory .

 I have tested this  within solr 4.4 by expilcitly indicating  the same
 coreNodeName.

 For example:
 Step1, a core was created with the name=core1 and shard=core_shard1 and
 collection=clollection1 and coreNodeName=*core1*
 Step2. create another core  with the name=core2 and shard=core_shard1
 and
 collection=clollection1 and coreNodeName=
 *core1*
 *  T*he two core share the same shard ,collection and coreNodeName.As a
 result,the two core will get the same index data which is stored in the
 hdfs directory :
 hdfs://myhdfs/*clollection1*/*core1*/data/index
 hdfs://myhdfs/*clollection1*/*core1*/data/tlog

 Unfortunately*, *as the solr 4.6 was released,we upgraded . the above
 goal failed. We could not create a core with both expilcit shard and
 coreNodeName.
 Exceptions are as [1].
 *  Can some give some help?*


 Regards

 [1]--
 64893635 [http-bio-8080-exec-1] INFO
  org.apache.solr.cloud.ZkController
 ?.publishing core=hdfstest3 state=down
 64893635 [http-bio-8080-exec-1] INFO
  org.apache.solr.cloud.ZkController
 ?.numShards not found on descriptor - reading it from system property
 64893698 [http-bio-8080-exec-1] INFO
  org.apache.solr.cloud.ZkController
 ?.look for our core node name



 64951227 [http-bio-8080-exec-17] INFO  org.apache.solr.core.SolrCore
 ?.[reportCore_201208] webapp=/solr path=/replication

 params={slave=falsecommand=detailswt=javabinqt=/replicationversion=2}
 status=0 QTime=107


 65213770 [http-bio-8080-exec-1] INFO
  org.apache.solr.cloud.ZkController
 ?.waiting to find shard id in clusterstate for hdfstest3
 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore
 ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore
 'hdfstest3': Could not get shard id for core: hdfstest3
  at

 org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535)
  at

 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
  at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at

 

Re: Boosting results on value of field different from query

2013-12-29 Thread manju16832003
Hi Puneet,

 http://localhost:8983/solr/my/select?q=type:sedan^100 type:compact^10
(:*)^1wt=jsonindent=truefl=,scoredebug=resultsbf=recip(rord(publish_date),1,2,3)^1.5sort=score
 
 desc 

Not really. The query I mentioned it does not restrict the other types,
instead it would push other types to bottom of the list as they have lower
score. The query I mentioned, Cars with type sedan listed first, compact
secondly and rest are the bottom of the list.

You would have to control the logic at the application level, which type you
want to be on the top.
In the above query, we are boosting type sedan to be on top, compact below
the sedan types and rest are at the bottom.
Change the those boosting values to get the different results.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-results-on-value-of-field-different-from-query-tp4108180p4108636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Maybe a bug for solr 4.6 when create a new core

2013-12-29 Thread Mark Miller
Thanks - that is more bad code that needs to be removed. I'll reopen that
issue and add to it.

- Mark


On Sun, Dec 29, 2013 at 8:56 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote:

 Hi Mark Miller

It's great that you have fixed the bug. By the way,there is another
 point i want to remind you which I posted it before.I am just posting  it
 again.
   Please check it again.whether is the autoCreated check here  is fit?

   IAfter I create a new core with explicite shard and coreNodeName
 successfully when I fixed the bug,I can not create a replica for above new
 core also with explicite coreNodeName and the same shard and collection
 ,the Request url as following:


 http://10.7.23.122:8080/solr/admin/cores?action=CREATEname=Test1shard=Testcollection.configName=myconfschema=schema.xmlconfig=solrconfigLocal_default.xmlcollection=defaultcolcoreNodeName=Test1

  It responses an error:

 response
 lst name=responseHeader
int name=status400/int
  int name=QTime29/int
   /lst
   lst name=error
  str name=msgError CREATEing SolrCore 'Test1': Test1 is
 removed/str
  int name=code400/int
   /lst
   /response

I find out that in the src class in  org.apache.solr.cloud.ZkController
 line 1369~ 1384:
As the code says,when I indicate a  coreNodeName and collection
 explicitly,it goes to check a 'autoCreated' property of the Collection
 which I have already created.

   My question :Why does it need to check the 'autoCreated' property,any
 jira about this 'autoCreated' property? How can I make  through the check?

[1]---

 --
 try {
   if(cd.getCloudDescriptor().getCollectionName() !=null 
  cd.getCloudDescriptor().getCoreNodeName() != null ) {
 //we were already registered


  
 if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){
 DocCollection coll =

  
 zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName());
  if(!true.equals(coll.getStr(autoCreated))){
Slice slice =
  coll.getSlice(cd.getCloudDescriptor().getShardId());
if(slice != null){
  if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName())
  == null) {
log.info(core_removed This core is removed from ZK);
throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName +
  is removed);
  }
}
  }
 }
   }

  
 --


 2013/12/27 YouPeng Yang yypvsxf19870...@gmail.com

   Hi Mark
 
I have filed a jira about the NPE:
https://issues.apache.org/jira/browse/SOLR-5580
 
 
  2013/12/27 YouPeng Yang yypvsxf19870...@gmail.com
 
  Hi Mark.
 
 Thanks for your reply.
 
  I will file a JIRA issue about the NPE.
 
 By the way,would you look through the Question 2. After I create a
 new
  core with explicite shard and coreNodeName successfully,I can not
 create a
  replica for above new core also with explicite coreNodeName and the same
  shard and collection
Request url as following:
 
 
 http://10.7.23.122:8080/solr/admin/cores?action=CREATEname=Test1shard=Testcollection.configName=myconfschema=schema.xmlconfig=solrconfigLocal_default.xmlcollection=defaultcolcoreNodeName=Test1
 
   It responses an error:
 
  response
  lst name=responseHeader
 int name=status400/int
   int name=QTime29/int
/lst
lst name=error
   str name=msgError CREATEing SolrCore 'Test1': Test1 is
  removed/str
   int name=code400/int
/lst
/response
 
 I find out that in the src class in  org.apache.solr.cloud.
  ZkController line 1369~ 1384:
 As the code says,when I indicate a  coreNodeName and collection
  explicitly,it goes to check a 'autoCreated' property of the Collection
  which I have already created.
 
My question :Why does it need to check the 'autoCreated' property,any
  jira about this 'autoCreated' property? How can I make  through the
 check?
 
 
 
 [1]-
  try {
if(cd.getCloudDescriptor().getCollectionName() !=null 
   cd.getCloudDescriptor().getCoreNodeName() != null ) {
  //we were already registered
 
 
 
  
 if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){
  DocCollection coll =
 
 
  
 zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName());
   if(!true.equals(coll.getStr(autoCreated))){
 Slice slice =
   coll.getSlice(cd.getCloudDescriptor().getShardId());
 if(slice != null){
 
  if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName())
   == null) {
   

RE: How to use Solr in my project

2013-12-29 Thread Fatima Issawi
Hi again,

We have another program that will be extracting the text, and it will be 
extracting the top right and bottom left corners of the words. You are right, I 
do expect to have a lot of data.

When would solr start experiencing issues in performance? Is it better to:

INDEX: 
- document metadata 
- words  

STORE: 
- document metadata
- words 
- coordinates 

in Solr rather than in the database? How would I set up the schema in order to 
store the coordinates?

If storing the coordinates in solr is not recommended, what would be the best 
process to get the coordinates after indexing the words and metadata? Do I 
search in solr and then use the documentID to then search the database for the 
words and coordinates?

Thanks for your patience. I don't have much choice in the use case. 


 -Original Message-
 From: Gora Mohanty [mailto:g...@mimirtech.com]
 Sent: Sunday, December 29, 2013 2:48 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to use Solr in my project
 
 On 29 December 2013 11:10, Fatima Issawi issa...@qu.edu.qa wrote:
 [...]
  We will have the full text stored, but we want to highlight the text in the
 original image. I expect to process the image after retrieval. We do plan on
 storing the (x, y) coordinates of the words in a database - I suspected that 
 it
 would be too expensive to store them in Solr. I guess I'm still confused about
 how to use Solr to index the document, but then retrieve the (x, y)
 coordinates of the search term from the database. Is this possible? If it can,
 can you give an example how this can be done?
 
 Storing, and retrieving the coordinates from Solr will likely be faster than
 from the database. However, I still think that you should think more carefully
 about your use case of highlighting the images. It can be done, but is a
 significant amount of work, and will need storage, and computational
 resources.
 1. For highlighting in the image, you will need to store two sets
 of coordinates (e.g., top right and bottom left corners) as you
 not know the length of the word in the image. Thus, say with
 15 words per line, 50 lines per page, 100 pages per document,
 you will need to store:
   4 x 15 x 50 x 100 = 3,00,000 coordinates/document 2. Also, how are you
 going to get the coordinates in the first
 place?
 
 Regards,
 Gora