from:"Noble Paul നോബിള്‍ नोब्ळ्"

Re: Hash range to shard assignment

2013-09-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

That is in the pipeline. within next 3-4 months for sure

On Mon, Sep 23, 2013 at 11:07 PM, lochri loc...@web.de wrote:
 Yes, actually that would be a very comfortable solution.
 Is that planned ? And if so, when will it be released ?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204p4091591.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
-
Noble Paul

Re: Hash range to shard assignment

2013-09-23 Thread Noble Paul നോബിള്‍ नोब्ळ्

Custom routers is an idea that is floated around ad easy to implement.
Just that it is something we resist to add another extension point.

The point is we are planning other features which would obviate the
need for a custom router. Something like splitting a shard by a query.
Will it be a good enough solution for you?





On Mon, Sep 23, 2013 at 2:52 PM, lochri loc...@web.de wrote:
 Thanks for the clarification.

 Still I would think it is sub-optimal to split shards when we actually don't
 know which mailboxes we actually split. It may create splits of small users
 which leads to unnecessary distribution of the smaller users.

 We thought about doing the routing ourself. As far as a I understood we can
 do distributed searches across multiple collections. What do you think about
 this option ?

 For the ideal solution: when will custom routers be supported ?

 Regards,
 Lochri



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204p4091503.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
-
Noble Paul

Re: Hash range to shard assignment

2013-09-20 Thread Noble Paul നോബിള്‍ नोब्ळ्

This would need you to plug your own router . It is not yet possible

But , you can split that shard repeatedly and keep the no:of users in that
shard limited


On Fri, Sep 20, 2013 at 3:52 PM, lochri loc...@web.de wrote:

 Hello folks,

 we would like to have control of where certain hash values or ranges are
 being located.
 The reason is that we want to shard per user but we know ahead that one or
 more specific users could grow way faster than others. Therefore we would
 like to locate them on separate shards (which may be on the same server
 initially and can be moved out later).

 So my question: can we control the hash-ranges and hash-range to shard
 assignment in SolrCloud ?

 Regards,
 Lochri




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul

Re: dataconfig to index ZIP Files

2013-07-01 Thread Noble Paul നോബിള്‍ नोब्ळ्

IIRC Zip files are not supported


On Mon, Jul 1, 2013 at 10:30 PM, ericrs22 ericr...@yahoo.com wrote:

 To answer the previous Post:

 I was not sure what datasource=binaryFile I took it from a PDF sample
 thinking that would help.

 after setting datasource=null I'm still gett the same errors...

 dataConfig
 dataSource type=BinFileDataSource user=svcSolr
 password=SomePassword /
 document
 entity name=Archive
   processor=FileListEntityProcessor baseDir=E:\ArchiveRoot
 fileName=.zip$ recursive=true rootEntity=false dataSource=null
 onError=skip

 field column=fileSize name=size/
 field column=file
 name=filename/

 /entity

 /document
 /dataConfig

 the logs report this:


 INFO  - 2013-07-01 16:45:57.317;
 org.apache.solr.handler.dataimport.DataImporter; Starting Full Import
 WARN  - 2013-07-01 16:45:57.333;
 org.apache.solr.handler.dataimport.SimplePropertiesWriter; Unable to read:
 dataimport.properties




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074399.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul

Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Noble Paul നോബിള്‍ नोब्ळ्

it is possible to create two separate root entities . one for full-import
and another for delta. for the delta-import you can skip Cache that way



On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber 
constantin.wol...@medicalcolumbus.de wrote:

 Hi,

 i searched for a solution for quite some time but did not manage to find
 some real hints on how to fix it.


 I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a
 tomcat 6 container.

 My data import setup is basically the following:

 Data-config.xml:

 entity
 name=article
 dataSource=ds1
 query=SELECT * FROM article
 deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date
 gt; '${dih.last_index_time}
 deltaImportQuery=SELECT * FROM article WHERE
 myownid=${dih.delta.myownid}
 pk=myownid
 field column=myownid name=id/

 entity
 name=supplier
 dataSource=ds2
 query=SELECT * FROM supplier WHERE status=1
 processor=CachedSqlEntityProcessor
 cacheKey=SUPPLIER_ID
 cacheLookup=article.ARTICLE_SUPPLIER_ID
 /entity

 entity
 name=attributes
 dataSource=ds1
 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+'
 Value:'+ATTRIBUTE_VALUE FROM attributes
 cacheKey=ARTICLE_ID
 cacheLookup=article.myownid
 processor=CachedSqlEntityProcessor
 /entity
 /entity


 Ok now for the problem:

 At first I tried everything without the Cache. But the full-import took a
 very long time. Because the attributes query is pretty slow compared to the
 rest. As a result I got a processing speed of around 150 Documents/s.
 When switching everything to the CachedSqlEntityProcessor the full import
 processed at the speed of 4000 Documents/s

 So full import is running quite fine. Now I wanted to use the delta
 import. When running the delta import I was expecting the ramp up time to
 be about the same as in full import since I need to load the whole table
 supplier and attributes to the cache in the first step. But when looking
 into the log file the weird thing is solr seems to refresh the Cache for
 every single document that is processed. So currently my delta-import is a
 lot slower than the full-import. I even tried to add the deltaImportQuery
 parameter to the entity but it doesn't change the behavior at all (of
 course I know it is not supposed to change anything in the setup I run).

 The following solutions would be possible in my opinion:

 1. Is there any way to tell the config to ignore the Cache when running a
 delta import? That would help already because we are talking about the
 maximum of 500 documents changed in 15 minutes compared to over 5 million
 documents in total.
 2. Get solr to not refresh the cash for every document.

 Best Regards

 Constantin Wolber




-- 
-
Noble Paul

Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Noble Paul നോബിള്‍ नोब्ळ्

yes. that's right


On Thu, Jun 20, 2013 at 8:16 PM, Constantin Wolber 
constantin.wol...@medicalcolumbus.de wrote:

 Hi,

 i may have been a little to fast with my response.

 After reading a bit more I imagine you meant running the full-import with
 the entity param for the root entity for full import. And running the delta
 import with the entity param for the delta entity. Is that correct?

 Regards

 Constantin


 -Ursprüngliche Nachricht-
 Von: Constantin Wolber [mailto:constantin.wol...@medicalcolumbus.de]
 Gesendet: Donnerstag, 20. Juni 2013 16:42
 An: solr-user@lucene.apache.org
 Betreff: AW: DataImportHandler: Problems with delta-import and
 CachedSqlEntityProcessor

 Hi,

 and thanks for the answer. But I'm a little bit confused about what you
 are suggesting.
 I did not really use the rootEntity attribute before. But from what I read
 in the documentation as far as I can tell that would result in two
 documents (maybe with the same id which would probably result in only one
 document being stored) because one for each root entity.

 It would be great if you could just sketch the setup with the entities I
 provided. Because currently I have no idea on how to do it.

 Regards

 Constantin


 -Ursprüngliche Nachricht-
 Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
 Gesendet: Donnerstag, 20. Juni 2013 15:42
 An: solr-user@lucene.apache.org
 Betreff: Re: DataImportHandler: Problems with delta-import and
 CachedSqlEntityProcessor

 it is possible to create two separate root entities . one for full-import
 and another for delta. for the delta-import you can skip Cache that way



 On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber 
 constantin.wol...@medicalcolumbus.de wrote:

  Hi,
 
  i searched for a solution for quite some time but did not manage to
  find some real hints on how to fix it.
 
 
  I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in
  a tomcat 6 container.
 
  My data import setup is basically the following:
 
  Data-config.xml:
 
  entity
  name=article
  dataSource=ds1
  query=SELECT * FROM article
  deltaQuery=SELECT myownid FROM articleHistory WHERE
  modified_date gt; '${dih.last_index_time}
  deltaImportQuery=SELECT * FROM article WHERE
  myownid=${dih.delta.myownid}
  pk=myownid
  field column=myownid name=id/
 
  entity
  name=supplier
  dataSource=ds2
  query=SELECT * FROM supplier WHERE status=1
  processor=CachedSqlEntityProcessor
  cacheKey=SUPPLIER_ID
  cacheLookup=article.ARTICLE_SUPPLIER_ID
  /entity
 
  entity
  name=attributes
  dataSource=ds1
  query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+'
  Value:'+ATTRIBUTE_VALUE FROM attributes
  cacheKey=ARTICLE_ID
  cacheLookup=article.myownid
  processor=CachedSqlEntityProcessor
  /entity
  /entity
 
 
  Ok now for the problem:
 
  At first I tried everything without the Cache. But the full-import
  took a very long time. Because the attributes query is pretty slow
  compared to the rest. As a result I got a processing speed of around 150
 Documents/s.
  When switching everything to the CachedSqlEntityProcessor the full
  import processed at the speed of 4000 Documents/s
 
  So full import is running quite fine. Now I wanted to use the delta
  import. When running the delta import I was expecting the ramp up time
  to be about the same as in full import since I need to load the whole
  table supplier and attributes to the cache in the first step. But when
  looking into the log file the weird thing is solr seems to refresh the
  Cache for every single document that is processed. So currently my
  delta-import is a lot slower than the full-import. I even tried to add
  the deltaImportQuery parameter to the entity but it doesn't change the
  behavior at all (of course I know it is not supposed to change anything
 in the setup I run).
 
  The following solutions would be possible in my opinion:
 
  1. Is there any way to tell the config to ignore the Cache when
  running a delta import? That would help already because we are talking
  about the maximum of 500 documents changed in 15 minutes compared to
  over 5 million documents in total.
  2. Get solr to not refresh the cash for every document.
 
  Best Regards
 
  Constantin Wolber
 
 


 --
 -
 Noble Paul




-- 
-
Noble Paul

Re: Replication not working

2013-06-11 Thread Noble Paul നോബിള്‍ नोब्ळ्

can you check with the indexversion command on both mater and slave?

pollInterval is set to 2 minutes. It is usually long . So you may need to
wait for 2 mins for the replication to kick in


On Tue, Jun 11, 2013 at 3:21 PM, thomas.poroc...@der.net wrote:

 Hi all,



 we have a setup with multiple cores, loaded via DataImportHandlers.

 Works fine so far.

 Now we are trying to get the replication working (for one core so far).
 But the automated replication is never happening.

 Manually triggered replication works!



 Environment:

 Solr 4.1 (also tried with 4.3)

 App-Server JBoss 4.3.

 Java 1.6



 There are two JBoss instances running on different ports on the same box
 with their own solr.home directories.



 Configuration is done like described in the documentation:



 requestHandler name=/replication class=solr.ReplicationHandler 

  lst name=master

   str name=enable${de.der.pu.solr.master.enable:false}/str

   str name=replicateAfterstartup/str

   str name=replicateAftercommit/str

   str name=replicateAfteroptimize/str

   str name=confFilesstopwords.txt, solrconfig.xml/str

  /lst

  lst name=slave

  str name=enable${de.der.pu.solr.slave.enable:false}/str

   str
 name=masterUrlhttp://localhost:30006/solr/${solr.core.name}/str

   str name=pollInterfall00:02:00/str

  /lst

   /requestHandler



 Basically it looks all fine from the admin-pages.



 The polling from the slave is going on but nothing happens.

 We have tried to delete slave index completely and restart both servers.
 Reimportet the master data several times and so on..



 On the masters replication page I see:

 - replication enable: true

 - replicateAfter: commit, startup

 - confFiles: stopwords.txt, solrconfig.xml



 On slave side I see:

 -masters version 1370612995391  53   2.56 MB

 -master url:  http://localhost:30006/solr/contacts

 -poling enable: true



 And master settings like on master side...



 When I enter
 http://localhost:30006/solr/contacts/replication?command=detailswt=json
 indent=true in the browser the response seems ok:

 {
   responseHeader:{
 status:0,
 QTime:0},
   details:{
 indexSize:2.56 MB,

 indexPath:D:\\usr\\local\\phx-unlimited\\jboss\\solr_cache\\test_pto_
 node1_solr\\contacts\\data\\index/,
 commits:[[
 indexVersion,1370612995391,
 generation,53,
 filelist,[_1r.fdt,
   _1r.fdx,
   _1r.fnm,
   _1r.nvd,
   _1r.nvm,
   _1r.si,
   _1r_Lucene41_0.doc,
   _1r_Lucene41_0.pos,
   _1r_Lucene41_0.tim,
   _1r_Lucene41_0.tip,
   segments_1h]]],
 isMaster:true,
 isSlave:false,
 indexVersion:1370612995391,
 generation:53,
 master:{
   confFiles:stopwords.txt, solrconfig.xml,
   replicateAfter:[commit,
 startup],
   replicationEnabled:true,
   replicableVersion:1370612995391,
   replicableGeneration:53}},
   WARNING:This response format is experimental.  It is likely to
 change in the future.}





 Any idea how we could go on?



 Regards

 Thomas






-- 
-
Noble Paul

Re: Replication not working

2013-06-11 Thread Noble Paul നോബിള്‍ नोब्ळ्

You said polling is happening and nothing is replicated

What do the logs say on slave (Set level to INFO) ?




On Tue, Jun 11, 2013 at 4:54 PM, thomas.poroc...@der.net wrote:

 Calling indexversion on master gives:
  response
   lst name=responseHeader
  int name=status0/intint name=QTime0/int
   /lst
   long name=indexversion1370612995391/long
   long name=generation53/long
  /response

 On Slave:
  response
lst name=responseHeaderint name=status0/int
int name=QTime0/int/lst
long name=indexversion0/long
long name=generation1/long
  /response

  pollInterval is set to 2 minutes. It is usually long

 I know ;-)


 -Ursprüngliche Nachricht-
 Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
 Gesendet: Dienstag, 11. Juni 2013 13:16
 An: solr-user@lucene.apache.org
 Betreff: Re: Replication not working

 can you check with the indexversion command on both mater and slave?

 pollInterval is set to 2 minutes. It is usually long . So you may need to
 wait for 2 mins for the replication to kick in


 On Tue, Jun 11, 2013 at 3:21 PM, thomas.poroc...@der.net wrote:

  Hi all,
 
 
 
  we have a setup with multiple cores, loaded via DataImportHandlers.
 
  Works fine so far.
 
  Now we are trying to get the replication working (for one core so far).
  But the automated replication is never happening.
 
  Manually triggered replication works!
 
 
 
  Environment:
 
  Solr 4.1 (also tried with 4.3)
 
  App-Server JBoss 4.3.
 
  Java 1.6
 
 
 
  There are two JBoss instances running on different ports on the same box
  with their own solr.home directories.
 
 
 
  Configuration is done like described in the documentation:
 
 
 
  requestHandler name=/replication class=solr.ReplicationHandler 
 
   lst name=master
 
str name=enable${de.der.pu.solr.master.enable:false}/str
 
str name=replicateAfterstartup/str
 
str name=replicateAftercommit/str
 
str name=replicateAfteroptimize/str
 
str name=confFilesstopwords.txt, solrconfig.xml/str
 
   /lst
 
   lst name=slave
 
   str name=enable${de.der.pu.solr.slave.enable:false}/str
 
str
  name=masterUrlhttp://localhost:30006/solr/${solr.core.name}/str
 
str name=pollInterfall00:02:00/str
 
   /lst
 
/requestHandler
 
 
 
  Basically it looks all fine from the admin-pages.
 
 
 
  The polling from the slave is going on but nothing happens.
 
  We have tried to delete slave index completely and restart both servers.
  Reimportet the master data several times and so on..
 
 
 
  On the masters replication page I see:
 
  - replication enable: true
 
  - replicateAfter: commit, startup
 
  - confFiles: stopwords.txt, solrconfig.xml
 
 
 
  On slave side I see:
 
  -masters version 1370612995391  53   2.56 MB
 
  -master url:  http://localhost:30006/solr/contacts
 
  -poling enable: true
 
 
 
  And master settings like on master side...
 
 
 
  When I enter
  http://localhost:30006/solr/contacts/replication?command=detailswt=json
  indent=true in the browser the response seems ok:
 
  {
responseHeader:{
  status:0,
  QTime:0},
details:{
  indexSize:2.56 MB,
 
  indexPath:D:\\usr\\local\\phx-unlimited\\jboss\\solr_cache\\test_pto_
  node1_solr\\contacts\\data\\index/,
  commits:[[
  indexVersion,1370612995391,
  generation,53,
  filelist,[_1r.fdt,
_1r.fdx,
_1r.fnm,
_1r.nvd,
_1r.nvm,
_1r.si,
_1r_Lucene41_0.doc,
_1r_Lucene41_0.pos,
_1r_Lucene41_0.tim,
_1r_Lucene41_0.tip,
segments_1h]]],
  isMaster:true,
  isSlave:false,
  indexVersion:1370612995391,
  generation:53,
  master:{
confFiles:stopwords.txt, solrconfig.xml,
replicateAfter:[commit,
  startup],
replicationEnabled:true,
replicableVersion:1370612995391,
replicableGeneration:53}},
WARNING:This response format is experimental.  It is likely to
  change in the future.}
 
 
 
 
 
  Any idea how we could go on?
 
 
 
  Regards
 
  Thomas
 
 
 
 


 --
 -
 Noble Paul




-- 
-
Noble Paul

Re: Replication not working

2013-06-11 Thread Noble Paul നോബിള്‍ नोब्ळ्

I mean , the log when polling happens when from slave. Not when you issue a
command.


On Tue, Jun 11, 2013 at 5:28 PM, thomas.poroc...@der.net wrote:

 Log on slave:

 2013-06-11 13:19:08,477 8385607 INFO  [org.apache.solr.core.SolrCore]
 (http-0.0.0.0-31006-1:) [contacts] webapp=/solr path=/replication
 params={indent=truecommand=indexversionswt=json+} status=0 QTime=0
 2013-06-11 13:19:08,477 8385607 DEBUG
 [org.apache.solr.servlet.SolrDispatchFilter] (http-0.0.0.0-31006-1:)
 Closing out SolrRequest: {indent=truecommand=indexversionswt=json+}
 2013-06-11 13:22:27,017 8584147 INFO  [org.apache.solr.core.SolrCore]
 (http-0.0.0.0-31006-1:) [contacts] webapp=/solr path=/replication
 params={command=indexversion} status=0 QTime=0
 2013-06-11 13:22:27,017 8584147 DEBUG
 [org.apache.solr.servlet.SolrDispatchFilter] (http-0.0.0.0-31006-1:)
 Closing out SolrRequest: {command=indexversion}

 -Ursprüngliche Nachricht-
 Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
 Gesendet: Dienstag, 11. Juni 2013 13:41
 An: solr-user@lucene.apache.org
 Betreff: Re: Replication not working

 You said polling is happening and nothing is replicated

 What do the logs say on slave (Set level to INFO) ?




 On Tue, Jun 11, 2013 at 4:54 PM, thomas.poroc...@der.net wrote:

  Calling indexversion on master gives:
   response
lst name=responseHeader
   int name=status0/intint name=QTime0/int
/lst
long name=indexversion1370612995391/long
long name=generation53/long
   /response
 
  On Slave:
   response
 lst name=responseHeaderint name=status0/int
 int name=QTime0/int/lst
 long name=indexversion0/long
 long name=generation1/long
   /response
 
   pollInterval is set to 2 minutes. It is usually long
 
  I know ;-)
 
 
  -Ursprüngliche Nachricht-
  Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
  Gesendet: Dienstag, 11. Juni 2013 13:16
  An: solr-user@lucene.apache.org
  Betreff: Re: Replication not working
 
  can you check with the indexversion command on both mater and slave?
 
  pollInterval is set to 2 minutes. It is usually long . So you may need to
  wait for 2 mins for the replication to kick in
 
 
  On Tue, Jun 11, 2013 at 3:21 PM, thomas.poroc...@der.net wrote:
 
   Hi all,
  
  
  
   we have a setup with multiple cores, loaded via DataImportHandlers.
  
   Works fine so far.
  
   Now we are trying to get the replication working (for one core so far).
   But the automated replication is never happening.
  
   Manually triggered replication works!
  
  
  
   Environment:
  
   Solr 4.1 (also tried with 4.3)
  
   App-Server JBoss 4.3.
  
   Java 1.6
  
  
  
   There are two JBoss instances running on different ports on the same
 box
   with their own solr.home directories.
  
  
  
   Configuration is done like described in the documentation:
  
  
  
   requestHandler name=/replication class=solr.ReplicationHandler 
  
lst name=master
  
 str
 name=enable${de.der.pu.solr.master.enable:false}/str
  
 str name=replicateAfterstartup/str
  
 str name=replicateAftercommit/str
  
 str name=replicateAfteroptimize/str
  
 str name=confFilesstopwords.txt, solrconfig.xml/str
  
/lst
  
lst name=slave
  
str name=enable${de.der.pu.solr.slave.enable:false}/str
  
 str
   name=masterUrlhttp://localhost:30006/solr/${solr.core.name}/str
  
 str name=pollInterfall00:02:00/str
  
/lst
  
 /requestHandler
  
  
  
   Basically it looks all fine from the admin-pages.
  
  
  
   The polling from the slave is going on but nothing happens.
  
   We have tried to delete slave index completely and restart both
 servers.
   Reimportet the master data several times and so on..
  
  
  
   On the masters replication page I see:
  
   - replication enable: true
  
   - replicateAfter: commit, startup
  
   - confFiles: stopwords.txt, solrconfig.xml
  
  
  
   On slave side I see:
  
   -masters version 1370612995391  53   2.56 MB
  
   -master url:  http://localhost:30006/solr/contacts
  
   -poling enable: true
  
  
  
   And master settings like on master side...
  
  
  
   When I enter
  
 http://localhost:30006/solr/contacts/replication?command=detailswt=json
   indent=true in the browser the response seems ok:
  
   {
 responseHeader:{
   status:0,
   QTime:0},
 details:{
   indexSize:2.56 MB,
  
  
 indexPath:D:\\usr\\local\\phx-unlimited\\jboss\\solr_cache\\test_pto_
   node1_solr\\contacts\\data\\index/,
   commits:[[
   indexVersion,1370612995391,
   generation,53,
   filelist,[_1r.fdt,
 _1r.fdx,
 _1r.fnm,
 _1r.nvd,
 _1r.nvm,
 _1r.si,
 _1r_Lucene41_0.doc,
 _1r_Lucene41_0.pos,
 _1r_Lucene41_0.tim,
 _1r_Lucene41_0.tip

Re: LotsOfCores feature

2013-06-10 Thread Noble Paul നോബിള്‍ नोब्ळ्

Aleksey,

It was a less than ideal situation. because we did not have a choice. We
had external systems/scripts to manage this. A new custom implementation is
being built on SolrCloud which would have taken care of most of hose
 issues.

SolrReplication is a hidden once you move to cloud. But it will continue in
the same way if you have a stand-lone deployment.


On Mon, Jun 10, 2013 at 1:20 AM, Aleksey bitterc...@gmail.com wrote:

 Thanks Paul. Just a little clarification:

 You mention that you migrate data using built-in replication, but if
 you map and route users yourself, doesn't that mean that you also need
 to manage replication yourself? Your routing logic needs to be aware
 of how to map both replicas for each user, and if one hosts goes down,
 then it needs to distribute traffic that it was receiving over other
 hosts. Same thing for adding more hosts.
 I did a couple of quick searches and found mostly older wikis that say
 solr replication will change in the future. Would you be able to point
 me to the right one?


 -

 On Fri, Jun 7, 2013 at 8:34 PM, Noble Paul നോബിള്‍  नोब्ळ्
 noble.p...@gmail.com wrote:
  We set it up like this
  + individual solr instances are setup
  + external mapping/routing to allocate users to instances. This
 information
  can be stored in an external data store
  + all cores are created as transient and loadonstart as false
  + cores come online on demand
  + as and when users data get bigger (or hosts are hot)they are migrated
  between less hit hosts using in built replication
 
  Keep in mind we had the schema for all users. Currently there is no way
 to
  upload a new schema to solr.
  On Jun 8, 2013 1:15 AM, Aleksey bitterc...@gmail.com wrote:
 
   Aleksey: What would you say is the average core size for your use
 case -
   thousands or millions of rows? And how sharded would each of your
   collections be, if at all?
 
  Average core/collection size wouldn't even be thousands, hundreds more
  like. And the largest would be half a million or so but that's a
  pathological case. I don't need sharding and queries than fan out to
  different machines. If fact I'd like to avoid that so I don't have to
  collate the results.
 
 
   The Wiki page was built not for Cloud Solr.
  
   We have done such a deployment where less than a tenth of cores were
  active
   at any given point in time. though there were tens of million indices
  they
   were split among a large no:of hosts.
  
   If you don't insist of Cloud deployment it is possible. I'm not sure
 if
  it
   is possible with cloud
 
  By Cloud you mean specifically SolrCloud? I don't have to have it if I
  can do without it. Bottom line is I want a bunch of small cores to be
  distributed over a fleet, each core completely fitting on one server.
  Would you be willing to provide a little more details on your setup?
  In particular, how are you managing the cores?
  How do you route requests to proper server?
  If you scale the fleet up and down, does reshuffling of the cores
  happen automatically or is it an involved manual process?
 
  Thanks,
 
  Aleksey
 




-- 
-
Noble Paul

Re: LotsOfCores feature

2013-06-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

The Wiki page was built not for Cloud Solr.

We have done such a deployment where less than a tenth of cores were active
at any given point in time. though there were tens of million indices they
were split among a large no:of hosts.


If you don't insist of Cloud deployment it is possible. I'm not sure if it
is possible with cloud


On Fri, Jun 7, 2013 at 12:38 AM, Aleksey bitterc...@gmail.com wrote:

 I was looking at this wiki and linked issues:
 http://wiki.apache.org/solr/LotsOfCores

 they talk about a limit being 100K cores. Is that per server or per
 entire fleet because zookeeper needs to manage that?

 I was considering a use case where I have tens of millions of indices
 but less that a million needs to be active at any time, so they need
 to be loaded on demand and evicted when not used for a while.
 Also since number one requirement is efficient loading of course I
 assume I will store a prebuilt index somewhere so Solr will just
 download it and strap it in, right?

 The root issue is marked as won;t fix but some other important
 subissues are marked as resolved. What's the overall status of the
 effort?

 Thank you in advance,

 Aleksey




-- 
-
Noble Paul

Re: SOLR CSV output in custom order

2013-06-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

Have you tried explicitly giving the field names (fl) as parameter
 http://wiki.apache.org/solr/CommonQueryParameters#fl


On Thu, Jun 6, 2013 at 12:41 PM, anurag.jain anurag.k...@gmail.com wrote:

 I want output of csv file in proper order.  when I use wt=csv  it gives
 output in random order. Is there any way to get output in proper format.

 Thanks



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-CSV-output-in-custom-order-tp4068527.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul

Re: LotsOfCores feature

2013-06-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

We set it up like this
+ individual solr instances are setup
+ external mapping/routing to allocate users to instances. This information
can be stored in an external data store
+ all cores are created as transient and loadonstart as false
+ cores come online on demand
+ as and when users data get bigger (or hosts are hot)they are migrated
between less hit hosts using in built replication

Keep in mind we had the schema for all users. Currently there is no way to
upload a new schema to solr.
On Jun 8, 2013 1:15 AM, Aleksey bitterc...@gmail.com wrote:

  Aleksey: What would you say is the average core size for your use case -
  thousands or millions of rows? And how sharded would each of your
  collections be, if at all?

 Average core/collection size wouldn't even be thousands, hundreds more
 like. And the largest would be half a million or so but that's a
 pathological case. I don't need sharding and queries than fan out to
 different machines. If fact I'd like to avoid that so I don't have to
 collate the results.


  The Wiki page was built not for Cloud Solr.
 
  We have done such a deployment where less than a tenth of cores were
 active
  at any given point in time. though there were tens of million indices
 they
  were split among a large no:of hosts.
 
  If you don't insist of Cloud deployment it is possible. I'm not sure if
 it
  is possible with cloud

 By Cloud you mean specifically SolrCloud? I don't have to have it if I
 can do without it. Bottom line is I want a bunch of small cores to be
 distributed over a fleet, each core completely fitting on one server.
 Would you be willing to provide a little more details on your setup?
 In particular, how are you managing the cores?
 How do you route requests to proper server?
 If you scale the fleet up and down, does reshuffling of the cores
 happen automatically or is it an involved manual process?

 Thanks,

 Aleksey

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-23 Thread Noble Paul നോബിള്‍ नोब्ळ्

Actually  , It's pretty high end for most of the users. Rishi, u can post
the real h/w details and our typical deployment .
No :of cpus per node
No :of disks per host
Vms per host
Gc params
No :of cores per instance

Noble Paul
Sent from phone
On 21 May 2013 01:47, Rishi Easwaran rishi.easwa...@aol.com wrote:

 No, we just upgraded to 4.2.1.
 With the size of our complex and effort required apply our patches and
 rollout, our upgrades are not that often.






 -Original Message-
 From: Noureddine Bouhlel nouredd...@ecotour.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, May 20, 2013 3:36 pm
 Subject: Re: Upgrading from SOLR 3.5 to 4.2.1 Results.


 Hi Rishi,

 Have you done any tests with Solr 4.3 ?

 Regards,


 Cordialement,

 BOUHLEL Noureddine



 On 17 May 2013 21:29, Rishi Easwaran rishi.easwa...@aol.com wrote:

 
 
  Hi All,
 
  Its Friday 3:00pm, warm  sunny outside and it was a good week. Figured
  I'd share some good news.
  I work for AOL mail team and we use SOLR for our mail search backend.
  We have been using it since pre-SOLR 1.4 and strong supporters of SOLR
  community.
  We deal with millions indexes and billions of requests a day across our
  complex.
  We finished full rollout of SOLR 4.2.1 into our production last week.
 
  Some key highlights:
  - ~75% Reduction in Search response times
  - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90%
  Reduction in errors
  - Garbage collection total stop reduction by over 50% moving application
  throughput into the 99.8% - 99.9% range
  - ~15% reduction in CPU usage
 
  We did not tune our application moving from 3.5 to 4.2.1 nor update java.
  For the most part it was a binary upgrade, with patches for our special
  use case.
 
  Now going forward we are looking at prototyping SOLR Cloud for our search
  system, upgrade java and tomcat, tune our application further. Lots of
 fun
  stuff :)
 
  Have a great weekend everyone.
  Thanks,
 
  Rishi.

Re: javabin binary format specification

2012-08-23 Thread Noble Paul നോബിള്‍ नोब्ळ्

There is no spec documented anywhere . It is all in this single file
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java

On Wed, Jul 25, 2012 at 6:47 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Sorry, but I could not find any spec on the binary format
  SolrJ is
  using. Can you point me to an URL if any?

 may be this?
 https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/BinaryResponseWriter.java




-- 
-
Noble Paul

Re: @field for child object

2011-07-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

no

On Mon, Jul 4, 2011 at 3:34 PM, Kiwi de coder kiwio...@gmail.com wrote:
 hi,

 i wondering solrj @Field annotation support embedded child object ? e.g.

 class A {

  @field
  string somefield;

  @emebedded
  B b;

 }

 regards,
 kiwi




-- 
-
Noble Paul

Re: Re; DIH Scheduling

2011-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Thu, Jun 23, 2011 at 9:13 PM, simon mtnes...@gmail.com wrote:
 The Wiki page describes a design for a scheduler, which has not been
 committed to Solr yet (I checked). I did see a patch the other day
 (see https://issues.apache.org/jira/browse/SOLR-2305) but it didn't
 look well tested.

 I think that you're basically stuck with something like cron at this
 time. If your application is written in java, take a look at the
 Quartz scheduler - http://www.quartz-scheduler.org/

It was considered and decided against.

 -Simon




-- 
-
Noble Paul

Re: Where is LogTransformer log file path??

2011-06-21 Thread Noble Paul നോബിള്‍ नोब्ळ्

it will be in the solr logs

On Tue, Jun 21, 2011 at 2:18 PM, Alucard alucard...@gmail.com wrote:
 Hi all.

 I follow the steps of creating a LogTransformer in DataImportHandler wiki:

 entity name=office_address dataSource=jdbc
 pk=office_add_Key transformer=LogTransformer logLevel=debug
                    logTemplate=office_add_Key:
 ${office_address.office_add_Key}, last_index_time:
 ${dataimporter.last_index_time}
 ...

 /entity

 The java statement that start Solr:

 java -Dremarks=solr:8983
 -Djava.util.logging.config.file=logging.properties -jar start.jar

 logging.properties file content

 # Default global logging level:
 .level = DEBUG

 # Write to a file:
 handlers = java.util.logging.FileHandler

 # Write log messages in human readable format:
 java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter

 # Log to the logs subdirectory, with log files named solrxxx.log
 java.util.logging.FileHandler.pattern = logs/solr_log-%g.log
 java.util.logging.FileHandler.append = true
 java.util.logging.FileHandler.count = 10
 java.util.logging.FileHandler.limit = 500 #Roughly 5MB

 

 So the log file (solr_log0.log) is there, startup message are properly
 logged.  However,
 when I do a delta import, the message defined in logTemplate attribute is
 not logged.

 I have done some research but cannot find anything related to:
 LogTransformer file path/log path or so on...

 So, can anyone please tell me where are those messgae logged?

 Thank you in advance for any help.

 Ellery




-- 
-
Noble Paul

Re: Need help with DIH dataconfig.xml

2011-06-16 Thread Noble Paul നോബിള്‍ नोब्ळ्

Use TemplateTransformer

dataConfig
   dataSource
   name = wld
   type=JdbcDataSource
   driver=com.mysql.jdbc.Driver
   url=jdbc:mysql://localhost/wld
   user=root
   password=pass/
   document name=variants
 entity name=III_1_1 query=SELECT * FROM `wld`.`III_1_1`
transformer=TemplateTransformer
   field column=id  template='${III_1_1.id}III_1_1}'/
   field column=lemmatitel name=lemma /
   field column=vraagtekst name=vraagtekst /
   field column=lexical_variant name=variant /
 /entity
 entity name=III_1_2 query=SELECT * FROM `wld`.`III_1_2`
   field column=id name='${III_1_2_ + id}'/
   field column=lemmatitel name=lemma /
   field column=vraagtekst name=vraagtekst /
   field column=lexical_variant name=variant /
 /entity
   /document
/dataConfig


On Wed, Jun 15, 2011 at 4:41 PM, MartinS martin.snijd...@gmail.com wrote:
 Hello,

 I want to perform a data import from a relational database.
 That all works well.
 However, i want to dynamically create a unique id for my solr documents
 while importing by using my data config file. I cant get it to work, maybe
 its not possible this way, but i thought i would ask you ll.
 (I set up schema.xml to use the field id as the unique id for solr
 documents)

 My solr config looks like this :

 dataConfig
        dataSource
                name = wld
                type=JdbcDataSource
                driver=com.mysql.jdbc.Driver
                url=jdbc:mysql://localhost/wld
                user=root
                password=pass/
        document name=variants
          entity name=III_1_1 query=SELECT * FROM `wld`.`III_1_1`
            field column=id name='${variants.name + id}'/
            field column=lemmatitel name=lemma /
            field column=vraagtekst name=vraagtekst /
            field column=lexical_variant name=variant /
          /entity
          entity name=III_1_2 query=SELECT * FROM `wld`.`III_1_2`
            field column=id name='${III_1_2_ + id}'/
            field column=lemmatitel name=lemma /
            field column=vraagtekst name=vraagtekst /
            field column=lexical_variant name=variant /
          /entity
    /document
 /dataConfig

 For a unique id I would like the concatenate the primary key of the table
 (Column id) with the table name.
 How can I do this ? Both ways as shown in the example data config don't work
 while importing.

 Any help is appreciated.
 Martin

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Need-help-with-DIH-dataconfig-xml-tp3066855p3066855.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul

Re: DIH entity threads

2011-06-15 Thread Noble Paul നോബിള്‍ नोब्ळ्

Sub entities can slow down indexing remarkably.What is that
datasource? DB? then try using CachedSqlEntityProcessor

On Tue, Jun 14, 2011 at 8:31 PM, Mark static.void@gmail.com wrote:
 Hello all,

 We are using DIH to index our data (~6M documents) and its taking an
 extremely long time (~24 hours). I am trying to find ways that we can speed
 this up. I've been reading through older posts and it's my understanding
 this should not take that long.

 One probably bottleneck is that we have a sub entity pulling in item
 descriptions from a separate datasource which we then strip html from.
 Before stripping the html we run it through JTidy. Our data-config looks
 something like this: http://pastie.org/2067011

 I've heard about entity threads and I was wondering if this would help in my
 case? I haven't been able to find any good documentation on this.

 Another possible bottleneck is the the number of sub entities we have... 5
 (only 1 of which is CachedSqlEntityProcessor). Any ideas?

 Thanks for the help






-- 
-
Noble Paul

Re: Throttling replication

2010-09-02 Thread Noble Paul നോബിള്‍ नोब्ळ्

There is no way to currently throttle replication. It consumes the
whole bandwidth available. It is a nice to have feature

On Thu, Sep 2, 2010 at 8:11 PM, Mark static.void@gmail.com wrote:
  Is there any way or forthcoming patch that would allow configuration of how
 much network bandwith (and ultimately disk I/O) a slave is allowed during
 replication? We have the current problem of while replicating our disk I/O
 goes through the roof. I would much rather have the replication take 2x as
 long with half the disk I/O? Any thoughts?

 Thanks




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Dynamic dataConfig files in DIH

2010-06-12 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Jun 11, 2010 at 11:13 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Is there a way to dynamically point which dataConfig file to use to
 import
 : using DIH without using the defaults hardcoded in solrconfig.xml?


 what do you mean by dynamically ? ... it's a query param, so you can
 specify the file name in the url when you issue the command.

 not it is not. it is not reloaded for every request. We should enhance dih
to do so. But the whole data-config file can be sent as a request param and
it works (this is used by the dih debug mode)


 -Hoss




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Solr DataConfig / DIH Question

2010-06-12 Thread Noble Paul നോബിള്‍ नोब्ळ्

this looks like a common problem.  I guess DIH should handle this more
gracefully. Instead of firing a query and failing it should not fire a query
if any of the values are missing . This can b made configurable if needed

On Sun, Jun 13, 2010 at 9:14 AM, Lance Norskog goks...@gmail.com wrote:

 This is a slow way to do this; databases are capable of doing this
 join and feeding the results very efficiently.

 The 'skipDoc' feature allows you to break out of the processing chain
 after the first query. It is used in the wikipedia example.

 http://wiki.apache.org/solr/DataImportHandler

 On Sat, Jun 12, 2010 at 6:37 PM, Holmes, Charles V. chol...@mitre.org
 wrote:
  I'm putting together an entity.  A simplified version of the database
 schema is below.  There is a 1-[0,1] relationship between Person and Address
 with address_id being the nullable foreign key.  If it makes any difference,
 I'm using SQL Server 2005 on the backend.
 
  Person [id (pk), name, address_id (fk)]
  Address [id (pk), zipcode]
 
  My data config looks like the one below.  This naturally fails when the
 address_id is null since the query ends up being select * from user.address
 where id = .
 
  entity name=person
 Query=select * from user.person
   entity name=address
   Query=select * from user.address where id =
 ${person.address_id}
   /entity
  /entity
 
  I've worked around it by using a config like this one.  However, this
 makes the queries quite complex for some of my larger joins.
 
  entity name=person
 Query=select * from user.person
   entity name=address
   Query=select * from user.address where id = (select address_id
 from user.person where id = ${person.id})
   /entity
  /entity
 
  Is there a cleaner / better way of handling these type of relationships?
  I've also tried to specify a default in the Solr schema, but that seems to
 only work after all the data is indexed which makes sense but surprised me
 initially.  BTW, thanks for the great DIH tutorial on the wiki!
 
  Thanks!
  Charles
 



 --
 Lance Norskog
 goks...@gmail.com




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: TikaEntityProcessor on Solr 1.4?

2010-05-22 Thread Noble Paul നോബിള്‍ नोब्ळ्

just copy the dih-extras jar file from the nightly should be fine

On Sat, May 22, 2010 at 3:12 AM, Sixten Otto six...@sfko.com wrote:
 On Fri, May 21, 2010 at 5:30 PM, Chris Harris rygu...@gmail.com wrote:
 Actually, rather than cherry-pick just the changes from SOLR-1358 and
 SOLR-1583 what I did was to merge in all DataImportHandler-related
 changes from between the 1.4 release up through Solr trunk r890679
 (inclusive). I'm not sure if that's what would work best for you, but
 it's one option.

 I'd rather, of course, not to have to build my own. But if I'm going
 to dabble in the source at all, it's just a slippery slope from the
 former to the latter. :-)  (My main hesitation in doing so would be
 that I'm new enough to the code that I have no idea what core changes
 the trunk's DIH might also depend on. And my Java's pretty rusty.)

 How did you arrive at your patch? Just grafting the entire
 trunk/solr/contrib/dataimporthandler onto 1.4's code? Or did you go
 through Jira/SVN looking for applicable changesets?

 I'll be very interested to hear how your testing goes!

 Sixten




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: TikaEntityProcessor on Solr 1.4?

2010-05-19 Thread Noble Paul നോബിള്‍ नोब्ळ्

I guess it should work because Tika Entityprocessor does not use any
new 1.4 APIs

On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote:
 Sorry to repeat this question, but I realized that it probably
 belonged in its own thread:

 The TikaEntityProcessor class that enables DataImportHandler to
 process business documents was added after the release of Solr 1.4,
 along with some other changes (like the binary DataSources) to support
 it. Obviously, there hasn't been an official release of Solr since
 then. Has anyone tried back-porting those changes to Solr 1.4?

 (I do see that the question was asked last month, without any
 response: http://www.lucidimagination.com/search/document/5d2d25bc57c370e9)

 The patches for these issues don't seem all that complex or pervasive,
 but it's hard for me (as a Solr n00b) to tell whether this is really
 all that's involved:
 https://issues.apache.org/jira/browse/SOLR-1583
 https://issues.apache.org/jira/browse/SOLR-1358

 Sixten




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Issue with delta import (not finding data in a column)

2010-05-12 Thread Noble Paul നോബിള്‍ नोब्ळ्

Are u reusing the context object? It may help if u can paste the relevant
part of ur code

On 10 May 2010 19:03, ahammad ahmed.ham...@gmail.com wrote:


I have a Solr core that retrieves data from an Oracle DB. The DB table has a
few columns, one of which is a Blob that represents a PDF document. In order
to retrieve the actual content of the PDF file, I wrote a Blob transformer
that converts the Blob into the PDF file, and subsequently reads it using
PDFBox. The blob is contained in a DB column called DOCUMENT, and the data
goes into a Solr field called fileContent, which is required.

This works fine when doing full imports, but it fails for delta imports. I
debugged my transformer, and it appears that when it attempts to fetch the
blob stored in the column, it gets nothing back (i.e. null). Because the
data is essentially null, it cannot retrieve anything, and cannot store
anything into Solr. As a result, the document does not get imported. I am
not sure what the problem is, because this only occurs with delta imports.

Here is my data-config file:

dataConfig
   dataSource driver=oracle.jdbc.driver.OracleDriver url=address
user=user password=pass/
   document name=table1
   entity name=TABLE1 pk=ID query=select * from TABLE1
   deltaImportQuery=select * from TABLE1 where ID
='${dataimporter.delta.ID}'
   deltaQuery=select ID from TABLE1 where (LASTMODIFIED

to_date('${dataimporter.last_index_time}', '-mm-dd HH24:MI:SS'))
   transformer=BlobTransformer
   field column=ID name=id /
   field column=TITLE name=title /
   field column=FILENAME name=filename /
   field column=DOCUMENT name=fileContent
blob=true/
   field column=LASTMODIFIED
name=lastModified /
   /entity
   /document
/dataConfig



Thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Issue-with-delta-import-not-finding-data-in-a-column-tp788993p788993.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom DIH EventListeners

2010-05-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

nope. register any event listener and check for the
context.currentProcess() to figure out what is the event

On Thu, May 6, 2010 at 8:21 AM, Blargy zman...@hotmail.com wrote:

 I know one can create custom event listeners for update or query events, but
 is it possible to create one for any DIH event (Full-Import, Delta-Import)?

 Thanks
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Custom-DIH-EventListeners-tp780517p780517.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Custom DIH variables

2010-05-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

you can use the core from this  API and use EmbeddedSolrServer (part
of solrj) . So the calls will be in-vm

On Thu, May 6, 2010 at 6:08 AM, Blargy zman...@hotmail.com wrote:

 Thanks Noble this is exactly what I was looking for.

 What is the preferred way to query solr within these sorts of classes?
 Should I grab the core from the context that is being passed in? Should I be
 using SolrJ?

 Can you provide an example and/or provide some tutorials/documentation.

 Once again, thanks!
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p780332.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Custom DIH variables

2010-05-05 Thread Noble Paul നോബിള്‍ नोब्ळ्

ok , u can't write a variable. But you may write a function
(Evaluator). it will look something like
${dataimporter.functions.foo()}

http://wiki.apache.org/solr/DataImportHandler#Custom_formatting_in_query_and_url_using_Functions

On Wed, May 5, 2010 at 9:12 PM, Blargy zman...@hotmail.com wrote:

 Thanks Paul, that will certainly work. I was just hoping there was a way I
 could write my own class that would inject this value as needed instead of
 precomputing this value and then passing it along in the params.

 My specific use case is instead of using dataimporter.last_index_time I want
 to use something like dataimporter.updated_time_of_last_document. Our DIH is
 set up to use a bunch of slave databases and there have been problems with
 some documents getting lost due to replication lag.

 I would prefer to compute this value using a custom variable at runtime
 instead of passing it along via the params. Is that even possible? If not
 Ill have to go with your previous suggestion.

 Thanks
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p779278.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Custom DIH variables

2010-05-04 Thread Noble Paul നോബിള്‍ नोब्ळ्

you can use custom parameters from request like ,
${dataimporter.request.foo}. pass the value of foo as a request param
say foo=bar


On Wed, May 5, 2010 at 6:05 AM, Blargy zman...@hotmail.com wrote:

 Can someone please point me in the right direction (classes) on how to create
 my own custom dih variable that can be used in my data-config.xml

 So instead of ${dataimporter.last_index_time} I want to be able to create
 ${dataimporter.foo}

 Thanks
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p777696.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DIH: inner select fails when outter entity is null/empty

2010-04-25 Thread Noble Paul നോബിള്‍ नोब्ळ्

do an onError=skip on the inner entity

On Fri, Apr 23, 2010 at 3:56 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hello,

 Here is a newbie DataImportHandler question:

 Currently, I have entities with entities.  There are some
 situations where a column value from the outer entity is null, and when I try 
 to use it in the inner entity, the null just gets replaced with an
 empty string.  That in turn causes the SQL query in the inner entity to
 fail.

 This seems like a common problem, but I couldn't find any solutions or 
 mention in the FAQ ( http://wiki.apache.org/solr/DataImportHandlerFaq )

 What is the best practice to avoid or convert null values to something safer? 
  Would
 this be done via a Transformer or is there a better mechanism for this?

 I think the problem I'm describing is similar to what was described here:  
 http://search-lucene.com/m/cjlhtFkG6m
 ... except I don't have the luxury of rewriting the SQL selects.

 Thanks,
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DIH best pratices question

2010-03-27 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Sat, Mar 27, 2010 at 3:25 AM, Blargy zman...@hotmail.com wrote:

 I have a items table on db1 and and item_descriptions table on db2.

 The items table is very small in the sense that it has small columns while
 the item_descriptions table has a very large text field column. Both tables
 are around 7 million rows

 What is the best way to import these into one document?

 document
  entity name=item ... 
     entity name=item_descriptions ... /entity
   /entity
 /document

this is the right way
 Or

 document
     entity name=item_descriptions rootEntity=false 
     entity name=item ... /entity
   /entity
 /document

 Or is there an alternative way? Maybe using the second way with a
 CachedSqlEntityProcessor for the item entity?
I don't think CachedSqlEntityProcessor helps here.

 Any thoughts are greatly appreciated. Thanks!
 --
 View this message in context: 
 http://n3.nabble.com/DIH-best-pratices-question-tp677568p677568.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: expungeDeletes on commit in Dataimport

2010-03-27 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Thu, Mar 25, 2010 at 10:14 PM, Ruben Chadien
ruben.chad...@aspiro.com wrote:
 Hi

 I know this has been discussed before, but is there any way do 
 expungeDeletes=true when the DataImportHandler does the commit.
 expungeDeletes= true is not used does not mean that the doc does not
get deleted.deleteDocByQuery does not do a commit. if you wish to
commit you should do it explicitly
 I am using the deleteDocByQuery in a Transformer when doing a delta-import 
 and as discussed before the documents are not deleted until restart.

 Also, how do i know in a Transformer if its running a Delta or Full Import , 
 i tries looking at Context. currentProcess() but that gives me FULL_DUMP 
 when doing a delta import...?
the variable ${dataimporter.request.command} tells you which command
is being run

 Thanks!
 Ruben Chadien



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: ReplicationHandler reports incorrect replication failures

2010-03-27 Thread Noble Paul നോബിള്‍ नोब्ळ्

please create a bug

On Fri, Mar 26, 2010 at 7:29 PM, Shawn Smith ssmit...@gmail.com wrote:
 We're using Solr 1.4 Java replication, which seems to be working
 nicely.  While writing production monitors to check that replication
 is healthy, I think we've run into a bug in the status reporting of
 the ../solr/replication?command=details command.  (I know it's
 experimental...)

 Our monitor parses the replication?command=details XML and checks that
 replication lag is reasonable by diffing the indexVersion of the
 master and slave indices to make sure it's within a reasonable time
 range.

 Our monitor also compares the first elements of
 indexReplicatedAtList and replicationFailedAtList lists to see if
 the last replication attempt failed.  This is where we're having a
 problem with the monitor throwing false errors.  It looks like there's
 a bug that causes successful replications to be considered failures.
 The bug is triggered immediately after a slave restarts when the slave
 is already in sync with the master.  Each no-op replication attempt
 after restart is considered a failure until something on the master
 changes and replication has to actually do work.

 From the code, it looks like SnapPuller.successfulInstall starts out
 false on restart.  If the slave starts out in sync with the master,
 then each no-op replication poll leaves successfulInstall set to
 false which makes SnapPuller.logReplicationTimeAndConfFiles log the
 poll as a failure.  SnapPuller.successfulInstall stays false until the
 first time replication actually has to do something, at which point it
 gets set to true, and then everything is OK.

 Thanks,
 Shawn




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: XPath Processing Applied to Clob

2010-03-17 Thread Noble Paul നോബിള്‍ नोब्ळ्

keep in mind that the xpath is case-sensitive. paste a sample xml

what is dataField=d.text  it does not seem to refer to anything.
where is the enclosing entity?
did you mean dataField=doc.text.

xpath=//BODY is a supported syntax as long as you are using Solr1.4 or higher




On Thu, Mar 18, 2010 at 3:15 AM, Neil Chaudhuri
nchaudh...@potomacfusion.com wrote:
 Incidentally, I tried adding this:

 datasource name=f type=FieldReaderDataSource /
 document
        entity dataSource=f processor=XPathEntityProcessor 
 dataField=d.text forEach=/MESSAGE
                  field column=body xpath=//BODY/
        /entity
 /document

 But this didn't seem to change anything.

 Any insight is appreciated.

 Thanks.



 From: Neil Chaudhuri
 Sent: Wednesday, March 17, 2010 3:24 PM
 To: solr-user@lucene.apache.org
 Subject: XPath Processing Applied to Clob

 I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
 and the text of a document. This is an Oracle database, and the document is 
 an XML document stored as Oracle's xmltype data type. Since this is nothing 
 more than a fancy CLOB, I am using the ClobTransformer to extract the actual 
 XML. However, I don't want to index/store all the XML but instead just the 
 XML within a set of tags. The XPath itself is trivial, but it seems like the 
 XPathEntityProcessor only works for XML file content rather than the output 
 of a Transformer.

 Here is what I currently have that fails:


 document

        entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
 d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer

            field column=EFFECTIVE_DT name=effectiveDate /

            field column=ARCHIVE_ID name=id /

            field column=TEXT name=text clob=true
            entity name=text processor=XPathEntityProcessor 
 forEach=/MESSAGE url=${doc.text}
                field column=body xpath=//BODY/

            /entity

        /entity

 /document


 Is there an easy way to do this without writing my own custom transformer?

 Thanks.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Is it possible to use ODBC with DIH?

2010-03-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

if you have  a jdbc-odbc bridge driver , it should be fine

On Sun, Mar 7, 2010 at 4:52 AM, JavaGuy84 bbar...@gmail.com wrote:

 Hi,

 I have a ODBC driver with me for MetaMatrix DB(Redhat). I am trying to
 figure out a way to use DIH using the DSN which has been created in my
 machine with that ODBC driver?

 Is it possible to spcify a DSN in DIH and index the DB? if its possible, can
 you please let me know the ODBC URL that I need to enter for Datasource in
 DIH data-config.xml?

 Thanks,
 Barani
 --
 View this message in context: 
 http://old.nabble.com/Is-it-possible-to-use-ODBC-with-DIH--tp27808016p27808016.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: If you could have one feature in Solr...

2010-03-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Mar 5, 2010 at 4:34 AM, Mark Miller markrmil...@gmail.com wrote:
 On 03/04/2010 05:56 PM, Chris Hostetter wrote:

 : The ability to read solr configuration files from the classpath instead
 of
 : solr.solr.home directory.

 Solr has always supported this.

 When SolrResourceLoader.openResourceLoader is asked to open a resource it
 first checks if it's an absolute path -- if it's not then it checks
 relative the conf dir (under whatever the instanceDir is, ie: Solr Home
 in a single core setup), then it checks relative the current working dir
 and if it still can't find it it checks via the current ClassLoader.

 that said: it's not something that a lot of people have ever taken
 advantage of, so it wouldn't suprise me if some features in Solr are
 buggy because they try to open files directly w/o utilizing
 openResourceLoader -- in particular a quick test of the trunk example
 using...
 java -Djetty.class.path=./solr/conf -Dsolr.solr.home=/tmp/new-solr-home
 -jar start.jar

 ...seems to suggest that QueryElevationComponent isn't using openResource
 to look for elevate.xml  (i set solr.solr.home in that line so solr would
 *NOT* attempt to look at ./solr ... it does need some sort of Solr Home,
 but in this case it was a completley empty directory)


 -Hoss



 I've been trying to think of ways to tackle this. I hate getConfigDir - it
 lets anyone just get around the ResourceLoader basically.

 It would be awesome to get rid of it somehow - it would make
 ZooKeeperSolrResourceLoader so much easier to get working correctly across
 the board.
Why not just get rid of it? Components depending on filesystems is a
big headache.

 The main thing I'm hung up on is how to update a file - some code I've seen
 uses getConfigDir to update files eg you get the content of solrconfig, then
 you want to update it and reload the core. Most other things, I think are
 doable without getConfigDir.

 QueryElevationComponent is actually sort of simple to get around - we just
 need to add an exists method that return true/false if the resource exists.
 QEC just uses getConfigDir to a do an exists on the elevate.xml - if its not
 there, it looks in the data dir.

 --
 - Mark

 http://www.lucidimagination.com







-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: replication issue

2010-03-01 Thread Noble Paul നോബിള്‍ नोब्ळ्

The data/index.20100226063400 dir is a temporary dir and isc reated in
the same dir where the index dir is located.

I'm wondering if the symlink is causing the problem. Why don't you set
the data dir as /raid/data instead of /solr/data

On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour
matthieu_lab...@yahoo.com wrote:
 Hi

 I am still having issues with the replication and wonder if things are 
 working properly

 So I have 1 master and 1 slave

 On the slave, I deleted the data/index directory and 
 data/replication.properties file and restarted solr.

 When slave is pulling data from master, I can see that the size of data 
 directory is growing

 r...@slr8:/raid/data# du -sh
 3.7M    .
 r...@slr8:/raid/data# du -sh
 4.7M    .

 and I can see that data/replication.properties  file got created and also a 
 directory data/index.20100226063400

 soon after index.20100226063400 disapears and the size of data/index is back 
 to 12K

 r...@slr8:/raid/data/index# du -sh
 12K    .

 And when I look for the number of documents via the admin interface, I still 
 see 0 documents so I feel something is wrong

 One more thing, I have a symlink for /solr/data --- /raid/data

 Thank you for your help !

 matt










-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Using XSLT with DIH for a URLDataSource

2010-02-28 Thread Noble Paul നോബിള്‍ नोब्ळ्

this is the only one place this should be a problem.'xsl' is not a
very commonly used attribute

On Fri, Feb 26, 2010 at 10:46 AM, Lance Norskog goks...@gmail.com wrote:
 There could be a common 'open an url' utility method. This would help
 make the DIH components consistent.

 2010/2/24 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 you are right. The StreamSource class is not throwing the proper exception

 Do we really have to handle this.?

 On Thu, Feb 25, 2010 at 9:06 AM, Lance Norskog goks...@gmail.com wrote:
 [Taken off the list]

 The problem is that the XSLT code swallows the real exception, and
 does not return it as the deeper exception.  To show the right
 error, the code would open a file name or an URL directly. The problem
 is, the code has to throw an exception on a file or an URL and try the
 other, then decide what to do.

       try {
          URL u = new URL(xslt);
          iStream = u.openStream();
        } catch (MalformedURLException e) {
          iStream = new FileInputStream(new File(xslt));
        }
        TransformerFactory transFact = TransformerFactory.newInstance();
        xslTransformer = transFact.newTransformer(new StreamSource(iStream));


 On Mon, Feb 22, 2010 at 6:24 AM, Roland Villemoes r...@alpha-solutions.dk 
 wrote:
 You're right!

 I was as simple (stupid!) as that,

 Thanks a lot (for your time .. very appreciated)

 Roland

 -Oprindelig meddelelse-
 Fra: noble.p...@gmail.com [mailto:noble.p...@gmail.com] På vegne af Noble 
 Paul ??? ??
 Sendt: 22. februar 2010 14:01
 Til: solr-user@lucene.apache.org
 Emne: Re: Using XSLT with DIH for a URLDataSource

 The xslt file looks fine . is the location of the file correct ?

 On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes 
 r...@alpha-solutions.dk wrote:

 Hi

 (thanks a lot)

 Yes, The full stacktrace is this:

 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter 
 doFullImport
 SEVERE: Full Import failed
 org.apache.solr.handler.dataimport.DataImportHandlerException: Error 
 initializing XSL  Processing Document # 1
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:103)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76)
        at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
        at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
        at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
        at 
 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Thread.java:619)
 Caused by: javax.xml.transform.TransformerConfigurationException: Could 
 not compile stylesheet
        at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:825)
        at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:614)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:98)
        ... 24 more
 22-02-2010 08:37:00

Re: If you could have one feature in Solr...

2010-02-28 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Feb 24, 2010 at 7:18 PM, Patrick Sauts patrick.via...@gmail.com wrote:
 Synchronisation between the slaves to switch the new index at the same time
 after replication.

I shall open as issue for this. And let us figure out how best it should be done
https://issues.apache.org/jira/browse/SOLR-1800

Re: Using XSLT with DIH for a URLDataSource

2010-02-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

you are right. The StreamSource class is not throwing the proper exception

Do we really have to handle this.?

On Thu, Feb 25, 2010 at 9:06 AM, Lance Norskog goks...@gmail.com wrote:
 [Taken off the list]

 The problem is that the XSLT code swallows the real exception, and
 does not return it as the deeper exception.  To show the right
 error, the code would open a file name or an URL directly. The problem
 is, the code has to throw an exception on a file or an URL and try the
 other, then decide what to do.

       try {
          URL u = new URL(xslt);
          iStream = u.openStream();
        } catch (MalformedURLException e) {
          iStream = new FileInputStream(new File(xslt));
        }
        TransformerFactory transFact = TransformerFactory.newInstance();
        xslTransformer = transFact.newTransformer(new StreamSource(iStream));


 On Mon, Feb 22, 2010 at 6:24 AM, Roland Villemoes r...@alpha-solutions.dk 
 wrote:
 You're right!

 I was as simple (stupid!) as that,

 Thanks a lot (for your time .. very appreciated)

 Roland

 -Oprindelig meddelelse-
 Fra: noble.p...@gmail.com [mailto:noble.p...@gmail.com] På vegne af Noble 
 Paul ??? ??
 Sendt: 22. februar 2010 14:01
 Til: solr-user@lucene.apache.org
 Emne: Re: Using XSLT with DIH for a URLDataSource

 The xslt file looks fine . is the location of the file correct ?

 On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes r...@alpha-solutions.dk 
 wrote:

 Hi

 (thanks a lot)

 Yes, The full stacktrace is this:

 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter 
 doFullImport
 SEVERE: Full Import failed
 org.apache.solr.handler.dataimport.DataImportHandlerException: Error 
 initializing XSL  Processing Document # 1
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:103)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76)
        at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
        at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
        at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
        at 
 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Thread.java:619)
 Caused by: javax.xml.transform.TransformerConfigurationException: Could not 
 compile stylesheet
        at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:825)
        at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:614)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:98)
        ... 24 more
 22-02-2010 08:37:00 org.apache.solr.update.DirectUpdateHandler2 rollback


 My import feed (for testing is this):
 ?xml version='1.0' encoding='utf-8'?
 products
 product id='738' rank='10'
 brand id='48'![CDATA[World's Best]]/brandname![CDATA[Kontakt 
 Cream-Special 4 x 10]]/name
 categories primarycategory='17'
    category id='7'
        name![CDATA[Jeans  Bukser]]/name
        category id='17'

Re: error while using the DIH handler

2010-02-23 Thread Noble Paul നോബിള്‍ नोब्ळ्

can you paste the DIH part in your solrconfig.xml ?

On Tue, Feb 23, 2010 at 7:01 PM, Na_D nabam...@zaloni.com wrote:

 yes i did check the location of the data-config.xml

 its in the folder example-DIH/solr/db/conf
 --
 View this message in context: 
 http://old.nabble.com/error-while-using-the-DIH-handler-tp27702772p2770.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Using XSLT with DIH for a URLDataSource

2010-02-22 Thread Noble Paul നോബിള്‍ नोब्ळ्

The xslt file looks fine . is the location of the file correct ?

On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes r...@alpha-solutions.dk 
wrote:

 Hi

 (thanks a lot)

 Yes, The full stacktrace is this:

 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter 
 doFullImport
 SEVERE: Full Import failed
 org.apache.solr.handler.dataimport.DataImportHandlerException: Error 
 initializing XSL  Processing Document # 1
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:103)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76)
        at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
        at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
        at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
        at 
 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Thread.java:619)
 Caused by: javax.xml.transform.TransformerConfigurationException: Could not 
 compile stylesheet
        at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:825)
        at 
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:614)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:98)
        ... 24 more
 22-02-2010 08:37:00 org.apache.solr.update.DirectUpdateHandler2 rollback


 My import feed (for testing is this):
 ?xml version='1.0' encoding='utf-8'?
 products
 product id='738' rank='10'
 brand id='48'![CDATA[World's Best]]/brandname![CDATA[Kontakt 
 Cream-Special 4 x 10]]/name
 categories primarycategory='17'
    category id='7'
        name![CDATA[Jeans  Bukser]]/name
        category id='17'
            name![CDATA[Jeans]]/name
        /category
    /category
    category id='8'
        name![CDATA[Nyheder]]/name
    /category
 /categories
 description![CDATA[4 pakker med 10 stk. glatte kondomer, med reservoir og 
 creme.]]/descriptionprice currency='SEK'310.70/pricesalesprice 
 currency='SEK'233.03/salespricecolor id='227'![CDATA[4 x 10 
 kondomer]]/colorsize 
 id='6'![CDATA[Large]]/sizeproductUrl![CDATA[http://www.website.se/butik/visvare.asp?id=738]]/productUrlimageUrl![CDATA[http://www.website.se/varebilleder/738_intro.jpg]]/imageUrllastmodified11-11-2008
  15:10:31/lastmodified/product
 product id='320' rank='10'
  categories primarycategory='17'
    category id='7'
      name![CDATA[Jeans  Bukser]]/name
      category id='17'
        name![CDATA[Jeans]]/name
      /category
    /category
    category id='8'
      name![CDATA[Nyheder]]/name
    /category
  /categories
  brand id='1'![CDATA[JBS]]/brandname![CDATA[JBS 
 trusser]]/namecategory 
 id='39'![CDATA[Trusser]]/categorydescription![CDATA[Gråmeleret JBS 
 trusser model Classic med gylp.]]/descriptionprice 
 currency='SEK'154.96/pricesalesprice 
 currency='SEK'154.96/salespricecolor 
 id='28'![CDATA[Gråmeleret]]/colorsize

Re: replications issue

2010-02-20 Thread Noble Paul നോബിള്‍ नोब्ळ्

wha is the problem. Is the replication not happening after you do a
commit on the master?
frequent polling is not a problem. frequent commits can slow down the system

On Fri, Feb 19, 2010 at 2:41 PM, giskard gisk...@autistici.org wrote:
 Ciao,

 Uhm after some time a new index in data/index on the slave has been written
 with the ~size of the master index.

 the configure on both master slave is the same one on the solrReplication 
 wiki page
 enable/disable master/slave in a node

 requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
    str name=enable${enable.master:false}/str
    str name=replicateAftercommit/str
    str name=confFilesschema.xml,stopwords.txt/str
  /lst
  lst name=slave
    str name=enable${enable.slave:false}/str
   str name=masterUrlhttp://localhost:8983/solr/replication/str
   str name=pollInterval00:00:60/str
  /lst
 /requestHandler

 When the master is started, pass in -Denable.master=true and in the slave 
 pass in -Denable.slave=true. Alternately , these values can be stored in a 
 solrcore.properties file as follows

 #solrcore.properties in master
 enable.master=true
 enable.slave=false

 Il giorno 19/feb/2010, alle ore 03.43, Otis Gospodnetic ha scritto:

 giskard,

 Is this on the master or on the slave(s)?
 Maybe you can paste your replication handler config for the master and your 
 replication handler config for the slave.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/




 
 From: giskard gisk...@autistici.org
 To: solr-user@lucene.apache.org
 Sent: Thu, February 18, 2010 12:16:37 PM
 Subject: replications issue

 Hi all,

 I've setup solr replication as described in the wiki.

 when i start the replication a directory called index.$numebers is created 
 after a while
 it disappears and a new index.$othernumbers is created

 index/ remains untouched with an empty index.

 any clue?

 thank you in advance,
 Riccardo

 --
 ciao,
 giskard

 --
 ciao,
 giskard







-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: @Field annotation support

2010-02-20 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Feb 19, 2010 at 11:41 PM, Pulkit Singhal
pulkitsing...@gmail.com wrote:
 Ok then, is this the correct class to support the @Field annotation?
 Because I have it on the path but its not working.

yes , it is the right class. But, what is not working?
 org\apache\solr\solr-solrj\1.4.0\solr-solrj-1.4.0.jar/org\apache\solr\client\solrj\beans\Field.class

 2010/2/18 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 solrj jar

 On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal
 pulkitsing...@gmail.com wrote:
 Hello All,

 When I use Maven or Eclipse to try and compile my bean which has the
 @Field annotation as specified in http://wiki.apache.org/solr/Solrj
 page ... the compiler doesn't find any class to support the
 annotation. What jar should we use to bring in this custom Solr
 annotation?




 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: @Field annotation support

2010-02-18 Thread Noble Paul നോബിള്‍ नोब्ळ्

solrj jar

On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal
pulkitsing...@gmail.com wrote:
 Hello All,

 When I use Maven or Eclipse to try and compile my bean which has the
 @Field annotation as specified in http://wiki.apache.org/solr/Solrj
 page ... the compiler doesn't find any class to support the
 annotation. What jar should we use to bring in this custom Solr
 annotation?




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Preventing mass index delete via DataImportHandler full-import

2010-02-16 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Feb 17, 2010 at 8:03 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : I have a small worry though. When I call the full-import functions, can
 : I configure Solr (via the XML files) to make sure there are rows to
 : index before wiping everything? What worries me is if, for some unknown
 : reason, we have an empty database, then the full-import will just wipe
 : the live index and the search will be broken.

 I believe if you set clear=false when doing the full-import, DIH won't
it is clean=false

or use command=import instead of command=full-import
 delete the entire index before it starts.  it probably makes the
 full-import slower (most of the adds wind up being deletes followed by
 adds) but it should prevent you from having an empty index if something
 goes wrong with your DB.

 the big catch is you now have to be responsible for managing deletes
 (using the XmlUpdateRequestHandler) yourself ... this bug looks like it's
 goal is to make this easier to deal with (but i'd not really clear to
 me what deletedPkQuery is ... it doesnt' seem to be documented.

 https://issues.apache.org/jira/browse/SOLR-1168



 -Hoss





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Solr 1.4: Full import FileNotFoundException

2010-02-13 Thread Noble Paul നോബിള്‍ नोब्ळ्

can we confirm that the user does not have multiple DIH configured?

any request for an import, while an import is going on, is rejected

On Sat, Feb 13, 2010 at 11:40 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : concurrent imports are not allowed in DIH, unless u setup multiple DIH 
 instances

 Right, but that's not the issue -- the question is wether attemping
 to do so might be causing index corruption (either because of a bug or
 because of some possibly really odd config we currently know nothing about)


 :  : I have noticed that when I run concurrent full-imports using DIH in Solr
 :  : 1.4, the index ends up getting corrupted. I see the following in the log
 : 
 :  I'm fairly confident that concurrent imports won't work -- but it
 :  shouldn't corrupt your index -- even if the DIH didn't actively check for
 :  this type of situation, the underlying Lucene LockFactory should ensure
 :  that one of the inports wins ... you'll need to tell us what kind of
 :  Filesystem you are using, and show us the relevent settings from your
 :  solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH,
 :  etc...)
 : 
 :  At worst you should get a lock time out exception.
 : 
 :  : But I looked at:
 :  : 
 http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html
 :  :
 :  : and was under the impression that this issue was fixed in Solr 1.4.
 : 
 :  ...right, attempting to run two concurrent imports with DIH should cause
 :  the second one to abort immediatley.
 : 
 : 
 : 
 : 
 :  -Hoss
 : 
 : 
 :
 :
 :
 : --
 : -
 : Noble Paul | Systems Architect| AOL | http://aol.com
 :



 -Hoss





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Solr 1.4: Full import FileNotFoundException

2010-02-12 Thread Noble Paul നോബിള്‍ नोब्ळ्

concurrent imports are not allowed in DIH, unless u setup multiple DIH instances

On Sat, Feb 13, 2010 at 7:05 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : I have noticed that when I run concurrent full-imports using DIH in Solr
 : 1.4, the index ends up getting corrupted. I see the following in the log

 I'm fairly confident that concurrent imports won't work -- but it
 shouldn't corrupt your index -- even if the DIH didn't actively check for
 this type of situation, the underlying Lucene LockFactory should ensure
 that one of the inports wins ... you'll need to tell us what kind of
 Filesystem you are using, and show us the relevent settings from your
 solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH,
 etc...)

 At worst you should get a lock time out exception.

 : But I looked at:
 : 
 http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html
 :
 : and was under the impression that this issue was fixed in Solr 1.4.

 ...right, attempting to run two concurrent imports with DIH should cause
 the second one to abort immediatley.




 -Hoss





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DIH: delta-import not working

2010-02-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

try this

deltaImportQuery=select id, bytes from attachment where application =
 'MYAPP' and id = '${dataimporter.delta.id}'

be aware that the names are case sensitive . if the id comes as 'ID'
this will not work



On Tue, Feb 9, 2010 at 3:15 PM, Jorg Heymans jorg.heym...@gmail.com wrote:
 Hi,

 I am having problems getting the delta-import to work for my schema.
 Following what i have found in the list, jira and the wiki below
 configuration should just work but it doesn't.

 dataConfig
  dataSource name=ora driver=oracle.jdbc.OracleDriver
 url=jdbc:oracle:thin:@. user= password=/
  dataSource name=orablob type=FieldStreamDataSource /
  document name=mydocuments
    entity dataSource=ora name=attachment pk=id query=select id,
 bytes from attachment where application = 'MYAPP'
      deltaImportQuery=select id, bytes from attachment where application =
 'MYAPP' and id = '${dataimporter.attachment.id}'
      deltaQuery=select id from attachment where application = 'MYAPP' and
 modified_on gt; to_date('${dataimporter.attachment.last_index_time}',
 '-mm-dd hh24:mi:ss')
      field column=id name=attachmentId /
      entity dataSource=orablob processor=TikaEntityProcessor
 url=bytes dataField=attachment.bytes
        field column=text name=attachmentContents/
      /entity
    /entity
  /document
 /dataConfig

 The sql generated in the deltaquery is correct, the timestamp is passed
 correctly. When i execute that query manually in the DB it returns the pk of
 the rows that were added. However no documents are added to the index. What
 am i missing here ?? I'm using a build snapshot from 03/02.


 Thanks
 Jorg




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Call URL, simply parse the results using SolrJ

2010-02-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

you can also try

URL urlo = new URL(url);// ensure that the url has wt=javabin in that
NamedListObject namedList = new
JavaBinCodec().unmarshal(urlo.openConnection().getInputStream());
QueryResponse response = new QueryResponse(namedList, null);


On Mon, Feb 8, 2010 at 11:49 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Here's what I did to resolve this:

 XMLResponseParser parser = new XMLResponseParser();
 URL urlo = new URL(url);
 InputStreamReader isr = new
 InputStreamReader(urlo.openConnection().getInputStream());
 NamedListObject namedList = parser.processResponse(isr);
 QueryResponse response = new QueryResponse(namedList, null);

 On Mon, Feb 8, 2010 at 10:03 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 So here's what happens if I pass in a URL with parameters, SolrJ chokes:

 Exception in thread main java.lang.RuntimeException: Invalid base
 url for solrj.  The base URL must not contain parameters:
 http://locahost:8080/solr/main/select?q=videoqt=dismax
        at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:205)
        at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:180)
        at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:152)
        at org.apache.solr.util.QueryTime.main(QueryTime.java:20)


 On Mon, Feb 8, 2010 at 9:32 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 Sorry for the poorly worded title... For SOLR-1761 I want to pass in a
 URL and parse the query response... However it's non-obvious to me how
 to do this using the SolrJ API, hence asking the experts here. :)






-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: How to configure multiple data import types

2010-02-08 Thread Noble Paul നോബിള്‍ नोब्ळ्

are you referring to nested entities?
http://wiki.apache.org/solr/DIHQuickStart#Index_data_from_multiple_tables_into_Solr

On Mon, Feb 8, 2010 at 5:42 PM,  stefan.ma...@bt.com wrote:
 I have got a dataimport request handler configured to index data by selecting 
 data from a DB view

 I now need to index additional data sets from other views so that I can 
 support other search queries

 I defined additional entity .. definitions within the document ..  
 section of my data-config.xml
 But I only seem to pull in data for the 1st entity ..  and not both


 Is there an xsd (or dtd) for
        data-config.xml
        schema.xml
        slrconfig.xml

 As these might help with understanding how to construct usable conf files

 Regards
 Stefan Maric
 BT Innovate  Design | Collaboration Platform - Customer Innovation Solutions




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DataImportHandlerException for custom DIH Transformer

2010-02-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Mon, Feb 8, 2010 at 9:13 AM, Tommy Chheng tommy.chh...@gmail.com wrote:
  I'm having trouble making a custom DIH transformer in solr 1.4.

 I compiled the General TrimTransformer into a jar. (just copy/paste sample
 code from http://wiki.apache.org/solr/DIHCustomTransformer)
 I placed the jar along with the dataimporthandler jar in solr/lib (same
 directory as the jetty jar)

do not keep in solr/lib it wont work. keep it in {solr.home}/lib

 Then I added to my DIH data-config.xml file:
 transformer=DateFormatTransformer, RegexTransformer,
 com.chheng.dih.transformers.TrimTransformer

 Now I get this exception when I try running the import.
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.NoSuchMethodException:
 com.chheng.dih.transformers.TrimTransformer.transformRow(java.util.Map)
        at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.loadTransformers(EntityProcessorWrapper.java:120)

 I noticed the exception lists TrimTransformer.transformRow(java.util.Map)
 but the abstract Transformer class defines a two parameter method:
 transformRow(MapString, Object row, Context context)?


 --
 Tommy Chheng
 Programmer and UC Irvine Graduate Student
 Twitter @tommychheng
 http://tommy.chheng.com





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-02-05 Thread Noble Paul നോബിള്‍ नोब्ळ्

unfortunately, no

On Fri, Feb 5, 2010 at 2:23 PM, Jorg Heymans jorg.heym...@gmail.com wrote:
 dow, thanks for that Paul :-|

 I suppose schema validation for data-config.xml is already in Jira somewhere
 ?

 Jorg

 2010/2/5 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 wrong   datasource name=orablob type=FieldStreamDataSource /
 right     dataSource name=orablob type=FieldStreamDataSource /

 On Thu, Feb 4, 2010 at 9:27 PM, Jorg Heymans jorg.heym...@gmail.com
 wrote:
  Hi,
  I'm having some troubles getting this to work on a snapshot from 3rd feb
  My
  config looks as follows
      dataSource name=ora driver=oracle.jdbc.OracleDriver url=
 /
      datasource name=orablob type=FieldStreamDataSource /
      document name=mydoc
          entity dataSource=ora name=meta query=select id, filename,
  bytes from documents 
              field column=ID name=id /
              field column=FILENAME name=filename /
              entity dataSource=orablob processor=TikaEntityProcessor
  url=bytes dataField=meta.BYTES
                field column=text name=mainDocument/
              /entity
           /entity
       /document
  and i get this stacktrace
  org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
  execute query: bytes Processing Document # 1
          at
 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
          at
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
          at
 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
          at
 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
          at
 
 org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98)
  It seems that whatever is in the url attribute it is trying to execute as
 a
  query. So i thought i put url=select bytes from documents where id =
  ${meta.ID} but then i get a classcastexception.
  Caused by: java.lang.ClassCastException:
  org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1
          at
 
 org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98)
          at
 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:233)
  Any ideas what is wrong with the config ?
  Thanks
  Jorg
  2010/1/27 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  There is no corresponding DataSurce which can be used with
  TikaEntityProcessor which reads from BLOB
  I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737
 
  On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal ns...@columnit.com
 wrote:
   Hi,
  
  
  
   I am fairly new to Solr and would like to use the DIH to pull rich
 text
   files (pdfs, etc) from BLOB fields in my database.
  
  
  
   There was a suggestion made to use the FieldReaderDataSource with the
   recently commited TikaEntityProcessor.  Has anyone accomplished this?
  
   This is my configuration, and the resulting error - I'm not sure if
 I'm
   using the FieldReaderDataSource correctly.  If anyone could shed light
   on whether I am going the right direction or not, it would be
   appreciated.
  
  
  
   ---Data-config.xml:
  
   dataConfig
  
     datasource name=f1 type=FieldReaderDataSource /
  
     dataSource name=orcle driver=oracle.jdbc.driver.OracleDriver
   url=jdbc:oracle:thin:un/p...@host:1521:sid /
  
        document
  
        entity dataSource=orcle name=attach query=select id as
 name,
   attachment from testtable2
  
           entity dataSource=f1 processor=TikaEntityProcessor
   dataField=attach.attachment format=text
  
              field column=text name=NAME /
  
           /entity
  
        /entity
  
     /document
  
   /dataConfig
  
  
  
  
  
   -Debug error:
  
   response
  
   lst name=responseHeader
  
   int name=status0/int
  
   int name=QTime203/int
  
   /lst
  
   lst name=initArgs
  
   lst name=defaults
  
   str name=configtestdb-data-config.xml/str
  
   /lst
  
   /lst
  
   str name=commandfull-import/str
  
   str name=modedebug/str
  
   null name=documents/
  
   lst name=verbose-output
  
   lst name=entity:attach
  
   lst name=document#1
  
   str name=queryselect id as name, attachment from testtable2/str
  
   str name=time-taken0:0:0.32/str
  
   str--- row #1-/str
  
   str name=NAMEjava.math.BigDecimal:2/str
  
   str name=ATTACHMENToracle.sql.BLOB:oracle.sql.b...@1c8e807/str
  
   str-/str
  
   lst name=entity:253433571801723
  
   str name=EXCEPTION
  
   org.apache.solr.handler.dataimport.DataImportHandlerException: No
   dataSource :f1 available for entity :253433571801723 Processing
 Document
   # 1
  
                  at
  
 org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da

Re: DataImportHandler - convertType attribute

2010-02-03 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Feb 3, 2010 at 3:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
One thing I find awkward about convertType is that it is JdbcDataSource
specific, rather than field-specific. Isn't the current implementation far
too broad?
it is feature of JdbcdataSource and no other dataSource offers it. we
offer it because JDBC drivers have mechanism to do type conversion

What do you mean by it is too broad?

Erik

On Feb 3, 2010, at 1:16 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

implicit conversion can cause problem when Transformers are applied.
It is hard for user to guess the type of the field by looking at the
schema.xml. In Solr, String is the most commonly used type. if you
wish to do numeric operations on a field convertType will cause
problems.
If it is explicitly set, user knows why the type got changed.

On Tue, Feb 2, 2010 at 6:38 PM, Alexey Serba ase...@gmail.com wrote:

Hello,

I encountered blob indexing problem and found convertType solution in

FAQhttp://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5

I was wondering why it is not enabled by default and found the
following comment

http://www.lucidimagination.com/search/document/169e6cc87dad5e67/dataimporthandler_and_blobs#169e6cc87dad5e67in
mailing list:

We used to attempt type conversion from the SQL type to the field's
given
type. We
found that it was error prone and switched to using the
ResultSet#getObject
for all columns (making the old behavior a configurable option –
convertType in JdbcDataSource).

Why it is error prone? Is it safe enough to enable convertType for all
jdbc
data sources by default? What are the side effects?

Thanks in advance,
Alex

--
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DataImportHandler - convertType attribute

2010-02-03 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Feb 3, 2010 at 4:16 PM, Erik Hatcher erik.hatc...@gmail.com wrote:

 On Feb 3, 2010, at 5:36 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 On Wed, Feb 3, 2010 at 3:31 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:

 One thing I find awkward about convertType is that it is JdbcDataSource
 specific, rather than field-specific.  Isn't the current implementation
 far
 too broad?

 it is feature of JdbcdataSource and no other dataSource offers it. we
 offer it because JDBC drivers have mechanism to do type conversion

 What do you mean by it is too broad?

 I mean the convertType flag is not field-specific (or at least field
 overridable).  Conversions occur on a per-field basis, but the setting is
 for the entire data source and thus all fields.
Yes. it is true.
First of all this is not very widely used, so fine tuning did not make sense

        Erik





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: java.lang.NullPointerException with MySQL DataImportHandler

2010-02-03 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Thu, Feb 4, 2010 at 10:50 AM, Lance Norskog goks...@gmail.com wrote:
 I just tested this with a DIH that does not use database input.

 If the DataImportHandler JDBC code does not support a schema that has
 optional fields, that is a major weakness. Noble/Shalin, is this true?
The problem is obviously not with DIH. DIH blindly passes on all the
fields it could obtain from the DB. if some field is missing DIH does
not do anything

 On Tue, Feb 2, 2010 at 8:50 AM, Sascha Szott sz...@zib.de wrote:
 Hi,

 since some of the fields used in your DIH configuration aren't mandatory
 (e.g., keywords and tags are defined as nullable in your db table schema),
 add a default value to all optional fields in your schema configuration
 (e.g., default = ). Note, that Solr does not understand the db-related
 concept of null values.

 Solr's log output

 SolrInputDocument[{keywords=keywords(1.0)={Dolce}, name=name(1.0)={Dolce
 amp; Gabbana Damp;G Neckties designer Tie for men 543},
 productID=productID(1.0)={220213}}]

 indicates that there aren't any tags or descriptions stored for the item
 with productId 220213. Since no default value is specified, Solr raises an
 error when creating the index document.

 -Sascha

 Jean-Michel Philippon-Nadeau wrote:

 Hi,

 Thanks for the reply.

 On Tue, 2010-02-02 at 16:57 +0100, Sascha Szott wrote:

 * the output of MySQL's describe command for all tables/views referenced
 in your DIH configuration

 mysql  describe products;

 ++--+--+-+-++
 | Field          | Type             | Null | Key | Default | Extra
 |

 ++--+--+-+-++
 | productID      | int(10) unsigned | NO   | PRI | NULL    |
 auto_increment |
 | skuCode        | varchar(320)     | YES  | MUL | NULL    |
 |
 | upcCode        | varchar(320)     | YES  | MUL | NULL    |
 |
 | name           | varchar(320)     | NO   |     | NULL    |
 |
 | description    | text             | NO   |     | NULL    |
 |
 | keywords       | text             | YES  |     | NULL    |
 |
 | disqusThreadID | varchar(50)      | NO   |     | NULL    |
 |
 | tags           | text             | YES  |     | NULL    |
 |
 | createdOn      | int(10) unsigned | NO   |     | NULL    |
 |
 | lastUpdated    | int(10) unsigned | NO   |     | NULL    |
 |
 | imageURL       | varchar(320)     | YES  |     | NULL    |
 |
 | inStock        | tinyint(1)       | YES  | MUL | 1       |
 |
 | active         | tinyint(1)       | YES  |     | 1       |
 |

 ++--+--+-+-++
 13 rows in set (0.00 sec)

 mysql  describe product_soldby_vendor;
 +-+--+--+-+-+---+
 | Field           | Type             | Null | Key | Default | Extra |
 +-+--+--+-+-+---+
 | productID       | int(10) unsigned | NO   | MUL | NULL    |       |
 | productVendorID | int(10) unsigned | NO   | MUL | NULL    |       |
 | price           | double           | NO   |     | NULL    |       |
 | currency        | varchar(5)       | NO   |     | NULL    |       |
 | buyURL          | varchar(320)     | NO   |     | NULL    |       |
 +-+--+--+-+-+---+
 5 rows in set (0.00 sec)

 mysql  describe products_vendors_subcategories;

 ++--+--+-+-++
 | Field                      | Type             | Null | Key | Default |
 Extra          |

 ++--+--+-+-++
 | productVendorSubcategoryID | int(10) unsigned | NO   | PRI | NULL    |
 auto_increment |
 | productVendorCategoryID    | int(10) unsigned | NO   |     | NULL    |
 |
 | labelEnglish               | varchar(320)     | NO   |     | NULL    |
 |
 | labelFrench                | varchar(320)     | NO   |     | NULL    |
 |

 ++--+--+-+-++
 4 rows in set (0.00 sec)

 mysql  describe products_vendors_categories;

 +-+--+--+-+-++
 | Field                   | Type             | Null | Key | Default |
 Extra          |

 +-+--+--+-+-++
 | productVendorCategoryID | int(10) unsigned | NO   | PRI | NULL    |
 auto_increment |
 | labelEnglish            | varchar(320)     | NO   |     | NULL    |
 |
 | labelFrench             | varchar(320)     | NO   |     | NULL    |
 |

 +-+--+--+-+-++
 3 rows in set (0.00 sec)

 mysql  describe product_vendor_in_subcategory;
 +---+--+--+-+-+---+
 | Field             | Type             | Null | Key | Default |

Re: DataImportHandler delta-import confusion

2010-02-02 Thread Noble Paul നോബിള്‍ नोब्ळ्

try
deltaImportQuery=select [bunch of stuff]
   WHERE m.moment_id = '${dataimporter.delta.moment_id}'

The key has to be same and in the same case

On Tue, Feb 2, 2010 at 1:45 AM, Jon Drukman jdruk...@gmail.com wrote:
 First, let me just say that DataImportHandler is fantastic. It got my old
 mysql-php-xml index rebuild process down from 30 hours to 6 minutes.

 I'm trying to use the delta-import functionality now but failing miserably.

 Here's my entity tag:  (some SELECT statements reduced to increase
 readability)

 entity name=moment
  query=select ...

  deltaQuery=select moment_id from moments where date_modified 
 '${dataimporter.last_index_time}'

  deltaImportQuery=select [bunch of stuff]
    WHERE m.moment_id = '${dataimporter.delta.MOMENTID}'

  pk=MOMENTID

  transformer=TemplateTransformer

 When I look at the MySQL query log I see the date modified query running
 fine and returning 3 rows.  The deltaImportQuery, however, does not have the
 proper primary key in the where clause.  It's just blank.  I also tried
 changing it to ${moment.MOMENTID}.

 I don't really get the relation between the pk field and the
 ${dataimport.delta.whatever} stuff.

 Help please!
 -jsd-






-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DataImportHandler delta-import confusion

2010-02-02 Thread Noble Paul നോബിള്‍ नोब्ळ्

Please do not hijack a thread. http://people.apache.org/~hossman/#threadhijack

On Tue, Feb 2, 2010 at 11:32 PM, Leann Pereira
le...@1sourcestaffing.com wrote:
 Hi Paul,

 Can you take me off this distribution list?

 Thanks,

 Leann

 
 From: noble.p...@gmail.com [noble.p...@gmail.com] On Behalf Of Noble Paul 
 നോബിള്‍  नोब्ळ् [noble.p...@corp.aol.com]
 Sent: Tuesday, February 02, 2010 2:12 AM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImportHandler delta-import confusion

 try
 deltaImportQuery=select [bunch of stuff]
   WHERE m.moment_id = '${dataimporter.delta.moment_id}'

 The key has to be same and in the same case

 On Tue, Feb 2, 2010 at 1:45 AM, Jon Drukman jdruk...@gmail.com wrote:
 First, let me just say that DataImportHandler is fantastic. It got my old
 mysql-php-xml index rebuild process down from 30 hours to 6 minutes.

 I'm trying to use the delta-import functionality now but failing miserably.

 Here's my entity tag:  (some SELECT statements reduced to increase
 readability)

 entity name=moment
  query=select ...

  deltaQuery=select moment_id from moments where date_modified 
 '${dataimporter.last_index_time}'

  deltaImportQuery=select [bunch of stuff]
    WHERE m.moment_id = '${dataimporter.delta.MOMENTID}'

  pk=MOMENTID

  transformer=TemplateTransformer

 When I look at the MySQL query log I see the date modified query running
 fine and returning 3 rows.  The deltaImportQuery, however, does not have the
 proper primary key in the where clause.  It's just blank.  I also tried
 changing it to ${moment.MOMENTID}.

 I don't really get the relation between the pk field and the
 ${dataimport.delta.whatever} stuff.

 Help please!
 -jsd-






 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DataImportHandler - convertType attribute

2010-02-02 Thread Noble Paul നോബിള്‍ नोब्ळ्

implicit conversion can cause problem when Transformers are applied.
It is hard for user to guess the type of the field by looking at the
schema.xml. In Solr, String is the most commonly used type. if you
wish to do numeric operations on a field convertType will cause
problems.
 If it is explicitly set, user knows why the type got changed.

On Tue, Feb 2, 2010 at 6:38 PM, Alexey Serba ase...@gmail.com wrote:
 Hello,

 I encountered blob indexing problem and found convertType solution in
 FAQhttp://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5

 I was wondering why it is not enabled by default and found the
 following comment
 http://www.lucidimagination.com/search/document/169e6cc87dad5e67/dataimporthandler_and_blobs#169e6cc87dad5e67in
 mailing list:

 We used to attempt type conversion from the SQL type to the field's given
 type. We
 found that it was error prone and switched to using the ResultSet#getObject
 for all columns (making the old behavior a configurable option –
 convertType in JdbcDataSource).

 Why it is error prone? Is it safe enough to enable convertType for all jdbc
 data sources by default? What are the side effects?

 Thanks in advance,
 Alex




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DataImportHandler problem - reading XML from a file

2010-01-31 Thread Noble Paul നോബിള്‍ नोब्ळ्

It clear that the xpaths provided won't fetch anything. because there
is no data in those paths. what do you really wish to be indexed ?



On Sun, Jan 31, 2010 at 10:30 AM, Lance Norskog goks...@gmail.com wrote:
 This DataImportHandler script does not find any documents in this HTML
 file. The DIH definitely opens the file, but the either the
 xpathprocessor gets no data or it does not recognize the xpaths
 described. Any hints? (I'm using Solr 1.5-dev, sometime recent.)

 Thanks!

 Lance


 xhtml-data-config.xml:

 dataConfig
        dataSource type=FileDataSource encoding=UTF-8 /
        document
        entity name=xhtml
                        forEach=/html/head | /html/body
                        processor=XPathEntityProcessor pk=id
                        transformer=TemplateTransformer
                        url=/cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html
                        
                field column=head_s xpath=/html/head/
                field column=body_s xpath=/html/body/
        /entity
        /document
 /dataConfig

 Sample data file: cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html

 ?xml version=1.0 encoding=UTF-8 ?
 html 
  head 
    meta content=en-US name=DC.language /
  /head
  body
    div id=header
     a href=ch05-tokenizers-filters-Solr1.4.htmlFirst/a
        span class=nolinkPrevious/span
        a href=ch05-tokenizers-filters-Solr1.41.htmlNext/a
        a href=ch05-tokenizers-filters-Solr1.460.htmlLast/a
    /div
    div dir=ltr id=content style=background-color:transparent
      h1 id=toc0
        span class=SectionNumber1/span
        a id=RefHeading36402771/a
        a id=bkmRefHeading36402771/a
        Understanding Analyzers, Tokenizers, and Filters
      /h1
    /div
  /body
 /html



 --
 Lance Norskog
 goks...@gmail.com




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: replication setup

2010-01-31 Thread Noble Paul നോബിള്‍ नोब्ळ्

it is always recommended to paste your actual configuration and
startup commands, instead of saying as described in wiki .

On Tue, Jan 26, 2010 at 9:52 PM, Matthieu Labour
matthieu_lab...@yahoo.com wrote:
 Hi



 I have set up replication following the wiki

 I downloaded the latest apache-solr-1.4 release and exploded it in 2 
 different directories
 I modified both solrconfig.xml for the master  the slave as described on the 
 wiki page
 In both sirectory, I started solr from the example directory
 example on the master:
 java -Dsolr.solr.home=multicore -Djetty.host=0.0.0.0 -Djetty.port=8983 
 -DSTOP.PORT=8078 -DSTOP.KEY=stop.now -jar start.jar

 and on the slave
 java -Dsolr.solr.home=multicore -Djetty.host=0.0.0.0 -Djetty.port=8982 
 -DSTOP.PORT=8077 -DSTOP.KEY=stop.now -jar start.jar



 I can see core0 and core 1 when I open the solr url
 However, I don't see a replication link and
 the following url  solr url / replication returns a 404 error



 I must be doing something wrong. I would appreciate any help !



 thanks a lot

 matt








-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: loading an updateProcessorChain with multicore in trunk

2010-01-29 Thread Noble Paul നോബിള്‍ नोब्ळ्

I guess . default=true should not be necessary if there is only one
updateRequestProcessorChain specified . Open an issue

On Fri, Jan 29, 2010 at 6:06 PM, Marc Sturlese marc.sturl...@gmail.com wrote:

 I am testing trunk and have seen a different behaviour when loading
 updateProcessors wich I don't know if it's normal (at least with multicore)
 Before I use to use an updateProcessorChain this way:

 requestHandler name=/update class=solr.XmlUpdateRequestHandler
    lst name=defaults
       str name=update.processormyChain/str
    /lst
 /requestHandler
 updateRequestProcessorChain name=myChain
    processor
 class=org.apache.solr.update.processor.CustomUpdateProcessorFactory /
    processor
 class=org.apache.solr.update.processor.LogUpdateProcessorFactory /
    processor
 class=org.apache.solr.update.processor.RunUpdateProcessorFactory /
 /updateRequestProcessorChain

 It does not work in current trunk. I have debuged the code and I have seen
 now UpdateProcessorChain is loaded via:

  public T T initPlugins(ListPluginInfo pluginInfos, MapString, T
 registry, ClassT type, String defClassName) {
    T def = null;
    for (PluginInfo info : pluginInfos) {
      T o = createInitInstance(info,type, type.getSimpleName(),
 defClassName);
      registry.put(info.name, o);
      if(info.isDefault()){
            def = o;
      }
    }
    return def;
  }

 As I don't have default=true in the configuration, my custom
 processorChain is not used. Setting default=true makes it work:

 requestHandler name=/update class=solr.XmlUpdateRequestHandler
    lst name=defaults
       str name=update.processormyChain/str
    /lst
 /requestHandler
 updateRequestProcessorChain name=myChain default=true
    processor
 class=org.apache.solr.update.processor.CustomUpdateProcessorFactory /
    processor
 class=org.apache.solr.update.processor.LogUpdateProcessorFactory /
    processor
 class=org.apache.solr.update.processor.RunUpdateProcessorFactory /
 /updateRequestProcessorChain

 As far as I understand, if you specify the chain you want to use in here:
 requestHandler name=/update class=solr.XmlUpdateRequestHandler
    lst name=defaults
       str name=update.processormyChain/str
    /lst
 /requestHandler

 Shouldn't be necesary to set it as default.
 Is it going to be kept this way?

 Thanks in advance



 --
 View this message in context: 
 http://old.nabble.com/loading-an-updateProcessorChain-with-multicore-in-trunk-tp27371375p27371375.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Help using CachedSqlEntityProcessor

2010-01-28 Thread Noble Paul നോബിള്‍ नोब्ळ्

Thanks for pointing this out. The wiki had a problem fro a while and
we could not update the documentation. It is updated here
http://wiki.apache.org/solr/DataImportHandler#cached

On Thu, Jan 28, 2010 at 6:31 PM, KirstyS kirst...@gmail.com wrote:

 Thanks,
 I saw that mistake and I have it working now!!! thank you for all your help.
 Out of interest, is the cacheKey and cacheLookup documented anywhere?



 Rolf Johansson-2 wrote:

 It's always a good thing if you can check the debug log (fx catalina.out)
 or
 run with debug/verbose to check how Solr runs trough the dataconfig.

 You've also made a typo in the pk and query, LinkedCatAricleId is
 missing
 a t.

 /Rolf

 Den 2010-01-28 11.20, skrev KirstyS kirst...@gmail.com:


 Okay, I changed my entity to look like this (have included my main entity
 as
 well):
  document name=ArticleDocument
     entity name=article pk=CmsArticleId
             query=Select * from vArticleSummaryDetail_SolrSearch
 (nolock)
 WHERE ArticleStatusId = 1

       entity name=LinkedCategory pk=LinkedCatAricleId
               query=SELECT LinkedCategoryBC, CmsArticleId as
 LinkedCatAricleId
                      FROM LinkedCategoryBreadCrumb_SolrSearch (nolock)
                      processor=CachedSqlEntityProcessor
                      cacheKey=LinkedCatArticleId
                      cacheLookup=article.CmsArticleId
       /entity
 /entity
 /document

 BUT now the index is taking SO much longer Have I missed
 any
 other configurationg changes? Do I need to add anything into the
 solfconfig.xml file?  Do I have my syntax completely wrong?

 Any help is greatly appreciated!!!




 --
 View this message in context: 
 http://old.nabble.com/Help-using-CachedSqlEntityProcessor-tp27337635p27355501.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Solr 1.4 Replication index directories

2010-01-28 Thread Noble Paul നോബിള്‍ नोब्ळ्

the index.20100127044500/ is a temp directory should have got cleaned
up if there was no problem in replication (see the logs if there was a
problem) . if there is a problem the temp directory will be used as
the new index directory and the old one will no more be used.at any
given point only one directory is used for the index. check the
replication dashboard to check which one it is. Everything else can be
deleted.

On Fri, Jan 29, 2010 at 6:03 AM, mark angelillo li...@snooth.com wrote:
Thanks, Otis. Responses inline.

Hi,

We're using the new replication and it's working pretty well. There's one
detail
I'd like to get some more information about.

As the replication works, it creates versions of the index in the data
directory. Originally we had index/, but now there are dated versions
such as
index.20100127044500/, which are the replicated versions.

Each copy is sized in the vicinity of 65G. With our current hard drive
it's fine
to have two around, but 3 gets a little dicey. Sometimes we're finding
that the
replication doesn't always clean up after itself. I would like to
understand
this better, or to not have this happen. It could be a configuration
issue.

Some more specific questions:

- Is it safe to remove the index/ directory (that doesn't have the date
on it)?
I think I tried this once and the whole thing broke, however maybe
something
else was wrong at the time.

No, that's the real, live index, you don't want to remove that one.

Yeah... I tried it once and remember things breaking.

However nothing in this directory has been modified for over a week (since
the last replication initialization). And I'm still sitting on 130GB of data
for what is only 65GB on the master

- Is there a way to know which one is the current one? (I'm looking at
the file
index.properties, and it seems to be correct, but sometimes there's a
newer
version in the directory, which later is removed)

I think the index one is always current, no? If not, I imagine the
admin replication page will tell you, or even the Statistics page.
e.g.
reader :
SolrIndexReader{this=46a55e,r=readonlysegmentrea...@46a55e,segments=1}
readerDir :
org.apache.lucene.store.NIOFSDirectory@/mnt/solrhome/cores/foo/data/index

reader :
SolrIndexReader{this=5c3aef1,r=readonlydirectoryrea...@5c3aef1,refCnt=1,segments=9}
readerDir :
org.apache.lucene.store.NIOFSDirectory@/home/solr/solr_1.4/solr/data/index.20100127044500

- Could it be that the index does not finish replicating in the poll
interval I
give it? What happens if, say there's a poll interval X and replicating
the
index happens to take longer than X sometimes. (Our current poll interval
is 45
minutes, and every time I'm watching it it completes in time.)

you can keep a very small pollInterval and it is OK. if a replication
is going on no new replication will be initiated till the old one
completes

I think only 1 replication will/should be happening at a time.

Whew, that's comforting.

--
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Fastest way to use solrj

2010-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्

how many fields are there in each doc? the binary format just reduces
overhead. it does not touch/compress the payload

2010/1/27 Tim Terlegård tim.terleg...@gmail.com:
 I have 3 millon documents, each having 5000 chars. The xml file is
 about 15GB. The binary file is also about 15GB.

 I was a bit surprised about this. It doesn't bother me much though. At
 least it performs better.

 /Tim

 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 if you write only a few docs you may not observe much difference in
 size. if you write large no:of docs you may observe a big difference.

 2010/1/27 Tim Terlegård tim.terleg...@gmail.com:
 I got the binary format to work perfectly now. Performance is better
 than with xml. Thanks!

 Although, it doesn't look like a binary file is smaller in size than
 an xml file?

 /Tim

 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 2010/1/21 Tim Terlegård tim.terleg...@gmail.com:
 Yes, it worked! Thank you very much. But do I need to use curl or can
 I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't
 use BinaryWriter then I don't know how to do this.
 if your data is serialized using JavaBinUpdateRequestCodec, you may
 POST it using curl.
 If you are writing directly , use CommonsHttpSolrServer

 /Tim

 2010/1/20 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 2010/1/20 Tim Terlegård tim.terleg...@gmail.com:
 BinaryRequestWriter does not read from a file and post it

 Is there any other way or is this use case not supported? I tried 
 this:

 $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin
 $ curl host/solr/update -F stream.body=' commit /'

 Solr did read the file, because solr complained when the file wasn't
 in the format the JavaBinUpdateRequestCodec expected. But no data is
 added to the index for some reason.

 how did you create the file /tmp/data.bin ? what is the format?

 I wrote this in the first email. It's in the javabin format (I think).
 I did like this (groovy code):

   fieldId = new NamedList()
   fieldId.add(name, id)
   fieldId.add(val, 9-0)
   fieldId.add(boost, null)
   fieldText = new NamedList()
   fieldText.add(name, text)
   fieldText.add(val, Some text)
   fieldText.add(boost, null)
   fieldNull = new NamedList()
   fieldNull.add(boost, null)
   doc = [fieldNull, fieldId, fieldText]
   docs = [doc]
   root = new NamedList()
   root.add(docs, docs)
   fos = new FileOutputStream(data.bin)
   new JavaBinCodec().marshal(root, fos)

 /Tim

 JavaBin is a format.
 use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest
 updateRequest, OutputStream os)

 The output of this can be posted to solr and it should work



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Fastest way to use solrj

2010-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्

The binary format just reduces overhead. in your case , all the data
is in the big text field which is not compressed. But overall, the
parsing is a lot faster for the binary format. So you see a perf boost

2010/1/27 Tim Terlegård tim.terleg...@gmail.com:
 I have 6 fields. The text field is the biggest, it contains almost all
 of the 5000 chars.

 /Tim

 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 how many fields are there in each doc? the binary format just reduces
 overhead. it does not touch/compress the payload

 2010/1/27 Tim Terlegård tim.terleg...@gmail.com:
 I have 3 millon documents, each having 5000 chars. The xml file is
 about 15GB. The binary file is also about 15GB.

 I was a bit surprised about this. It doesn't bother me much though. At
 least it performs better.

 /Tim

 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 if you write only a few docs you may not observe much difference in
 size. if you write large no:of docs you may observe a big difference.

 2010/1/27 Tim Terlegård tim.terleg...@gmail.com:
 I got the binary format to work perfectly now. Performance is better
 than with xml. Thanks!

 Although, it doesn't look like a binary file is smaller in size than
 an xml file?

 /Tim

 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 2010/1/21 Tim Terlegård tim.terleg...@gmail.com:
 Yes, it worked! Thank you very much. But do I need to use curl or can
 I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't
 use BinaryWriter then I don't know how to do this.
 if your data is serialized using JavaBinUpdateRequestCodec, you may
 POST it using curl.
 If you are writing directly , use CommonsHttpSolrServer

 /Tim

 2010/1/20 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 2010/1/20 Tim Terlegård tim.terleg...@gmail.com:
 BinaryRequestWriter does not read from a file and post it

 Is there any other way or is this use case not supported? I tried 
 this:

 $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin
 $ curl host/solr/update -F stream.body=' commit /'

 Solr did read the file, because solr complained when the file wasn't
 in the format the JavaBinUpdateRequestCodec expected. But no data is
 added to the index for some reason.

 how did you create the file /tmp/data.bin ? what is the format?

 I wrote this in the first email. It's in the javabin format (I think).
 I did like this (groovy code):

   fieldId = new NamedList()
   fieldId.add(name, id)
   fieldId.add(val, 9-0)
   fieldId.add(boost, null)
   fieldText = new NamedList()
   fieldText.add(name, text)
   fieldText.add(val, Some text)
   fieldText.add(boost, null)
   fieldNull = new NamedList()
   fieldNull.add(boost, null)
   doc = [fieldNull, fieldId, fieldText]
   docs = [doc]
   root = new NamedList()
   root.add(docs, docs)
   fos = new FileOutputStream(data.bin)
   new JavaBinCodec().marshal(root, fos)

 /Tim

 JavaBin is a format.
 use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest
 updateRequest, OutputStream os)

 The output of this can be posted to solr and it should work



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Help using CachedSqlEntityProcessor

2010-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्

cacheKey and cacheLookup are required attributes .

On Thu, Jan 28, 2010 at 12:51 AM, KirstyS kirst...@gmail.com wrote:

 Thanks. I am on 1.4..so maybe that is the problem.
 Will try when I get back to work tomorrow.
 Thanks


 Rolf Johansson-2 wrote:

 I recently had issues with CachedSqlEntityProcessor too, figuring out how
 to
 use the syntax. After a while, I managed to get it working with cacheKey
 and
 cacheLookup. I think this is 1.4 specific though.

 It seems you have double WHERE clauses, one in the query and one in the
 where attribute.

 Try using cacheKey and cacheLookup instead in something like this:

 entity name=LinkedCategory pk=LinkedCatArticleId
         query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId
                FROM LinkedCategoryBreadCrumb_SolrSearch (nolock)
         processor=CachedSqlEntityProcessor
         cacheKey=LINKEDCATARTICLEID
         cacheLookup=article.CMSARTICLEID
         deltaQuery=SELECT LinkedCategoryBC
                     FROM LinkedCategoryBreadCrumb_SolrSearch (nolock)
                     WHERE convert(varchar(50), LastUpdateDate) 
                     '${dataimporter.article.last_index_time}'
                     OR convert(varchar(50), PublishDate) 
                     '${dataimporter.article.last_index_time}'
         parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch
                          (nolock)
     field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/
 /entity

 /Rolf


 Den 2010-01-27 12.36, skrev KirstyS kirst...@gmail.com:


 Hi, I have looked on the wiki. Using the CachedSqlEntityProcessor looks
 like
 it was simple. But I am getting no speed benefit and am not sure if I
 have
 even got the syntax correct.
 I have a main root entity called 'article'.

 And then I have a number of sub entities. One such entity is as such :

     entity name=LinkedCategory pk=LinkedCatAricleId
               query=SELECT LinkedCategoryBC, CmsArticleId as
 LinkedCatAricleId
                      FROM LinkedCategoryBreadCrumb_SolrSearch (nolock)
                      WHERE convert(varchar(50), CmsArticleId) =
 convert(varchar(50), '${article.CmsArticleId}') 
                 processor=CachedSqlEntityProcessor
                 WHERE=LinkedCatArticleId = article.CmsArticleId
                 deltaQuery=SELECT LinkedCategoryBC
                             FROM LinkedCategoryBreadCrumb_SolrSearch
 (nolock)
                             WHERE convert(varchar(50), CmsArticleId) =
 convert(varchar(50), '${article.CmsArticleId}')
                             AND (convert(varchar(50), LastUpdateDate) 
 '${dataimporter.article.last_index_time}'
                             OR   convert(varchar(50), PublishDate) 
 '${dataimporter.article.last_index_time}')
                 parentDeltaQuery=SELECT * from
 vArticleSummaryDetail_SolrSearch (nolock)
                                  WHERE convert(varchar(50), CmsArticleId)
 =
 convert(varchar(50), '${article.CmsArticleId}')
         field column=LinkedCategoryBC
 name=LinkedCategoryBreadCrumb/
       /entity


 As you can see I have added (for the main query - not worrying about the
 delta queries yet!!) the processor and the 'where' but not sure if it's
 correct.
 Can anyone point me in the right direction???
 Thanks
 Kirsty




 --
 View this message in context: 
 http://old.nabble.com/Help-using-CachedSqlEntityProcessor-tp27337635p27345412.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-01-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

There is no corresponding DataSurce which can be used with
TikaEntityProcessor which reads from BLOB
I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737

On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal ns...@columnit.com wrote:
 Hi,



 I am fairly new to Solr and would like to use the DIH to pull rich text
 files (pdfs, etc) from BLOB fields in my database.



 There was a suggestion made to use the FieldReaderDataSource with the
 recently commited TikaEntityProcessor.  Has anyone accomplished this?

 This is my configuration, and the resulting error - I'm not sure if I'm
 using the FieldReaderDataSource correctly.  If anyone could shed light
 on whether I am going the right direction or not, it would be
 appreciated.



 ---Data-config.xml:

 dataConfig

   datasource name=f1 type=FieldReaderDataSource /

   dataSource name=orcle driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:un/p...@host:1521:sid /

      document

      entity dataSource=orcle name=attach query=select id as name,
 attachment from testtable2

         entity dataSource=f1 processor=TikaEntityProcessor
 dataField=attach.attachment format=text

            field column=text name=NAME /

         /entity

      /entity

   /document

 /dataConfig





 -Debug error:

 response

 lst name=responseHeader

 int name=status0/int

 int name=QTime203/int

 /lst

 lst name=initArgs

 lst name=defaults

 str name=configtestdb-data-config.xml/str

 /lst

 /lst

 str name=commandfull-import/str

 str name=modedebug/str

 null name=documents/

 lst name=verbose-output

 lst name=entity:attach

 lst name=document#1

 str name=queryselect id as name, attachment from testtable2/str

 str name=time-taken0:0:0.32/str

 str--- row #1-/str

 str name=NAMEjava.math.BigDecimal:2/str

 str name=ATTACHMENToracle.sql.BLOB:oracle.sql.b...@1c8e807/str

 str-/str

 lst name=entity:253433571801723

 str name=EXCEPTION

 org.apache.solr.handler.dataimport.DataImportHandlerException: No
 dataSource :f1 available for entity :253433571801723 Processing Document
 # 1

                at
 org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
 taImporter.java:279)

                at
 org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl
 .java:93)

                at
 org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
 yProcessor.java:97)

                at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
 ProcessorWrapper.java:237)

                at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
 ava:357)

                at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
 ava:383)

                at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
 :242)

                at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
 0)

                at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
 r.java:331)

                at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
 :389)

                at
 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D
 ataImportHandler.java:203)

                at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
 ase.java:131)

                at
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)

                at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
 va:338)

                at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
 ava:241)

                at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
 dler.java:1089)

                at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)

                at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
 16)

                at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)

                at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)

                at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)

                at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
 Collection.java:211)

                at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
 a:114)

                at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)

                at org.mortbay.jetty.Server.handle(Server.java:285)

                at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)

                at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
 ction.java:821)

                at
 org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)

Re: Fastest way to use solrj

2010-01-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

if you write only a few docs you may not observe much difference in
size. if you write large no:of docs you may observe a big difference.

2010/1/27 Tim Terlegård tim.terleg...@gmail.com:
 I got the binary format to work perfectly now. Performance is better
 than with xml. Thanks!

 Although, it doesn't look like a binary file is smaller in size than
 an xml file?

 /Tim

 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 2010/1/21 Tim Terlegård tim.terleg...@gmail.com:
 Yes, it worked! Thank you very much. But do I need to use curl or can
 I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't
 use BinaryWriter then I don't know how to do this.
 if your data is serialized using JavaBinUpdateRequestCodec, you may
 POST it using curl.
 If you are writing directly , use CommonsHttpSolrServer

 /Tim

 2010/1/20 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 2010/1/20 Tim Terlegård tim.terleg...@gmail.com:
 BinaryRequestWriter does not read from a file and post it

 Is there any other way or is this use case not supported? I tried this:

 $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin
 $ curl host/solr/update -F stream.body=' commit /'

 Solr did read the file, because solr complained when the file wasn't
 in the format the JavaBinUpdateRequestCodec expected. But no data is
 added to the index for some reason.

 how did you create the file /tmp/data.bin ? what is the format?

 I wrote this in the first email. It's in the javabin format (I think).
 I did like this (groovy code):

   fieldId = new NamedList()
   fieldId.add(name, id)
   fieldId.add(val, 9-0)
   fieldId.add(boost, null)
   fieldText = new NamedList()
   fieldText.add(name, text)
   fieldText.add(val, Some text)
   fieldText.add(boost, null)
   fieldNull = new NamedList()
   fieldNull.add(boost, null)
   doc = [fieldNull, fieldId, fieldText]
   docs = [doc]
   root = new NamedList()
   root.add(docs, docs)
   fos = new FileOutputStream(data.bin)
   new JavaBinCodec().marshal(root, fos)

 /Tim

 JavaBin is a format.
 use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest
 updateRequest, OutputStream os)

 The output of this can be posted to solr and it should work



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Replication Handler Severe Error: Unable to move index file

2010-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Jan 22, 2010 at 4:24 AM, Trey solrt...@gmail.com wrote:
 Unfortunately, when I went back to look at the logs this morning, the log
 file had been blown away... that puts a major damper on my debugging
 capabilities - so sorry about that.  As a double whammy, we optimize
 nightly, so the old index files have completely changed at this point.

 I do not remember seeing an exception / stack trace in the logs associated
 with the SEVERE *Unable to move file* entry, but we were grepping the
 logs, so if it was outputted onto another line it could have possibly been
 there.  I wouldn't really expect to see anything based upon the code in
 SnapPuller.java:

 /**
   * Copy a file by the File#renameTo() method. If it fails, it is
 considered a failure
   * p/
   * Todo may be we should try a simple copy if it fails
   */
  private boolean copyAFile(File tmpIdxDir, File indexDir, String fname,
 ListString copiedfiles) {
    File indexFileInTmpDir = new File(tmpIdxDir, fname);
    File indexFileInIndex = new File(indexDir, fname);
    boolean success = indexFileInTmpDir.renameTo(indexFileInIndex);
    if (!success) {
      LOG.error(Unable to move index file from:  + indexFileInTmpDir
              +  to:  + indexFileInIndex);
      for (String f : copiedfiles) {
        File indexFile = new File(indexDir, f);
        if (indexFile.exists())
          indexFile.delete();
      }
      delTree(tmpIdxDir);
      return false;
    }
    return true;
  }

 In terms of whether this is an off case: this is the first occurrence of
 this I have seen in the logs.  We tried to replicate the conditions under
 which the exception occurred, but were unable.  I'll send along some more
 useful info if this happens again.

 In terms of the behavior we saw: It appears that a replication occurred and
 the Unable to move file error occurred.  As a result, it looks like the
 ENTIRE index was subsequently replicated again into a temporary directory
 (several times, over and over).

 The end result was that we had multiple full copies of the index in
 temporary index folders on the slave, and the original still couldn't be
 updated (the move to ./index wouldn't work).  Does Solr ever hold files open
 in a manner that would prevent a file in the index directory from being
 overridden?

There is a TODO which says manual it try to copy if move (renameTo)
fails. We never did it because we never observed renameTo failing.


 2010/1/21 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 is it a one off case? do you observerve this frequently?

 On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
  It's hard to tell without poking around, but one of the first things I'd
 do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm
 - does this file/dir really exist?  Or, rather, did it exist when the error
 happened.
 
  I'm not looking at the source code now, but is that really the only error
 you got?  No exception stack trace?
 
   Otis
  --
  Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
 
 
  - Original Message 
  From: Trey solrt...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Wed, January 20, 2010 11:54:43 PM
  Subject: Replication Handler Severe Error: Unable to move index file
 
  Does anyone know what would cause the following error?:
 
  10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile
 
       SEVERE: *Unable to move index file* from:
  /home/solr/cores/core8/index.20100119103919/_6qv.fnm to:
  /home/solr/cores/core8/index/_6qv.fnm
  This occurred a few days back and we noticed that several full copies of
 the
  index were subsequently pulled from the master to the slave, effectively
  evicting our live index from RAM (the linux os cache), and killing our
 query
  performance due to disk io contention.
 
  Has anyone experienced this behavior recently?  I found an old thread
 about
  this error from early 2009, but it looks like it was patched almost a
 year
  ago:
 
 http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html
 
 
  Additional Relevant information:
  -We are using the Solr 1.4 official release + a field collapsing patch
 from
  mid December (which I believe should only affect query side, not
 indexing /
  replication).
  -Our Replication PollInterval for slaves checking the master is very
 small
  (15 seconds)
  -We have a multi-box distributed search with each box possessing
 multiple
  cores
  -We issue a manual (rolling) optimize across the cores on the master
 once a
  day (occurred ~ 1-2 hours before the above timeline)
  -maxWarmingSearchers is set to 1.
 
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Replication Handler Severe Error: Unable to move index file

2010-01-21 Thread Noble Paul നോബിള്‍ नोब्ळ्

is it a one off case? do you observerve this frequently?

On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 It's hard to tell without poking around, but one of the first things I'd do 
 would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm - 
 does this file/dir really exist?  Or, rather, did it exist when the error 
 happened.

 I'm not looking at the source code now, but is that really the only error you 
 got?  No exception stack trace?

  Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



 - Original Message 
 From: Trey solrt...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wed, January 20, 2010 11:54:43 PM
 Subject: Replication Handler Severe Error: Unable to move index file

 Does anyone know what would cause the following error?:

 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile

      SEVERE: *Unable to move index file* from:
 /home/solr/cores/core8/index.20100119103919/_6qv.fnm to:
 /home/solr/cores/core8/index/_6qv.fnm
 This occurred a few days back and we noticed that several full copies of the
 index were subsequently pulled from the master to the slave, effectively
 evicting our live index from RAM (the linux os cache), and killing our query
 performance due to disk io contention.

 Has anyone experienced this behavior recently?  I found an old thread about
 this error from early 2009, but it looks like it was patched almost a year
 ago:
 http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html


 Additional Relevant information:
 -We are using the Solr 1.4 official release + a field collapsing patch from
 mid December (which I believe should only affect query side, not indexing /
 replication).
 -Our Replication PollInterval for slaves checking the master is very small
 (15 seconds)
 -We have a multi-box distributed search with each box possessing multiple
 cores
 -We issue a manual (rolling) optimize across the cores on the master once a
 day (occurred ~ 1-2 hours before the above timeline)
 -maxWarmingSearchers is set to 1.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Fastest way to use solrj

2010-01-20 Thread Noble Paul നോബിള്‍ नोब्ळ्

2010/1/20 Tim Terlegård tim.terleg...@gmail.com:
 BinaryRequestWriter does not read from a file and post it

 Is there any other way or is this use case not supported? I tried this:

 $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin
 $ curl host/solr/update -F stream.body=' commit /'

 Solr did read the file, because solr complained when the file wasn't
 in the format the JavaBinUpdateRequestCodec expected. But no data is
 added to the index for some reason.

 how did you create the file /tmp/data.bin ? what is the format?

 I wrote this in the first email. It's in the javabin format (I think).
 I did like this (groovy code):

   fieldId = new NamedList()
   fieldId.add(name, id)
   fieldId.add(val, 9-0)
   fieldId.add(boost, null)
   fieldText = new NamedList()
   fieldText.add(name, text)
   fieldText.add(val, Some text)
   fieldText.add(boost, null)
   fieldNull = new NamedList()
   fieldNull.add(boost, null)
   doc = [fieldNull, fieldId, fieldText]
   docs = [doc]
   root = new NamedList()
   root.add(docs, docs)
   fos = new FileOutputStream(data.bin)
   new JavaBinCodec().marshal(root, fos)

 /Tim

JavaBin is a format.
use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest
updateRequest, OutputStream os)

The output of this can be posted to solr and it should work



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Fastest way to use solrj

2010-01-19 Thread Noble Paul നോബിള്‍ नोब्ळ्

2010/1/19 Tim Terlegård tim.terleg...@gmail.com:
 There are a few ways to use solrj. I just learned that I can use the
 javabin format to get some performance gain. But when I try the binary
 format nothing is added to the index. This is how I try to use this:

    server = new CommonsHttpSolrServer(http://localhost:8983/solr;)
    server.setRequestWriter(new BinaryRequestWriter())
    request = new UpdateRequest()
    request.setAction(UpdateRequest.ACTION.COMMIT, true, true);
    request.setParam(stream.file, /tmp/data.bin)
    request.process(server)

 Should this work? Could there be something wrong with the file? I
 haven't found a good reference for how to create a javabin file, but
 by reading the source code I came up with this (groovy code):
BinaryRequestWriter does not read from a file and post it

    fieldId = new NamedList()
    fieldId.add(name, id)
    fieldId.add(val, 9-0)
    fieldId.add(boost, null)
    fieldText = new NamedList()
    fieldText.add(name, text)
    fieldText.add(val, Some text)
    fieldText.add(boost, null)
    fieldNull = new NamedList()
    fieldNull.add(boost, null)
    doc = [fieldNull, fieldId, fieldText]
    docs = [doc]
    root = new NamedList()
    root.add(docs, docs)
    fos = new FileOutputStream(data.bin)
    new JavaBinCodec().marshal(root, fos)

 I haven't found any examples of using stream.file like this with a
 binary file. Is it supported? Is it better/faster to use
 StreamingUpdateSolrServer and send everything over HTTP instead? Would
 code for that look something like this?

    while (moreDocs) {
        xmlDoc = readDocFromFileUsingSaxParser()
        doc = new SolrInputDocument()
        doc.addField(id, 9-0)
        doc.addField(text, Some text)
        server.add(doc)
    }

 To me it instinctively looks as if stream.file would be faster because
 it doesn't have to use HTTP and it doesn't have to create a bunch of
 SolrInputDocument objects.

 /Tim




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DIH delta import - last modified date

2010-01-19 Thread Noble Paul നോബിള്‍ नोब्ळ्

While invoking the delta-import you may, pass the value as a request
parameter. That value can be used in the query as ${dih.request.xyz}

where as xyz is the request parameter name

On Wed, Jan 20, 2010 at 1:15 AM, Yao Ge yao...@gmail.com wrote:

 I am struggling with the concept of delta import in DIH. According the to
 documentation, the delta import will automatically record the last index
 time stamp and make it available to use for the delta query. However in many
 case when the last_modified date time stamp in the database lag behind the
 current time, the last index time stamp is the not good for delta query. Can
 I pick a different mechanism to generate last_index_time by using time
 stamp computed from the database (such as from a column of the database)?
 --
 View this message in context: 
 http://old.nabble.com/DIH-delta-import---last-modified-date-tp27231449p27231449.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: NullPointerException in ReplicationHandler.postCommit + question about compression

2010-01-18 Thread Noble Paul നോബിള്‍ नोब्ळ्

When you copy paste config from wiki, just copy what you need.
excluding documentation and comments

On Wed, Jan 13, 2010 at 12:51 AM, Stephen Weiss swe...@stylesight.com wrote:
 Hi Solr List,

 We're trying to set up java-based replication with Solr 1.4 (dist tarball).
  We are running this to start with on a pair of test servers just to see how
 things go.

 There's one major problem we can't seem to get past.  When we replicate
 manually (via the admin page) things seem to go well.  However, when
 replication is triggered by a commit event on the master, the master gets a
 NullPointerException and no replication seems to take place.

 SEVERE: java.lang.NullPointerException
        at
 org.apache.solr.handler.ReplicationHandler$4.postCommit(ReplicationHandler.java:922)
        at
 org.apache.solr.update.UpdateHandler.callPostCommitCallbacks(UpdateHandler.java:78)
        at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:411)
        at
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
        at
 org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:169)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
        at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239)
        at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
        at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
        at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
        at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:324)
        at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
        at
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:741)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:213)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
        at
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
        at
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)



 This is the master config:

  requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=master
        !--Replicate on 'optimize'. Other values can be 'commit', 'startup'.
        It is possible to have multiple entries of this config string--
        str name=replicateAftercommit/str

        !--Create a backup after 'optimize'. Other values can be 'commit',
 'startup'.
        It is possible to have multiple entries of this config string.  Note
 that this is
        just for backup, replication does not require this. --
        !-- str name=backupAfteroptimize/str --

        !--If configuration files need to be replicated give the names here,
 separated by comma --
        str
 name=confFilessolrconfig_slave.xml:solrconfig.xml,schema.xml,synonyms.txt,stopwords.txt,elevate.xml/str

        !--The default value of reservation is 10 secs.See the documentation
 below.
        Normally , you should not need to specify this --
        str name=commitReserveDuration00:00:10/str
    /lst
  /requestHandler


 and... the slave config:

  requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=slave

        !--fully qualified url for the replication handler of master . It is
 possible
         to pass on this as a request param for the fetchindex command--
        str
 name=masterUrlhttp://hostname.obscured.com:8080/solr/calendar_core/replication/str

        !--Interval in which the slave should poll master .Format is
 HH:mm:ss .
         If this is absent slave does not poll automatically.
         But a fetchindex can be triggered from the admin or the http API --
        str name=pollInterval00:00:20/str

        !-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED--
        !--to use compression while transferring the index files. The
 possible values are internal|external

Re: Data Full Import Error

2010-01-12 Thread Noble Paul നോബിള്‍ नोब्ळ्

You need more memory to run dataimport.


On Tue, Jan 12, 2010 at 4:46 PM, Lee Smith l...@weblee.co.uk wrote:
 Hi All

 I am trying to do a data import but I am getting the following error.

 INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 
 QTime=405
 2010-01-12 03:08:08.576::WARN:  Error for /solr/dataimport
 java.lang.OutOfMemoryError: Java heap space
 Jan 12, 2010 3:08:05 AM org.apache.solr.handler.dataimport.DataImporter 
 doFullImport
 SEVERE: Full Import failed
 java.lang.OutOfMemoryError: Java heap space
 Exception in thread btpool0-2 java.lang.OutOfMemoryError: Java heap space
 Jan 12, 2010 3:08:14 AM org.apache.solr.update.DirectUpdateHandler2 rollback
 INFO: start rollback
 Jan 12, 2010 3:08:21 AM org.apache.solr.update.DirectUpdateHandler2 rollback
 INFO: end_rollback
 Jan 12, 2010 3:08:23 AM org.apache.solr.update.SolrIndexWriter finalize
 SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug 
 -- POSSIBLE RESOURCE LEAK!!!

This is OK. don't bother

 Any ideas what this can be ??

 Hope you can help.

 Lee





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Data Full Import Error

2010-01-12 Thread Noble Paul നോബിള്‍ नोब्ळ्

it is the way you start your solr server( -Xmx option)

On Tue, Jan 12, 2010 at 6:00 PM, Lee Smith l...@weblee.co.uk wrote:
 Thank you for your response.

 Will I just need to adjust the allowed memory in a config file or is this a 
 server issue. ?

 Sorry I know nothing about Java.

 Hope you can advise !

 On 12 Jan 2010, at 12:26, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 You need more memory to run dataimport.


 On Tue, Jan 12, 2010 at 4:46 PM, Lee Smith l...@weblee.co.uk wrote:
 Hi All

 I am trying to do a data import but I am getting the following error.

 INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 
 QTime=405
 2010-01-12 03:08:08.576::WARN:  Error for /solr/dataimport
 java.lang.OutOfMemoryError: Java heap space
 Jan 12, 2010 3:08:05 AM org.apache.solr.handler.dataimport.DataImporter 
 doFullImport
 SEVERE: Full Import failed
 java.lang.OutOfMemoryError: Java heap space
 Exception in thread btpool0-2 java.lang.OutOfMemoryError: Java heap space
 Jan 12, 2010 3:08:14 AM org.apache.solr.update.DirectUpdateHandler2 rollback
 INFO: start rollback
 Jan 12, 2010 3:08:21 AM org.apache.solr.update.DirectUpdateHandler2 rollback
 INFO: end_rollback
 Jan 12, 2010 3:08:23 AM org.apache.solr.update.SolrIndexWriter finalize
 SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug 
 -- POSSIBLE RESOURCE LEAK!!!

 This is OK. don't bother

 Any ideas what this can be ??

 Hope you can help.

 Lee





 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DataImportHandler - synchronous execution

2010-01-12 Thread Noble Paul നോബിള്‍ नोब्ळ्

it can be added

On Tue, Jan 12, 2010 at 10:18 PM, Alexey Serba ase...@gmail.com wrote:
 Hi,

 I found that there's no explicit option to run DataImportHandler in a
 synchronous mode. I need that option to run DIH from SolrJ (
 EmbeddedSolrServer ) in the same thread. Currently I pass dummy stream
 to DIH as a workaround for this, but I think it makes sense to add
 specific option for that. Any objections?

 Alex




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Synonyms from Database

2010-01-10 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Ravi,

 I think if your synonyms were in a DB, it would be trivial to periodically 
 dump them into a text file Solr expects.  You wouldn't want to hit the DB to 
 look up synonyms at query time...
Why query time. Can it not be done at startup time ?


 Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



 - Original Message 
 From: Ravi Gidwani ravi.gidw...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sat, January 9, 2010 10:20:18 PM
 Subject: Synonyms from Database

 Hi :
      Is there any work done in providing synonyms from a database instead of
 synonyms.txt file ? Idea is to have a dictionary in DB that can be enhanced
 on the fly in the application. This can then be used at query time to check
 for synonyms.

 I know I am not putting thoughts to the performance implications of this
 approach, but will love to hear about others thoughts.

 ~Ravi.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: replication -- missing field data file

2010-01-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

actually it does not.
BTW, FYI, backup is just to take periodics backups not necessary for
the Replicationhandler to work

On Thu, Jan 7, 2010 at 2:37 AM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 How can you tell when the backup is done?

 -Original Message-
 From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
 Paul ??? ??
 Sent: Wednesday, January 06, 2010 12:23 PM
 To: solr-user
 Subject: Re: replication -- missing field data file

 the index dir is in the name index others will be stored as
 indexdate-as-number

 On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade
 gfernandez-kinc...@capitaliq.com wrote:
 How can you differentiate between the backup and the normal index files?

 -Original Message-
 From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
 Paul ??? ??
 Sent: Wednesday, January 06, 2010 11:52 AM
 To: solr-user
 Subject: Re: replication -- missing field data file

 On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
 gfernandez-kinc...@capitaliq.com wrote:
 I set up replication between 2 cores on one master and 2 cores on one 
 slave. Before doing this the master was working without issues, and I 
 stopped all indexing on the master.

 Now that replication has synced the index files, an .FDT field is suddenly 
 missing on both the master and the slave. Pretty much every operation (core 
 reload, commit, add document) fails with an error like the one posted below.

 How could this happen? How can one recover from such an error? Is there any 
 way to regenerate the FDT file without re-indexing everything?

 This brings me to a question about backups. If I run the 
 replication?command=backup command, where is this backup stored? I've tried 
 this a few times and get an OK response from the machine, but I don't see 
 the backup generated anywhere.
 The backup is done asynchronously. So it always gives an OK response 
 immedietly.
 The backup is created in the data dir itself

 Thanks,
 Gio.

 org.apache.solr.common.SolrException: Error handling 'reload' action
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
       at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
       at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
       at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
       at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
       at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
       at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
       at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
       at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
       at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
       at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
       at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
       at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
       at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
       at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
       at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
       at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579)
       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
       ... 18 more
 Caused by: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at java.io.RandomAccessFile.open(Native Method)
       at java.io.RandomAccessFile.lt;initgt;(Unknown Source)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78)
       at

Re: readOnly=true IndexReader

2010-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Jan 6, 2010 at 4:26 PM, Patrick Sauts patrick.via...@gmail.com wrote:
 In the Wiki page : http://wiki.apache.org/lucene-java/ImproveSearchingSpeed,
 I've found
 -Open the IndexReader with readOnly=true. This makes a big difference when
 multiple threads are sharing the same reader, as it removes certain sources
 of thread contention.

 How to open the IndexReader with readOnly=true ?
 I can't find anything related to this parameter.

 Do the VJM parameters -Dslave=disabled or -Dmaster=disabled have any
 incidence on solr with a standart solrConfig.xml?
these are not variables used by Solr. These are just substituted in
solrconfig.xml and probably consumed by ReplicationHandler (this is
not a standard)

 Thank you for your answers.

 Patrick.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: replication -- missing field data file

2010-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 I set up replication between 2 cores on one master and 2 cores on one slave. 
 Before doing this the master was working without issues, and I stopped all 
 indexing on the master.

 Now that replication has synced the index files, an .FDT field is suddenly 
 missing on both the master and the slave. Pretty much every operation (core 
 reload, commit, add document) fails with an error like the one posted below.

 How could this happen? How can one recover from such an error? Is there any 
 way to regenerate the FDT file without re-indexing everything?

 This brings me to a question about backups. If I run the 
 replication?command=backup command, where is this backup stored? I've tried 
 this a few times and get an OK response from the machine, but I don't see the 
 backup generated anywhere.
The backup is done asynchronously. So it always gives an OK response immedietly.
The backup is created in the data dir itself

 Thanks,
 Gio.

 org.apache.solr.common.SolrException: Error handling 'reload' action
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
       at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
       at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
       at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
       at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
       at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
       at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
       at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
       at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
       at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
       at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
       at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
       at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
       at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
       at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
       at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
       at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579)
       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
       ... 18 more
 Caused by: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at java.io.RandomAccessFile.open(Native Method)
       at java.io.RandomAccessFile.lt;initgt;(Unknown Source)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108)
       at 
 org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
       at 
 org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104)
       at 
 org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
       at 
 org.apache.lucene.index.DirectoryReader.lt;initgt;(DirectoryReader.java:103)
       at 
 org.apache.lucene.index.ReadOnlyDirectoryReader.lt;initgt;(ReadOnlyDirectoryReader.java:27)
       at 
 org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)
       at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
       at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:68)
       at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
       at

Re: replication -- missing field data file

2010-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

the index dir is in the name index others will be stored as
indexdate-as-number

On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 How can you differentiate between the backup and the normal index files?

 -Original Message-
 From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
 Paul ??? ??
 Sent: Wednesday, January 06, 2010 11:52 AM
 To: solr-user
 Subject: Re: replication -- missing field data file

 On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
 gfernandez-kinc...@capitaliq.com wrote:
 I set up replication between 2 cores on one master and 2 cores on one slave. 
 Before doing this the master was working without issues, and I stopped all 
 indexing on the master.

 Now that replication has synced the index files, an .FDT field is suddenly 
 missing on both the master and the slave. Pretty much every operation (core 
 reload, commit, add document) fails with an error like the one posted below.

 How could this happen? How can one recover from such an error? Is there any 
 way to regenerate the FDT file without re-indexing everything?

 This brings me to a question about backups. If I run the 
 replication?command=backup command, where is this backup stored? I've tried 
 this a few times and get an OK response from the machine, but I don't see 
 the backup generated anywhere.
 The backup is done asynchronously. So it always gives an OK response 
 immedietly.
 The backup is created in the data dir itself

 Thanks,
 Gio.

 org.apache.solr.common.SolrException: Error handling 'reload' action
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
       at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
       at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
       at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
       at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
       at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
       at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
       at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
       at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
       at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
       at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
       at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
       at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
       at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
       at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
       at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
       at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579)
       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
       ... 18 more
 Caused by: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at java.io.RandomAccessFile.open(Native Method)
       at java.io.RandomAccessFile.lt;initgt;(Unknown Source)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108)
       at 
 org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
       at 
 org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104)
       at 
 org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
       at

Re: replicating extension JARs

2010-01-05 Thread Noble Paul നോബിള്‍ नोब्ळ्

jars are not replicated. It is by design. But that is not to say that
we can't do it. open an issue .

On Wed, Jan 6, 2010 at 6:20 AM, Ryan Kennedy rcken...@gmail.com wrote:
 Will the built-in Solr replication replicate extension JAR files in
 the lib directory? The documentation appears to indicate that only
 the index and any specified configuration files will be replicated,
 however if your solrconfig.xml references a class in a JAR file added
 to the lib directory then you'll need that replicated as well
 (otherwise the slave will encounter ClassDefNotFound exceptions). I'm
 wondering if I'm missing something and Solr replication will do that
 or if it's a deficiency in Solr's replication.

 Ryan




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Solr Replication Questions

2010-01-05 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Jan 6, 2010 at 2:51 AM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 http://wiki.apache.org/solr/SolrReplication

 I've been looking over this replication wiki and I'm still unclear on a two 
 points about Solr Replication:

 1.     If there have been small changes to the index on the master, does the 
 slave copy the entire contents of the index files that were affected?

only the delta is copied.

 a.     Let's say I add one document to the master. Presumably that causes 
 changes to the position file, amidst a few others. Does the slave download 
 the entire position file? Or just the portion that was changed?

Lucene never modifies a file which was written by previous commits. So
if you add a new document and commit , it is written to new files.
Solr replication will only replicate those new files
 2.     If you have a multi-core slave, is it possible to share one 
 configuration file (i.e. one instance directory) amidst the multiple cores, 
 and yet each core poll a different master?

 a.     Can you set the masterUrl for each core separately in the server.xml?


 Thanks for your help,
 Gio.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: serialize SolrInputDocument to java.io.File and back again?

2009-12-31 Thread Noble Paul നോബിള്‍ नोब्ळ्

what serialization would you wish to use?

you can use java serialization or solrj helps you serialize it as xml
or javabin format
(org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec)

On Thu, Dec 31, 2009 at 6:55 AM, Phillip Rhodes rhodebumpl...@gmail.com wrote:
 I want to store a SolrInputDocument to the filesystem until it can be sent
 to the solr server via the solrj client.

 I will be using a quartz job to periodically query a table that contains a
 listing of SolrInputDocuments stored as java.io.File that need to be
 processed.

 Thanks for your time.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: fl parameter and dynamic fields

2009-12-29 Thread Noble Paul നോബിള്‍ नोब्ळ्

if you wish to search on fields using wild-card you have to use a
copyField to copy all the values of Bool_* to another field and
search on that field.


On Tue, Dec 29, 2009 at 4:14 AM, Harsch, Timothy J. (ARC-TI)[PEROT
SYSTEMS] timothy.j.har...@nasa.gov wrote:
 I use dynamic fields heavily in my SOLR config.  I would like to be able to 
 specify which fields should be returned from a query based on a pattern for 
 the field name.  For instance, given:

            dynamicField name=Bool_* type=boolean
                  indexed=true stored=true /

 I might be able to construct a query like:
 http://localhost:8080/solr/select?q=Bool_*:truerows=10

 Is there something like this in SOLR?

 Thanks,
 Tim Harsch





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Problem with simple use of DIH

2009-12-27 Thread Noble Paul നോബിള്‍ नोब्ळ्

did you run it w/o the debug?

On Sun, Dec 27, 2009 at 6:31 PM, AHMET ARSLAN iori...@yahoo.com wrote:
 I'm trying to use DataImportHandler
 to load my index and having some strange
 results. I have two tables in my database. DPRODUC contains
 products and
 FSKUMAS contains the skus related to each product.

 This is the data-config I'm using.

 dataConfig
   dataSource type=JdbcDataSource

 driver=com.ibm.as400.access.AS400JDBCDriver

 url=jdbc:as400:IWAVE;prompt=false;naming=system

 user=IPGUI

 password=IPGUI/
   document
     entity name=dproduc
       query=select dprprd, dprdes from
 dproduc where dprprd like 'F%'
       field column=dprprd name=id
 /
       field column=dprdes name=name
 /
       entity name=fskumas
         query=select fsksku, fcoclr,
 fszsiz, fskret
           from fskumas where
 dprprd='${dproduc.DPRPRD}'
          field
 column=fsksku name=sku /
          field
 column=fcoclr name=color /
          field
 column=fszsiz name=size /
          field
 column=fskret name=price /
       /entity
     /entity
   /document
 /dataConfig

 What is the primary key of dproduc table? If it is dprprd can you try adding 
 pk=dprprd to entity name=dproduc?

 entity name=dproduc pk=dprprd  query=select dprprd, dprdes from dproduc 
 where dprprd like 'F%'







-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Problem with simple use of DIH

2009-12-27 Thread Noble Paul നോബിള്‍ नोब्ळ्

The field names are case sensitive. But if the field tags are
missing they are mapped to corresponding solr fields in a case
insensistive way.apparently all the fields come out of you ALL CAPS
you should put the 'column' values in ALL CAPS too

On Sun, Dec 27, 2009 at 9:03 PM, Jay Fisher jay.l.fis...@gmail.com wrote:
 I did run it without debug and the result was that 0 documents were
 processed.

 The problem seems to be with the field tags that I was using to map from
 the table column names to the schema.xml field names. I switched to using an
 AS clause in the SQL statement instead and it worked.

 I think the column names may be case-sensitive, although I haven't proven
 that to be the case. I did discover that references to column names in the
 velocity template are case sensitive; ${dproduc.DPRPRD} works
 and ${dproduc.dprprd} does not.

 Thanks, Jay

 2009/12/27 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 did you run it w/o the debug?

 On Sun, Dec 27, 2009 at 6:31 PM, AHMET ARSLAN iori...@yahoo.com wrote:
  I'm trying to use DataImportHandler
  to load my index and having some strange
  results. I have two tables in my database. DPRODUC contains
  products and
  FSKUMAS contains the skus related to each product.
 
  This is the data-config I'm using.
 
  dataConfig
    dataSource type=JdbcDataSource
 
  driver=com.ibm.as400.access.AS400JDBCDriver
 
  url=jdbc:as400:IWAVE;prompt=false;naming=system
 
  user=IPGUI
 
  password=IPGUI/
    document
      entity name=dproduc
        query=select dprprd, dprdes from
  dproduc where dprprd like 'F%'
        field column=dprprd name=id
  /
        field column=dprdes name=name
  /
        entity name=fskumas
          query=select fsksku, fcoclr,
  fszsiz, fskret
            from fskumas where
  dprprd='${dproduc.DPRPRD}'
           field
  column=fsksku name=sku /
           field
  column=fcoclr name=color /
           field
  column=fszsiz name=size /
           field
  column=fskret name=price /
        /entity
      /entity
    /document
  /dataConfig
 
  What is the primary key of dproduc table? If it is dprprd can you try
 adding pk=dprprd to entity name=dproduc?
 
  entity name=dproduc pk=dprprd  query=select dprprd, dprdes from
 dproduc where dprprd like 'F%'
 
 
 
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: suggestions for DIH batchSize

2009-12-22 Thread Noble Paul നോബിള്‍ नोब्ळ्

A bigger batchSize results in increased memory usage.  I guess
performance should be slightly better with bigger values but not
verified.

On Wed, Dec 23, 2009 at 2:51 AM, Joel Nylund jnyl...@yahoo.com wrote:
 Hi,

 it looks like from looking at the code the default is 500, is the
 recommended setting for this?

 Has anyone notice any significant performance/memory tradeoffs by making
 this much bigger?

 thanks
 Joel





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Documents are indexed but not searchable

2009-12-20 Thread Noble Paul നോബിള്‍ नोब्ळ्

just search for *:* and see if the docs are indeed there in the index.
--Noble
On Mon, Dec 21, 2009 at 9:26 AM, krosan kro...@gmail.com wrote:

 Hi,

 I'm trying to test solr for a proof of concept project, but I'm having some
 problems.

 I indexed my document, but when I search for a word which is 100% certain in
 the document, I don't get any hits.

 These are my files:

 First: my data-config.xml

 dataConfig
  dataSource type=JdbcDataSource
              driver=com.mysql.jdbc.Driver
              url=jdbc:mysql://host.com:3306/crossfire3
              user=user
              password=pass
              batchSize=1/
  document
    entity name=users
            query=select username, password, email from users
                field column=username name=username /
                field column=password name=password /
                field column=email name=email /
    /entity
  /document
 /dataConfig

 Now, I have used this in the debugger, and with commit on, and verbose on, I
 get this reply:

 http://pastebin.com/m7a460711

 This clearly states that those 2 rows have been processed and are now in the
 index.
 However, when I try to do a search with the http parameters, I get this
 response:

 For the hyperlink
 http://localhost:8080/solr/select?q=username:krosandebugQuery=on
 this is the response:
 http://pastebin.com/m7bb1dcaa

 I'm clueless on what the problem could be!

 These are my two config files:

 schema.xml: http://pastebin.com/m1fd1da58
 solrconfig.xml: http://pastebin.com/m44b73d83
 (look for krosan in the documents to see what I've added to the standard
 docs)

 Any help will be greatly appreciated!

 Thanks in advance,

 Andreas Evers
 --
 View this message in context: 
 http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26868925.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Is DataImportHandler ThreadSafe???

2009-12-19 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Sat, Dec 19, 2009 at 2:16 PM, gurudev suyalprav...@yahoo.com wrote:

 Hi,
 Just wanted to know, Is the DataImportHandler available in solr1.3
 thread-safe?. I would like to use multiple instances of data import handler
 running concurrently and posting my various set of data from DB to Index.

 Can I do this by registering the DIH multiple times with various names in
 solrconfig.xml and then invoking all concurrently to achieve maximum
 throughput? Would i need to define different data-config.xml's 
 dataimport.properties for each DIH?
yes , this should work. it is thread-safe

 If it would be possible to specify the query in data-config.xml to restrict
 one DIH from overlapping the data-set fetched by another DIH through some
 SQL clauses?

 --
 View this message in context: 
 http://old.nabble.com/Is-DataImportHandler-ThreadSafetp26853521p26853521.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: shards parameter

2009-12-17 Thread Noble Paul നോബിള്‍ नोब्ळ्

yes.
put it under the defaults section in your standard requesthandler.

On Thu, Dec 17, 2009 at 5:22 PM, pcurila p...@eea.sk wrote:

 Hello, is there any way to configure shards parameter in solrconfig.xml? So I
 do not need provide it in the url. Thanks Peter
 --
 View this message in context: 
 http://old.nabble.com/shards-parameter-tp26826908p26826908.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: solr core size on disk

2009-12-16 Thread Noble Paul നോബിള്‍ नोब्ळ्

look at the index dir and see the size of the files . it is typically
in $SOLR_HOME/data/index

On Thu, Dec 17, 2009 at 2:56 AM, Matthieu Labour matth...@kikin.com wrote:
 Hi
 I am new to solr. Here is my question:
 How to find out the size of a solr core on disk ?
 Thank you
 matt




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: question regarding dynamic fields

2009-12-14 Thread Noble Paul നോബിള്‍ नोब्ळ्

use a copyField to copy those fields to another field and search on that

On Mon, Dec 14, 2009 at 1:00 PM, Phanindra Reva
reva.phanin...@gmail.com wrote:
 Hello..,
             I have observed that the text or keywords which are being
 indexed using dynamicField concept are being searchable only when we
 mention field name too while querying.Am I wrong with my observation
 or  is it the default and can not be changed? I am just wondering if
 there is any route to search the text indexed using dynamicFields with
 out having to mention the field name in the query.
 Thanks.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Request Assistance with DIH

2009-12-14 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Sat, Dec 12, 2009 at 6:15 AM, Robbin rob...@drivesajeep.com wrote:
 I've been trying to use the DIH with oracle and would love it if someone
 could give me some pointers.  I put the ojdbc14.jar in both the Tomcat lib
 and solr home/lib.  I created a dataimport.xml and enabled it in the
 solrconfig.xml.  I go to the http://solr server/solr/admin/dataimport.jsp.
  This all seems to be fine, but I get the default page response and doesn't
 look like the connection to the oracle server is even attempted.
Did you trigger an import? what is the message on the we page and what
do the logs say?



 I'm using the Solr 1.4 release on Nov 10.
 Do I need an oracle client on the server?  I thought having the ojdbc jar
 should be sufficient.  Any help or configuration examples for setting this
 up would be much appreciated.
You need all the jars you would normally use to connect to Oracle.


 Thanks
 Robbin




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: apache-solr-common.jar

2009-12-14 Thread Noble Paul നോബിള്‍ नोब्ळ्

there is no solrcommon jar anymore. you may use the solrj jar which
contains all the classes which were there in the comon jar.

On Mon, Dec 14, 2009 at 9:22 PM, gudumba l gudumba.sm...@gmail.com wrote:
 Hello All,
               I have been using apache-solr-common-1.3.0.jar in my module.
 I am planning to shift to the latest version, because of course it has more
 flexibility. But it is really strange that I dont find any corresponding jar
 of the latest version. I have serached in total apachae solr 1.4 folder
 (which is downloaded from site), but have not found any. , I am sorry, its
 really silly to request for a jar, but have no option.
 Thanks.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Custom Field sample?

2009-12-11 Thread Noble Paul നോബിള്‍ नोब्ळ्

how exactly do you wish to query these documents?

On Fri, Dec 11, 2009 at 4:35 PM, Antonio Zippo reven...@yahoo.it wrote:
 I need to add theese features to each document

 Document1
 ---
 Argument1, positive
 Argument2, positive
 Argument3, neutral
 Argument4, positive
 Argument5, negative
 Argument6, negative

 Document2
 ---
 Argument1, negative
 Argument2, positive
 Argument3, negative
 Argument6, negative
 Argument7, neutral

 where the argument name is dynamic
 using a relational database I could use a master detail structure, but in 
 solr?
 I thought about a Map or Pair field







 
 Da: Grant Ingersoll gsing...@apache.org
 A: solr-user@lucene.apache.org
 Inviato: Gio 10 dicembre 2009, 19:47:55
 Oggetto: Re: Custom Field sample?

 Can you perhaps give a little more info on what problem you are trying to 
 solve?  FWIW, there are a lot of examples of custom FieldTypes in the Solr 
 code.


 On Dec 10, 2009, at 11:46 AM, Antonio Zippo wrote:

 Hi all,

 could you help me to create a custom field?

 I need to create a field structured like a Map
 is it possible? how to define if the search string is on key or value (or 
 both)?

 A way could be to create a char separated multivalued string field... but it 
 isn't the best way. and with facets is the worst way

 could you give me a custom field sample?


 Thanks in advance,
  Revenge



 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
 Solr/Lucene:
 http://www.lucidimagination.com/search






-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: indexing XML with solr example webapp - out of java heap space

2009-12-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

the post.jar does not stream. use curl if you are using *nix.
--Noble

On Wed, Dec 9, 2009 at 12:28 AM, Feroze Daud fero...@zillow.com wrote:
 Hi!



 I downloaded SOLR and am trying to index an XML file. This XML file is
 huge (500M).



 When I try to index it using the post.jar tool in example\exampledocs,
 I get a out of java heap space error in the SimplePostTool
 application.



 Any ideas how to fix this? Passing in -Xms1024M does not fix it.



 Feroze.









-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 987 matches

Mail list logo