Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-17 Thread Naveen Gupta
 of a document that was used. You could copy/paste that to try it
 out.


  4. JVM tuning and performance result based on Multithreaded environment.

 5. Machine Details (RAM, CPU, and settings from SOLR perspective).


 Default Solr settings with the shipped jetty container. The startup script
 used is available when you download Solr 3.3 with RankingAlgorithm. It has
 mx set to 2Gb and uses the default collector with parallel collection
 enabled for the young generation.  The system is a x86_64 Linux (2.6
 kernel), 2 core (2.5Ghz) and uses internal disks for indexing.

 My suggestion would be to download a version of Solr 3.3 with
 RankingAlgorithm and give it a try to see if any changes are needed from
 your existing setup.


 Regards,

 - Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://rankingalgorithm.tgels.**org http://rankingalgorithm.tgels.org


  Hoping that you are getting my point. We want to benchmark the
 performance.
 If you can involve me in your group, that would be great.

 Thanks
 Naveen



 2011/8/15 Nagendra 
 Nagarajayyannagarajayya@**transaxtions.comnnagaraja...@transaxtions.com
 

  Bill:

 I did look at Marks performance tests. Looks very interesting.

 Here is the Apacle Solr 3.3 with RankingAlgorithm NRT performance:
 http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_**ver_**3.xhttp://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.x
 http://solr-ra.**tgels.com/wiki/en/Near_Real_**Time_Search_ver_3.xhttp://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x
 



 Regards

 - Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://rankingalgorithm.tgels.orghttp://rankingalgorithm.**
 tgels.org http://rankingalgorithm.tgels.org




 On 8/14/2011 7:47 PM, Bill Bell wrote:

  I understand.

 Have you looked at Mark's patch? From his performance tests, it looks
 pretty good.

 When would RA work better?

 Bill


 On 8/14/11 8:40 PM, Nagendra Nagarajayyannagarajayya@**
 transaxtions.comnnagarajayya@**transaxtions.comnnagaraja...@transaxtions.com
 
 wrote:

  Bill:

 The technical details of the NRT implementation in Apache Solr with
 RankingAlgorithm (SOLR-RA) is available here:

 http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdfhttp://solr-ra.tgels.com/**papers/NRT_Solr_**RankingAlgorithm.pdf
 http://**solr-ra.tgels.com/papers/NRT_**Solr_RankingAlgorithm.pdfhttp://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf
 


 (Some changes for Solr 3.x, but for most it is as above)

 Regarding support for 4.0 trunk, should happen sometime soon.

 Regards

 - Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://rankingalgorithm.tgels.orghttp://rankingalgorithm.**
 tgels.org http://rankingalgorithm.tgels.org






 On 8/14/2011 7:11 PM, Bill Bell wrote:

  OK,

 I'll ask the elephant in the roomŠ.

 What is the difference between the new UpdateHandler from Mark and the
 SOLR-RA?

 The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk?

 Pros/Cons?


 On 8/14/11 8:10 PM, Nagendra
 Nagarajayyannagarajayya@**tr**ansaxtions.comhttp://transaxtions.com
 nnagarajayya@**transaxtions.com nnagaraja...@transaxtions.com
 wrote:

  Naveen:

 NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for
 a
 document to become searchable. Any document that you add through
 update
 becomes  immediately searchable. So no need to commit from within
 your
 update client code.  Since there is no commit, the cache does not
 have
 to be cleared or the old searchers closed or  new searchers opened,
 and
 warmed (error that you are facing).

 Regards

 - Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://rankingalgorithm.tgels.orghttp://rankingalgorithm.**
 tgels.org http://rankingalgorithm.tgels.org




 On 8/14/2011 10:37 AM, Naveen Gupta wrote:

  Hi Mark/Erick/Nagendra,

 I was not very confident about NRT at that point of time, when we
 started
 project almost 1 year ago, definitely i would try NRT and see the
 performance.

 The current requirement was working fine till we were using
 commitWithin 10
 millisecs in the XMLDocument which we were posting to SOLR.

 But due to which, we were getting very poor performance (almost 3
 mins
 for
 15,000 docs) per user. There are many paraller user committing to
 our
 SOLR.

 So we removed the commitWithin, and hence performance was much much
 better.

 But then we are getting this maxWarmingSearcher Error, because we
 are
 committing separately as a curl request after once entire doc is
 submitted
 for indexing.

 The question here is what is difference between commitWithin and
 commit
 (apart from the fact that commit takes memory and processes and
 additional
 hardware usage)

 Why we want it to be visible as soon as possible, since we are
 applying
 many
 business rules on top of the results (older indexes as well as new
 one)
 and
 apply different filters.

 upto 5 mins is fine for us. but more than that we need to think then
 other
 optimizations.

 We will definitely try NRT

Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-16 Thread Naveen Gupta
Nagendra

You wrote,

Naveen:

*NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
document to become searchable*. Any document that you add through update
becomes  immediately searchable. So no need to commit from within your
update client code.  Since there is no commit, the cache does not have to be
cleared or the old searchers closed or  new searchers opened, and warmed
(error that you are facing).


Looking at the link which you mentioned is clearly what we wanted. But the
real thing is that you have RA does need a commit for  a document to become
searchable (please take a look at bold sentence) .

In future, for more loads, can it cater to Master Slave (Replication) and
etc to scale and perform better? If yes, we would like to go for NRT and
looking at the performance described in the article is acceptable. We were
expecting the same real time performance for a single user.

What about multiple users, should we wait for 1-2 secs before calling the
curl request to make SOLR perform better. Or internally it will handle with
multiple request (multithreaded and etc).

What would be doc size (10,000 docs) to allow JVM perform better? Have you
done any kind of benchmarking in terms of multi threaded and multi user for
NRT and also JVM tuning in terms of SOLR sever performance. Any kind of
performance analysis would help us to decide quickly to switch over to NRT.

Questions in terms for switching over to NRT,


1.Should we upgrade to SOLR 4.x ?

2. Any benchmarking (10,000 docs/secs).  The question here is more specific

the detail of individual doc (fields, number of fields, fields size,
parameters affecting performance with faceting or w/o faceting)

3. What about multiple users ?

A user in real time might be having an large doc size of .1 million. How to
break and analyze which one is better (though it is our task to do). But
still any kind of break up will help us. Imagine a user inbox.

4. JVM tuning and performance result based on Multithreaded environment.

5. Machine Details (RAM, CPU, and settings from SOLR perspective).

Hoping that you are getting my point. We want to benchmark the performance.
If you can involve me in your group, that would be great.

Thanks
Naveen



2011/8/15 Nagendra Nagarajayya nnagaraja...@transaxtions.com

 Bill:

 I did look at Marks performance tests. Looks very interesting.

 Here is the Apacle Solr 3.3 with RankingAlgorithm NRT performance:
 http://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.xhttp://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x


 Regards

 - Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://rankingalgorithm.tgels.**org http://rankingalgorithm.tgels.org



 On 8/14/2011 7:47 PM, Bill Bell wrote:

 I understand.

 Have you looked at Mark's patch? From his performance tests, it looks
 pretty good.

 When would RA work better?

 Bill


 On 8/14/11 8:40 PM, Nagendra Nagarajayyannagarajayya@**
 transaxtions.com nnagaraja...@transaxtions.com
 wrote:

  Bill:

 The technical details of the NRT implementation in Apache Solr with
 RankingAlgorithm (SOLR-RA) is available here:

 http://solr-ra.tgels.com/**papers/NRT_Solr_**RankingAlgorithm.pdfhttp://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf

 (Some changes for Solr 3.x, but for most it is as above)

 Regarding support for 4.0 trunk, should happen sometime soon.

 Regards

 - Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://rankingalgorithm.tgels.**org http://rankingalgorithm.tgels.org





 On 8/14/2011 7:11 PM, Bill Bell wrote:

 OK,

 I'll ask the elephant in the roomŠ.

 What is the difference between the new UpdateHandler from Mark and the
 SOLR-RA?

 The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk?

 Pros/Cons?


 On 8/14/11 8:10 PM, Nagendra
 Nagarajayyannagarajayya@**transaxtions.comnnagaraja...@transaxtions.com
 
 wrote:

  Naveen:

 NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
 document to become searchable. Any document that you add through update
 becomes  immediately searchable. So no need to commit from within your
 update client code.  Since there is no commit, the cache does not have
 to be cleared or the old searchers closed or  new searchers opened, and
 warmed (error that you are facing).

 Regards

 - Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://rankingalgorithm.tgels.**orghttp://rankingalgorithm.tgels.org



 On 8/14/2011 10:37 AM, Naveen Gupta wrote:

 Hi Mark/Erick/Nagendra,

 I was not very confident about NRT at that point of time, when we
 started
 project almost 1 year ago, definitely i would try NRT and see the
 performance.

 The current requirement was working fine till we were using
 commitWithin 10
 millisecs in the XMLDocument which we were posting to SOLR.

 But due to which, we were getting very poor performance (almost 3 mins
 for
 15,000 docs) per user. There are many paraller user committing to our
 SOLR.

 So we removed the commitWithin, and hence

Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-14 Thread Naveen Gupta
Hi Mark/Erick/Nagendra,

I was not very confident about NRT at that point of time, when we started
project almost 1 year ago, definitely i would try NRT and see the
performance.

The current requirement was working fine till we were using commitWithin 10
millisecs in the XMLDocument which we were posting to SOLR.

But due to which, we were getting very poor performance (almost 3 mins for
15,000 docs) per user. There are many paraller user committing to our SOLR.

So we removed the commitWithin, and hence performance was much much better.

But then we are getting this maxWarmingSearcher Error, because we are
committing separately as a curl request after once entire doc is submitted
for indexing.

The question here is what is difference between commitWithin and commit
(apart from the fact that commit takes memory and processes and additional
hardware usage)

Why we want it to be visible as soon as possible, since we are applying many
business rules on top of the results (older indexes as well as new one) and
apply different filters.

upto 5 mins is fine for us. but more than that we need to think then other
optimizations.

We will definitely try NRT. But please tell me other options which we can
apply in order to optimize.?

Thanks
Naveen


On Sun, Aug 14, 2011 at 9:42 PM, Erick Erickson erickerick...@gmail.comwrote:

 Ah, thanks, Mark... I must have been looking at the wrong JIRAs.

 Erick

 On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller markrmil...@gmail.com
 wrote:
 
  On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:
 
  You either have to go to near real time (NRT), which is under
  development, but not committed to trunk yet
 
  NRT support is committed to trunk.
 
  - Mark Miller
  lucidimagination.com
 
 
 
 
 
 
 
 
 



exceeded limit of maxWarmingSearchers ERROR

2011-08-13 Thread Naveen Gupta
Hi,

Most of the settings are default.

We have single node (Memory 1 GB, Index Size 4GB)

We have a requirement where we are doing very fast commit. This is kind of
real time requirement where we are polling many threads from third party and
indexes into our system.

We want these results to be available soon.

We are committing for each user (may have 10k threads and inside that 1
thread may have 10 messages). So overall documents per user will be having
around .1 million (10)

Earlier we were using commit Within  as 10 milliseconds inside the document,
but that was slowing the indexing and we were not getting any error.

As we removed the commit Within, indexing became very fast. But after that
we started experiencing in the system

As i read many forums, everybody told that this is happening because of very
fast commit rate, but what is the solution for our problem?

We are using CURL to post the data and commit

Also till now we are using default solrconfig.

Aug 14, 2011 12:12:04 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
exceeded limit of maxWarmingSearchers=2, try again later.
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1052)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:424)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)


Re: LockObtainFailedException

2011-08-12 Thread Naveen Gupta
HI Peter

I found the issue,

Actually we were getting this exception because of JVM space. I allocated
512 xms and 1024 xmx .. finally increased the time limit for write lock to
20 secs .. things are working fine ... but still it did not help ...

On closely analysis of doc which we were indexing, we were using
commitWithin as 10 secs, which was the root cause of taking so long for
indexing the document because of so many segments to be committed.

On separate commit command using curl solved the issue.

The performance improved from 3 mins to 1.5 secs :)

Thanks a lot
Naveen

On Thu, Aug 11, 2011 at 6:27 PM, Peter Sturge peter.stu...@gmail.comwrote:

 Optimizing indexing time is a very different question.
 I'm guessing your 3mins+ time you refer to is the commit time.

 There are a whole host of things to take into account regarding
 indexing, like: number of segments, schema, how many fields, storing
 fields, omitting norms, caching, autowarming, search activity etc. -
 the list goes on...
 The trouble is, you can look at 100 different Solr installations with
 slow indexing, and find 200 different reasons why each is slow.

 The best place to start is to get a full understanding of precisely
 how your data is being stored in the index, starting with adding docs,
 going through your schema, Lucene segments, solrconfig.xml etc,
 looking at caches, commit triggers etc. - really getting to know how
 each step is affecting performance.
 Once you really have a handle on all the indexing steps, you'll be
 able to spot the bottlenecks that relate to your particular
 environment.

 An index of 4.5GB isn't that big (but the number of documents tends to
 have more of an effect than the physical size), so the bottleneck(s)
 should be findable once you trace through the indexing operations.



 On Thu, Aug 11, 2011 at 1:02 PM, Naveen Gupta nkgiit...@gmail.com wrote:
  Yes this was happening because of JVM heap size
 
  But the real issue is that if our index size is growing (very high)
 
  then indexing time is taking very long (using streaming)
 
  earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it
  was taking 3 mins 20 secs time,
 
  after deleting the index data, it is taking 9 secs
 
  What would be approach to have better indexing performance as well as
 index
  size should also at the same time.
 
  The index size was around 4.5 GB
 
  Thanks
  Naveen
 
  On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge peter.stu...@gmail.com
 wrote:
 
  Hi,
 
  When you get this exception with no other error or explananation in
  the logs, this is almost always because the JVM has run out of memory.
  Have you checked/profiled your mem usage/GC during the stream operation?
 
 
 
  On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta nkgiit...@gmail.com
 wrote:
   Hi,
  
   We are doing streaming update to solr for multiple user,
  
   We are getting
  
  
   Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log
  
   SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
  timed
   out: NativeFSLock@/var/lib/solr/data/index/write.lock
  at org.apache.lucene.store.Lock.obtain(Lock.java:84)
  at
  org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
  at
   org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
  at
  
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
  at
  
 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
  at
  
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
  at
  
 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
  at
   org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
  at
  
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
  at
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
  at
  
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at
  
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
  
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at
  
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at
  
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127

Re: LockObtainFailedException

2011-08-11 Thread Naveen Gupta
Yes this was happening because of JVM heap size

But the real issue is that if our index size is growing (very high)

then indexing time is taking very long (using streaming)

earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it
was taking 3 mins 20 secs time,

after deleting the index data, it is taking 9 secs

What would be approach to have better indexing performance as well as index
size should also at the same time.

The index size was around 4.5 GB

Thanks
Naveen

On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge peter.stu...@gmail.comwrote:

 Hi,

 When you get this exception with no other error or explananation in
 the logs, this is almost always because the JVM has run out of memory.
 Have you checked/profiled your mem usage/GC during the stream operation?



 On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta nkgiit...@gmail.com wrote:
  Hi,
 
  We are doing streaming update to solr for multiple user,
 
  We are getting
 
 
  Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log
 
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: NativeFSLock@/var/lib/solr/data/index/write.lock
 at org.apache.lucene.store.Lock.obtain(Lock.java:84)
 at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
 at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
 at
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
 at
 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
 at
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
 at
 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
 at
  org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at org.apache.tomcat.util.net.JIoEndpoint
 
  Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: NativeFSLock@/var/lib/solr/data/index/write.lock
 at org.apache.lucene.store.Lock.obtain(Lock.java:84)
 at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
 at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
 at
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
 at
 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
 at
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
 at
 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
 at
  org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252

LockObtainFailedException

2011-08-10 Thread Naveen Gupta
Hi,

We are doing streaming update to solr for multiple user,

We are getting


Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log

SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: NativeFSLock@/var/lib/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
at
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint

Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: NativeFSLock@/var/lib/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
at
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)


Re: indexing taking very long time

2011-08-05 Thread Naveen Gupta
Hi Erick,

We are having a requirement where we are having almost 100,000 documents to
be indexed (atleast 20 fields). These fields are not having length greater
than 10 KB.

Also we are running parallel search for the same index.

We found that it is taking almost 3 min to index the entire documents.

Strategy what we are doing is that

We are making a commit after  15000 docs (single large xml doc) (update
streaming using curl in php)

We are having merge factor of 10 as if now

I am wondering if increasing the merge factor to 25 or 50 would increase the
performance.

also what about RAM Size (default is 32 MB) ?

Which other factors we need to consider ?

When should we consider optimize ?

Any other deviation from default would help us in achieving the target.

We are allocating JVM max heap size allocation 512 MB, default concurrent
mark sweep is set for garbage collection.

One more thing, we have CPU utilization (20-25 % in all 4 cores) (using
htop)
Thanks
Naveen

On Thu, Aug 4, 2011 at 7:05 AM, Erick Erickson erickerick...@gmail.comwrote:

 What version of Solr are you using? If it's a recent version, then
 optimizing is not that  essential, you can do it during off hours, perhaps
 nightly or weekly.

 As far as indexing speed, have you profiled your application to see whether
 it's Solr or your indexing process that's the bottleneck? A quick check
 would be to monitor the CPU utilization on the server and see if it's high.

 As far as multithreading, one option is to simply have multiple clients
 indexing simultaneously. But you haven't indicated how the indexing is
 being
 done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to
 provide those kinds of details to get meaningful help.

 Best
 Erick
 On Aug 2, 2011 8:06 AM, Naveen Gupta nkgiit...@gmail.com wrote:
  Hi
 
  We have a requirement where we are indexing all the messages of a a
 thread,
  a thread may have attachment too . We are adding to the solr for indexing
  and searching for applying few business rule.
 
  For a user, we have almost many threads (100k) in number and each thread
 may
  be having 10-20 messages.
 
  Now what we are finding is that it is taking 30 mins to index the entire
  threads.
 
  When we run optimize then it is taking faster time.
 
  The question here is that how frequently this optimize should be called
 and
  when ?
 
  Please note that we are following commit strategy (that is every after
 10k
  threads, commit is called). we are not calling commit after every doc.
 
  Secondly how can we use multi threading from solr perspective in order to
  improve jvm and other utilization ?
 
 
  Thanks
  Naveen



Re: indexing taking very long time

2011-08-05 Thread Naveen Gupta
Hi ERick,

Version of SOLR 3.0

We are indexing the data using CURL call from C interface to SOLR server
using REST.

We are merging 15,000 docs in a single XML doc and directly using CURL to
index the data and then calling commit. (update)

For each of the client, we are creating a new connection .(a php script uses
exec() command to start new C process for every user) and hitting the SOLR
server.

We are using default solrconfig except few of the fields changes.inschema.xml

Max JVM heap allocation (512 MB RAM) (512 MB RAM is for linux box as well)

Initially i increased merge factor 50 and Ram size of 50 MB, but needed to
reduce since we were getting
java.lang.OutOfMemoryError: Java heap space

it is taking 3 mins to index 15,000 docs  ( a client can have 100 000 docs
and we have many multiple clients). Also we run in parallel search query
from other client to this index as well.

its the time between curl was called and the time response came back

When we commit, CPU usage goes upto 25 % (not all the cores, but yeah few of
them). The total number of cores is 4.

Can you please advise where to start from tuning perspective.

Some blog i was going through, it clearly says that it should take 40 secs
to index 100,000 docs (if you have 10-12 fields defined). I forgot the link.


They talked about increasing the merge factor.

Thanks
Naveen

On Thu, Aug 4, 2011 at 7:05 AM, Erick Erickson erickerick...@gmail.comwrote:

 What version of Solr are you using? If it's a recent version, then
 optimizing is not that  essential, you can do it during off hours, perhaps
 nightly or weekly.

 As far as indexing speed, have you profiled your application to see whether
 it's Solr or your indexing process that's the bottleneck? A quick check
 would be to monitor the CPU utilization on the server and see if it's high.

 As far as multithreading, one option is to simply have multiple clients
 indexing simultaneously. But you haven't indicated how the indexing is
 being
 done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to
 provide those kinds of details to get meaningful help.

 Best
 Erick
 On Aug 2, 2011 8:06 AM, Naveen Gupta nkgiit...@gmail.com wrote:
  Hi
 
  We have a requirement where we are indexing all the messages of a a
 thread,
  a thread may have attachment too . We are adding to the solr for indexing
  and searching for applying few business rule.
 
  For a user, we have almost many threads (100k) in number and each thread
 may
  be having 10-20 messages.
 
  Now what we are finding is that it is taking 30 mins to index the entire
  threads.
 
  When we run optimize then it is taking faster time.
 
  The question here is that how frequently this optimize should be called
 and
  when ?
 
  Please note that we are following commit strategy (that is every after
 10k
  threads, commit is called). we are not calling commit after every doc.
 
  Secondly how can we use multi threading from solr perspective in order to
  improve jvm and other utilization ?
 
 
  Thanks
  Naveen



merge factor performance

2011-08-04 Thread Naveen Gupta
Hi,

We are having a requirement where we are having almost 100,000 documents to
be indexed (atleast 20 fields). These fields are not having length greater
than 10 KB.

Also we are running parallel search for the same index.

We found that it is taking almost 3 min to index the entire documents.

Strategy what we are doing is that

We are making a commit after  15000 docs (single large xml doc)

We are having merge factor of 10 as if now

I am wondering if increasing the merge factor to 25 or 50 would increase the
performance.

also what about RAM Size (default is 32 MB) ?

Which other factors we need to consider ?

When should we consider optimize ?

Any other deviation from default would help us in achieving the target.

We are allocating JVM max heap size allocation 512 MB, default concurrent
mark sweep is set for garbage collection.


Thanks
Naveen


Re: merge factor performance

2011-08-04 Thread Naveen Gupta
Sorry for 15k Docs, it is taking 3 mins.

On Thu, Aug 4, 2011 at 10:07 PM, Naveen Gupta nkgiit...@gmail.com wrote:

 Hi,

 We are having a requirement where we are having almost 100,000 documents to
 be indexed (atleast 20 fields). These fields are not having length greater
 than 10 KB.

 Also we are running parallel search for the same index.

 We found that it is taking almost 3 min to index the entire documents.

 Strategy what we are doing is that

 We are making a commit after  15000 docs (single large xml doc)

 We are having merge factor of 10 as if now

 I am wondering if increasing the merge factor to 25 or 50 would increase
 the performance.

 also what about RAM Size (default is 32 MB) ?

 Which other factors we need to consider ?

 When should we consider optimize ?

 Any other deviation from default would help us in achieving the target.

 We are allocating JVM max heap size allocation 512 MB, default concurrent
 mark sweep is set for garbage collection.


 Thanks
 Naveen






indexing taking very long time

2011-08-02 Thread Naveen Gupta
Hi

We have a requirement where we are indexing all the messages of a a thread,
a thread may have attachment too . We are adding to the solr for indexing
and searching for applying few business rule.

For a user, we have almost many threads (100k) in number and each thread may
be having 10-20 messages.

Now what we are finding is that it is taking 30 mins to index the entire
threads.

When we run optimize then it is taking faster time.

The question here is that how frequently this optimize should be called and
when ?

Please note that we are following commit strategy (that is every after 10k
threads, commit is called). we are not calling commit after every doc.

Secondly how can we use multi threading from solr perspective in order to
improve jvm and other utilization ?


Thanks
Naveen


Re: IMP: indexing taking very long time

2011-08-02 Thread Naveen Gupta
Can somebody answer this?

What should be the best strategy for optimize (when million of messages we
are indexing for a new registered user)

Thanks
Naveen

On Tue, Aug 2, 2011 at 5:36 PM, Naveen Gupta nkgiit...@gmail.com wrote:

 Hi

 We have a requirement where we are indexing all the messages of a a thread,
 a thread may have attachment too . We are adding to the solr for indexing
 and searching for applying few business rule.

 For a user, we have almost many threads (100k) in number and each thread
 may be having 10-20 messages.

 Now what we are finding is that it is taking 30 mins to index the entire
 threads.

 When we run optimize then it is taking faster time.

 The question here is that how frequently this optimize should be called and
 when ?

 Please note that we are following commit strategy (that is every after 10k
 threads, commit is called). we are not calling commit after every doc.

 Secondly how can we use multi threading from solr perspective in order to
 improve jvm and other utilization ?


 Thanks
 Naveen



relevant result for query with boost factor on parameters

2011-06-18 Thread Naveen Gupta
Hi,
I am trying to achieve this use case with following expectation

three fields

1. field1
2. field2
3. field3

field1 should have the max relevance

field2 should have the next

field3 is the last

the term will be entered by end user (say* rock roll*)

i want to show the results which will contain *rock and roll* both in field1
(first)

i want to show the results which will contain *rock and roll* both in field
2 (first)

these should be only done for a given* field3 (x...@gmail.com)*

but if suppose field1 does not contain both the term *rock and roll,
*
*special attention *then field 2 results should take the priority (show
the results which has both the terms first and then show the results with
respect to boost factor or relevance)

if both the fields do not contain these terms together (show as normal one
with field1 having more relevance than field2)

how to join the results for field3

that means for a given field3, the above results should be filtered.

I am trying this one, giving satisfactory results, but not the best one,

field1:(rock roll)^20 field2:(rock roll)^4 field3:x...@gmail.com

i was thinking of givning

filed1 field2  field3

but not working.

Can you help in this regard?

What other config should i consider in terms of given context ?


Thanks
Naveen


Re: tika integration exception and other related queries

2011-06-09 Thread Naveen Gupta
Hi Gary,

Similar thing we are doing, but we are not creating an XML doc, rather we
are leaving TIKA to extract the content and depends on dynamic fields. We
are not storing the text as well. But not sure if in future that would be
the case.

What about microsoft 7 and later related attachments. Is this working for
you, because we are always getting number format exception. I posted as well
in the community, but till now no response has some.

Thanks
Naveen

On Thu, Jun 9, 2011 at 6:43 PM, Gary Taylor g...@inovem.com wrote:

 Naveen,

 Not sure our requirement matches yours, but one of the things we index is a
 comment item that can have one or more files attached to it.  To index the
 whole thing as a single Solr document we create a zipfile containing a file
 with the comment details in it and any additional attached files.  This is
 submitted to Solr as a TEXT field in an XML doc, along with other meta-data
 fields from the comment.  In our schema the TEXT field is indexed but not
 stored, so when we search and get a match back it doesn't contain all of the
 contents from the attached files etc., only the stored fields in our schema.
   Admittedly, the user can therefore get back a comment match with no
 indication as to WHERE the match occurred (ie. was it in the meta-data or
 the contents of the attached files), but at the moment we're only interested
 in getting appropriate matches, not explaining where the match is.

 Hope that helps.

 Kind regards,
 Gary.




 On 09/06/2011 03:00, Naveen Gupta wrote:

 Hi Gary

 It started working .. though i did not test for Zip files, but for rar
 files, it is working fine ..

 only thing what i wanted to do is to index the metadata (text mapped to
 content) not store the data  Also in search result, i want to filter
 the
 stuffs ... and it started working fine .. i don't want to show the content
 stuffs to the end user, since the way it extracts the information is not
 very helpful to the user .. although we can apply few of the analyzers and
 filters to remove the unnecessary tags ..still the information would not
 be
 of much help .. looking for your opinion ... what you did in order to
 filter
 out the content or are you showing the content extracted to the end user?

 Even in case, we are showing the text part to the end user, how can i
 limit
 the number of characters while querying the search results ... is there
 any
 feature where we can achieve this ... the concept of snippet kind of thing
 ...

 Thanks
 Naveen

 On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylorg...@inovem.com  wrote:

  Naveen,

 For indexing Zip files with Tika, take a look at the following thread :



 http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html

 I got it to work with the 3.1 source and a couple of patches.

 Hope this helps.

 Regards,
 Gary.



 On 08/06/2011 04:12, Naveen Gupta wrote:

  Hi Can somebody answer this ...

 3. can somebody tell me an idea how to do indexing for a zip file ?

 1. while sending docx, we are getting following error.





ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen Gupta
Hi

This is my document

in php

$xmldoc = 'adddocfield name=idF_146/fieldfield
name=userid74/fieldfield name=groupuseidgmail.com/fieldfield
name=attachment_size121/fieldfield
name=attachment_namesample.pptx/field/doc/add';

  $ch = curl_init(http://localhost:8080/solr/update;);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
  curl_setopt ($ch, CURLOPT_POST, 1);
  curl_setopt($ch, CURLOPT_HTTPHEADER, array(Content-Type:
text/xml) );
  curl_setopt($ch, CURLOPT_POSTFIELDS,$xmldoc);

   $result= curl_exec($ch);
   if(!curl_errno($ch))
   {
   $info = curl_getinfo($ch);
   $header = substr($response, 0, $info['header_size']);
   echo 'Took ' . $info['total_time'] . ' seconds to send a
request to ' . $info['url'];
 }else{
 print_r('no idea');
}
println('result of query'.'  '.' - '.$result);

It is throwing error

 htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style
/headbodyh1HTTP Status 400 - Unexpected character ''' (code 39) in
prolog; expected 'lt;'
 at [row,col {unknown-source}]: [1,1]/h1HR size=1
noshade=noshadepbtype/b Status report/ppbmessage/b
uUnexpected character ''' (code 39) in prolog; expected 'lt;'
 at [row,col {unknown-source}]: [1,1]/u/ppbdescription/b uThe
request sent by the client was syntactically incorrect (Unexpected character
''' (code 39) in prolog; expected 'lt;'
 at [row,col {unknown-source}]: [1,1])./u/pHR size=1
noshade=noshadeh3Apache Tomcat/6.0.18/h3/body/html


Thanks
Naveen


Re: ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen Gupta
Hi,


curl http://localhost:8983/solr/update?commit=true -H Content-Type:
text/xml --data-binary 'adddocfield
name=idtestdoc/field/doc/add'

Regards
Naveen

On Fri, Jun 10, 2011 at 10:18 AM, Naveen Gupta nkgiit...@gmail.com wrote:

 Hi

 This is my document

 in php

 $xmldoc = 'adddocfield name=idF_146/fieldfield
 name=userid74/fieldfield name=groupuseidgmail.com/fieldfield
 name=attachment_size121/fieldfield
 name=attachment_namesample.pptx/field/doc/add';

   $ch = curl_init(http://localhost:8080/solr/update;);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
   curl_setopt ($ch, CURLOPT_POST, 1);
   curl_setopt($ch, CURLOPT_HTTPHEADER, array(Content-Type:
 text/xml) );
   curl_setopt($ch, CURLOPT_POSTFIELDS,$xmldoc);

$result= curl_exec($ch);
if(!curl_errno($ch))
{
$info = curl_getinfo($ch);
$header = substr($response, 0, $info['header_size']);
echo 'Took ' . $info['total_time'] . ' seconds to send a
 request to ' . $info['url'];
  }else{
  print_r('no idea');
 }
 println('result of query'.'  '.' - '.$result);

 It is throwing error

  htmlheadtitleApache Tomcat/6.0.18 - Error
 report/titlestyle!--H1
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
 H2
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
 H3
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
 BODY
 {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
 P
 {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
 {color : black;}A.name {color : black;}HR {color : #525D76;}--/style
 /headbodyh1HTTP Status 400 - Unexpected character ''' (code 39) in
 prolog; expected 'lt;'
  at [row,col {unknown-source}]: [1,1]/h1HR size=1
 noshade=noshadepbtype/b Status report/ppbmessage/b
 uUnexpected character ''' (code 39) in prolog; expected 'lt;'
  at [row,col {unknown-source}]: [1,1]/u/ppbdescription/b uThe
 request sent by the client was syntactically incorrect (Unexpected character
 ''' (code 39) in prolog; expected 'lt;'
  at [row,col {unknown-source}]: [1,1])./u/pHR size=1
 noshade=noshadeh3Apache Tomcat/6.0.18/h3/body/html


 Thanks
 Naveen





Re: tika integration exception and other related queries

2011-06-08 Thread Naveen Gupta
Hi Gary

It started working .. though i did not test for Zip files, but for rar
files, it is working fine ..

only thing what i wanted to do is to index the metadata (text mapped to
content) not store the data  Also in search result, i want to filter the
stuffs ... and it started working fine .. i don't want to show the content
stuffs to the end user, since the way it extracts the information is not
very helpful to the user .. although we can apply few of the analyzers and
filters to remove the unnecessary tags ..still the information would not be
of much help .. looking for your opinion ... what you did in order to filter
out the content or are you showing the content extracted to the end user?

Even in case, we are showing the text part to the end user, how can i limit
the number of characters while querying the search results ... is there any
feature where we can achieve this ... the concept of snippet kind of thing
...

Thanks
Naveen

On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor g...@inovem.com wrote:

 Naveen,

 For indexing Zip files with Tika, take a look at the following thread :


 http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html

 I got it to work with the 3.1 source and a couple of patches.

 Hope this helps.

 Regards,
 Gary.



 On 08/06/2011 04:12, Naveen Gupta wrote:

 Hi Can somebody answer this ...

 3. can somebody tell me an idea how to do indexing for a zip file ?

 1. while sending docx, we are getting following error.





getting numberformat exception while using tika

2011-06-07 Thread Naveen Gupta
Hi

We are using requestextractinghandler and we are getting following error. we
are giving microsoft docx file for indexing.

I think that this is something to do with field date definition .. but now
very sure ...what field type should we use?

2. we are trying to index jpg (when we search over the name of the jpg, it
is not coming .. though in id i am passing one)

3. what about zip files or rar files.. does tika with solr handle this one ?

java.lang.NumberFormatException: For input string:
quot;2011-01-27T07:18:00Zquot;
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:412)
at java.lang.Long.parseLong(Long.java:461)
at org.apache.solr.schema.TrieField.createField(TrieField.java:434)
at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:98)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

Thanks
Naveen


tika integration exception and other related queries

2011-06-07 Thread Naveen Gupta
Hi Can somebody answer this ...

3. can somebody tell me an idea how to do indexing for a zip file ?

1. while sending docx, we are getting following error.

java.lang.

 NumberFormatException: For input string: quot;2011-01-27T07:18:00Zquot;
 at
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
 at java.lang.Long.parseLong(Long.java:412)
 at java.lang.Long.parseLong(Long.java:461)
 at org.apache.solr.schema.TrieField.createField(TrieField.java:434)
 at
 org.apache.solr.schema.SchemaField.createField(SchemaField.java:98)
 at
 org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204)
 at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
 at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)



Thanks
Naveen



On Tue, Jun 7, 2011 at 3:33 PM, Naveen Gupta nkgiit...@gmail.com wrote:

 Hi

 We are using requestextractinghandler and we are getting following error.
 we are giving microsoft docx file for indexing.

 I think that this is something to do with field date definition .. but now
 very sure ...what field type should we use?

 2. we are trying to index jpg (when we search over the name of the jpg, it
 is not coming .. though in id i am passing one)

 3. what about zip files or rar files.. does tika with solr handle this one
 ?






 java.lang.NumberFormatException: For input string:
 quot;2011-01-27T07:18:00Zquot;
 at
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
 at java.lang.Long.parseLong(Long.java:412)
 at java.lang.Long.parseLong(Long.java:461)
 at org.apache.solr.schema.TrieField.createField(TrieField.java:434)
 at
 org.apache.solr.schema.SchemaField.createField(SchemaField.java:98)
 at
 org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204)
 at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
 at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360

Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Naveen Gupta
Hi Tomas,

1. Regarding SolrInputDocument,

We are not using java client, rather we are using php solr, wrapping content
in SolrInputDocument, i am not sure how to do in PHP client? In this case,
we need tika related jars to avail the metadata such as content .. we
certainly don't want to handle all these things in PHP client.

 Secondly, what i was asking about commit strategy --

what about suppose you have 100 docs

iterate over 99 docs and fire curl without commit in url

and for 100th doc, we will use commit 

so doing so, will it also update the indexes for last 99 docs 

while(upto 99){
 curl_command = url without commit;
}

when i = 100, url would be commit

i wanted to achieve something similar to optimize kind of thing 

why these kind of use cases which are general purpose not included in
example (especially in other language ...java guys can easily do using API)

I am basically a Java Guy, so i can feel the problem

Thanks
Naveen
2011/6/6 Tomás Fernández Löbbe tomasflo...@gmail.com

 1. About the commit strategy, all the ExtractingRequestHandler (request
 handler that uses Tika to extract content from the input file) will do is
 extract the content of your file and add it to a SolrInputDocument. The
 commit strategy should not change because of this, compared to other
 documents you might be indexing. It is usually not recommended to commit on
 every new / updated document.

 2. Don't know if I understand the question. you can add all the static
 fields you want to the document by adding the literal. prefix to the name
 of the fields when using ExtractingRequestHandler (as you are doing with 
 literal.id). You can also leave empty fields if they are not marked as
 required at the schema.xml file. See:
 http://wiki.apache.org/solr/ExtractingRequestHandler#Literals

 3. Solr cores can work almost as completely different Solr instances. You
 could tell one core to replicate from another core. I don't think this
 would
 be of any help here. If you want to separate the indexing operations from
 the query operations, you could probably use different machines, that's
 usually a better option. Configure the indexing box as master and the query
 box as slave. Here you have some more information about it:
 http://wiki.apache.org/solr/SolrReplication

 Were this the answers you were looking for or did I misunderstand your
 questions?

 Tomás

 On Mon, Jun 6, 2011 at 2:54 AM, Naveen Gupta nkgiit...@gmail.com wrote:

  Hi
 
  Since it is php, we are using solphp for calling curl based call,
 
  what my concern here is that for each user, we might be having 20-40
  attachments needed to be indexed each day, and there are various users
  ..daily we are targeting around 500-1000 users ..
 
  right now if you see, we
 
  ?php
  $ch = curl_init('
  http://localhost:8010/solr/update/extract?literal.id=doc2commit=true');
   curl_setopt ($ch, CURLOPT_POST, 1);
   curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=@paper.pdf));
   $result= curl_exec ($ch);
  ?
 
  also we are planning to use other fields which are to be indexed and
 stored
  ...
 
 
  There are couple of questions here
 
  1. what would be the best strategies for commit. if we take all the
  documents in an array and iterating one by one and fire the curl and for
  the
  last doc, if we commit, will it work or for each doc, we need to commit?
 
  2. we are having several fields which are already defined in schema and
 few
  of the them are required earlier, but for this purpose, we don't want,
 how
  to have two requirement together in the same schema?
 
  3. since it is frequent commit, how to use solr multicore for write and
  read
  operations separately ?
 
  Thanks
  Naveen
 



TIKA INTEGRATION PERFORMANCE

2011-06-05 Thread Naveen Gupta
Hi

Since it is php, we are using solphp for calling curl based call,

what my concern here is that for each user, we might be having 20-40
attachments needed to be indexed each day, and there are various users
..daily we are targeting around 500-1000 users ..

right now if you see, we

?php
$ch = curl_init('
http://localhost:8010/solr/update/extract?literal.id=doc2commit=true');
 curl_setopt ($ch, CURLOPT_POST, 1);
 curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=@paper.pdf));
 $result= curl_exec ($ch);
?

also we are planning to use other fields which are to be indexed and stored
...


There are couple of questions here

1. what would be the best strategies for commit. if we take all the
documents in an array and iterating one by one and fire the curl and for the
last doc, if we commit, will it work or for each doc, we need to commit?

2. we are having several fields which are already defined in schema and few
of the them are required earlier, but for this purpose, we don't want, how
to have two requirement together in the same schema?

3. since it is frequent commit, how to use solr multicore for write and read
operations separately ?

Thanks
Naveen


different indexes for multitenant approach

2011-06-03 Thread Naveen Gupta
Hi

I want to implement different index strategy where we want to keep indexes
with respect to each tennant and we want to maintain indexes separately ...

first level of category -- company name

second level of category - company name + fields to be indexed

then further categories - group of different company name based on some
heuristic (hashing) (if it grows furhter)

i want to do in the same solr instance. can it be possible ?

Thanks
Naveen


Re: How to display search results of solr in to other application.

2011-06-03 Thread Naveen Gupta
Hi Romi

As per me, you need to understand how ajax with jquery works .. then go for
json and then jsonp (if you are fetching from different)

query here is dynamic query which you will be trying to hit solr .. (it
could be simple text, or more advanced query string)

http://wiki.apache.org/solr/CommonQueryParameters

Callback is the method name which you will define .. after getting response,
this method will be called (callback mechanism)

using the response from solr (json format), you need to show the response or
analyze the response as per your business need.

Thanks
Naveen


On Fri, Jun 3, 2011 at 12:00 PM, Romi romijain3...@gmail.com wrote:

 $.getJSON(
   http://[server]:[port]/solr/select/?jsoncallback=?;,
   {q: queryString,
   version: 2.2,
   start: 0,
   rows: 10,
   indent: on,
   json.wrf: callbackFunctionToDoSomethingWithOurData,
   wt: json,
   fl: field1}
   );

 would you please explain what are  queryString and json.wrf:
 callbackFunctionToDoSomethingWithOurData. and what if i want to change my
 query string each time.

 -
 Thanks  Regards
 Romi
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3018740.html
 Sent from the Solr - User mailing list archive at Nabble.com.



php library for extractrequest handler

2011-06-03 Thread Naveen Gupta
Hi

We want to post to solr server with some of the files (rtf,doc,etc) using
php .. one way is to post using curl

is there any client like java client (solrcell)

urls will also help

Thanks
Naveen


Re: Strategy -- Frequent updates in our application

2011-06-03 Thread Naveen Gupta
Hi Pravesh

We don't have that setup right now .. we are thinking of doing that 

for writes we are going to have one instance and for read, we are going to
have another...

do you have other design in mind .. kindly share

Thanks
Naveen

On Fri, Jun 3, 2011 at 2:50 PM, pravesh suyalprav...@yahoo.com wrote:

 You can use DataImportHandler for your full/incremental indexing. Now NRT
 indexing could vary as per business requirements (i mean delay cud be
 5-mins
 ,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will
 be indexed incrementally.
 BTW, r u having Master+Slave SOLR setup?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: php library for extractrequest handler

2011-06-03 Thread Naveen Gupta
Yes,

that one i used and it is working fine .thanks to nabble ..

Thanks
Naveen

On Fri, Jun 3, 2011 at 4:02 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Fri, Jun 3, 2011 at 3:55 PM, Naveen Gupta nkgiit...@gmail.com wrote:
  Hi
 
  We want to post to solr server with some of the files (rtf,doc,etc) using
  php .. one way is to post using curl

 Do not normally use PHP, and have not tried it myself.
 However, there is a PHP extension for Solr:
  http://wiki.apache.org/solr/SolPHP
  http://php.net/manual/en/book.solr.php

 Regards,
 Gora



tika and solr 3,1 integration

2011-06-02 Thread Naveen Gupta
Hi

I am trying to integrate solr 3.1 and tika (which comes default with the
version)

and using curl command trying to index few of the documents, i am getting
this error. the error is attr_meta field is unknown. i checked the
solrconfig, it looks perfect to me.

can you please tell me what i am missing.

I copied all the jars from contrib/extraction/lib to solr/lib folder that is
there in same place where conf is there 


I am using the same request handler which is coming with default

requestHandler name=/update/extract
  startup=lazy
  class=solr.extraction.ExtractingRequestHandler 
lst name=defaults
  !-- All the main content goes into text... if you need to return
   the extracted text or do highlighting, use a stored field. --
  str name=fmap.contenttext/str
  str name=lowernamestrue/str
  str name=uprefixignored_/str

  !-- capture link hrefs but ignore div attributes --
  str name=captureAttrtrue/str
  str name=fmap.alinks/str
  str name=fmap.divignored_/str
/lst
  /requestHandler





* curl 
http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdfuprefix=attr_attr_fmap.content=attr_contentcommit=true;
-F myfile=@/root/apache-solr-3.1.0/docs/who.pdf*


htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style
/headbodyh1HTTP Status 400 - ERROR:unknown field 'attr_meta'/h1HR
size=1 noshade=noshadepbtype/b Status report/ppbmessage/b
uERROR:unknown field 'attr_meta'/u/ppbdescription/b uThe
request sent by the client was syntactically incorrect (ERROR:unknown field
'attr_meta')./u/pHR size=1 noshade=noshadeh3Apache
Tomcat/6.0.18/h3/body/htmlroot@weforpeople:/usr/share/solr1/lib#


Please note

i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows machine
and using solr cell

calling the program works fine without any changes in configuration.

Thanks
Naveen


tika and solr 3,1 integration error

2011-06-02 Thread Naveen Gupta
Hi

I am trying to integrate solr 3.1 and tika (which comes default with the
version)

and using curl command trying to index few of the documents, i am getting
this error. the error is attr_meta field is unknown. i checked the
solrconfig, it looks perfect to me.

can you please tell me what i am missing.

I copied all the jars from contrib/extraction/lib to solr/lib folder that is
there in same place where conf is there 


I am using the same request handler which is coming with default

requestHandler name=/update/extract
  startup=lazy
  class=solr.extraction.

 ExtractingRequestHandler 
 lst name=defaults
   !-- All the main content goes into text... if you need to return
the extracted text or do highlighting, use a stored field. --
   str name=fmap.contenttext/str
   str name=lowernamestrue/str
   str name=uprefixignored_/str

   !-- capture link hrefs but ignore div attributes --
   str name=captureAttrtrue/str
   str name=fmap.alinks/str
   str name=fmap.divignored_/str
 /lst
   /requestHandler





 * curl 
 http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdfuprefix=attr_attr_fmap.content=attr_contentcommit=true;
 -F myfile=@/root/apache-solr-3.1.0/docs/who.pdf*


 htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
 H2
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
 H3
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
 BODY
 {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
 P
 {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
 {color : black;}A.name {color : black;}HR {color : #525D76;}--/style
 /headbodyh1HTTP Status 400 - ERROR:unknown field 'attr_meta'/h1HR
 size=1 noshade=noshadepbtype/b Status report/ppbmessage/b
 uERROR:unknown field 'attr_meta'/u/ppbdescription/b uThe
 request sent by the client was syntactically incorrect (ERROR:unknown field
 'attr_meta')./u/pHR size=1 noshade=noshadeh3Apache
 Tomcat/6.0.18/h3/body/htmlroot@weforpeople:/usr/share/solr1/lib#


 Please note

 i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows
 machine and using solr cell

 calling the program works fine without any changes in configuration.

 Thanks
 Naveen





Re: tika and solr 3,1 integration

2011-06-02 Thread Naveen Gupta
Hi

This is fixed .. yes, schema.xml was the culprit and i fixed it looking at
the sample schema provided in the sample.

But in windows, i am getting slf4j (illegalacess exception) which looks like
jar problem. looking at the fixes, suggested in their FAQs, they are
suggesting to use 1.5.5 version, which is already there in lib folder ..

i have been finding a lot of jars to be deployed .. i am afraid if that is
causing the problem ..

Has somebody experienced the same ?

Thanks
Naveen


On Fri, Jun 3, 2011 at 2:41 AM, Juan Grande juan.gra...@gmail.com wrote:

 Hi Naveen,

 Check if there is a dynamic field named attr_* in the schema. The
 uprefix=attr_ parameter means that if Solr can't find an extracted field
 in the schema, it'll add the prefix attr_ and try again.

 *Juan*



 On Thu, Jun 2, 2011 at 4:21 AM, Naveen Gupta nkgiit...@gmail.com wrote:

  Hi
 
  I am trying to integrate solr 3.1 and tika (which comes default with the
  version)
 
  and using curl command trying to index few of the documents, i am getting
  this error. the error is attr_meta field is unknown. i checked the
  solrconfig, it looks perfect to me.
 
  can you please tell me what i am missing.
 
  I copied all the jars from contrib/extraction/lib to solr/lib folder that
  is
  there in same place where conf is there 
 
 
  I am using the same request handler which is coming with default
 
  requestHandler name=/update/extract
   startup=lazy
   class=solr.extraction.ExtractingRequestHandler 
 lst name=defaults
   !-- All the main content goes into text... if you need to return
the extracted text or do highlighting, use a stored field. --
   str name=fmap.contenttext/str
   str name=lowernamestrue/str
   str name=uprefixignored_/str
 
   !-- capture link hrefs but ignore div attributes --
   str name=captureAttrtrue/str
   str name=fmap.alinks/str
   str name=fmap.divignored_/str
 /lst
   /requestHandler
 
 
 
 
 
  * curl 
 
 
 http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdfuprefix=attr_attr_fmap.content=attr_contentcommit=true
  
  -F myfile=@/root/apache-solr-3.1.0/docs/who.pdf*
 
 
  htmlheadtitleApache Tomcat/6.0.18 - Error
 report/titlestyle!--H1
 
 
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
  H2
 
 
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
  H3
 
 
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
  BODY
  {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;}
 B
 
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
  P
 
 
 {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
  {color : black;}A.name {color : black;}HR {color : #525D76;}--/style
  /headbodyh1HTTP Status 400 - ERROR:unknown field
 'attr_meta'/h1HR
  size=1 noshade=noshadepbtype/b Status
  report/ppbmessage/b
  uERROR:unknown field 'attr_meta'/u/ppbdescription/b uThe
  request sent by the client was syntactically incorrect (ERROR:unknown
 field
  'attr_meta')./u/pHR size=1 noshade=noshadeh3Apache
  Tomcat/6.0.18/h3/body/htmlroot@weforpeople:/usr/share/solr1/lib#
 
 
  Please note
 
  i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows
  machine
  and using solr cell
 
  calling the program works fine without any changes in configuration.
 
  Thanks
  Naveen
 



Strategy -- Frequent updates in our application

2011-06-02 Thread Naveen Gupta
Hi

We are having an application where every 10 mins, we are doing indexing of
users docs repository, and eventually, if some thread is being added in that
particular discussion, we need to index the thread again (please note we are
not doing blind indexing each time, we have various rules to filter out
which thread is new and thus that is a candidate for indexing plus new ones
which has arrived).

So we are doing updates for each user docs repository .. the performance is
not looking so far very good. the future is that we are going to get hits in
volume(1000 to 10,000 hits per mins), so looking for strategy where we can
tune solr in order to index the data in real time

and what about NRT, is it fine to apply in this case of scenario. i read
that solr NRT is not very good in performance, but i am not going to believe
it since it is one of the best open sources ..so it is going to have this
problem sorted in near future ..but if any benchmark is there, kindly share
with me ... we would like to analyze with our requirements.

Is there any way to add incremental indexes which we generally find in other
search engine like endeca and etc? i don't know much in detail about solr...
since i am newbie, so can you please tell me if we can have some settings
which can keep track of incremental indexing?


Thanks
Naveen