Thought exercise: features for Solr client

2013-11-14 Thread Alexandre Rafalovitch
Hello,

I am trying to imagine what would a new, fresh, Solr client library look
like. There has been a number of features added to Solr recently, so some
of the older libraries do not necessarily support them as well (e.g.
multi-collections, soft commits, multiple handler end-points, schema
auto-discovery, etc).
 If one were to write a new client, what would a useful version 1 would
look like for modern Solr? At the moment, I am not talking of a specific
implementation language. Stil, if you have any thoughts on that, they are
welcome too.

My own thought center around two directions that a library would need to
support:
1) Indexing on the backend
2) Middle-layers between the website and Solr doing some sort of query
security, enhancement, normalization, etc

Any thoughts?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


Re: exceeded limit of maxWarmingSearchers ERROR

2013-11-14 Thread Loka
Hi Naveen,
Iam also getting the similar problem where I do not know how to use the
commitWithin Tag, can you help me how to use commitWithin Tag. can you give
me the example



--
View this message in context: 
http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100864.html
Sent from the Solr - User mailing list archive at Nabble.com.


Configure maxConnectionsPerHost

2013-11-14 Thread yriveiro
Hi,

Where can I configure the maxConnectionsPerHost on Solr?

I'm using Solr 4.5.1 with the old style of solr.xml (I have a lot of
collections and switch to the new style of solr.xml is too much work)



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configure-maxConnectionsPerHost-tp4100870.html
Sent from the Solr - User mailing list archive at Nabble.com.


Optimizing cores in SolrCloud

2013-11-14 Thread michael.boom
A few weeks ago optimization in SolrCloud was discussed in this thred:
http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-td4097499.html#a4098020

The thread was covering the distributed optimization inside a collection.
My use case requires manually running optimizations every week or so,
because I do delete by query often, and deletedDocs number gets to huge
amounts, and the only way to regain that space is by optimizing.

Since I have a pretty steady high load, I can't do it over night and i was
thinking to do it one core at a time - meaning optimizing shard1_replica1
and then shard1_replica2 and so on, using 
curl
'http://localhost:8983/solr/collection1_shard1_replica1/update?optimize=truedistrib=false'

My question is how would this reflect on the performance of the system? All
queries that would be routed to that shard replica would be very slow I
assume. 

Would there be any problems if a replica is optimized and another is not?
Anybody tried something like this? Any tips or stories ?
Thank you!



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Thought exercise: features for Solr client

2013-11-14 Thread Alvaro Cabrerizo
Here goes my wishlist:

   - Transaction management
   - Access control at document level

Regards.


On Thu, Nov 14, 2013 at 10:35 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Hello,

 I am trying to imagine what would a new, fresh, Solr client library look
 like. There has been a number of features added to Solr recently, so some
 of the older libraries do not necessarily support them as well (e.g.
 multi-collections, soft commits, multiple handler end-points, schema
 auto-discovery, etc).
  If one were to write a new client, what would a useful version 1 would
 look like for modern Solr? At the moment, I am not talking of a specific
 implementation language. Stil, if you have any thoughts on that, they are
 welcome too.

 My own thought center around two directions that a library would need to
 support:
 1) Indexing on the backend
 2) Middle-layers between the website and Solr doing some sort of query
 security, enhancement, normalization, etc

 Any thoughts?

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



Re: Thought exercise: features for Solr client

2013-11-14 Thread Michael Sokolov
I think there is a place for a client-side query hierarchy.  It would be 
nice if you could build a Lucene Query and the Solr client would 
serialize it for you.  If there were a general-purpose query 
serialization library then you could support a similar programming model 
for Lucene-only and with Solr. It would be useful for all kinds of 
things, since you wouldn't be tied to the query parser zoo.  The XML QP 
is a possible starting place for a serialization format, but I think 
ultimately to do this, Query would have to add support for some kind of 
generic representation (eg a map of children which could be primitives 
or queries).


-Mike

On 11/14/13 4:35 AM, Alexandre Rafalovitch wrote:

Hello,

I am trying to imagine what would a new, fresh, Solr client library look
like. There has been a number of features added to Solr recently, so some
of the older libraries do not necessarily support them as well (e.g.
multi-collections, soft commits, multiple handler end-points, schema
auto-discovery, etc).
  If one were to write a new client, what would a useful version 1 would
look like for modern Solr? At the moment, I am not talking of a specific
implementation language. Stil, if you have any thoughts on that, they are
welcome too.

My own thought center around two directions that a library would need to
support:
1) Indexing on the backend
2) Middle-layers between the website and Solr doing some sort of query
security, enhancement, normalization, etc

Any thoughts?

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)





RE: distributed search is significantly slower than direct search

2013-11-14 Thread Elran Dvir
Hi,

We tried returning just the id field and got exactly the same performance.
Our system is distributed but all shards are in a single machine so network 
issues are not a factor.
The code we found where Solr is spending its time is on the shard and not on 
the routing core, again all shards are local.
We investigated the getFirstMatch() method and noticed that the 
MultiTermEnum.reset (inside MultiTerm.iterator) and MultiTerm.seekExact take 
99% of the time. 
Inside these methods, the call to 
BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock  takes most 
of the time.
Out of the 7 seconds  run these methods take ~5 and BinaryResponseWriter.write 
takes the rest(~ 2 seconds).

We tried increasing cache sizes and got hits, but it only improved the query 
time by a second (~6), so no major effect.
We are not indexing during our tests. The performance is similar.
(How do we measure doc size? Is it important due to the fact that the 
performance is the same when returning only id field?)

We still don't completely understand why the query takes this much longer 
although the cores are on the same machine.

Is there a way to improve the performance (code, configuration, query)?

-Original Message-
From: idokis...@gmail.com [mailto:idokis...@gmail.com] On Behalf Of Manuel Le 
Normand
Sent: Thursday, November 14, 2013 1:30 AM
To: solr-user@lucene.apache.org
Subject: Re: distributed search is significantly slower than direct search

It's surprising such a query takes a long time, I would assume that after 
trying consistently q=*:* you should be getting cache hits and times should be 
faster. Try see in the adminUI how do your query/doc cache perform.
Moreover, the query in itself is just asking the first 5000 docs that were 
indexed (returing the first [docid]), so seems all this time is wasted on 
transfer. Out of these 7 secs how much is spent on the above method? What do 
you return by default? How big is every doc you display in your results?
Might be the matter that both collections work on the same ressources. Try 
elaborating your use-case.

Anyway, it seems like you just made a test to see what will be the performance 
hit in a distributed environment so I'll try to explain some things we 
encountered in our benchmarks, with a case that has at least the similarity of 
the num of docs fetched.

We reclaim 2000 docs every query, running over 40 shards. This means every 
shard is actually transfering to our frontend 2000 docs every document-match 
request (the first you were referring to). Even if lazily loaded, reading 2000 
id's (on 40 servers) and lazy loading the fields is a tough job. Waiting for 
the slowest shard to respond, then sorting the docs and reloading (lazy or not) 
the top 2000 docs might take a long time.

Our times are 4-8 secs, but do it's not possible comparing cases. We've done 
few steps that improved it along the way, steps that led to others.
These were our starters:

   1. Profile these queries from different servers and solr instances, try
   putting your finger what collection is working hard and why. Check if
   you're stuck on components that don't have an added value for you but are
   used by default.
   2. Consider eliminating the doc cache. It loads lots of (partly) lazy
   documents that their probability of secondary usage is low. There's no such
   thing popular docs when requesting so many docs. You may be using your
   memory in a better way.
   3. Bottleneck check - inner server metrics as cpu user / iowait, packets
   transferred over the network, page faults etc. are excellent in order to
   understand if the disk/network/cpu is slowing you down. Then upgrade
   hardware in one of the shards to check if it helps by looking at the
   upgraded shard qTime compared to other.
   4. Warm up the index after commiting - try to benchmark how do queries
   performs before and after some warm-up, let's say some few hundreds of
   queries (from your previous system) in order to warm up the os cache
   (assuming your using NRTDirectoryFactory)


Good luck,
Manu


On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson erickerick...@gmail.comwrote:

 One thing you can try, and this is more diagnostic than a cure, is 
 return just the id field (and insure that lazy field loading is true). 
 That'll tell you whether the issue is actually fetching the document 
 off disk and decompressing, although frankly that's unlikely since you 
 can get your 5,000 rows from a single machine quickly.

 The code you found where Solr is spending its time, is that on the 
 routing core or on the shards? I actually have a hard time 
 understanding how that code could take a long time, doesn't seem 
 right.

 You are transferring 5,000 docs across the network, so it's possible 
 that your network is just slow, that's certainly a difference between 
 the local and remote case, but that's a stab in the dark.

 Not much help I know,
 Erick



 On Wed, Nov 13, 2013 at 2:52 AM, Elran 

Re: Solr Synonym issue

2013-11-14 Thread Rafał Kuć
Hello!

Could you please describe the issue you are having?

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/



Hi Team,

I had implemented solr with my magento enterprise edition. I am trying to 
implemented synonyms in solr but its not working.Please find attached for the 
synonyms.txt,schema.xml and solrconf.xml file.

Since 2 days i am debugging the issue yet not finding any solution.  
Please help me as soon as possible.

Hope i will get your reply as soon as possible.

Thanks  Regards
Jyoti Kadam



Re: solrcloud - forward update to a shard failed

2013-11-14 Thread Aileen
Thanks Michael.  Followed your advice - no commits from indexing clients; let 
auto commit takes care of things.  It worked, so far no errors.   The config 
params needs some more tweaking to get the right balance, specifically maxTime, 
maxDocs and the soft commit interval, but otherwise sold is a lot more 
healthier...

Thanks for your help.


 
 
 I did something like that also, and i was getting some nasty problems when 
 one of my clients would try to commit before a commit issued by another one 
 hadn't yet finish. Might be the same problem for you too.
 
 Try not doing explicit commits fomr the indexing client and instead set the 
 autocommit to 1000 docs or whichever value fits you best.
 
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100670.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Updating Document Score With Payload of Multivalued Field?

2013-11-14 Thread Furkan KAMACI
Any ideas?


2013/11/13 Furkan KAMACI furkankam...@gmail.com

 PS: I use Solr 4.5.1


 2013/11/13 Furkan KAMACI furkankam...@gmail.com

 Here is my case;

 I have a field at my schema named *elmo_field*. I want that *elmo_field* 
 should
 have multiple values and multiple payloads. i.e.

 dorothy|0.46
 sesame|0.37
 big bird|0.19
 bird|0.22

 When a user searches for a keyword i.e. *dorothy* I want to add 0.46 to
 score. If user searches for *big bird *0.19 and if user searches for *bird
 *0.22

 I mean I will make a search on my index at my other fields of solr
 schema.  And I will make another search (this one is an exact match search)
 at *elmo_field* at same time and if matches something I will increase
 score with payloads.

 How can I do that: adding something to score at multivalued payload (with
 a nested query or not) and do you have any other ideas to achieve that?







Solr Release Management Process

2013-11-14 Thread Furkan KAMACI
Hi;

I've asked the same question at dev-list but I could not get an answer.
This question is related to Solr contributers too and I wanted to ask it
here. solr-user list. My question was that:


I've resolved 2 issues last week. One of them is created by me and one of
them was an existence issue. Also there is an 3rd issue that is a
duplication of the second one.

When I create an issue I have a right to edit Fix Version/s. I've written
4.6 for fix version of first issue. Second issue was not created by me so I
can not edit the Fix Version/s.

I just wonder and want to learn commitment process of Solr project. What
committers do before a new release process start? If they filter the
resolved issues that has a Fix Version/s of new release they will not able
to see resolved issues. If they filter the issues resolved since the last
release then they are not using the benefits of Fix Version/s section.
People have a right to edit Fix Version/s section when they create an issue
but does not have a right to edit existence one (ones are created by other
people)

There are many issues at Solr project and frequent commits every day.
Should I point the user at comments (with an @ tag) for such kind of
situations (I follow who is responsible for next release from dev-list) or
do you handle it yourself (as like how you handled it since this time).

I just wanted to learn the internal process of release management.

Thanks;
Furkan KAMACI


Solr xml img parsing exception

2013-11-14 Thread Marcello Lorenzi

Hi,
I have installed a Solr 4.3 instance and we have configured manifoldcf 
to pass web content to the shard collection, but during the crawling we 
have noticed a lot of this exception:


ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: 
org.apache.tika.exception.TikaException: XML parse error
at 
com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:150)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:107)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:76)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:934)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:90)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:515)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1012)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:642)
at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.tika.exception.TikaException: XML parse error
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:147)

... 24 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 
105; The element type img must be terminated by the matching end-tag 
/img.
at 
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
at 
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at 
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
at 
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
at 
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1753)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2951)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at 
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:846)
at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:775)
at 
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at 

Re: My setup - init script and other info

2013-11-14 Thread Erick Erickson
Shawn:

Would you be willing to put this on the Wiki? I think it'd be really useful
to have it there...

I'm pretty sure you have edit rights to the wiki, but they're free for the
asking if not...

Erick


On Wed, Nov 13, 2013 at 1:07 PM, Shawn Heisey s...@elyograg.org wrote:

 In the hopes that it will help someone get Solr running in a very clean
 way, here's an informational email.

 For my Solr install on CentOS 6, I use /opt/solr4 as my installation path,
 and /index/solr4 as my solr home.  The /index directory is a dedicated
 filesystem, /opt is part of the root filesystem.

 From the example directory, I copied cloud-scripts, contexts, etc, lib,
 webapps, and start.jar over to /opt/solr4.  My stuff was created before
 4.3.0, so the resources directory didn't exist.  I was already using log4j
 with a custom Solr build, and I put my log4j.properties file in etc
 instead.  I created a logs directory and a run directory in /opt/solr4.

 My data structure in /index/solr4 is complex.  All a new user really needs
 to know is that solr.xml goes here and dictates the rest of the structure.
  There is a symlink at /index/solr4/lib, pointing to /opt/solr4/solrlib -
 so that jars placed in ${solr.solr.home}/lib are actually located in the
 program directory, not the data directory.  That makes for a much cleaner
 version control scenario - both directories are git repositories cloned
 from our internal git server.

 Unlike the example configs, my solrconfig.xml files do not have lib
 directives for loading jars.  That gets automatically handled by the jars
 living in that symlinked lib directory.  See SOLR-4852 for caveats
 regarding central lib directories.

 https://issues.apache.org/jira/browse/SOLR-4852

 If you want to run SolrCloud, you would need to install zookeeper
 separately and put your zkHost parameter in solr.xml.  Due to a bug,
 putting zkHost in solr.xml doesn't work properly until 4.4.0.

 Here's the current state of my init script.  It's redhat-specific.  I used
 /bin/bash (instead of /bin/sh) in the shebang because I am pretty sure that
 there are bash-isms in it, and bash is always available on the systems that
 I use:

 http://apaste.info/9fVA

 Notable features:
 * Runs Solr as an unprivileged user.
 * Has three methods for stopping Solr, tries graceful methods first.
  1) The jetty STOPPORT/STOPKEY mechanism.
  2) PID saved by the 'start' action.
  3) Any program using the Solr listening port.
 * Before killing by PID, tries to make sure that the process actually is
 Solr.
 * Sets up remote JMX, by default without authentication or SSL.
 * Highly tuned CMS garbage collection.
 * Sets up GC logging.
 * Virtually everything is overridable via /etc/sysconfig/solr4.
 * Points at an overridable log4j config file, by default in /opt/solr4/etc.
 * Removes the existing PID file if the server is just booting up -- which
 it knows by noting that server uptime is less than three minutes.

 It shouldn't be too hard to convert this so it works on debian-derived
 systems.  That would involve rewriting portions that use redhat init
 routines, and probably start-stop-daemon. What I'd really like is one
 script that will work on any system, but that will require a fair amount of
 work.

 It's a work in progress.  It should load log4j.properties from resources
 instead of etc. I'd like to include it in the Solr download, but without a
 fair amount of documentation and possibly an installation script, which
 still must be written, that won't be possible.

 Feel free to ask questions about anything that doesn't seem clear. I
 welcome ideas for improvement on both my own setup and the solr example.

 Thanks,
 Shawn




Re: Atomic Update at Solrj For a Newly Added Schema Field

2013-11-14 Thread Erick Erickson
I don't think this is a problem, what are you seeing? Have you
tried it and get an error?

The only reason you need to have fields stored is so _existing_
documents with _existing_ data gets into the new doc. Since
you've just added a field, you should be fine. It's just that updating
documents already in your index won't have any value in the new
field unless you specifically add it in the new version. So yes, to
get values in all of your _existing_ records you need to at least
add all the docs again, in which case you might as well re-index.

But if you can live with some of the docs not having the value,
you shouldn't need to.

If you're seeing other behavior tell us what you're seeing..

Best,
Erick


On Wed, Nov 13, 2013 at 1:10 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 I use Solr 4.5.1 I have indexed some documents and decided to add a new
 field to my schema after a time later. I want to use Atomic Updates for
 that newly added field. I use Solrj for indexing. However due to there is
 no field named as I've newly added Solr does not make an atomic update for
 existing documents. I do not want to reindex my whole data. Any ideas for
 it?



Re: Using data-config.xml from DIH in SolrJ

2013-11-14 Thread Erick Erickson
There's nothing that I know of that takes a DIH configuration and
uses it through SolrJ. You can use Tika directly in SolrJ if you
need to parse structured documents though, see:
http://searchhub.org/2012/02/14/indexing-with-solrj/

Yep, you're going to be kind of reinventing the wheel a bit I'm
afraid.

Best,
Erick


On Wed, Nov 13, 2013 at 1:55 PM, P Williams
williams.tricia.l...@gmail.comwrote:

 Hi All,

 I'm building a utility (Java jar) to create SolrInputDocuments and send
 them to a HttpSolrServer using the SolrJ API.  The intention is to find an
 efficient way to create documents from a large directory of files (where
 multiple files make one Solr document) and be sent to a remote Solr
 instance for update and commit.

 I've already solved the problem using the DataImportHandler (DIH) so I have
 a data-config.xml that describes the templated fields and cross-walking of
 the source(s) to the schema.  The original data won't always be able to be
 co-located with the Solr server which is why I'm looking for another
 option.

 I've also already solved the problem using ant and xslt to create a
 temporary (and unfortunately a potentially large) document which the
 UpdateHandler will accept.  I couldn't think of a solution that took
 advantage of the XSLT support in the UpdateHandler because each document is
 created from multiple files.  Our current dated Java based solution
 significantly outperforms this solution in terms of disk and time.  I've
 rejected it based on that and gone back to the drawing board.

 Does anyone have any suggestions on how I might be able to reuse my DIH
 configuration in the SolrJ context without re-inventing the wheel (or DIH
 in this case)?  If I'm doing something ridiculous I hope you'll point that
 out too.

 Thanks,
 Tricia



Re: field collapsing performance in sharded environment

2013-11-14 Thread Erick Erickson
bq:   Of the 10k docs,
most have a unique near duplicate hash value, so there are about 10k unique
values for the field that I'm grouping on.

I suspect (but don't know the grouping code well) that this is the issue.
You're
getting the top N groups, right? But in the general case, you can't insure
that the
topN from shard1 has any relation to the topN from shard2. So I _suspect_
that
the code returns all of the groups. Say that shard1 for group 5 has 3 docs,
but
for shard2 has 3,000 docs. Do get the true top N, you need to collate all
the values
from all the groups; you can't just return the top 10 groups from each
shard and
get correct counts.

Since your group cardinality is about 10K/shard, you're pushing 10 packets
each
containing 10K entries back to the originating shard, which has to
combine/sort
them all to get the true top N. At least that's my theory.

Your situation is special in that you say that your groups don't appear on
more than
one shard, so you'd probably have to write something that aborted this
behavior and
returned only the top N, if I'm right.

But that begs the question of why you're doing this. What purpose is served
by
grouping on documents that probably only have 1 member?

Best,
Erick


On Wed, Nov 13, 2013 at 2:46 PM, David Anthony Troiano 
dtroi...@basistech.com wrote:

 Hello,

 I'm hitting a performance issue when using field collapsing in a
 distributed Solr setup and I'm wondering if others have seen it and if
 anyone has an idea to work around. it.

 I'm using field collapsing to deduplicate documents that have the same near
 duplicate hash value, and deduplicating at query time (as opposed to
 filtering at index time) is a requirement.  I have a sharded setup with 10
 cores (not SolrCloud), each having ~1000 documents each.  Of the 10k docs,
 most have a unique near duplicate hash value, so there are about 10k unique
 values for the field that I'm grouping on.  The grouping parameters that
 I'm using are:

 group=true
 group.field=near dupe hash field
 group.main=true

 I'm attempting distributed queries (shards=s1,s2,...,s10) where the only
 difference is the absence or presence of these three grouping parameters
 and I'm consistently seeing a marked difference in performance (as a
 representative data point, 200ms latency without grouping and 1600ms with
 grouping).  Interestingly, if I put all 10k docs on the same core and query
 that core independently with and without grouping, I don't see much of a
 latency difference, so the performance degradation seems to exist only in
 the sharded setup.

 Is there a known performance issue when field collapsing in a sharded setup
 (perhaps only manifests when the grouping field has many unique values), or
 have other people observed this?  Any ideas for a workaround?  Note that
 docs in my sharded setup can only have the same signature if they're in the
 same shard, so perhaps that can be used to boost perf, though I don't see
 an exposed way to do so.

 A follow-on question is whether we're likely to see the same issue if /
 when we move to SolrCloud.

 Thanks,
 Dave



Re: queries including time zone

2013-11-14 Thread Erick Erickson
IMO you will save yourself endless grief just biting the bullet and working
with UTC
at all times. The instant you have uses in even adjacent but different time
zones,
you'll have to deal with this anyway.

FWIW,
Erick


On Thu, Nov 14, 2013 at 12:26 AM, Jack Krupansky j...@basetechnology.comwrote:

 I believe it is the TZ column from this table:
 http://en.wikipedia.org/wiki/List_of_tz_database_time_zones

 Yeah, it's on my TODO list for my book.

 I suspect that tz will not affect NOW, which is probably UTC. I
 suspect that tz only affects literal dates in date math.

 -- Jack Krupansky

 -Original Message- From: Eric Katherman
 Sent: Wednesday, November 13, 2013 11:38 PM
 To: solr-user@lucene.apache.org
 Subject: queries including time zone


 Can anybody provide any insight about using the tz param? The behavior of
 this isn't affecting date math and /day rounding.  What format does the tz
 variables need to be in?  Not finding any documentation on this.

 Sample query we're using:

 path=/select params={tz=America/Chicagosort=id+descstart=0q=
 application_id:51b30ed9bc571bd96773f09c+AND+object_key:object_26+AND+
 values_field_215_date:[*+TO+NOW/DAY%2B1DAY]wt=jsonrows=25}

 Thanks!
 Eric=



Re: exceeded limit of maxWarmingSearchers ERROR

2013-11-14 Thread Erick Erickson
CommitWithin is either configured in solrconfig.xml for the
autoCommit or autoSoftCommit tags as the maxTime tag. I
recommend you do use this.

The other way you can do it is if you're using SolrJ, one of the
forms of the server.add() method takes a number of milliseconds
to force a commit.

You really, really do NOT want to use ridiculously short times for this
like a few milliseconds. That will cause new searchers to be
warmed, and when too many of them are warming at once you
get this error.

Seriously, make your commitWithin or autocommit parameters
as long as you can, for many reasons.

Here's a bunch of background:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick


On Thu, Nov 14, 2013 at 5:13 AM, Loka lokanadham.ga...@zensar.in wrote:

 Hi Naveen,
 Iam also getting the similar problem where I do not know how to use the
 commitWithin Tag, can you help me how to use commitWithin Tag. can you give
 me the example



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100864.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Optimizing cores in SolrCloud

2013-11-14 Thread Erick Erickson
I'm going to answer with something completely different G

First, though, optimization happens in the background, so it
shouldn't have too big an impact on query performance outside of
I/O contention. There also shouldn't be any problem with one
shard being optimized and one not.

Second, have you considered tweaking some of the TieredMergePolicy
knobs? In particular.
reclaimDeletesWeight
which defaults to 2.0. You can set this in your solrconfig.xml. Through
a clever bit of reflection, you can actually set most (all?) of the
member vars in TieredMergePolicy.java.

Bumping up the weight might cause the segment merges to merge-away
the deleted docs frequently enough to satisfy you.

Best,
Erick


On Thu, Nov 14, 2013 at 5:39 AM, michael.boom my_sky...@yahoo.com wrote:

 A few weeks ago optimization in SolrCloud was discussed in this thred:

 http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-td4097499.html#a4098020

 The thread was covering the distributed optimization inside a collection.
 My use case requires manually running optimizations every week or so,
 because I do delete by query often, and deletedDocs number gets to huge
 amounts, and the only way to regain that space is by optimizing.

 Since I have a pretty steady high load, I can't do it over night and i was
 thinking to do it one core at a time - meaning optimizing shard1_replica1
 and then shard1_replica2 and so on, using
 curl
 '
 http://localhost:8983/solr/collection1_shard1_replica1/update?optimize=truedistrib=false
 '

 My question is how would this reflect on the performance of the system? All
 queries that would be routed to that shard replica would be very slow I
 assume.

 Would there be any problems if a replica is optimized and another is not?
 Anybody tried something like this? Any tips or stories ?
 Thank you!



 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: solrcloud - forward update to a shard failed

2013-11-14 Thread Erick Erickson
Here's a writeup on the interactions between a number of the parameters
for soft/hard commits, NRT, and transaction logs. FWIW.

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick


On Thu, Nov 14, 2013 at 8:22 AM, Aileen ail...@kriel.org wrote:

 Thanks Michael.  Followed your advice - no commits from indexing clients;
 let auto commit takes care of things.  It worked, so far no errors.   The
 config params needs some more tweaking to get the right balance,
 specifically maxTime, maxDocs and the soft commit interval, but otherwise
 sold is a lot more healthier...

 Thanks for your help.


 
 
  I did something like that also, and i was getting some nasty problems
 when one of my clients would try to commit before a commit issued by
 another one hadn't yet finish. Might be the same problem for you too.
 
  Try not doing explicit commits fomr the indexing client and instead set
 the autocommit to 1000 docs or whichever value fits you best.
 
 
 
 
  -
  Thanks,
  Michael
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100670.html
  Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr xml img parsing exception

2013-11-14 Thread Erick Erickson
It looks like bad data. The XML you're sending to Solr looks mal-formed, so
I
suspect this is completely outside of Solr's purview.

Best,
Erick


On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi mlore...@sorint.itwrote:

 Hi,
 I have installed a Solr 4.3 instance and we have configured manifoldcf to
 pass web content to the shard collection, but during the crawling we have
 noticed a lot of this exception:

 ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException;
 org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException:
 XML parse error
 at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
 CwsExtractingDocumentLoader.java:150)
 at org.apache.solr.handler.ContentStreamHandlerBase.
 handleRequestBody(ContentStreamHandlerBase.java:74)
 at org.apache.solr.handler.RequestHandlerBase.handleRequest(
 RequestHandlerBase.java:135)
 at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
 handleRequest(RequestHandlers.java:242)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
 at org.apache.solr.servlet.SolrDispatchFilter.execute(
 SolrDispatchFilter.java:656)
 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:359)
 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:155)
 at org.apache.catalina.core.ApplicationFilterChain.
 internalDoFilter(ApplicationFilterChain.java:241)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(
 ApplicationFilterChain.java:208)
 at org.apache.catalina.core.StandardWrapperValve.invoke(
 StandardWrapperValve.java:221)
 at org.apache.catalina.core.StandardContextValve.invoke(
 StandardContextValve.java:107)
 at org.apache.catalina.core.StandardHostValve.invoke(
 StandardHostValve.java:155)
 at org.apache.catalina.valves.ErrorReportValve.invoke(
 ErrorReportValve.java:76)
 at org.apache.catalina.valves.AccessLogValve.invoke(
 AccessLogValve.java:934)
 at org.apache.catalina.core.StandardEngineValve.invoke(
 StandardEngineValve.java:90)
 at org.apache.catalina.connector.CoyoteAdapter.service(
 CoyoteAdapter.java:515)
 at org.apache.coyote.http11.AbstractHttp11Processor.process(
 AbstractHttp11Processor.java:1012)
 at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.
 process(AbstractProtocol.java:642)
 at org.apache.coyote.http11.Http11NioProtocol$
 Http11ConnectionHandler.process(Http11NioProtocol.java:223)
 at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
 doRun(NioEndpoint.java:1597)
 at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
 run(NioEndpoint.java:1555)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(
 ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(
 ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Caused by: org.apache.tika.exception.TikaException: XML parse error
 at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
 at org.apache.tika.parser.CompositeParser.parse(
 CompositeParser.java:242)
 at org.apache.tika.parser.CompositeParser.parse(
 CompositeParser.java:242)
 at org.apache.tika.parser.AutoDetectParser.parse(
 AutoDetectParser.java:120)
 at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
 CwsExtractingDocumentLoader.java:147)
 ... 24 more
 Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
 105; The element type img must be terminated by the matching end-tag
 /img.
 at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
 createSAXParseException(ErrorHandlerWrapper.java:198)
 at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
 fatalError(ErrorHandlerWrapper.java:177)
 at com.sun.org.apache.xerces.internal.impl.
 XMLErrorReporter.reportError(XMLErrorReporter.java:441)
 at com.sun.org.apache.xerces.internal.impl.
 XMLErrorReporter.reportError(XMLErrorReporter.java:368)
 at com.sun.org.apache.xerces.internal.impl.XMLScanner.
 reportFatalError(XMLScanner.java:1388)
 at com.sun.org.apache.xerces.internal.impl.
 XMLDocumentFragmentScannerImpl.scanEndElement(
 XMLDocumentFragmentScannerImpl.java:1753)
 at com.sun.org.apache.xerces.internal.impl.
 XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(
 XMLDocumentFragmentScannerImpl.java:2951)
 at com.sun.org.apache.xerces.internal.impl.
 XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
 at com.sun.org.apache.xerces.internal.impl.
 XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
 at com.sun.org.apache.xerces.internal.impl.
 XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl
 .java:511)
 at com.sun.org.apache.xerces.internal.parsers.
 

Re: Solr xml img parsing exception

2013-11-14 Thread Erik Hatcher
Also there's a custom loader here that is the culprit:  
com.lsegroup.solr.handler.CwsExtractingDocumentLoader

On Nov 14, 2013, at 10:20, Erick Erickson erickerick...@gmail.com wrote:

 It looks like bad data. The XML you're sending to Solr looks mal-formed, so
 I
 suspect this is completely outside of Solr's purview.
 
 Best,
 Erick
 
 
 On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi mlore...@sorint.itwrote:
 
 Hi,
 I have installed a Solr 4.3 instance and we have configured manifoldcf to
 pass web content to the shard collection, but during the crawling we have
 noticed a lot of this exception:
 
 ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException;
 org.apache.solr.common.SolrException: 
 org.apache.tika.exception.TikaException:
 XML parse error
at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
 CwsExtractingDocumentLoader.java:150)
at org.apache.solr.handler.ContentStreamHandlerBase.
 handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
 RequestHandlerBase.java:135)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
 handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at org.apache.solr.servlet.SolrDispatchFilter.execute(
 SolrDispatchFilter.java:656)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:359)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:155)
at org.apache.catalina.core.ApplicationFilterChain.
 internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
 ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(
 StandardWrapperValve.java:221)
at org.apache.catalina.core.StandardContextValve.invoke(
 StandardContextValve.java:107)
at org.apache.catalina.core.StandardHostValve.invoke(
 StandardHostValve.java:155)
at org.apache.catalina.valves.ErrorReportValve.invoke(
 ErrorReportValve.java:76)
at org.apache.catalina.valves.AccessLogValve.invoke(
 AccessLogValve.java:934)
at org.apache.catalina.core.StandardEngineValve.invoke(
 StandardEngineValve.java:90)
at org.apache.catalina.connector.CoyoteAdapter.service(
 CoyoteAdapter.java:515)
at org.apache.coyote.http11.AbstractHttp11Processor.process(
 AbstractHttp11Processor.java:1012)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.
 process(AbstractProtocol.java:642)
at org.apache.coyote.http11.Http11NioProtocol$
 Http11ConnectionHandler.process(Http11NioProtocol.java:223)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
 doRun(NioEndpoint.java:1597)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
 run(NioEndpoint.java:1555)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
 ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
 ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
 Caused by: org.apache.tika.exception.TikaException: XML parse error
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
at org.apache.tika.parser.CompositeParser.parse(
 CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(
 CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(
 AutoDetectParser.java:120)
at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
 CwsExtractingDocumentLoader.java:147)
... 24 more
 Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
 105; The element type img must be terminated by the matching end-tag
 /img.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
 createSAXParseException(ErrorHandlerWrapper.java:198)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
 fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.
 XMLErrorReporter.reportError(XMLErrorReporter.java:441)
at com.sun.org.apache.xerces.internal.impl.
 XMLErrorReporter.reportError(XMLErrorReporter.java:368)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.
 reportFatalError(XMLScanner.java:1388)
at com.sun.org.apache.xerces.internal.impl.
 XMLDocumentFragmentScannerImpl.scanEndElement(
 XMLDocumentFragmentScannerImpl.java:1753)
at com.sun.org.apache.xerces.internal.impl.
 XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(
 XMLDocumentFragmentScannerImpl.java:2951)
at com.sun.org.apache.xerces.internal.impl.
 XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.
 XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
at 

Re: Optimizing cores in SolrCloud

2013-11-14 Thread michael.boom
Thanks Erick!

That's a really interesting idea, i'll try it!
Another question would be, when does the merging actually happens? Is it
triggered or conditioned by something?

Currently I have a core with ~13M maxDocs and ~3M deleted docs, and although
I see a lot of merges in SPM, deleted documents aren't really going
anywhere.
For merging I have the example settings, haven't changed it.




-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871p4100936.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query on multi valued field

2013-11-14 Thread giridhar
Hi,

I want to search in a multivalued field.

For example, my field FormIds contains (1,2,3) as comma separated.

If i search for 1 or (1,2) or (1,3) or (2,3) or (1,2,3) any combination like
this should work.

How to define this multivalued integer field type.

Thankyou.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-on-multi-valued-field-tp3209343p4100937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Document routing question.

2013-11-14 Thread yriveiro
Hi,

I read this post http://searchhub.org/2013/06/13/solr-cloud-document-routing
and I have some questions.

When a tenant is too large to fit on one shard, we can specify the number of
bit from the shardKey that we want to use.

If we set a doc's key as tenant1/4!docXXX we are saying to spread the docs
over the 1/4th of the collection. If the collection has 4 shards this means
that all docs with the same shardKey will go to the same shard, or we will
spread 25% in each shard?

Other question is: at query time, we must configurate shardKeys param as
shard.keys=tenant1! or as shard.keys=tenant1/4!

/Yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-routing-question-tp4100938.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query on multi valued field

2013-11-14 Thread Upayavira
On Thu, Nov 14, 2013, at 03:45 PM, giridhar wrote:
 Hi,
 
 I want to search in a multivalued field.
 
 For example, my field FormIds contains (1,2,3) as comma separated.
 
 If i search for 1 or (1,2) or (1,3) or (2,3) or (1,2,3) any combination
 like
 this should work.
 
 How to define this multivalued integer field type.

Surely this is how multivalued fields work. If you had an integer field
type, that is defined as multiValued=true, then you can have three
values in that field, 1, 2 and 3.

Then, if you query for FormIds:(1 AND 2) will return all documents that
have both 1 and 2 in that field.

Am I missing something?

Upayavira


Re: Solr xml img parsing exception

2013-11-14 Thread Marcello Lorenzi

Hi Erik,
but in this case the custom loader receives an HTTP Error 500 by SOLR?

Thanks,
Marcello
On 11/14/2013 04:29 PM, Erik Hatcher wrote:

Also there's a custom loader here that is the culprit:  
com.lsegroup.solr.handler.CwsExtractingDocumentLoader

On Nov 14, 2013, at 10:20, Erick Erickson erickerick...@gmail.com wrote:


It looks like bad data. The XML you're sending to Solr looks mal-formed, so
I
suspect this is completely outside of Solr's purview.

Best,
Erick


On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi mlore...@sorint.itwrote:


Hi,
I have installed a Solr 4.3 instance and we have configured manifoldcf to
pass web content to the shard collection, but during the crawling we have
noticed a lot of this exception:

ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException:
XML parse error
at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
CwsExtractingDocumentLoader.java:150)
at org.apache.solr.handler.ContentStreamHandlerBase.
handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:135)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:656)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:359)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:155)
at org.apache.catalina.core.ApplicationFilterChain.
internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:221)
at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:107)
at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:155)
at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:76)
at org.apache.catalina.valves.AccessLogValve.invoke(
AccessLogValve.java:934)
at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:90)
at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:515)
at org.apache.coyote.http11.AbstractHttp11Processor.process(
AbstractHttp11Processor.java:1012)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.
process(AbstractProtocol.java:642)
at org.apache.coyote.http11.Http11NioProtocol$
Http11ConnectionHandler.process(Http11NioProtocol.java:223)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
doRun(NioEndpoint.java:1597)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
run(NioEndpoint.java:1555)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.tika.exception.TikaException: XML parse error
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
at org.apache.tika.parser.CompositeParser.parse(
CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(
CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(
AutoDetectParser.java:120)
at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
CwsExtractingDocumentLoader.java:147)
... 24 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
105; The element type img must be terminated by the matching end-tag
/img.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
createSAXParseException(ErrorHandlerWrapper.java:198)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.
XMLErrorReporter.reportError(XMLErrorReporter.java:441)
at com.sun.org.apache.xerces.internal.impl.
XMLErrorReporter.reportError(XMLErrorReporter.java:368)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.
reportFatalError(XMLScanner.java:1388)
at com.sun.org.apache.xerces.internal.impl.
XMLDocumentFragmentScannerImpl.scanEndElement(
XMLDocumentFragmentScannerImpl.java:1753)
at com.sun.org.apache.xerces.internal.impl.
XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(
XMLDocumentFragmentScannerImpl.java:2951)
at com.sun.org.apache.xerces.internal.impl.
XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.

Re: Solr xml img parsing exception

2013-11-14 Thread Jack Krupansky

The actual error appears to be:

Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
105; The element type img must be terminated by the matching end-tag
/img.

So, check the input document at line 91, column 105. There should be an 
img tag there, but SAX is complaining that there is no matching /img.


-- Jack Krupansky

-Original Message- 
From: Marcello Lorenzi

Sent: Thursday, November 14, 2013 9:26 AM
To: solr-user@lucene.apache.org
Subject: Solr xml img parsing exception

Hi,
I have installed a Solr 4.3 instance and we have configured manifoldcf
to pass web content to the shard collection, but during the crawling we
have noticed a lot of this exception:

ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: XML parse error
at
com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:150)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:107)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:76)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:934)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:90)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:515)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1012)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:642)
at
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.tika.exception.TikaException: XML parse error
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:147)
... 24 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
105; The element type img must be terminated by the matching end-tag
/img.
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
at
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1753)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2951)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
at

Re: Optimizing cores in SolrCloud

2013-11-14 Thread Walter Underwood
Earlier, you said that optimize is the only way that deleted documents are 
expunged. That is false. They are expunged when the segment they are in is 
merged. A forced merge (optimize) merges all segments, so will expunge all 
deleted document. But those documents will be expunged by merges eventually.

When you have deleted docs in the largest segment, you have to wait for a merge 
of that segment.

My best advice is to stop looking at the deleted documents count and worry 
about something that makes a difference to your users.

For about 10 years, I worked on Ultraseek Server, a search engine with the same 
design for merging and document deletion. With over 10K installations, we never 
had a customer who had a problem caused by deleted documents.

wunder

On Nov 14, 2013, at 7:41 AM, michael.boom my_sky...@yahoo.com wrote:

 Thanks Erick!
 
 That's a really interesting idea, i'll try it!
 Another question would be, when does the merging actually happens? Is it
 triggered or conditioned by something?
 
 Currently I have a core with ~13M maxDocs and ~3M deleted docs, and although
 I see a lot of merges in SPM, deleted documents aren't really going
 anywhere.
 For merging I have the example settings, haven't changed it.
 
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871p4100936.html
 Sent from the Solr - User mailing list archive at Nabble.com.






Re: Query on multi valued field

2013-11-14 Thread Jack Krupansky
I suppose you could define the field as tokenized text with the work 
delimiter filter and with autogeneratePhraseQueries=false and the default 
query operator set to OR, and then queries might just work close enough to 
what you want.


Otherwise...

You could do a custom update processor that parsed the string and expands it 
into separate integer values for a multivalued field, and then you would 
need to do either a custom query parser or a query preprocessor that 
exapanded that special syntax into normal  Solr query syntax using AND or OR 
as desired.


You could implement the update processor as a JavaScript script. The 
simplest approach to the query side would be to expand the special query 
syntax in your application layer.


-- Jack Krupansky

-Original Message- 
From: giridhar

Sent: Thursday, November 14, 2013 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Query on multi valued field

Hi,

I want to search in a multivalued field.

For example, my field FormIds contains (1,2,3) as comma separated.

If i search for 1 or (1,2) or (1,3) or (2,3) or (1,2,3) any combination like
this should work.

How to define this multivalued integer field type.

Thankyou.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-on-multi-valued-field-tp3209343p4100937.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Query on multi valued field

2013-11-14 Thread Jack Krupansky

s/work/word/

word delimiter filter

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Thursday, November 14, 2013 11:34 AM
To: solr-user@lucene.apache.org
Subject: Re: Query on multi valued field

I suppose you could define the field as tokenized text with the work
delimiter filter and with autogeneratePhraseQueries=false and the default
query operator set to OR, and then queries might just work close enough to
what you want.

Otherwise...

You could do a custom update processor that parsed the string and expands it
into separate integer values for a multivalued field, and then you would
need to do either a custom query parser or a query preprocessor that
exapanded that special syntax into normal  Solr query syntax using AND or OR
as desired.

You could implement the update processor as a JavaScript script. The
simplest approach to the query side would be to expand the special query
syntax in your application layer.

-- Jack Krupansky

-Original Message- 
From: giridhar

Sent: Thursday, November 14, 2013 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Query on multi valued field

Hi,

I want to search in a multivalued field.

For example, my field FormIds contains (1,2,3) as comma separated.

If i search for 1 or (1,2) or (1,3) or (2,3) or (1,2,3) any combination like
this should work.

How to define this multivalued integer field type.

Thankyou.



--
View this message in context:
http://lucene.472066.n3.nabble.com/Query-on-multi-valued-field-tp3209343p4100937.html
Sent from the Solr - User mailing list archive at Nabble.com. 



facet method=enum and uninvertedfield limitations

2013-11-14 Thread Lemke, Michael SZ/HZA-ZSW
I am running into performance problems with faceted queries.
If I do a 

q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0

I am getting an exception:
org.apache.solr.common.SolrException: Too many values for UnInvertedField 
faceting on field CONTENT
at 
org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384)
at 
org.apache.solr.request.UnInvertedField.lt;initgt;(UnInvertedField.java:178)
at 
org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
...

I understand it's got something to do with a 24bit limit somewhere
in the code but I don't understand enough of it to be able to construct
a specialized index that can be queried with facet.method=enum.

A stripped down index still doesn't work.  It has exactly one
field CONTENT with 178,000 Terms and ~1 mio documents.  The top
ranking terms according to Luke are

1 413950CONTENT word1
2 321223CONTENT word2
3 299036CONTENT word3
4 276757CONTENT word4
...

How would we have to strip the index?

Thanks,
Michael



Re: queries including time zone

2013-11-14 Thread Chris Hostetter

: Can anybody provide any insight about using the tz param? The behavior 
: of this isn't affecting date math and /day rounding.  What format does 
: the tz variables need to be in?  Not finding any documentation on this.

it's not tz it's TZ

The input/output format is always in UTC, but TZ will affect all of the 
date math...

https://wiki.apache.org/solr/CoreQueryParameters#TZ


-Hoss


Re: Using data-config.xml from DIH in SolrJ

2013-11-14 Thread P Williams
Hi,

I just discovered
UpdateProcessorFactoryhttp://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/package-summary.html
in
a big way.  How did this completely slip by me?

Working on two ideas.
1. I have used the DIH in a local EmbeddedSolrServer previously.  I could
write a ForwardingUpdateProcessorFactory to take that local update and send
it to a HttpSolrServer.
2. I have code which walks the file-system to compose rough documents but
haven't yet written the part that handles the templated fields and
cross-walking of the source(s) to the schema.  I could configure the update
handler on the Solr server side to do this with the RegexReplace
http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.htmland
DefaultValuehttp://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/DefaultValueUpdateProcessorFactory.html
 UpdateProcessorFactor(ies).

Any thoughts on the advantages/disadvantages of these approaches?

Thanks,
Tricia



On Thu, Nov 14, 2013 at 7:49 AM, Erick Erickson erickerick...@gmail.comwrote:

 There's nothing that I know of that takes a DIH configuration and
 uses it through SolrJ. You can use Tika directly in SolrJ if you
 need to parse structured documents though, see:
 http://searchhub.org/2012/02/14/indexing-with-solrj/

 Yep, you're going to be kind of reinventing the wheel a bit I'm
 afraid.

 Best,
 Erick


 On Wed, Nov 13, 2013 at 1:55 PM, P Williams
 williams.tricia.l...@gmail.comwrote:

  Hi All,
 
  I'm building a utility (Java jar) to create SolrInputDocuments and send
  them to a HttpSolrServer using the SolrJ API.  The intention is to find
 an
  efficient way to create documents from a large directory of files (where
  multiple files make one Solr document) and be sent to a remote Solr
  instance for update and commit.
 
  I've already solved the problem using the DataImportHandler (DIH) so I
 have
  a data-config.xml that describes the templated fields and cross-walking
 of
  the source(s) to the schema.  The original data won't always be able to
 be
  co-located with the Solr server which is why I'm looking for another
  option.
 
  I've also already solved the problem using ant and xslt to create a
  temporary (and unfortunately a potentially large) document which the
  UpdateHandler will accept.  I couldn't think of a solution that took
  advantage of the XSLT support in the UpdateHandler because each document
 is
  created from multiple files.  Our current dated Java based solution
  significantly outperforms this solution in terms of disk and time.  I've
  rejected it based on that and gone back to the drawing board.
 
  Does anyone have any suggestions on how I might be able to reuse my DIH
  configuration in the SolrJ context without re-inventing the wheel (or DIH
  in this case)?  If I'm doing something ridiculous I hope you'll point
 that
  out too.
 
  Thanks,
  Tricia
 



Re: facet method=enum and uninvertedfield limitations

2013-11-14 Thread Yonik Seeley
On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael  SZ/HZA-ZSW
lemke...@schaeffler.com wrote:
 I am running into performance problems with faceted queries.
 If I do a

 q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0

 I am getting an exception:
 org.apache.solr.common.SolrException: Too many values for UnInvertedField 
 faceting on field CONTENT
 at 
 org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384)
 at 
 org.apache.solr.request.UnInvertedField.lt;initgt;(UnInvertedField.java:178)
 at 
 org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
 ...

 I understand it's got something to do with a 24bit limit somewhere
 in the code but I don't understand enough of it to be able to construct
 a specialized index that can be queried with facet.method=enum.

You shouldn't need to do anything differently to try facet.method=enum
(just replace facet.method=fc with facet.method=enum)

You may also want to add the parameter
facet.enum.cache.minDf=10
to lower memory usage by only usiing the filter cache for terms that
match more than 100K docs.

-Yonik
http://heliosearch.com -- making solr shine


SOLR DIH not indexing NFS share

2013-11-14 Thread tegryan
I have SOLR with DIH using TIKA running fine on a local directory. It imports
the data fine. I need it to work on an NFS mounted directory however, and it
fails when I change it to use that. The tomcat6 user has access to the NFS
mount (ls returns all files any way). The mount is NFS v3, if that matters.
I've changed the tomcat's uid to match the tomcat user on the NFS server.
Can anyone point me in the right direction for why this isn't fetching any
files?

I get this while indexing: Requests: 0, Fetched: 1, Skipped: 0, Processed: 0

Here are the SOLR logs:

824741 [commitScheduler-6-thread-1] INFO 
org.apache.solr.update.UpdateHandler  – No uncommitted changes. Skipping
IW.commit.
824745 [commitScheduler-6-thread-1] INFO 
org.apache.solr.update.UpdateHandler  – end_commit_flush
853486 [http-8080-1] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [collection1]
webapp=/solr path=/dataimport
params={optimize=falseindent=trueclean=truecommit=trueverbose=trueentity=fcommand=full-importdebug=truewt=json}
{deleteByQuery=*:* (-1451628963612852224)} 0 44428
853488 [http-8080-1] ERROR org.apache.solr.handler.dataimport.DataImporter 
– Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.NoClassDefFoundError cannot be cast
to java.lang.Exception
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.NoClassDefFoundError cannot be cast
to java.lang.Exception
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:410)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231)
... 20 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.NoClassDefFoundError cannot be cast
to java.lang.Exception
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:539)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408)
... 22 more
Caused by: java.lang.ClassCastException: java.lang.NoClassDefFoundError
cannot be cast to java.lang.Exception
at
org.apache.solr.handler.dataimport.DebugLogger.log(DebugLogger.java:140)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:537)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:495)
... 23 more

853489 [http-8080-1] INFO  org.apache.solr.update.UpdateHandler  – start
rollback{}
853498 [http-8080-1] INFO  org.apache.solr.update.DefaultSolrCoreState  –
Creating new IndexWriter...
853498 [http-8080-1] INFO  org.apache.solr.update.DefaultSolrCoreState  –
Waiting until IndexWriter is unused... core=collection1
853498 [http-8080-1] INFO  org.apache.solr.update.DefaultSolrCoreState  –
Rollback old IndexWriter... core=collection1
853509 [http-8080-1] INFO  org.apache.solr.core.SolrCore  –
SolrDeletionPolicy.onInit: commits: num=1
   

Re: Boosting documents by categorical preferences

2013-11-14 Thread Chris Hostetter

: I have a question around boosting. I wanted to use the boost= to write a
: nested query that will boost a document based on categorical preferences.

You have no idea how stoked I am to see you working on this in a real 
world application.

: Currently I have the weights set to the z-score equivalent of a user's
: preference for that category which is simply how many standard deviations
: above the global average is this user's preference for that movie category.
: 
: My question though is basically whether or not semantically the equation
: query(category:Drama)*some weight + query(category:Comedy)*some weight
: + query(category:Action)*some weight makes sense?

My gut says that your apprach makes sense -- but if i'm 
understadning you correclty, i think that you need to add 1 to 
all your weights: the boost is a multiplier, so if someone's rating for 
every category is is 0 std devs above the average rating (ie: the most 
average person imaginable), you don't wnat to give every moving in every 
category a score of 0.

Are you picking the top 3 categories the user prefers as a cut off, or 
are you arbitrarily using N category boosts for however many N categories 
the user is above the global average in their pref for that category?

Are your prefrences coming from explicit user feedback on the categories 
(ie: rate how much you like comedies on a scale of 1-5) or are you 
infering it from user ratings of the movies themselves? (ie: rate this 
movie, which happens to be an scifi,action,comedy, on a scale of 1-5) ... 
because if it's hte later you probably want to be careful to also 
normalize based on how many categories the movie is in.

the other thing to consider is wether you want to include negative 
prefrences (ie: weights less then 1) based on how many std dev the user's 
average is *below* the global average for a category .. in this case i 
*think* you'd want to divide the raw value from -1 to get a useful 
multiplier.

Alternatively: you oculd experiment with using the weights as exponents 
instead of multipliers...

b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448))

...that would simplify the math you'd have to worry about both for the 
totally boring average user (x**0 = 1) and for the categories users hate 
(x**-5 = some positive fraction that will act as a penalty) ... but you'd 
definitley need to run some tests to see if it over boosts as the std 
dev variations get really high (might want to take a root first before 
using them as the exponent)



-Hoss


Re: My setup - init script and other info

2013-11-14 Thread Shawn Heisey

On 11/14/2013 7:43 AM, Erick Erickson wrote:

Shawn:

Would you be willing to put this on the Wiki? I think it'd be really useful
to have it there...

I'm pretty sure you have edit rights to the wiki, but they're free for the
asking if not...


Done.  To make it more obvious that it's not an officially sanctioned 
script at this time, I've put it on my personal wiki page.


https://wiki.apache.org/solr/ShawnHeisey#Init_script

Thanks,
Shawn



Re: queries including time zone

2013-11-14 Thread Chris Hostetter

I've beefed up the ref guide page on dates to include more info about all 
of this...

https://cwiki.apache.org/confluence/display/solr/Working+with+Dates


-Hoss


RE: My setup - init script and other info

2013-11-14 Thread Boogie Shafer
its worth pointing out there are init scripts for jetty which can be pulled 
from its regular distribution site and added to a solr installation with only 
minor modifications

i do this with my rpm build process (i just pushed the updates for 4.5.1 
release)

https://github.com/boogieshafer/jetty-solr-rpm

you then put the JVM settings and solr specific variables in /etc/default/jetty 
(the regular jetty init scripts looks for this file)

the init script, modular JMX and request log configurations are all things i 
borrow from the mainline jetty which are stripped out by the existing solr 
packaging of the embedded jetty and IMO are worth adding back in for a 
production deployment



From: Palmer, Eric epal...@richmond.edu
Sent: Wednesday, November 13, 2013 10:09
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: My setup - init script and other info

Thank you. This will help me a lot.

Sent from my iPhone

On Nov 13, 2013, at 10:08 AM, Shawn Heisey s...@elyograg.org wrote:

 In the hopes that it will help someone get Solr running in a very clean way, 
 here's an informational email.

 For my Solr install on CentOS 6, I use /opt/solr4 as my installation path, 
 and /index/solr4 as my solr home.  The /index directory is a dedicated 
 filesystem, /opt is part of the root filesystem.

 From the example directory, I copied cloud-scripts, contexts, etc, lib, 
 webapps, and start.jar over to /opt/solr4.  My stuff was created before 
 4.3.0, so the resources directory didn't exist.  I was already using log4j 
 with a custom Solr build, and I put my log4j.properties file in etc instead.  
 I created a logs directory and a run directory in /opt/solr4.

 My data structure in /index/solr4 is complex.  All a new user really needs to 
 know is that solr.xml goes here and dictates the rest of the structure.  
 There is a symlink at /index/solr4/lib, pointing to /opt/solr4/solrlib - so 
 that jars placed in ${solr.solr.home}/lib are actually located in the program 
 directory, not the data directory.  That makes for a much cleaner version 
 control scenario - both directories are git repositories cloned from our 
 internal git server.

 Unlike the example configs, my solrconfig.xml files do not have lib 
 directives for loading jars.  That gets automatically handled by the jars 
 living in that symlinked lib directory.  See SOLR-4852 for caveats regarding 
 central lib directories.

 https://issues.apache.org/jira/browse/SOLR-4852

 If you want to run SolrCloud, you would need to install zookeeper separately 
 and put your zkHost parameter in solr.xml.  Due to a bug, putting zkHost in 
 solr.xml doesn't work properly until 4.4.0.

 Here's the current state of my init script.  It's redhat-specific.  I used 
 /bin/bash (instead of /bin/sh) in the shebang because I am pretty sure that 
 there are bash-isms in it, and bash is always available on the systems that I 
 use:

 http://apaste.info/9fVA

 Notable features:
 * Runs Solr as an unprivileged user.
 * Has three methods for stopping Solr, tries graceful methods first.
 1) The jetty STOPPORT/STOPKEY mechanism.
 2) PID saved by the 'start' action.
 3) Any program using the Solr listening port.
 * Before killing by PID, tries to make sure that the process actually is Solr.
 * Sets up remote JMX, by default without authentication or SSL.
 * Highly tuned CMS garbage collection.
 * Sets up GC logging.
 * Virtually everything is overridable via /etc/sysconfig/solr4.
 * Points at an overridable log4j config file, by default in /opt/solr4/etc.
 * Removes the existing PID file if the server is just booting up -- which it 
 knows by noting that server uptime is less than three minutes.

 It shouldn't be too hard to convert this so it works on debian-derived 
 systems.  That would involve rewriting portions that use redhat init 
 routines, and probably start-stop-daemon. What I'd really like is one script 
 that will work on any system, but that will require a fair amount of work.

 It's a work in progress.  It should load log4j.properties from resources 
 instead of etc. I'd like to include it in the Solr download, but without a 
 fair amount of documentation and possibly an installation script, which still 
 must be written, that won't be possible.

 Feel free to ask questions about anything that doesn't seem clear. I welcome 
 ideas for improvement on both my own setup and the solr example.

 Thanks,
 Shawn





Group and Field Collapsing in SOLR More like this

2013-11-14 Thread balaji
Hi

I have two types of profile : Shadow and DO and I am trying to use MLT to
bring related recommendation of a userID

In the result I get both the types but I want to restrict the results of
document through a field (type) I pass it on.

Currently grouping and field collapsing does not seem to work. Any other way
to achieve it


Thanks
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Group-and-Field-Collapsing-in-SOLR-More-like-this-tp4101032.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Date range faceting with various gap sizes?

2013-11-14 Thread Chris Hostetter


: I'm experimenting with date range faceting, and would like to use 
: different gaps depending on how old the date is. But I am not sure on 
: how to do that.

What you are trying to do is possible, but the SolrJ helper methods you 
are using predates the ability and doesn't currently work the way it 
should...

: solrQuery.addDateRangeFacet(scheduledate_start_tdate, date1, date2, 
+1YEAR);
: solrQuery.addDateRangeFacet(scheduledate_start_tdate, date3, date4, 
+1MONTH);

the addDateRangeFacet method you are calling is just syntactic sugar for 
the add(String,String) method called on the various params: facet.range, 
facet.range.start, etc

You can see that in the resulting URL you got the params are duplicated -- 
the problem is that when expressed this way, Solr doesn't know when the 
different values of the start/end/gap params should be applied -- it just 
loops over each of the facet.range fields (in your case: the same field 
twice) and then looks for a coorisponding start/end/gap value and finds 
the first one since there are duplicates.

what you want to do can be accomplished (as of Solr 4.3 - see SOLR-1351) 
by using local params in the facet.range (or facet.date) params...

http://localhost:8983/solr/select?q=*:*rows=0facet=truefacet.range={!facet.range.start=NOW/MONTH%20facet.range.end=NOW/MONTH%2B1MONTH%20facet.range.gap=%2B1DAY}manufacturedate_dtfacet.range={!facet.range.start=NOW/MONTH%20facet.range.end=NOW/MONTH%2B1MONTH%20facet.range.gap=%2B5DAY}manufacturedate_dt

I've opened a new issue to track fixing these sugar methods -- patches 
to improve this would certainly be welcome, but note that it regardless of 
hte SolrJ behavior you'll need to upgrade to at least Solr 4.3 for the 
server side piece to work, and you cna work arround hte client side 
behavior by calling add(String,String) directly.

https://issues.apache.org/jira/browse/SOLR-5443

-Hoss


Document Security Model Question

2013-11-14 Thread kchellappa
I had earlier posted a similar discussion in LinkedIn and David Smiley
rightly advised me that solr-user is a better place for technical
discussions

--

Our product which is hosted supports searching on educational resources. Our
customers can choose to make specific resources unavailable for their users
and also it depends on licensing. Our current solution uses full text search
support in the database and handles availability as part of sql .

My task is to move the search from the database full text search into Solr.
I searched through posts and found some that were kind of related and I am
thinking along the following lines

  a)  Use the authorization model.   I can add fields like allow and/or deny
in the index which contain the list of customers.  At query time, I can add
the constraint based on the customer Id.  I am concerned about the
performance if there are lot of values for these fields and also it requires
constant reindexing if a value in this field changes
 b) Use Query-time Join.  
 Have the resource to availability for customer in separate inner
documents.
 We are planning to deploy in SolrCloud.  I have read some challenges
about Query-time join and SolrCloud. So this may not work for us.

c) Other ideas?
 
Excerpts from David Smiley's response

You're right that there may be some re-indexing as security rules change. If
many Lucene/Solr documents share identical access control with other
documents, then it may make more sense to externally determine which unique
set of access-control sets the user has access to, then finally search by id
-- which will hopefully not be a huge number. I've seen this done both
externally and with a Solr core to join on.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-Security-Model-Question-tp4101078.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR DIH not indexing NFS share

2013-11-14 Thread Erick Erickson
At a quick glance at the very first error:

java.lang.ClassCastException: java.lang.NoClassDefFoundError cannot be cast
to java.lang.Exception

Looks like you have some weird jars in your classpath and/or are using a
strange version of Java. But that's just a guess.

Erick


On Thu, Nov 14, 2013 at 1:57 PM, tegryan t...@jostle.me wrote:

 I have SOLR with DIH using TIKA running fine on a local directory. It
 imports
 the data fine. I need it to work on an NFS mounted directory however, and
 it
 fails when I change it to use that. The tomcat6 user has access to the NFS
 mount (ls returns all files any way). The mount is NFS v3, if that matters.
 I've changed the tomcat's uid to match the tomcat user on the NFS server.
 Can anyone point me in the right direction for why this isn't fetching any
 files?

 I get this while indexing: Requests: 0, Fetched: 1, Skipped: 0, Processed:
 0

 Here are the SOLR logs:

 824741 [commitScheduler-6-thread-1] INFO
 org.apache.solr.update.UpdateHandler  – No uncommitted changes. Skipping
 IW.commit.
 824745 [commitScheduler-6-thread-1] INFO
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 853486 [http-8080-1] INFO
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1]
 webapp=/solr path=/dataimport

 params={optimize=falseindent=trueclean=truecommit=trueverbose=trueentity=fcommand=full-importdebug=truewt=json}
 {deleteByQuery=*:* (-1451628963612852224)} 0 44428
 853488 [http-8080-1] ERROR org.apache.solr.handler.dataimport.DataImporter
 – Full Import failed:java.lang.RuntimeException:
 java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.ClassCastException: java.lang.NoClassDefFoundError cannot be cast
 to java.lang.Exception
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
 at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
 at

 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476)
 at

 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at

 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
 at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:679)
 Caused by: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.ClassCastException: java.lang.NoClassDefFoundError cannot be cast
 to java.lang.Exception
 at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:410)
 at

 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231)
 ... 20 more
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.ClassCastException: java.lang.NoClassDefFoundError cannot be cast
 to java.lang.Exception
 at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:539)
 at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408)
 ... 22 more
 Caused by: java.lang.ClassCastException: java.lang.NoClassDefFoundError
 cannot be cast to java.lang.Exception
 at
 org.apache.solr.handler.dataimport.DebugLogger.log(DebugLogger.java:140)
 at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:537)
 at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:495)
 ... 23 more

 853489 [http-8080-1] INFO  org.apache.solr.update.UpdateHandler  – start
 rollback{}
 853498 [http-8080-1] 

Re: SOLR DIH not indexing NFS share

2013-11-14 Thread tegryan
Hi Erick,

I appreciate the answer.  I just found out that it's failing on a .mov file
with that error.  I also noticed that I load the log4j.jar's twice, so I'm
wondering if the wrong class loader is loading the logging and that's why
it's giving me an unhelpful message.  I've excluded .mov files for now since
they can't be indexed anyway, and will look at why the logging is not
working.  Thanks again.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-DIH-not-indexing-NFS-share-tp4100998p4101096.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: queries including time zone

2013-11-14 Thread Eric Katherman
We're still not seeing the proper result.I've included a gist of the query 
and its debug result.  This was run on a clean index running 4.4.0 with just 
one document.  That document has a date of 11/15/2013 yet the date in the 
included TZ it is the 14th but I still get that document returned.  Hoping 
someone can help.

https://gist.github.com/anonymous/7478773


On Nov 14, 2013, at 3:06 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 I've beefed up the ref guide page on dates to include more info about all 
 of this...
 
 https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
 
 
 -Hoss



Re: Document Security Model Question

2013-11-14 Thread Rajani Maski
Hi,

For the case: *it requires *constant reindexing if a value in this field
changes
 If the acl for documents keep changing, Solr PostFilter is one of the
option. We use it in our system. We have almost near to billion documents
and 5000 approx users.


But it is important to check whether the acl changes are frequent and
decide solution based on that. The first option in your list works
efficiently without effecting search performance. In case the value changes
are less frequent then re-indexing of only those documents should not be
the concern.  But then, If changes are frequent, Post filter can be used
and will add some amount of delay.


Thanks












On Fri, Nov 15, 2013 at 4:32 AM, kchellappa kannan.chella...@gmail.comwrote:

 I had earlier posted a similar discussion in LinkedIn and David Smiley
 rightly advised me that solr-user is a better place for technical
 discussions

 --

 Our product which is hosted supports searching on educational resources.
 Our
 customers can choose to make specific resources unavailable for their users
 and also it depends on licensing. Our current solution uses full text
 search
 support in the database and handles availability as part of sql .

 My task is to move the search from the database full text search into Solr.
 I searched through posts and found some that were kind of related and I am
 thinking along the following lines

   a)  Use the authorization model.   I can add fields like allow and/or
 deny
 in the index which contain the list of customers.  At query time, I can add
 the constraint based on the customer Id.  I am concerned about the
 performance if there are lot of values for these fields and also it
 requires
 constant reindexing if a value in this field changes
  b) Use Query-time Join.
  Have the resource to availability for customer in separate inner
 documents.
  We are planning to deploy in SolrCloud.  I have read some challenges
 about Query-time join and SolrCloud. So this may not work for us.

 c) Other ideas?

 Excerpts from David Smiley's response

 You're right that there may be some re-indexing as security rules change.
 If
 many Lucene/Solr documents share identical access control with other
 documents, then it may make more sense to externally determine which unique
 set of access-control sets the user has access to, then finally search by
 id
 -- which will hopefully not be a huge number. I've seen this done both
 externally and with a Solr core to join on.






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Document-Security-Model-Question-tp4101078.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Solr spatial search within the polygon

2013-11-14 Thread Dhanesh Radhakrishnan
Hi,
I'm experimenting with solr spatial search, with plotting points in the map
(Latitude and longitude) and based on the value I need to get the result.

As the first step I've defined the filed type as
fieldType name=location_rpt
class=solr.SpatialRecursivePrefixTreeFieldType
spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFactory
distErrPct=0.025 maxDistErr=0.09 units=degrees /

And then added the field *location* as type *location_rpt*
 field name=location  type=location_rpt  indexed=true stored=true
multiValued=true /

Indexed the location filed as $latitude, $longitude So that data in the
*location* will be like

location:[9.445890,76.540970]

Next I draw a polygon in google map and collected the lat and lng
coordinates of the polygon.
It will be like
9.472992 76.540817, 9.441328 76.523651 , 9.433708 76.555065 , 9.458092
76.572403, 9.472992 76.540817

Based on this coordinates I performed a query in solr like this
localhost:8983/solr/ha_poc/select?fl=id,name,district,localitywt=json
json.nl=mapq=*:*fq=location:IsWithin(POLYGON((9.472992 76.540817,
9.441328 76.523651 , 9.433708 76.555065 , 9.458092 76.572403, 9.472992
76.540817))) distErrPct=0


But I didn't get the result from the solr as I expected.

{

   - responseHeader: {
  - status: 0,
  - QTime: 2
   },
   - response: {
  - numFound: 0,
  - start: 0,
  - docs: [ ]
   }

}


Is there anything that I missed ???
Can anybody help me in solving this issue with solr spatial search.
I'm using Solr 4.4.0
Added an additional dependency JTS jar file for the support of polygon
/lib/ext/jts-1.13.jar

-- 
*dhanesh s.r*


Re: exceeded limit of maxWarmingSearchers ERROR

2013-11-14 Thread Loka
Hi Erickson,

Thanks for your reply, basically, I used commitWithin tag as below in 
solrconfig.xml file


 requestHandler name=/update class=solr.XmlUpdateRequestHandler
   lst name=defaults
 str name=update.processordedupe/str
   /lst
add commitWithin=1/
 /requestHandler

updateRequestProcessorChain name=dedupe
processor 
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  str name=signatureFieldid/str
  bool name=overwriteDupesfalse/bool
  str name=fieldsname,features,cat/str
  str 
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain


But this fix did not solve my problem, I mean I again got the same error.
PFA of schema.xml and solrconfig.xml file, solr-spring.xml, 
messaging-spring.xml, can you sugest me where Iam doing wrong.

Regards,
Lokanadham Ganta










- Original Message -
From: Erick Erickson [via Lucene] ml-node+s472066n4100924...@n3.nabble.com
To: Loka lokanadham.ga...@zensar.in
Sent: Thursday, November 14, 2013 8:38:17 PM
Subject: Re: exceeded limit of maxWarmingSearchers ERROR

CommitWithin is either configured in solrconfig.xml for the 
autoCommit or autoSoftCommit tags as the maxTime tag. I 
recommend you do use this. 

The other way you can do it is if you're using SolrJ, one of the 
forms of the server.add() method takes a number of milliseconds 
to force a commit. 

You really, really do NOT want to use ridiculously short times for this 
like a few milliseconds. That will cause new searchers to be 
warmed, and when too many of them are warming at once you 
get this error. 

Seriously, make your commitWithin or autocommit parameters 
as long as you can, for many reasons. 

Here's a bunch of background: 
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
 

Best, 
Erick 


On Thu, Nov 14, 2013 at 5:13 AM, Loka  [hidden email]  wrote: 


 Hi Naveen, 
 Iam also getting the similar problem where I do not know how to use the 
 commitWithin Tag, can you help me how to use commitWithin Tag. can you give 
 me the example 
 
 
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100864.html
  
 Sent from the Solr - User mailing list archive at Nabble.com. 
 





If you reply to this email, your message will be added to the discussion below: 
http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100924.html
 
To unsubscribe from exceeded limit of maxWarmingSearchers ERROR, click here . 
NAML 

solr-spring.xml (2K) 
http://lucene.472066.n3.nabble.com/attachment/4101152/0/solr-spring.xml
messaging-spring.xml (2K) 
http://lucene.472066.n3.nabble.com/attachment/4101152/1/messaging-spring.xml
schema.xml (6K) 
http://lucene.472066.n3.nabble.com/attachment/4101152/2/schema.xml
solrconfig.xml (61K) 
http://lucene.472066.n3.nabble.com/attachment/4101152/3/solrconfig.xml




--
View this message in context: 
http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4101152.html
Sent from the Solr - User mailing list archive at Nabble.com.

An UpdateHandler to run following a MySql DataImport

2013-11-14 Thread Dileepa Jayakody
Hi All,

I have written a custom update request handler to do some custom processing
of documents and configured the /update handler to use my custom handler in
the default: update.chain.

The same requestHandler should be configured for the data-import-handler
when it loads documents to solr index.
Is there a way configure the dataimport handler to use my custom
updatehandler in a update.chain?

If not how can I perform the required custom processing of the document
while importing data from a mysql database?

Thanks,
Dileepa