Re: Adding DocExpirationUpdateProcessorFactory causes "Overlapping onDeckSearchers" warnings

2016-12-09 Thread Brent
Chris Hostetter-3 wrote
> If you are seeing an increase in "Overlapping onDeckSearchers" when using 
> DocExpirationUpdateProcessorFactory, it's becuase you actaully have docs 
> expiring quite frequently relative to the autoDeletePeriodSeconds and 
> the amount of time needed to warm each of the new searchers.
> 
> if ou don't want the searchers to be re-opened so frequently, just 
> increase the autoDeletePeriodSeconds. 

But if I increase the period, then it'll have even more docs that have
expired, and shouldn't that make the amount of time needed to warm the new
searcher even longer?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-DocExpirationUpdateProcessorFactory-causes-Overlapping-onDeckSearchers-warnings-tp4309155p4309175.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding DocExpirationUpdateProcessorFactory causes "Overlapping onDeckSearchers" warnings

2016-12-09 Thread Brent
Okay, I created a JIRA ticket
(https://issues.apache.org/jira/servicedesk/agent/SOLR/issue/SOLR-9841) and
will work on a patch.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-DocExpirationUpdateProcessorFactory-causes-Overlapping-onDeckSearchers-warnings-tp4309155p4309173.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding DocExpirationUpdateProcessorFactory causes "Overlapping onDeckSearchers" warnings

2016-12-09 Thread Brent
I'm using Solr Cloud 6.1.0, and my client application is using SolrJ 6.1.0.

Using this Solr config, I get none of the dreaded "PERFORMANCE WARNING:
Overlapping onDeckSearchers=2" log messages:
https://dl.dropboxusercontent.com/u/49733981/solrconfig-no_warnings.xml

However, I start getting them frequently after I add an expiration update
processor to the update request processor chain, as seen in this config (at
the bottom):
https://dl.dropboxusercontent.com/u/49733981/solrconfig-warnings.xml

Do I have something configured wrong in the way I've tried to add the
function of expiring documents? My client application sets the "expire_at"
field with the date to remove the document being added, so I don't need
anything on the Solr Cloud side to calculate the expiration date using a
TTL. I've confirmed that the documents are getting removed as expected after
the TTL duration.

Is it possible that the expiration processor is triggering additional
commits? Seems like the warning is usually the result of commits happening
too frequently. If the commit spacing is fine without the expiration
processor, but not okay when I add it, it seems like maybe each update is
now triggering a (soft?) commit. Although, that'd actually be crazy and I'm
sure I'd see a lot more errors if that were the case... is it triggering a
commit every 30 seconds, because that's what I have the
autoDeletePeriodSeconds set to? Maybe if I try to offset that a bit from the
10 second auto soft commit I'm using? Seems like it'd be better (if that is
the case) if the processor simple didn't have to do a commit when it expires
documents, and instead let the auto commit settings handle that. 

Do I still need the line:

when I have the 

element?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-DocExpirationUpdateProcessorFactory-causes-Overlapping-onDeckSearchers-warnings-tp4309155.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: "on deck" searcher vs warming searcher

2016-12-09 Thread Brent
Hmmm, conflicting answers. Given the infamous "PERFORMANCE WARNING:
Overlapping onDeckSearchers" log message, it seems like the "they're the
same" answer is probably correct, because shouldn't there only be one active
searcher at a time?

Although it makes me curious, if there's a warning about having multiple
(overlapping) warming searchers, why is there a setting
(maxWarmingSearchers) that even lets you have more than one, or at least,
why ever set it to anything other than 1?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/on-deck-searcher-vs-warming-searcher-tp4309021p4309080.html
Sent from the Solr - User mailing list archive at Nabble.com.


"on deck" searcher vs warming searcher

2016-12-08 Thread Brent
Is there a difference between an "on deck" searcher and a warming searcher?
>From what I've read, they sound like the same thing.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/on-deck-searcher-vs-warming-searcher-tp4309021.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does _version_ field in schema need to be indexed and/or stored?

2016-10-25 Thread Brent
I know that in the sample config sets, the _version_ field is indexed and not
stored, like so:



Is there any reason it needs to be indexed? I'm able to create collections
and use them with it not indexed, but I wonder if it negatively impacts
performance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-version-field-in-schema-need-to-be-indexed-and-or-stored-tp4303036.html
Sent from the Solr - User mailing list archive at Nabble.com.


TTL: expungeDeletes=false when removing expired documents

2016-10-21 Thread Brent
I've got a DocExpirationUpdateProcessorFactory configured to periodically
remove expired documents from the Solr index, which is working in that the
documents no longer show up in queries once they've reached expiration date.
But the index size isn't reduced when they expire, and I'm wondering if it's
because of the expungeDeletes setting in this log line I'm seeing:

o.a.s.u.p.DocExpirationUpdateProcessorFactory Begining periodic deletion of
expired docs
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

How can I set expungeDeletes=true, and what are the drawbacks to doing so?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TTL-expungeDeletes-false-when-removing-expired-documents-tp4302596.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: For TTL, does expirationFieldName need to be indexed?

2016-10-20 Thread Brent
Thanks for the reply.

Follow up:
Do I need to have the field stored? While I don't need to ever look at the
field's original contents, I'm guessing that the
DocExpirationUpdateProcessorFactory does, so that would mean I need to have
stored=true as well, correct? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/For-TTL-does-expirationFieldName-need-to-be-indexed-tp4301522p4302386.html
Sent from the Solr - User mailing list archive at Nabble.com.


For TTL, does expirationFieldName need to be indexed?

2016-10-17 Thread Brent
In my solrconfig.xml, I have:

  

  30
  expire_at
  
  

  

and in my schema, I have:

  

If I change it to indexed="false", will it still work? If so, is there any
benefit to having the field indexed if I'm not using it in any way except to
allow the expiration processor to remove expired documents?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/For-TTL-does-expirationFieldName-need-to-be-indexed-tp4301522.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: leader/replica update error + client "expected mime type/stale state" error

2016-09-21 Thread Brent
Okay, I'll try to figure out how to making using to the latest feasible... my
situation is that I need to be able to talk to both DSE and Solr from the
same application, and 5.4.1 is the latest version that works with DSE. 

But this issue appears to be stemming from the Solr side. The client logs
its messages after the messages are logged in the Solr server logs.

Just pretending that I'm using SolrJ 6.1.0, would there be any explanation
for my question?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/leader-replica-update-error-client-expected-mime-type-stale-state-error-tp4297053p4297205.html
Sent from the Solr - User mailing list archive at Nabble.com.


leader/replica update error + client "expected mime type/stale state" error

2016-09-20 Thread Brent P
I'm running Solr Cloud 6.1.0, with a Java client using SolrJ 5.4.1.

Every once in awhile, during a query, I get a pair of messages logged in
the client from CloudSolrClient -- an error about a request failing, then a
warning saying that it's retrying after a stale state error.

For this test, the collection (test_collection) has one shard, with RF=2.
There are two machines, 10.112.7.2 (replica) and 10.112.7.4 (leader). The
client is on 10.112.7.4. Note that the system time on 10.112.7.4 is about 1
minute, 5-6 seconds ahead of the other machine.

---
Leader (10.112.7.4) Solr log:
---
19:27:16.583 ERROR [c:test_collection s:shard1 r:core_node2
x:test_collection_shard1_replica2] o.a.s.u.StreamingSolrClients error
org.apache.http.NoHttpResponseException: 10.112.7.2:8983 failed to respond
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

19:27:16.587 WARN  [c:test_collection s:shard1 r:core_node2
x:test_collection_shard1_replica2] o.a.s.u.p.DistributedUpdateProcessor
Error sending update to http://10.112.7.2:8983/solr
org.apache.http.NoHttpResponseException: 10.112.7.2:8983 failed to respond
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
at

Create collection PeerSync "no frame of reference" warnings

2016-09-20 Thread Brent
Whenver I create a collection that's replicated, I get these warnings in the
follower Solr log:

WARN  [c:test_collection s:shard1 r:core_node1
x:test_collection_shard1_replica1] o.a.s.u.PeerSync no frame of reference to
tell if we've missed updates

Are these harmless?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Create-collection-PeerSync-no-frame-of-reference-warnings-tp4297032.html
Sent from the Solr - User mailing list archive at Nabble.com.


IOException errors in SolrJ client

2016-09-20 Thread Brent
I'm getting periodic errors when adding documents from a Java client app.
It's always a sequence of an error message logged in CloudSolrClient, then
an exception thrown from 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143):

Error message:
[ERROR][impl.CloudSolrClient][pool-6-thread-27][CloudSolrClient.java@904] -
Request to collection test_collection failed due to (0)
org.apache.http.NoHttpResponseException: 10.112.7.3:8983 failed to respond,
retry? 0

Exception with stack:
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://10.112.7.3:8983/solr/test_collection
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:589)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:372)
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:325)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1100)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:871)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:807)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
...
Caused by: org.apache.http.NoHttpResponseException: 10.112.7.3:8983 failed
to respond
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
at
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
... 22 more

I'm guessing it's due to a timeout, but maybe not my client's timeout
setting, but instead a timeout between two Solr Cloud servers, perhaps when
the document is being sent from one to the other. This app is running on
machine 10.112.7.4, and test_collection has a single shard, replicated on
both 10.112.7.4 and 10.112.7.3, with .3 being the leader. Is this saying
that .4 got the add request from my app, and tried to tell .3 to add the
doc, but .3 didn't respond? If so, is it a timeout, and if so, can I
increase the timeout value?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/IOException-errors-in-SolrJ-client-tp4297015.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud: Higher search latency with two nodes vs one node

2016-09-13 Thread Brent
Thanks for the reply.

The overhead you describe is what I suspected, I was just suprised that if
DSE is able to keep that overhead small enough that the overall result is
faster with the extra hardware, Solr doesn't also benefit.

I did try with RF=2 and shards=1, and yep, it's way fast. Really nice
performance. I'm hoping I can figure out a configuration that will allow me
to get this type of performance boost over DSE with a much higher load.

Follow up question:
I'm getting periodic errors when adding documents in the Java client app:
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://:8983/solr/

I believe it's always preceded by this log message:
Request to collection  failed due to (0)
org.apache.http.NoHttpResponseException: :8983 failed to
respond, retry? 0

I'm guessing it's due to a timeout, but maybe not my client's timeout
setting, but instead a timeout between the two Solr Cloud servers, perhaps
when a query is being sent from one to the other.


Probably unrelated to those errors because these are at different times, but
in the Solr logs, I'm getting occasional warning+error messages like this:
always server 2:
WARN  [c: s:shard1 r:core_node2 x:_shard1_replica1] o.a.s.c.RecoveryStrategy Stopping recovery for
core=[_shard1_replica1] coreNodeName=[core_node2]
followed by server 1:
ERROR [c: s:shard1 r:core_node1 x:_shard1_replica2] o.a.s.u.StreamingSolrClients error
org.apache.http.NoHttpResponseException: :8983 failed to
respond

WARN  [c: s:shard1 r:core_node1 x:_shard1_replica2] o.a.s.c.LeaderInitiatedRecoveryThread Leader is
publishing core=_shard1_replica1 coreNodeName =core_node2
state=down on behalf of un-reachable replica http://:8983/solr/_shard1_replica1/

Any idea what the cause is for these and how to avoid them?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Higher-search-latency-with-two-nodes-vs-one-node-tp4295894p4296124.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Cloud: Higher search latency with two nodes vs one node

2016-09-12 Thread Brent
1
  0

 false
  
  
  

  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}


  ${solr.autoSoftCommit.maxTime:3}

  
  
1024




true
   20
   30

  


  

false
2
  
  


  
  
 
   edismax
   explicit
   3
   750
 
  
  
  

  text

  



The Solr queries have the following params:
rows=30=search=true===:()=

There are no explicit commits happening.

Any help is greatly appreciated, and let me know if there's any additional
info that would be helpful.

-Brent



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Higher-search-latency-with-two-nodes-vs-one-node-tp4295894.html
Sent from the Solr - User mailing list archive at Nabble.com.


Detecting down node with SolrJ

2016-09-09 Thread Brent
Is there a way to tell whether or not a node at a specific address is up
using a SolrJ API?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Detecting-down-node-with-SolrJ-tp4295402.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: High load, frequent updates, low latency requirement use case

2016-09-08 Thread Brent
Emir Arnautovic wrote
> There should be no problems with ingestion on 24 machines. Assuming 1 
> replication, that is roughly 40 doc/sec/server. Make sure you bulk docs 
> when ingesting.

What is bulking docs, and how do I do it? I'm guessing this is some sort of
batch loading of documents?

Thanks for the reply.
-Brent



--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-load-frequent-updates-low-latency-requirement-use-case-tp4293383p4295225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mailing list subscriptions

2016-08-30 Thread Brent
ZOMG thanks! This makes the mailing list almost tolerable, in an
only-15-years-behind kind of way instead of a 25-years-behind kind of way.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mailing-list-subscriptions-tp4294026p4294098.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding request parameters when using SolrCloudClient.add()

2016-08-30 Thread Brent P
Using SolrJ, I'm trying to figure out how to include request parameters
when adding a document with SolrCloudClient.add().

Here is what I was doing when using HttpSolrClient instead of
SolrCloudClient:

HttpSolrClient client = new HttpSolrClient.Builder("
http://hostname.com:8983/solr/corename;)
.withHttpClient(HttpClientBuilder.create().build())
.build();
client.getInvariantParams().set("param1_name", "param1_value");
client.getInvariantParams().set("param2_name", "param2_value");

SolrInputDocument sDoc = new SolrInputDocument();
// Populate sDoc fields.
UpdateResponse response = client.add(sDoc);


SolrCloudClient doesn't appear to have any methods for adding parameters,
so I'm not sure how I can do it. When doing reads, I add parameters to the
SolrQuery object, but when adding a SolrInputDocument object, I don't have
that option.

The only thing I can think of is that maybe adding them as fields in the
SolrInputDocument object would do the trick, but would that actually work
the same?


Mailing list subscriptions

2016-08-30 Thread Brent P
Is there a way to subscribe to just responses to a question I ask on the
mailing list, without getting emails for all activity on the mailing list?
You'd think it'd be designed in a way that when someone submits a question,
they automatically get emails for any responses.


SolrJ Solr Cloud timeout

2016-08-30 Thread Brent P
Is there any way to set a timeout with a CloudSolrClient?


High load, frequent updates, low latency requirement use case

2016-08-25 Thread Brent P
I'm trying to set up a Solr Cloud cluster to support a system with the
following characteristics:

It will be writing documents at a rate of approximately 500 docs/second,
and running search queries at about the same rate.
The documents are fairly small, with about 10 fields, most of which range
in size from a simple int to a string that holds a UUID. There's a date
field, and then three text fields that typically hold in the range of 350
to 500 chars.
Documents should be available for searching within 30 seconds of being
added.
We need an average search latency of 50 ms or faster.

We've been using DataStax Enterprise with decent results, but trying to
determine if we can get more out of the latest version of Solr Cloud, as we
originally chose DSE ~4 years ago *I believe* because its Cassandra-backed
Solr provided redundancy/high availability features that weren't currently
available with straight Solr (not even sure if Solr Cloud was available
then).

We have 24 fairly beefy servers (96 CPU cores, 256 GB RAM, SSDs) for the
task, and I'm trying to figure out the best way to distribute the documents
into collections, cores, and shards.

If I can categorize a document into one of 8 "types", should I create 8
collections? Is that going to provide better performance than putting them
all into one collection and then using a filter query with the type field
when doing a search?

What are the options/things to consider when deciding on the number of
shards for each collection? As far as I know, I don't choose the number of
Solr cores, that is just determined base on the replication factor (and
shard count?).

Some of the settings I'm using in my solrconfig that seem important:
${solr.lock.type:native}

  ${solr.autoCommit.maxTime:3}
  false


  ${solr.autoSoftCommit.maxTime:1000}

true
8

I've got the updateLog/transaction log enabled, as I think I read it's
required for Solr Cloud.

Are there any settings I should look at that affect performance
significantly, especially outside of the solrconfig.xml for each collection
(like jetty configs, logging properties, etc)?

How much impact do the  directives in the solrconfig have on
performance? Do they only take effect if I have something configured that
requires them, and therefore if I'm missing one that I need, I'd get an
error if it's not defined?

Any help will be greatly appreciated. Thanks!
-Brent


Re: SOLRJ replace document

2013-10-19 Thread Brent Ryan
So I found out the issue here...  It was related to what you guys said
regarding the Map object in my document.  The problem is that I had data
being serialized from DB - .NET - JSON and some of the fields in .NET was
== System.DBNull.Value instead of null.  This caused the JSON serializer to
write out an object (ie. Map) so when these fields got deserialized into
the SolrInputDocument it had the Map objects as you indicated.

Thanks for the help! Much appreciated!


On Sat, Oct 19, 2013 at 12:58 AM, Jack Krupansky j...@basetechnology.comwrote:

 By all means please do file a support request with DataStax, either as an
 official support ticket or as a question on StackOverflow.

 But, I do think the previous answer of avoiding the use of a Map object in
 your document is likely to be the solution.


 -- Jack Krupansky

 -Original Message- From: Brent Ryan
 Sent: Friday, October 18, 2013 10:21 PM
 To: solr-user@lucene.apache.org
 Subject: Re: SOLRJ replace document


 So I think the issue might be related to the tech stack we're using which
 is SOLR within DataStax enterprise which doesn't support atomic updates.
 But I think it must have some sort of bug around this because it doesn't
 appear to work correctly for this use case when using solrj ...  Anyways,
 I've contacted support so lets see what they say.


 On Fri, Oct 18, 2013 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote:

  On 10/18/2013 3:36 PM, Brent Ryan wrote:

  My schema is pretty simple and has a string field called solr_id as my
 unique key.  Once I get back to my computer I'll send some more details.


 If you are trying to use a Map object as the value of a field, that is
 probably why it is interpreting your add request as an atomic update.  If
 this is the case, and you're doing it because you have a multivalued
 field,
 you can use a List object rather than a Map.

 If this doesn't sound like what's going on, can you share your code, or a
 simplification of the SolrJ parts of it?

 Thanks,
 Shawn






SOLRJ replace document

2013-10-18 Thread Brent Ryan
How do I replace a document in solr using solrj library?  I keep getting
this error back:

org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Atomic document updates are not supported unless updateLog/ is configured

I don't want to do partial updates, I just want to replace it...


Thanks,
Brent


Re: SOLRJ replace document

2013-10-18 Thread Brent Ryan
I wish that was the case but calling addDoc() is what's triggering that
exception.

On Friday, October 18, 2013, Jack Krupansky wrote:

 To replace a Solr document, simply add it again using the same
 technique used to insert the original document. The set option for atomic
 update is only used when you wish to selectively update only some of the
 fields for a document, and that does require that the update log be enabled
 using updateLog.

 -- Jack Krupansky

 -Original Message- From: Brent Ryan
 Sent: Friday, October 18, 2013 4:59 PM
 To: solr-user@lucene.apache.org
 Subject: SOLRJ replace document

 How do I replace a document in solr using solrj library?  I keep getting
 this error back:

 org.apache.solr.client.solrj.**impl.HttpSolrServer$**RemoteSolrException:
 Atomic document updates are not supported unless updateLog/ is configured

 I don't want to do partial updates, I just want to replace it...


 Thanks,
 Brent



Re: SOLRJ replace document

2013-10-18 Thread Brent Ryan
My schema is pretty simple and has a string field called solr_id as my
unique key.  Once I get back to my computer I'll send some more details.

Brent

On Friday, October 18, 2013, Shawn Heisey wrote:

 On 10/18/2013 2:59 PM, Brent Ryan wrote:

 How do I replace a document in solr using solrj library?  I keep getting
 this error back:

 org.apache.solr.client.solrj.**impl.HttpSolrServer$**RemoteSolrException:
 Atomic document updates are not supported unless updateLog/ is
 configured

 I don't want to do partial updates, I just want to replace it...


 Replacing a document is done by simply adding the document, in the same
 way as if you were adding a new one.  If you have properly configured Solr,
 the old one will be deleted before the new one is inserted.  Properly
 configuring Solr means that you have a uniqueKey field in yourschema, and
 that it is a simple type like string, int, long, etc, and is not
 multivalued. A TextField type that is tokenized cannot be used as the
 uniqueKey field.

 Thanks,
 Shawn




Re: SOLRJ replace document

2013-10-18 Thread Brent Ryan
So I think the issue might be related to the tech stack we're using which
is SOLR within DataStax enterprise which doesn't support atomic updates.
 But I think it must have some sort of bug around this because it doesn't
appear to work correctly for this use case when using solrj ...  Anyways,
I've contacted support so lets see what they say.


On Fri, Oct 18, 2013 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote:

 On 10/18/2013 3:36 PM, Brent Ryan wrote:

 My schema is pretty simple and has a string field called solr_id as my
 unique key.  Once I get back to my computer I'll send some more details.


 If you are trying to use a Map object as the value of a field, that is
 probably why it is interpreting your add request as an atomic update.  If
 this is the case, and you're doing it because you have a multivalued field,
 you can use a List object rather than a Map.

 If this doesn't sound like what's going on, can you share your code, or a
 simplification of the SolrJ parts of it?

 Thanks,
 Shawn




Re: SOLR grouped query sorting on numFound

2013-09-25 Thread Brent Ryan
ya, that's the problem... you can't sort by numFound and it's not
feasible to do the sort on the client because the grouped result set is too
large.

Brent


On Wed, Sep 25, 2013 at 6:09 AM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, just specifying sort= is _almost_ what you want,
 except it sorts by the value of fields in the doc not numFound.

 this shouldn't be hard to do on the client though, but you'd
 have to return all the groups...

 FWIW,
 Erick

 On Tue, Sep 24, 2013 at 1:11 PM, Brent Ryan brent.r...@gmail.com wrote:
  We ran into 1 snag during development with SOLR and I thought I'd run it
 by
  anyone to see if they had any slick ways to solve this issue.
 
  Basically, we're performing a SOLR query with grouping and want to be
 able
  to sort by the number of documents found within each group.
 
  Our query response from SOLR looks something like this:
 
  {
 
responseHeader:{
 
  status:0,
 
  QTime:17,
 
  params:{
 
indent:true,
 
q:*:*,
 
group.limit:0,
 
group.field:rfp_stub,
 
group:true,
 
wt:json,
 
rows:1000}},
 
grouped:{
 
  rfp_stub:{
 
matches:18470,
 
groups:[{
 
 
  groupValue:java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e,
 
doclist:{*numFound*:3,start:0,docs:[]
 
}},
 
  {
 
 
  groupValue:java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce,
 
doclist:{*numFound*:5,start:0,docs:[]
 
}},
 
  {
 
 
  groupValue:java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131,
 
doclist:{*numFound*:6,start:0,docs:[]
 
}},
 
  …
 
 
  The *numFound* shows the number of documents within that group.  Is there
  anyway to perform a sort on *numFound* in SOLR ?  I don't believe this is
  supported, but wondered if anyone their has come across this and if there
  was any suggested workarounds given that the dataset is really too large
 to
  hold in memory on our app servers?



SOLR grouped query sorting on numFound

2013-09-24 Thread Brent Ryan
We ran into 1 snag during development with SOLR and I thought I'd run it by
anyone to see if they had any slick ways to solve this issue.

Basically, we're performing a SOLR query with grouping and want to be able
to sort by the number of documents found within each group.

Our query response from SOLR looks something like this:

{

  responseHeader:{

status:0,

QTime:17,

params:{

  indent:true,

  q:*:*,

  group.limit:0,

  group.field:rfp_stub,

  group:true,

  wt:json,

  rows:1000}},

  grouped:{

rfp_stub:{

  matches:18470,

  groups:[{


groupValue:java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e,

  doclist:{*numFound*:3,start:0,docs:[]

  }},

{


groupValue:java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce,

  doclist:{*numFound*:5,start:0,docs:[]

  }},

{


groupValue:java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131,

  doclist:{*numFound*:6,start:0,docs:[]

  }},

…


The *numFound* shows the number of documents within that group.  Is there
anyway to perform a sort on *numFound* in SOLR ?  I don't believe this is
supported, but wondered if anyone their has come across this and if there
was any suggested workarounds given that the dataset is really too large to
hold in memory on our app servers?


apache bench solr keep alive not working?

2013-09-10 Thread Brent Ryan
Does anyone know why solr is not respecting keep-alive requests when using
apache bench?

ab -v 4 -H Connection: Keep-Alive -H Keep-Alive: 3000 -k -c 10 -n 100 
http://host1:8983/solr/test.solr/select?q=*%3A*wt=xmlindent=true;


Response contains this and if you look at debug output you see http header
of Connection: Close

Keep-Alive requests:0


Any ideas?  I'm seeing the same behavior when using http client.


Thanks,
Brent


Re: apache bench solr keep alive not working?

2013-09-10 Thread Brent Ryan
thanks guys.  I saw this other post with curl and verified it working.

I've also used apache bench for a bunch of stuff and keep-alive works fine
with things like Java netty.io servers ...  strange that tomcat isn't
respecting the http protocol or headers

There must be a bug in this version of tomcat being used.


On Tue, Sep 10, 2013 at 6:23 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Does anyone know why solr is not respecting keep-alive requests when
 using
 : apache bench?

 I've seen this before from people trying to test with ab, but never
 fully understood it.

 There is some combination of using ab (which uses HTTP/1.0 and the non-RFC
 compliant HTTP/1.0 version of optional Keep-Alive) and jetty (which also
 attempts to support the non-RFC compliant HTTP/1.0 version of optional
 Keep-Alive) and chunked responses that come from jetty when you request
 dynamic resources (like solr query URLs) as opposed to static resources
 with a known file size.

 I suspect that jetty's best attempt at doing the hooky HTTP/1.0 Keep-Alive
 thing doesn't work with chunked responses -- or maybe ab doesn't actually
 keep the connection alive unless it gets a content-length.

 either way, see this previous thread for how you can demonstrate that
 keep-alive works properly using curl...


 https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201209.mbox/%3c5048e856.9080...@ea.com%3E


 -Hoss



CRLF Invalid Exception ?

2013-09-06 Thread Brent Ryan
Has anyone ever hit this when adding documents to SOLR?  What does it mean?


ERROR [http-8983-6] 2013-09-06 10:09:32,700 SolrException.java (line 108)
org.apache.solr.common.SolrException: Invalid CRLF

at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:175)

at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)

at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:663)

at
com.datastax.bdp.cassandra.index.solr.CassandraDispatchFilter.execute(CassandraDispatchFilter.java:176)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)

at
com.datastax.bdp.cassandra.index.solr.CassandraDispatchFilter.doFilter(CassandraDispatchFilter.java:139)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

at
com.datastax.bdp.cassandra.audit.SolrHttpAuditLogFilter.doFilter(SolrHttpAuditLogFilter.java:194)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

at
com.datastax.bdp.cassandra.index.solr.auth.CassandraAuthorizationFilter.doFilter(CassandraAuthorizationFilter.java:95)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

at
com.datastax.bdp.cassandra.index.solr.auth.DseAuthenticationFilter.doFilter(DseAuthenticationFilter.java:102)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)

at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)

at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)

at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:722)

Caused by: com.ctc.wstx.exc.WstxIOException: Invalid CRLF

at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)

at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)

at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:387)

at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)

at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)

... 30 more

Caused by: java.io.IOException: Invalid CRLF

at
org.apache.coyote.http11.filters.ChunkedInputFilter.parseCRLF(ChunkedInputFilter.java:352)

at
org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:151)

at
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:710)

at org.apache.coyote.Request.doRead(Request.java:428)

at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:304)

at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:403)

at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:327)

at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:162)

at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)

at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)

at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)

at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)

at
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)

at
com.ctc.wstx.sr.StreamScanner.loadMoreFromCurrent(StreamScanner.java:1046)

at com.ctc.wstx.sr.StreamScanner.parseLocalName2(StreamScanner.java:1796)

at com.ctc.wstx.sr.StreamScanner.parseLocalName(StreamScanner.java:1756)

at
com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2914)

at
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)

at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)

... 33 more


Re: CRLF Invalid Exception ?

2013-09-06 Thread Brent Ryan
Thanks.  I realized there's an error in the ChunkedInputFilter...

I'm not sure if this means there's a bug in the client library I'm using
(solrj 4.3) or is a bug in the server SOLR 4.3?  Or is there something in
my data that's causing the issue?


On Fri, Sep 6, 2013 at 1:02 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : Has anyone ever hit this when adding documents to SOLR?  What does it
 mean?

 Always check for the root cause...

 : Caused by: java.io.IOException: Invalid CRLF
 :
 : at
 :
 org.apache.coyote.http11.filters.ChunkedInputFilter.parseCRLF(ChunkedInputFilter.java:352)

 ...so while Solr is trying to read XML off the InputStream from the
 client, an error is encountered by the ChunkedInputFilter.

 I suspect the client library you are using for the HTTP connection is
 claiming it's using chunking but isn't, or is doing something wrong with
 the chunking, or there is a bug in the ChunkedInputFilter.


 -Hoss



Re: CRLF Invalid Exception ?

2013-09-06 Thread Brent Ryan
For what it's worth... I just updated to solrj 4.4 (even though my server
is solr 4.3) and it seems to have fixed the issue.

Thanks for the help!


On Fri, Sep 6, 2013 at 1:41 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : I'm not sure if this means there's a bug in the client library I'm using
 : (solrj 4.3) or is a bug in the server SOLR 4.3?  Or is there something in
 : my data that's causing the issue?

 It's unlikly that an error in the data you pass to SolrJ methods would be
 causing this problem -- i'm pretty sure it's not even a problem with the
 raw xml data being streamed, it appears to be a problem with how that data
 is getting shunked across the wire.

 My best guess is that the most likely causes are either...
  * a bug in the HttpClient versio you are using on the client side
  * a bug in the ChunkedInputFilter you are using on the server side
  * a misconfiguration on the HttpClient object you are using with SolrJ
(ie: claiming it's sending chunked when it's not?)


 -Hoss



JSON update request handler commitWithin

2013-09-05 Thread Ryan, Brent
I'm prototyping a search product for us and I was trying to use the 
commitWithin parameter for posting updated JSON documents like so:

curl -v 
'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1' 
--data-binary @rfp.json -H 'Content-type:application/json'

However, the commit never seems to happen as you can see below there are still 
2 docsPending (even 1 hour later).  Is there a trick to getting this to work 
with submitting to the json update request handler?
[cid:483C4A1C-D20D-4AAB-822E-DFCA03026572]



Re: JSON update request handler commitWithin

2013-09-05 Thread Ryan, Brent
Ya, looks like this is a bug in Datastax Enterprise 3.1.2.  I'm using
their enterprise cluster search product which is built on SOLR 4.

:(



On 9/5/13 11:24 AM, Jack Krupansky j...@basetechnology.com wrote:

I just tried commitWithin with the standard Solr example in Solr 4.4 and
it works fine.

Can you reproduce your problem using the standard Solr example in Solr
4.4?

-- Jack Krupansky

From: Ryan, Brent 
Sent: Thursday, September 05, 2013 10:39 AM
To: solr-user@lucene.apache.org
Subject: JSON update request handler  commitWithin

I'm prototyping a search product for us and I was trying to use the
commitWithin parameter for posting updated JSON documents like so:

curl -v 
'http://localhost:8983/solr/proposal.solr/update/json?commitWithin=1'
--data-binary @rfp.json -H 'Content-type:application/json'

However, the commit never seems to happen as you can see below there are
still 2 docsPending (even 1 hour later).  Is there a trick to getting
this to work with submitting to the json update request handler?




Large import making solr unresponsive

2012-12-16 Thread Brent Mills
This is an issue we've only been running into lately so I'm not sure what to 
make of it.  We have 2 cores on a solr machine right now, one of them is about 
10k documents, the other is about 1.5mil.  None of the documents are very 
large, only about 30 short attributes.  We also have about 10 requests/sec 
hitting the smaller core and less on the larger one.  Whenever we try to do a 
full import on the smaller one everything is fine, the response times stay the 
same during the whole 30 seconds it takes to run the indexer.  The cpu also 
stays fairly low.

When we run a full import on the larger one the response times on all cores 
tank from about 10ms to over 8 seconds.  We have a 4 core machine (VM) and I've 
noticed 1 core stays pegged the entire time which is understandable since the 
DIH as I understand it is single threaded.  Also, from what I can tell there is 
no disk, network, or memory pressure (8gb) either and the other procs do 
virtually nothing.  Also the responses from solr still come back with a 10ms 
qtime.  My best guess at this point is tomcat is having issues when the single 
proc gets pegged but I'm at a loss on how to further diagnose this to a tomcat 
issue or something weird that solr is doing.

Has anyone run into this before or have ideas about what might be happening?


RE: problem adding new fields in DIH

2012-07-11 Thread Brent Mills
Thanks for the explanation and bug report Robert!

-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Monday, July 09, 2012 3:18 PM
To: solr-user@lucene.apache.org
Subject: Re: problem adding new fields in DIH

Thanks again for reporting this Brent. I opened a JIRA issue:
https://issues.apache.org/jira/browse/SOLR-3610

On Mon, Jul 9, 2012 at 3:36 PM, Brent Mills bmi...@uship.com wrote:
 We're having an issue when we add or change a field in the db-data-config.xml 
 and schema.xml files in solr.  Basically whenever I add something new to 
 index I add it to the database, then the data config, then add the field to 
 the schema to index, reload the core, and do a full import.  This has worked 
 fine until we upgraded to an iteration of 4.0 (we are currently on 4.0 
 alpha).  Now sometimes when we go through this process solr throws errors 
 about the field not being found.  The only way to fix this is to restart 
 tomcat and everything immediately starts working fine again.

 The interesting thing is that this is only a problem if the database is 
 returning a value for that field and only in the documents that have a value. 
  The field shows up in the schema browser in solr, it just has no data in it. 
  If I completely remove it from the database but leave it in the schema and 
 dataconfig files there is no issue.  Also of note, this is happening on 2 
 different machines.

 Here's the trace

 SEVERE: Exception while solr commit.
 java.lang.IllegalArgumentException: no such field test
 at 
 org.apache.solr.core.DefaultCodecFactory$1.getPostingsFormatForField(DefaultCodecFactory.java:49)
 at 
 org.apache.lucene.codecs.lucene40.Lucene40Codec$1.getPostingsFormatForField(Lucene40Codec.java:52)
 at 
 org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:94)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480)
 at 
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
 at 
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554)
 at 
 org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2547)
 at 
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2683)
 at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2663)
 at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:414)
 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
 at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
 at 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
 at 
 org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:107)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:304)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:256)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:399)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:380)




-- 
lucidimagination.com


problem adding new fields in DIH

2012-07-09 Thread Brent Mills
We're having an issue when we add or change a field in the db-data-config.xml 
and schema.xml files in solr.  Basically whenever I add something new to index 
I add it to the database, then the data config, then add the field to the 
schema to index, reload the core, and do a full import.  This has worked fine 
until we upgraded to an iteration of 4.0 (we are currently on 4.0 alpha).  Now 
sometimes when we go through this process solr throws errors about the field 
not being found.  The only way to fix this is to restart tomcat and everything 
immediately starts working fine again.

The interesting thing is that this is only a problem if the database is 
returning a value for that field and only in the documents that have a value.  
The field shows up in the schema browser in solr, it just has no data in it.  
If I completely remove it from the database but leave it in the schema and 
dataconfig files there is no issue.  Also of note, this is happening on 2 
different machines.

Here's the trace

SEVERE: Exception while solr commit.
java.lang.IllegalArgumentException: no such field test
at 
org.apache.solr.core.DefaultCodecFactory$1.getPostingsFormatForField(DefaultCodecFactory.java:49)
at 
org.apache.lucene.codecs.lucene40.Lucene40Codec$1.getPostingsFormatForField(Lucene40Codec.java:52)
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:94)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554)
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2547)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2683)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2663)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:414)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
at 
org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:107)
at 
org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:304)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:256)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:399)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:380)



RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?

2012-05-10 Thread Brent Mills
Hi James,

I just pulled down the newest nightly build of 4.0 and it solves an issue I had 
been having with solr ignoring the caching of the child entities.  It was 
basically opening a new connection for each iteration even though everything 
was specified correctly.  This was present in my previous build of 4.0 so it 
looks like you fixed it with one of those patches.  Thanks for all your work on 
the DIH, the caching improvements are a big help with some of the things we 
will be rolling out in production soon.

-Brent

-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com] 
Sent: Monday, May 07, 2012 1:47 PM
To: solr-user@lucene.apache.org
Cc: Brent Mills; dye.kel...@gmail.com; keithn...@dswinc.com
Subject: RE: Nested CachedSqlEntityProcessor running for each entity row with 
Solr 3.6?

Dear Kellen, Brent  Keith,

There now are fixes available for 2 cache-related bugs that unfortunately made 
their way into the 3.6.0 release.  These were addressed on these 2 JIRA issues, 
which have been committed to the 3.6 branch (as of today):
- https://issues.apache.org/jira/browse/SOLR-3430
- https://issues.apache.org/jira/browse/SOLR-3360
These problem were also affecting Trunk/4.x, with both fixes being committed to 
Trunk under SOLR-3430.

Should Solr 3.6.1 be released, these fixes will become generally available at 
that time.  They also will be part of the 4.0 release, which the Development 
Community hopes will be later this year.

In the mean time, I am hoping each of you can test these fixes with your 
installation.  The best way to do this is to get a fresh SVN checkout of the 
3.6.1 branch 
(http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), switch 
to the solr directory, then run ant dist.  I believe you need Ant 1.8 to 
build.

If you are unable to build yourself, I put an *unofficial* shapshot of the DIH 
jar here:
 
http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar

Please let me know if this solves your problems with DIH Caching, giving you 
the functionality you had with 3.5 and prior.  Your feedback is greatly 
appreciatd.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: not interesting [mailto:dye.kel...@gmail.com]
Sent: Monday, May 07, 2012 9:43 AM
To: solr-user@lucene.apache.org
Subject: Nested CachedSqlEntityProcessor running for each entity row with Solr 
3.6?

I just upgraded from Solr 3.4 to Solr 3.6; I'm using the same data-import.xml 
for both versions. The import functioned properly with 3.4.

I'm using a nested entity to fetch authors associated with each document, and 
I'm using CachedSqlEntityProcessor to avoid hitting the DB an unreasonable 
number of times. However, when indexing, Solr indexes very slowly and appears 
to be fetching all authors in the DB for each document. The index should be 
~500 megs; I aborted the indexing when it reached ~6gigs. If I comment out the 
nested author entity below, Solr will index normally.

Am I missing something obvious or is this a bug?

document name=documents
entity name=document dataSource=production
 transformer=HTMLStripTransformer,TemplateTransformer,RegexTransformer
 query=select id, ..., from document
field column=id name=id/
field column=uid name=uid template=DOC${document.id}/
!-- more fields .. --
entity name=author dataSource=production
 query=select
cast(da.document_id as text) as document_id,
a.id, a.name, a.signature from document_author da
left outer join author a on a.id = da.author_id
 cacheKey=document_id
 cacheLookup=document.id
 processor=CachedSqlEntityProcessor
 field name=author_id column=id /
 field name=author column=name /
 field name=author_signature column=signature /
/entity
/entity
/document

Also posted at SO if you prefer to answer there:
http://stackoverflow.com/questions/10482484/nested-cachedsqlentityprocessor-running-for-each-entity-row-with-solr-3-6

Kellen


Upgrading to 3.6 broke cachedsqlentityprocessor

2012-04-30 Thread Brent Mills
I've read some things in jira on the new functionality that was put into 
caching in the DIH but I wouldn't think it should break the old behavior.  It 
doesn't look as though any errors are being thrown, it's just ignoring the 
caching part and opening a ton of connections.  Also I cannot find any 
documentation on the new functionality that was added so I'm not sure what 
syntax is valid and what's not.  Here is my entity that worked in 3.1 but no 
longer works in 3.6:

entity name=Emails query=SELECT * FROM Account.SolrUserSearchEmails WHERE 
'${dataimporter.request.clean}' != 'false' OR DateUpdated = dateadd(ss, -30, 
'${dataimporter.last_index_time}') processor=CachedSqlEntityProcessor 
where=UserID=Users.UserID


RE: Sharing dih dictionaries

2011-12-06 Thread Brent Mills
You're totally correct.  There's actually a link on the DIH page now which 
wasn't there when I had read it a long time ago.  I'm really looking forward to 
4.0, it's got a ton of great new features.  Thanks for the links!!

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Monday, December 05, 2011 10:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Sharing dih dictionaries

It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even 
https://issues.apache.org/jira/browse/SOLR-2613.
I guess by using SOLR-2382 you can specify your own SortedMapBackedCache 
subclass which is able to share your Dictionary.

Regards

On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills bmi...@uship.com wrote:

 I'm not really sure how to title this but here's what I'm trying to do.

 I have a query that creates a rather large dictionary of codes that 
 are shared across multiple fields of a base entity.  I'm using the 
 cachedsqlentityprocessor but I was curious if there was a way to join 
 this multiple times to the base entity so I can avoid having to reload 
 it for each column join.

 Ex:
 entity name=parts query=select name, code1, code2, code3 from 
 parts  field column=name name=name /
entity name=shareddictionary1 query=select code, description 
 from partcodes where=code=parts.code1
  field column=description name=code1desc //entity
entity name=shareddictionary2 query=select code, description 
 from partcodes where=code=parts.code2
  field column=description name=code1desc //entity
entity name=shareddictionary3 query=select code, description 
 from partcodes where=code=parts.code3
  field column=description name=code1desc //entity 
 /entity

 Kind of a simplified example but in this case the dictionary query has 
 to be run 3 times to join 3 different columns.  It would be nice if I 
 could load the data set once as an entity and specify how to join it 
 in code without requiring a separate sql query.  Any ideas?




--
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
http://www.griddynamics.com
 mkhlud...@griddynamics.com


Sharing dih dictionaries

2011-12-05 Thread Brent Mills
I'm not really sure how to title this but here's what I'm trying to do.

I have a query that creates a rather large dictionary of codes that are shared 
across multiple fields of a base entity.  I'm using the 
cachedsqlentityprocessor but I was curious if there was a way to join this 
multiple times to the base entity so I can avoid having to reload it for each 
column join.

Ex:
entity name=parts query=select name, code1, code2, code3 from parts
  field column=name name=name /
entity name=shareddictionary1 query=select code, description from 
partcodes where=code=parts.code1
  field column=description name=code1desc //entity
entity name=shareddictionary2 query=select code, description from 
partcodes where=code=parts.code2
  field column=description name=code1desc //entity
entity name=shareddictionary3 query=select code, description from 
partcodes where=code=parts.code3
  field column=description name=code1desc //entity
/entity

Kind of a simplified example but in this case the dictionary query has to be 
run 3 times to join 3 different columns.  It would be nice if I could load the 
data set once as an entity and specify how to join it in code without requiring 
a separate sql query.  Any ideas?


multiple local indexes

2010-09-28 Thread Brent Palmer
In our application, we need to be able to search across multiple local 
indexes.  We need this not so much for performance reasons, but because 
of the particular needs of our project.  But the indexes, while sharing 
the same schema can be vary different in terms of size and distribution 
of documents.  By that I mean that some indexes may have a lot more 
documents about some topic while others will have more documents about 
other topics.  We want to be able add documents to the individual 
indexes as well.  I can provide more detail about our project is 
necessary.  Thus, the Distributed Search feature with shards in 
different cores seems to be an obvious solution except for the 
limitation of distributed idf.


First, I want to make sure my understanding about the distributed idf 
limitation are correct:  If your documents are spread across your shards 
evenly, then the distribution of terms across the individual shards can 
be assumed to be even enough not to matter.  If, as in our case, the 
shards are not very uniform, then this limitation is magnified.  Even 
though simplistic, do I have the basic idea?


We have hacked together something that allows us to read from multiple 
indexes, but it isn't really a long-term solution.  It's just sort of 
shoe-horned in there.   Here are some notes from the programmer who 
worked on this:
  Two custom files: EgranaryIndexReaderFactory.java and 
EgranaryIndexReader.java

  EgranaryIndexReader.java
  No real work is done here. This class extends 
lucene.index.MultiReader and overrides the directory() and getVersion() 
methods inherited from IndexReader.
  These methods don't  make sense for a MultiReader as they only return 
a single value. However, Solr expects Readers to have these methods. 
directory() was
  overridden to return a call to directory() on the first reader in the 
subreader list. The same was done for getVersion(). This hack makes any 
use of these methods

  by Solr somewhat pointless.

  EgranaryIndexReaderFactory.java
  Overrides the newReader(Directory indexDir, boolean readOnly) method
  The expected behavior of this method is to construct a Reader from 
the index at indexDir.
  However, this method ignores indexDir and reads a list of indexDirs 
from the solrconfig.xml file.
  These indices are used to create a list of lucene.index.IndexReader 
classes. This list is then used to create the EgranaryIndexReader.


So the second questions is: Does anybody have other ideas about how we 
might solve this problem?  Is distributed search still our best bet?


Thanks for your thoughts!
Brent


Re: multiple local indexes

2010-09-28 Thread Brent Palmer
 Thanks for your comments, Jonathon.  Here is some information that 
gives a brief overview of the eGranary Platform in order to quickly 
outline the need for a solution for bringing multiple indexes into one 
searchable collection.


http://www.widernet.org/egranary/info/multipleIndexes

Thanks,
Brent


On 9/28/2010 5:40 PM, Jonathan Rochkind wrote:

Honestly, I think just putting everything in the same index is your best bet.  Are you sure your 
particular needs of your project can't be served by one combined index?  You can certainly still 
query on just a portion of the index when needed using fq -- you can even create a request handler (or 
multiple request handlers) with invariant or appends to force that all queries 
through that request handler have a fixed fq.

From: Brent Palmer [br...@widernet.org]
Sent: Tuesday, September 28, 2010 6:04 PM
To: solr-user@lucene.apache.org
Subject: multiple local indexes

In our application, we need to be able to search across multiple local
indexes.  We need this not so much for performance reasons, but because
of the particular needs of our project.  But the indexes, while sharing
the same schema can be vary different in terms of size and distribution
of documents.  By that I mean that some indexes may have a lot more
documents about some topic while others will have more documents about
other topics.  We want to be able add documents to the individual
indexes as well.  I can provide more detail about our project is
necessary.  Thus, the Distributed Search feature with shards in
different cores seems to be an obvious solution except for the
limitation of distributed idf.

First, I want to make sure my understanding about the distributed idf
limitation are correct:  If your documents are spread across your shards
evenly, then the distribution of terms across the individual shards can
be assumed to be even enough not to matter.  If, as in our case, the
shards are not very uniform, then this limitation is magnified.  Even
though simplistic, do I have the basic idea?

We have hacked together something that allows us to read from multiple
indexes, but it isn't really a long-term solution.  It's just sort of
shoe-horned in there.   Here are some notes from the programmer who
worked on this:
Two custom files: EgranaryIndexReaderFactory.java and
EgranaryIndexReader.java
EgranaryIndexReader.java
No real work is done here. This class extends
lucene.index.MultiReader and overrides the directory() and getVersion()
methods inherited from IndexReader.
These methods don't  make sense for a MultiReader as they only return
a single value. However, Solr expects Readers to have these methods.
directory() was
overridden to return a call to directory() on the first reader in the
subreader list. The same was done for getVersion(). This hack makes any
use of these methods
by Solr somewhat pointless.

EgranaryIndexReaderFactory.java
Overrides the newReader(Directory indexDir, boolean readOnly) method
The expected behavior of this method is to construct a Reader from
the index at indexDir.
However, this method ignores indexDir and reads a list of indexDirs
from the solrconfig.xml file.
These indices are used to create a list of lucene.index.IndexReader
classes. This list is then used to create the EgranaryIndexReader.

So the second questions is: Does anybody have other ideas about how we
might solve this problem?  Is distributed search still our best bet?

Thanks for your thoughts!
Brent




Re: using multisearcher

2009-04-16 Thread Brent Palmer
Thanks Hoss.  I haven't had time to try it yet, but that is exactly the 
kind of help I was looking for. 


Brent

Chris Hostetter wrote:

: As for the second part, I was thinking of trying to replace the standard
: SolrIndexSearcher with one that employs a MultiSearcher.  But I'm not very
: familiar with the workings of Solr, especially with respect to the caching
: that goes on.  I thought that maybe people who are more familiar with it might
: have some tips on how to go about it.  Or perhaps there are reasons that make
: this a bad idea. 

If your indexes are all local, then using a MultiReader would be simpler 
trying to shoehorn MultiSearcher type logic into SolrIndexSearcher.


https://issues.apache.org/jira/browse/SOLR-243


-Hoss

  


--
Brent Palmer
Widernet.org
University of Iowa
319-335-2200



Re: using multisearcher

2009-04-08 Thread Brent Palmer

Hi Hoss,

The indexes are local.  Also they are very different sizes and I think 
that the merged results could be really wonky when using 
DistributedSearch because of it (although I might be totally wrong about 
that).


As for the second part, I was thinking of trying to replace the standard 
SolrIndexSearcher with one that employs a MultiSearcher.  But I'm not 
very familiar with the workings of Solr, especially with respect to the 
caching that goes on.  I thought that maybe people who are more familiar 
with it might have some tips on how to go about it.  Or perhaps there 
are reasons that make this a bad idea. 


Brent

Chris Hostetter wrote:
If you've been using a MultiSearcher to query multiple *remote* searchers, 
then Distributed searching in solr should be a appropriate.


if you're use to useing MultiSearcher as a way of aggregating from 
multiple *local* indexes distributed searching is probably going to seem 
slow compared to what you're use to (because of the overhead)


: pretty sure this isn't what we want.  Also does anyone have any comments about
: trying to replace the SolrIndexSearcher with a SolrMultiSearcher;  reasons why

...i'm not really sure what you mean by that.  




-Hoss

  


using multisearcher

2009-03-27 Thread Brent Palmer

Hi everybody,
I'm interested in using Solr to search multiple indexes at once.  We 
currently use our own search application which uses lucene's 
multisearcher.   Has anyone attempted to or successfully replaced 
SolrIndexSearcher with some kind of multisearcher?  I have looked at the 
DistributedSearch on the wiki and I'm pretty sure this isn't what we 
want.  Also does anyone have any comments about trying to replace the 
SolrIndexSearcher with a SolrMultiSearcher;  reasons why we shouldn't do 
this, pitfalls; suggestions about how to go about it etc.  Also, it 
should be noted that we would only be adding documents to one of the 
indexes.  I can give more info about the context of this application if 
necessary.


Thank you for any suggestions!

--
Brent Palmer
Widernet.org
University of Iowa
319-335-2200



Re: including time as a factor in relevance?

2006-07-15 Thread Brent Verner
[2006-07-15 14:06] WHIRLYCOTT said:
| I need to have my search result relevance influenced by time.  Older  
| things in my index are less relevant than newer things.  I don't want  
| to do a strict sort by date.  Is this supported somehow by using a  
| dismax request handler?  Or if you have other hints, I'm eager to  
| know what they are.

  I've recently used solr's FunctionQuery to do just this with great 
success.  Take a look at FunctionQuery and ReciprocalFloatFunction.
If I have some time later today, I'll put together a simple example.

cheers!
  Brent