date:20150406

Re: Solr 5.0.0 integration with Nutch 1.9

2015-04-06 Thread Anchit Jain

I followed the given steps and created a core named foo with
sample_techproducts_configs
but when I give the indexing command to nutch
"bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb
crawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize"
it gives an error that

Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)


Here is the complete hadoop log for the process.I have underlined the error
part in it.

2015-04-07 09:38:06,613 INFO  indexer.IndexingJob - Indexer: starting at
2015-04-07 09:38:06
2015-04-07 09:38:06,684 INFO  indexer.IndexingJob - Indexer: deleting gone
documents: false
2015-04-07 09:38:06,685 INFO  indexer.IndexingJob - Indexer: URL filtering:
true
2015-04-07 09:38:06,685 INFO  indexer.IndexingJob - Indexer: URL
normalizing: true
2015-04-07 09:38:06,893 INFO  indexer.IndexWriters - Adding
org.apache.nutch.indexwriter.solr.SolrIndexWriter
2015-04-07 09:38:06,893 INFO  indexer.IndexingJob - Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default
solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication


2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce - IndexerMapReduce:
crawldb: crawl/crawldb
2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce - IndexerMapReduce:
linkdb: crawl/linkdb
2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce - IndexerMapReduces:
adding segment: crawl/segments/20150406231502
2015-04-07 09:38:07,036 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2015-04-07 09:38:07,540 INFO  anchor.AnchorIndexingFilter - Anchor
deduplication is: off
2015-04-07 09:38:07,565 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:09,552 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:10,642 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:10,734 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:10,895 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:11,088 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:11,219 INFO  indexer.IndexWriters - Adding
org.apache.nutch.indexwriter.solr.SolrIndexWriter
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: content
dest: content
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: title dest:
title
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: host dest:
host
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: segment
dest: segment
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: boost dest:
boost
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: digest dest:
digest
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: tstamp dest:
tstamp
2015-04-07 09:38:11,526 INFO  solr.SolrIndexWriter - Indexing 250 documents
2015-04-07 09:38:11,526 INFO  solr.SolrIndexWriter - Deleting 0 documents
2015-04-07 09:38:11,644 INFO  solr.SolrIndexWriter - Indexing 250 documents
*2015-04-07 09:38:11,699 WARN  mapred.LocalJobRunner -
job_local1245074757_0001*
*org.apache.solr.common.SolrException: Not Found*

*Not Found*

*request: http://localhost:8983/solr/update?wt=javabin&version=2
*
* at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)*
* at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)*
* at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)*
* at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:135)*
* at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88)*
* at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)*
* at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)*
* at
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:458)*
* at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500)*
* at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:323)*
*

Re: Setting up SolrCloud 5.0.0 and ZooKeeper 3.4.6

2015-04-06 Thread Zheng Lin Edwin Yeo

Hi Erick,

I think I'll just setup the ZooKeeper server in standalone mode first,
before I get more confused as I'm quite new to both Solr and ZooKeeper too.
Better not to jump the gun.

However, I face this error when I try to start it in standalone mode.

2015-04-07 11:59:51,789 [myid:] - ERROR [main:ZooKeeperServerMain@54] -
Invalid arguments, exiting abnormally
java.lang.NumberFormatException: For input string:
"C:\Users\edwin\zookeeper-3.4.6\bin\..\conf\zoo.cfg"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:60)
at
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:83)
at
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2015-04-07 11:59:51,796 [myid:] - INFO  [main:ZooKeeperServerMain@55] -
Usage: ZooKeeperServerMain configfile | port datadir [ticktime] [maxcnxns]


I have the following information in my zoo.cfg:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\singleserver
clientPort=8983


I got the same error even if I set the clientPort=2888.


Regards,
Edwin



On 7 April 2015 at 11:26, Erick Erickson  wrote:

> Believe me, I'm no Zookeeper expert, but it looks to me like you're
> mixing Solr ports and Zookeeper ports. AFAIK, the two ports in
> the zoo.cfg file are exclusively for the Zookeeper instances to talk
> to each other. Zookeeper isn't aware that the listening nodes are
> Solr noodes, so putting Solr ports in there is confusing Zookeeper
> I'd guess.
>
> Assuming you're starting your three ZK instances on ports 2888, 2889 and
> 2890,
> I'd expect the proper ports are
> 2888:3888
> 2889:3889
> 2890:3890
>
> But as I said I'm not a Zookeeper expert so beware..
>
>
> Best,
> Erick
>
> On Mon, Apr 6, 2015 at 7:57 PM, Zheng Lin Edwin Yeo
>  wrote:
> > Hi,
> >
> > I'm using Solr 5.0.0 and ZooKeeper 3.4.6. I'm trying to set up a
> ZooKeeper
> > with simulation of 3 servers, but they are all located on the same
> machine
> > for testing purpose.
> >
> > In my zoo.cfg file, I have listed down the 3 servers to be as follows:
> > server.1=localhost:8983:3888
> > server.2=localhost:8984:3889
> > server.3=localhost:8985:3890
> >
> > Then I try to start Solr using the following command:
> > bin/solr start -e cloud -z localhost:8983-noprompt
> >
> > However, I'm unable to establish a connection from my Solr to the
> > ZooKeeper. Is this configuration possible, or is there anything which I
> > missed out?
> >
> > Thank you in advance for your help.
> >
> > Regards,
> > Edwin
>

Re: Measuring QPS

2015-04-06 Thread Otis Gospodnetic

Hi Daniel,

See SPM , which will give you QPS and a bunch of
other Solr, JVM, and OS metrics, along with alerting, anomaly detection,
and not-yet-announced transaction tracing
.
It has percentiles Wunder mentions.  I see others mentioned JMeter.  We use
SPM with JMeter pretty regularly when helping clients with Solr performance
issues.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Apr 3, 2015 at 11:37 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> I wanted to gather QPS for our production Solr instances, but I was
> surprised that the Admin UI did not contain this information.   We are
> running a mix of versions, but mostly 4.10 at this point.   We are not
> using SolrCloud at present; that's part of why I'm checking - I want to
> validate the size of our existing setup and what sort of SolrCloud setup
> would be needed to centralize several of them.
>
> What is the best way to gather QPS information?
>
> What is the best way to add information like this to the Admin UI, if I
> decide to take that step?
>
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>
>

Re: Problem with new solr.xml format and core swaps

2015-04-06 Thread Shawn Heisey

On 4/6/2015 6:40 PM, Erick Erickson wrote:
> What version are you migrating _from_? 4.9.0? There were some
> persistence issues at one point, but AFAIK they were fixed by 4.9, I
> can check if you're on an earlier version...

Effectively there is no previous version.  Whenever I upgrade, I delete
all the data directories and completely reindex.  When I converted from
the old solr.xml to core discovery, the server was already on 4.9.1.

Thanks,
Shawn

Re: Setting up SolrCloud 5.0.0 and ZooKeeper 3.4.6

2015-04-06 Thread Erick Erickson

Believe me, I'm no Zookeeper expert, but it looks to me like you're
mixing Solr ports and Zookeeper ports. AFAIK, the two ports in
the zoo.cfg file are exclusively for the Zookeeper instances to talk
to each other. Zookeeper isn't aware that the listening nodes are
Solr noodes, so putting Solr ports in there is confusing Zookeeper
I'd guess.

Assuming you're starting your three ZK instances on ports 2888, 2889 and 2890,
I'd expect the proper ports are
2888:3888
2889:3889
2890:3890

But as I said I'm not a Zookeeper expert so beware..

Best,
Erick

On Mon, Apr 6, 2015 at 7:57 PM, Zheng Lin Edwin Yeo
 wrote:
> Hi,
>
> I'm using Solr 5.0.0 and ZooKeeper 3.4.6. I'm trying to set up a ZooKeeper
> with simulation of 3 servers, but they are all located on the same machine
> for testing purpose.
>
> In my zoo.cfg file, I have listed down the 3 servers to be as follows:
> server.1=localhost:8983:3888
> server.2=localhost:8984:3889
> server.3=localhost:8985:3890
>
> Then I try to start Solr using the following command:
> bin/solr start -e cloud -z localhost:8983-noprompt
>
> However, I'm unable to establish a connection from my Solr to the
> ZooKeeper. Is this configuration possible, or is there anything which I
> missed out?
>
> Thank you in advance for your help.
>
> Regards,
> Edwin

Re: Config join parse in solrconfig.xml

2015-04-06 Thread Erick Erickson

df does not allow multiple fields, it stands for "default field", not
"default fields". To get what you're looking for, you need to use
edismax or explicitly create the multiple clauses.

I'm not quite sure what the join parser is doing with the df
parameter. So my first question is "what happens if you just use a
single field for df?".

Best,
Erick

On Mon, Apr 6, 2015 at 11:51 AM, Frank li  wrote:
> The error message was from the query with "debug=query".
>
> On Mon, Apr 6, 2015 at 11:49 AM, Frank li  wrote:
>
>> Hi Erick,
>>
>>
>> Thanks for your response.
>>
>> Here is the query I am sending:
>>
>> http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:apple&fq=type:PartyLawyerLawfirm&facet=true&facet.field=lawyer_id_lms&facet.mincount=1&rows=0
>> 
>>
>> You can see it has "all_text:apple". I added field name "all_text",
>> because it gives error without it.
>>
>> Errors:
>>
>> undefined field all_text number party
>> name all_code ent_name400
>>
>>
>> These fields are defined as the default search fields in our
>> solr_config.xml file:
>>
>> all_text number party name all_code ent_name
>>
>>
>> Thanks,
>>
>> Fudong
>>
>> On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson 
>> wrote:
>>
>>> You have to show us several more things:
>>>
>>> 1> what exactly does the query look like?
>>> 2> what do you expect?
>>> 3> output when you specify &debug=query
>>> 4> anything else that would help. You might review:
>>>
>>> http://wiki.apache.org/solr/UsingMailingLists
>>>
>>> Best,
>>> Erick
>>>
>>> On Fri, Apr 3, 2015 at 10:58 AM, Frank li  wrote:
>>> > Hi,
>>> >
>>> > I am starting using join parser with our solr. We have some default
>>> fields.
>>> > They are defined in solrconfig.xml:
>>> >
>>> >   
>>> >edismax
>>> >explicit
>>> >10
>>> >all_text number party name all_code ent_name
>>> >all_text number^3 name^5 party^3 all_code^2
>>> > ent_name^7
>>> >id description market_sector_type parent
>>> ult_parent
>>> > ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss
>>> *_ds
>>> > *_sms *_ss *_bs
>>> >AND
>>> >  
>>> >
>>> >
>>> > I found out once I use join parser, it does not recognize the default
>>> > fields any more. How do I modify the configuration for this?
>>> >
>>> > Thanks,
>>> >
>>> > Fred
>>>
>>
>>

Re: Facet

2015-04-06 Thread Erick Erickson

fc.method=enum will create an entry in the filter cache for each and
every value. But since the filterCache is bounded, each result will
pretty much be thrown away immediately. At least that's what I
remember.

Which neatly accounts for your issue I think; you're spending a huge
amount of time/cycles calculating filterCache entries to just throw
them away. If you increased your filterCache size to (shudder) 300K+ I
think your performance would be fine after the first one, but I
really, really, really doubt you can do that.

You say "Now we are getting an error". What's the error? I'm guessing OOM...

Faceting really wasn't built for very high cardinality fields. If this
is a reporting kind of thing, and you have the option of using 5.1
(coming Real Soon Now), you might get some usage out of "streaming
aggregation", which is way cool. But it's not going to give you
sub-second responses though.

Best,
Erick

On Sun, Apr 5, 2015 at 2:59 PM, Toke Eskildsen  wrote:
> Bill Bell  wrote:
>> The limit is set to -1. But the average result is 300.
>
> Okay, better. Well, somewhat better. But unless your values are very well 
> distributed, I would guess that your worst case is very high. Have you 
> checked if your performance problems are for specific queries?
>
> One way is to look through your solr.log for high QTimes and see if that 
> correlates with large result sets. My guess (still assuming distributed 
> search) is that lines containing __terms (indicating the fine count phase of 
> distributed faceting) will have higher QTimes that the other queries.
>
>> Would creating 900 fields be better ?
>> Then I could just put the prefix in the field name.
>
> With fc, there is an constant overhead for each field that you facet on. 900 
> fields would take up much more memory than a single field with all the 
> values. I don't think that enum leaves structures in memory, but I doubt that 
> it would be better than using a single field and facet.prefix.
>
>> So far I heard solcloud, docvalues as viable solutions. Stay away from enum.
>
> SolrCloud is not a solution to faceting as such. There is a performance 
> penalty when switching from single-shard to SolrCloud, especially for the 
> fairly large facet result sets that you have. I just guessed that you were 
> using SolrCloud already.
>
> A quick test: Try setting facet.limit=10 and run some tests. If performance 
> is fine for that and you're using multiple shards, then your performance (at 
> least for faceting) would probably be a lot higher with just a single shard.
>
> - Toke Eskildsen

Setting up SolrCloud 5.0.0 and ZooKeeper 3.4.6

2015-04-06 Thread Zheng Lin Edwin Yeo

Hi,

I'm using Solr 5.0.0 and ZooKeeper 3.4.6. I'm trying to set up a ZooKeeper
with simulation of 3 servers, but they are all located on the same machine
for testing purpose.

In my zoo.cfg file, I have listed down the 3 servers to be as follows:
server.1=localhost:8983:3888
server.2=localhost:8984:3889
server.3=localhost:8985:3890

Then I try to start Solr using the following command:
bin/solr start -e cloud -z localhost:8983-noprompt

However, I'm unable to establish a connection from my Solr to the
ZooKeeper. Is this configuration possible, or is there anything which I
missed out?

Thank you in advance for your help.

Regards,
Edwin

Re: Problem with new solr.xml format and core swaps

2015-04-06 Thread Erick Erickson

Shawn:

What version are you migrating _from_? 4.9.0? There were some
persistence issues at one point, but AFAIK they were fixed by 4.9, I
can check if you're on an earlier version...

Erick

On Sun, Apr 5, 2015 at 2:05 PM, Shawn Heisey  wrote:
> I'm having two problems with Solr 4.9.1.  I can't upgrade yet, because
> we are using a third-party plugin component that is not yet explicitly
> qualified for anything newer than 4.9.0.  The point release upgrade
> seemed like a safe bet, because I know that we don't do API changes in
> point releases.  These are transient problems, and do not seem to be
> affecting the index at this time.
>
> Some background info:
>
> Ubuntu 14, Java 8u40 from the webupd8 PPA, Solr 4.9.1.  It is *NOT*
> SolrCloud.
>
> Full rebuilds on my index involve building a new index in cores that I
> have designated "build" cores, then swapping those cores with "live"
> cores.  This always worked flawlessly before I updated to Solr 4.9.1 and
> migrated the config to use core discovery.
>
> root@idxb4:~# cat /index/solr4/cores/sparkinc_0/core.properties
> name=sparkinclive
> dataDir=../../data/sparkinc_0
>
> root@idxb4:~# cat /index/solr4/cores/sparkinc_1/core.properties
> name=sparkincbuild
> dataDir=../../data/sparkinc_1
>
> The first problem:  Sometimes, in a completely unpredictable manner, the
> new solr.xml format seems to behave like using the old format with
> persistent=false.
>
> When I restarted Solr yesterday, that action swapped the live cores with
> the build cores and I lost half my index because it swapped back to the
> previous build cores.  Just now when I tried a restart, everything
> worked flawlessly and the cores did not swap.
>
> The second problem:  Sometimes old index segments do not get deleted,
> even though they are not part of the index.
>
> Another part of the full rebuild process involves clearing the build
> cores before beginning the full import.  The code does a deleteByQuery
> with *:* and then optimizes the core.  Sometimes this action fails to
> delete the old segment files, but when I checked the core Overview in
> the admin UI, numDocs only reflected the newly indexed docs and
> deletedDocs was 0.
>
> It was actually while trying to fix/debug this second problem that I
> discovered the first problem.  Once the rebuild finished, I wanted to
> see what would happen if I restarted Solr while one of my cores had 32GB
> of segment files that were not part of the index ... but that's when the
> indexes swapped.  At that point, I deleted all the dataDirs on both
> machines (it's a distributed index), restarted Solr again, and began a
> full rebuild.  Everything seems to be fine now.
>
> Are either of these problems anything that anyone has seen?  I don't
> recall seeing anything come across the list before.  Are there existing
> issues in Jira?  Is there any information that I can provide which would
> help in narrowing down the problem?
>
> Thanks,
> Shawn
>

Re: Solr 5.0.0 integration with Nutch 1.9

2015-04-06 Thread Shawn Heisey

On 4/6/2015 2:14 PM, Anchit Jain wrote:
> I want to index nutch results using *Solr 5.0* but as mentioned in
> https://wiki.apache.org/nutch/NutchTutorial there is no directory
> ${APACHE_SOLR_HOME}/example/solr/collection1/conf/
>  in  solr 5.0 . So where I have to copy *schema.xml*?
> Also there is no *start.jar* present in example directory.

The first thing to ask is whether you are running in cloud mode or
standard mode.  If you're in cloud mode, then what I'm saying below will
require modification.

After you start Solr with "bin/solr start" you can then do this command:

bin/solr create -c foo -d sample_techproducts_configs

Once that's done, you will have a core named foo, and then you can put
the schema and any other Solr config files you get from nutch in the
server/solr/foo/conf directory.

The create command will choose the example for a data-driven schema by
default.  The sample_techproducts_configs example will meet your needs
better.

Thanks,
Shawn

Re: Measuring QPS

2015-04-06 Thread Walter Underwood

That sounds neat. Our QA people are moving to Gatling, so we probably won’t 
change our JMeter approach now.

We use the JMeter Plugs CMDrunner, telling it to generate only CSV.

http://jmeter-plugins.org/wiki/JMeterPluginsCMD/

Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Apr 6, 2015, at 3:02 PM, Siegfried Goeschl  wrote:

> Hi Walter,
> 
> sort of shameless plug - I ran into similar issues and wrote a JMeter SLA 
> Reporting Backend - https://github.com/sgoeschl/jmeter-sla-report 
> 
> 
> * It reads the CSV/XML JMeter report file and sorts the response times in 
> logarithmic buckets 
> * the XML processor uses a Stax parser to handle huge JTL files (exceeding 1 
> GB)
> * it also caters for merging JTL files when running multiple JMeter instances
> 
> Cheers,
> 
> Siegfried Goeschl
> 
> 
> 
>> On 06 Apr 2015, at 22:57, Walter Underwood  wrote:
>> 
>> The load testing is the easiest part.
>> 
>> We use JMeter to replay the prod logs. We start about a hundred threads and 
>> use ConstantThroughputTimer to control the traffic level. JMeter tends to 
>> fall over with two much data graphing, so we run it headless. Then we post 
>> process with JMeter Plugins to get percentiles.
>> 
>> The complicated part of the servlet filter was getting it configured in 
>> Tomcat. The code itself is not too bad.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> On Apr 6, 2015, at 1:49 PM, Siegfried Goeschl  wrote:
>> 
>>> The good-sounding thing - you can do that easily with JMeter running the 
>>> GUI or the command-line
>>> 
>>> Cheers,
>>> 
>>> Siegfried Goeschl
>>> 
 On 06 Apr 2015, at 21:35, Davis, Daniel (NIH/NLM) [C] 
  wrote:
 
 This sounds really good:
 
 "For load testing, we replay production logs to test that we meet the SLA 
 at a given traffic level."
 
 The rest sounds complicated.   Ah well, that's the job.
 
 -Original Message-
 From: Walter Underwood [mailto:wun...@wunderwood.org] 
 Sent: Monday, April 06, 2015 2:48 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Measuring QPS
 
 We built a servlet request filter that is configured in front of the Solr 
 servlets. It reports response times to metricsd, using the Codahale 
 library.
 
 That gives us counts, rates, and response time metrics. We mostly look at 
 percentiles, because averages are thrown off by outliers. Average is just 
 the wrong metric for a one-sided distribution like response times.
 
 We use Graphite to display the 95th percentile response time for each 
 request handler. We use Tattle for alerting on those metrics.
 
 We also use New Relic for a different look at the performance. It is good 
 at tracking from the front end through to Solr.
 
 For load testing, we replay production logs to test that we meet the SLA 
 at a given traffic level.
 
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
 On Apr 6, 2015, at 11:31 AM, Davis, Daniel (NIH/NLM) [C] 
  wrote:
 
> OK,
> 
> I have a lot of chutzpah posting that here ;)The other guys answering 
> the questions can probably explain it better.
> I love showing off, however, so please forgive me.
> 
> -Original Message-
> From: Davis, Daniel (NIH/NLM) [C]
> Sent: Monday, April 06, 2015 2:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Measuring QPS
> 
> Its very common to do autocomplete based on popular queries/titles over 
> some sliding time window.   Some enterprise search systems even apply age 
> weighting so that they don't need to re-index but continuously add to the 
> index.   This way, they can do autocomplete based on what's popular these 
> days.
> 
> We use relevance/field boosts/phrase matching etc. to get the best guess 
> about what results they want to see.   This is similar - we use 
> relevance, field boosting to guess what users want to search for.   
> Zipf's law applies to searches as well as results.
> 
> -Original Message-
> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
> Sent: Monday, April 06, 2015 2:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring QPS
> 
> Hi Daniel,
> 
> interesting - I never thought of autocompletion but for keeping track 
> of user behaviour :-)
> 
> * the numbers are helpful for the online advertisement team to sell 
> campaigns
> * it is used for sanity checks - sensible queries returning no results 
> or returning too many results
> 
> Cheers,
> 
> Siegfried Goeschl
> 
>> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C] 
>>  wrote:
>> 
>> Siegfried,
>

RE: Are there known issues with Java 8 in older versions of Solr?

2015-04-06 Thread Ryan, Michael F. (LNG-DAY)

I can at least say that Solr 3.x works fine with Java 7.

-Michael

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Monday, April 06, 2015 5:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Are there known issues with Java 8 in older versions of Solr?

On 4/6/2015 3:10 PM, chillra wrote:
> We are currently running Solr 3.6.1.
>
> The release notes for Solr 4.8 state that it is verified to be 
> compatible with Java 8. Does this mean that older releases of Solr 
> were not compatible, or just that they have not be tested?
>
> None of the bug fixes associated with the 4.8 release seem to be 
> related to Java 8 compatibility.
>
> We are currently planning to upgrade to Java 8 before upgrading to 
> Solr 5, but if there are know incompatibilities we will need to change those 
> plans.
>
> Thanks, would appreciate any insights.

Solr 3.x still works with Java 5. While I don't know of anything specific that 
would prevent it from working with Java 8, I have only personally ever used 3.x 
with Java 6.There has been no official testing.  Java 8 did not exist when we 
were still making minor Solr 3.x releases, and Java 7 had been out for less 
than a year when Solr 3.6.0 was released.

The best advice I can offer is to try it.  Any problems are likely to NOT be 
subtle ... it'll most likely either work flawlessly or not at all.  Because 
Solr 3.x works with Java 6, I think it is likely to work just fine with Java 8. 
 Hopefully there will be someone who has actually tried it who can let you know 
for sure.

If you can put Java 6 on the machine as well, and use that to run Solr, that 
would be safer.

Thanks,
Shawn

Re: Measuring QPS

2015-04-06 Thread Siegfried Goeschl

Hi Walter,

sort of shameless plug - I ran into similar issues and wrote a JMeter SLA 
Reporting Backend - https://github.com/sgoeschl/jmeter-sla-report 


* It reads the CSV/XML JMeter report file and sorts the response times in 
logarithmic buckets 
* the XML processor uses a Stax parser to handle huge JTL files (exceeding 1 GB)
* it also caters for merging JTL files when running multiple JMeter instances

Cheers,

Siegfried Goeschl



> On 06 Apr 2015, at 22:57, Walter Underwood  wrote:
> 
> The load testing is the easiest part.
> 
> We use JMeter to replay the prod logs. We start about a hundred threads and 
> use ConstantThroughputTimer to control the traffic level. JMeter tends to 
> fall over with two much data graphing, so we run it headless. Then we post 
> process with JMeter Plugins to get percentiles.
> 
> The complicated part of the servlet filter was getting it configured in 
> Tomcat. The code itself is not too bad.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> On Apr 6, 2015, at 1:49 PM, Siegfried Goeschl  wrote:
> 
>> The good-sounding thing - you can do that easily with JMeter running the GUI 
>> or the command-line
>> 
>> Cheers,
>> 
>> Siegfried Goeschl
>> 
>>> On 06 Apr 2015, at 21:35, Davis, Daniel (NIH/NLM) [C] 
>>>  wrote:
>>> 
>>> This sounds really good:
>>> 
>>> "For load testing, we replay production logs to test that we meet the SLA 
>>> at a given traffic level."
>>> 
>>> The rest sounds complicated.   Ah well, that's the job.
>>> 
>>> -Original Message-
>>> From: Walter Underwood [mailto:wun...@wunderwood.org] 
>>> Sent: Monday, April 06, 2015 2:48 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Measuring QPS
>>> 
>>> We built a servlet request filter that is configured in front of the Solr 
>>> servlets. It reports response times to metricsd, using the Codahale library.
>>> 
>>> That gives us counts, rates, and response time metrics. We mostly look at 
>>> percentiles, because averages are thrown off by outliers. Average is just 
>>> the wrong metric for a one-sided distribution like response times.
>>> 
>>> We use Graphite to display the 95th percentile response time for each 
>>> request handler. We use Tattle for alerting on those metrics.
>>> 
>>> We also use New Relic for a different look at the performance. It is good 
>>> at tracking from the front end through to Solr.
>>> 
>>> For load testing, we replay production logs to test that we meet the SLA at 
>>> a given traffic level.
>>> 
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> On Apr 6, 2015, at 11:31 AM, Davis, Daniel (NIH/NLM) [C] 
>>>  wrote:
>>> 
 OK,
 
 I have a lot of chutzpah posting that here ;)The other guys answering 
 the questions can probably explain it better.
 I love showing off, however, so please forgive me.
 
 -Original Message-
 From: Davis, Daniel (NIH/NLM) [C]
 Sent: Monday, April 06, 2015 2:25 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Measuring QPS
 
 Its very common to do autocomplete based on popular queries/titles over 
 some sliding time window.   Some enterprise search systems even apply age 
 weighting so that they don't need to re-index but continuously add to the 
 index.   This way, they can do autocomplete based on what's popular these 
 days.
 
 We use relevance/field boosts/phrase matching etc. to get the best guess 
 about what results they want to see.   This is similar - we use relevance, 
 field boosting to guess what users want to search for.   Zipf's law 
 applies to searches as well as results.
 
 -Original Message-
 From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
 Sent: Monday, April 06, 2015 2:17 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Measuring QPS
 
 Hi Daniel,
 
 interesting - I never thought of autocompletion but for keeping track 
 of user behaviour :-)
 
 * the numbers are helpful for the online advertisement team to sell 
 campaigns
 * it is used for sanity checks - sensible queries returning no results 
 or returning too many results
 
 Cheers,
 
 Siegfried Goeschl
 
> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C] 
>  wrote:
> 
> Siegfried,
> 
> It is early days as yet.   I don't think we need a code drop.   AFAIK, 
> none of our current Solr applications autocomplete the search box based 
> on popular query/title keywords.   We have other applications that do 
> that, but they don't use Solr.
> 
> Thanks again,
> 
> Dan
> 
> -Original Message-
> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
> Sent: Monday, April 06, 2015 1:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring Q

Re: Are there known issues with Java 8 in older versions of Solr?

2015-04-06 Thread Shawn Heisey


On 4/6/2015 3:10 PM, chillra wrote:

We are currently running Solr 3.6.1.

The release notes for Solr 4.8 state that it is verified to be compatible
with Java 8. Does this mean that older releases of Solr were not compatible,
or just that they have not be tested?

None of the bug fixes associated with the 4.8 release seem to be related to
Java 8 compatibility.

We are currently planning to upgrade to Java 8 before upgrading to Solr 5,
but if there are know incompatibilities we will need to change those plans.

Thanks, would appreciate any insights.


Solr 3.x still works with Java 5. While I don't know of anything 
specific that would prevent it from working with Java 8, I have only 
personally ever used 3.x with Java 6.There has been no official 
testing.  Java 8 did not exist when we were still making minor Solr 3.x 
releases, and Java 7 had been out for less than a year when Solr 3.6.0 
was released.


The best advice I can offer is to try it.  Any problems are likely to 
NOT be subtle ... it'll most likely either work flawlessly or not at 
all.  Because Solr 3.x works with Java 6, I think it is likely to work 
just fine with Java 8.  Hopefully there will be someone who has actually 
tried it who can let you know for sure.


If you can put Java 6 on the machine as well, and use that to run Solr, 
that would be safer.


Thanks,
Shawn

Are there known issues with Java 8 in older versions of Solr?

2015-04-06 Thread chillra

We are currently running Solr 3.6.1.

The release notes for Solr 4.8 state that it is verified to be compatible
with Java 8. Does this mean that older releases of Solr were not compatible,
or just that they have not be tested?

None of the bug fixes associated with the 4.8 release seem to be related to
Java 8 compatibility.

We are currently planning to upgrade to Java 8 before upgrading to Solr 5,
but if there are know incompatibilities we will need to change those plans.

Thanks, would appreciate any insights.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Are-there-known-issues-with-Java-8-in-older-versions-of-Solr-tp4197935.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Measuring QPS

2015-04-06 Thread Walter Underwood

The load testing is the easiest part.

We use JMeter to replay the prod logs. We start about a hundred threads and use 
ConstantThroughputTimer to control the traffic level. JMeter tends to fall over 
with two much data graphing, so we run it headless. Then we post process with 
JMeter Plugins to get percentiles.

The complicated part of the servlet filter was getting it configured in Tomcat. 
The code itself is not too bad.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Apr 6, 2015, at 1:49 PM, Siegfried Goeschl  wrote:

> The good-sounding thing - you can do that easily with JMeter running the GUI 
> or the command-line
> 
> Cheers,
> 
> Siegfried Goeschl
> 
>> On 06 Apr 2015, at 21:35, Davis, Daniel (NIH/NLM) [C]  
>> wrote:
>> 
>> This sounds really good:
>> 
>> "For load testing, we replay production logs to test that we meet the SLA at 
>> a given traffic level."
>> 
>> The rest sounds complicated.   Ah well, that's the job.
>> 
>> -Original Message-
>> From: Walter Underwood [mailto:wun...@wunderwood.org] 
>> Sent: Monday, April 06, 2015 2:48 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Measuring QPS
>> 
>> We built a servlet request filter that is configured in front of the Solr 
>> servlets. It reports response times to metricsd, using the Codahale library.
>> 
>> That gives us counts, rates, and response time metrics. We mostly look at 
>> percentiles, because averages are thrown off by outliers. Average is just 
>> the wrong metric for a one-sided distribution like response times.
>> 
>> We use Graphite to display the 95th percentile response time for each 
>> request handler. We use Tattle for alerting on those metrics.
>> 
>> We also use New Relic for a different look at the performance. It is good at 
>> tracking from the front end through to Solr.
>> 
>> For load testing, we replay production logs to test that we meet the SLA at 
>> a given traffic level.
>> 
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> On Apr 6, 2015, at 11:31 AM, Davis, Daniel (NIH/NLM) [C] 
>>  wrote:
>> 
>>> OK,
>>> 
>>> I have a lot of chutzpah posting that here ;)The other guys answering 
>>> the questions can probably explain it better.
>>> I love showing off, however, so please forgive me.
>>> 
>>> -Original Message-
>>> From: Davis, Daniel (NIH/NLM) [C]
>>> Sent: Monday, April 06, 2015 2:25 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: Measuring QPS
>>> 
>>> Its very common to do autocomplete based on popular queries/titles over 
>>> some sliding time window.   Some enterprise search systems even apply age 
>>> weighting so that they don't need to re-index but continuously add to the 
>>> index.   This way, they can do autocomplete based on what's popular these 
>>> days.
>>> 
>>> We use relevance/field boosts/phrase matching etc. to get the best guess 
>>> about what results they want to see.   This is similar - we use relevance, 
>>> field boosting to guess what users want to search for.   Zipf's law applies 
>>> to searches as well as results.
>>> 
>>> -Original Message-
>>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>>> Sent: Monday, April 06, 2015 2:17 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Measuring QPS
>>> 
>>> Hi Daniel,
>>> 
>>> interesting - I never thought of autocompletion but for keeping track 
>>> of user behaviour :-)
>>> 
>>> * the numbers are helpful for the online advertisement team to sell 
>>> campaigns
>>> * it is used for sanity checks - sensible queries returning no results 
>>> or returning too many results
>>> 
>>> Cheers,
>>> 
>>> Siegfried Goeschl
>>> 
 On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C] 
  wrote:
 
 Siegfried,
 
 It is early days as yet.   I don't think we need a code drop.   AFAIK, 
 none of our current Solr applications autocomplete the search box based on 
 popular query/title keywords.   We have other applications that do that, 
 but they don't use Solr.
 
 Thanks again,
 
 Dan
 
 -Original Message-
 From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
 Sent: Monday, April 06, 2015 1:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Measuring QPS
 
 Hi Dan,
 
 at willhaben.at (customer of mine) two SOLR components were written 
 for SOLR 3 and ported to SORL 4
 
 1) SlowQueryLog which dumps long-running search requests into a log 
 file
 
 2) Most Frequent Search Terms allowing to query & filter the most 
 frequent user search terms over the browser
 
 Some notes along the line
 
 
 * For both components I have the "GO" to open source them but I never 
 had enough time to do that (shame on me) - see
 https://issues.apache.org/jira/browse/SOLR-4056
 
 * The Most Frequent Search Term component actually mimics a SOLR 
 server you feed the

Re: Measuring QPS

2015-04-06 Thread Siegfried Goeschl

The good-sounding thing - you can do that easily with JMeter running the GUI or 
the command-line

Cheers,

Siegfried Goeschl

> On 06 Apr 2015, at 21:35, Davis, Daniel (NIH/NLM) [C]  
> wrote:
> 
> This sounds really good:
> 
> "For load testing, we replay production logs to test that we meet the SLA at 
> a given traffic level."
> 
> The rest sounds complicated.   Ah well, that's the job.
> 
> -Original Message-
> From: Walter Underwood [mailto:wun...@wunderwood.org] 
> Sent: Monday, April 06, 2015 2:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring QPS
> 
> We built a servlet request filter that is configured in front of the Solr 
> servlets. It reports response times to metricsd, using the Codahale library.
> 
> That gives us counts, rates, and response time metrics. We mostly look at 
> percentiles, because averages are thrown off by outliers. Average is just the 
> wrong metric for a one-sided distribution like response times.
> 
> We use Graphite to display the 95th percentile response time for each request 
> handler. We use Tattle for alerting on those metrics.
> 
> We also use New Relic for a different look at the performance. It is good at 
> tracking from the front end through to Solr.
> 
> For load testing, we replay production logs to test that we meet the SLA at a 
> given traffic level.
> 
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> On Apr 6, 2015, at 11:31 AM, Davis, Daniel (NIH/NLM) [C] 
>  wrote:
> 
>> OK,
>> 
>> I have a lot of chutzpah posting that here ;)The other guys answering 
>> the questions can probably explain it better.
>> I love showing off, however, so please forgive me.
>> 
>> -Original Message-
>> From: Davis, Daniel (NIH/NLM) [C]
>> Sent: Monday, April 06, 2015 2:25 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Measuring QPS
>> 
>> Its very common to do autocomplete based on popular queries/titles over some 
>> sliding time window.   Some enterprise search systems even apply age 
>> weighting so that they don't need to re-index but continuously add to the 
>> index.   This way, they can do autocomplete based on what's popular these 
>> days.
>> 
>> We use relevance/field boosts/phrase matching etc. to get the best guess 
>> about what results they want to see.   This is similar - we use relevance, 
>> field boosting to guess what users want to search for.   Zipf's law applies 
>> to searches as well as results.
>> 
>> -Original Message-
>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>> Sent: Monday, April 06, 2015 2:17 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Measuring QPS
>> 
>> Hi Daniel,
>> 
>> interesting - I never thought of autocompletion but for keeping track 
>> of user behaviour :-)
>> 
>> * the numbers are helpful for the online advertisement team to sell 
>> campaigns
>> * it is used for sanity checks - sensible queries returning no results 
>> or returning too many results
>> 
>> Cheers,
>> 
>> Siegfried Goeschl
>> 
>>> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C] 
>>>  wrote:
>>> 
>>> Siegfried,
>>> 
>>> It is early days as yet.   I don't think we need a code drop.   AFAIK, none 
>>> of our current Solr applications autocomplete the search box based on 
>>> popular query/title keywords.   We have other applications that do that, 
>>> but they don't use Solr.
>>> 
>>> Thanks again,
>>> 
>>> Dan
>>> 
>>> -Original Message-
>>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>>> Sent: Monday, April 06, 2015 1:42 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Measuring QPS
>>> 
>>> Hi Dan,
>>> 
>>> at willhaben.at (customer of mine) two SOLR components were written 
>>> for SOLR 3 and ported to SORL 4
>>> 
>>> 1) SlowQueryLog which dumps long-running search requests into a log 
>>> file
>>> 
>>> 2) Most Frequent Search Terms allowing to query & filter the most 
>>> frequent user search terms over the browser
>>> 
>>> Some notes along the line
>>> 
>>> 
>>> * For both components I have the "GO" to open source them but I never 
>>> had enough time to do that (shame on me) - see
>>> https://issues.apache.org/jira/browse/SOLR-4056
>>> 
>>> * The Most Frequent Search Term component actually mimics a SOLR 
>>> server you feed the user search terms so this might be a better 
>>> solution in the long run. But this requires to have a separate SOLR 
>>> core & ingest  plus GUI (check out SILK or ELK) - in other words more 
>>> moving parts in production :-)
>>> 
>>> * If there is sufficient interest I can make a code drop on GitHub
>>> 
>>> Cheers,
>>> 
>>> Siegfried Goeschl
>>> 
>>> 
>>> 
 On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C] 
  wrote:
 
 Siegfried,
 
 This is a wonderful find.   The second presentation is a nice write-up of 
 a large number of free tools.   The first presentation prompts a question 
 - did you add custom request handlers/code to automate determination of 
 bes

Solr 5.0.0 integration with Nutch 1.9

2015-04-06 Thread Anchit Jain

I want to index nutch results using *Solr 5.0* but as mentioned in
https://wiki.apache.org/nutch/NutchTutorial there is no directory
${APACHE_SOLR_HOME}/example/solr/collection1/conf/
 in  solr 5.0 . So where I have to copy *schema.xml*?
Also there is no *start.jar* present in example directory.

RE: Measuring QPS

2015-04-06 Thread Davis, Daniel (NIH/NLM) [C]

This sounds really good:

"For load testing, we replay production logs to test that we meet the SLA at a 
given traffic level."

The rest sounds complicated.   Ah well, that's the job.

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Monday, April 06, 2015 2:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Measuring QPS

We built a servlet request filter that is configured in front of the Solr 
servlets. It reports response times to metricsd, using the Codahale library.

That gives us counts, rates, and response time metrics. We mostly look at 
percentiles, because averages are thrown off by outliers. Average is just the 
wrong metric for a one-sided distribution like response times.

We use Graphite to display the 95th percentile response time for each request 
handler. We use Tattle for alerting on those metrics.

We also use New Relic for a different look at the performance. It is good at 
tracking from the front end through to Solr.

For load testing, we replay production logs to test that we meet the SLA at a 
given traffic level.

Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Apr 6, 2015, at 11:31 AM, Davis, Daniel (NIH/NLM) [C]  
wrote:

> OK,
> 
> I have a lot of chutzpah posting that here ;)The other guys answering the 
> questions can probably explain it better.
> I love showing off, however, so please forgive me.
> 
> -Original Message-
> From: Davis, Daniel (NIH/NLM) [C]
> Sent: Monday, April 06, 2015 2:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Measuring QPS
> 
> Its very common to do autocomplete based on popular queries/titles over some 
> sliding time window.   Some enterprise search systems even apply age 
> weighting so that they don't need to re-index but continuously add to the 
> index.   This way, they can do autocomplete based on what's popular these 
> days.
> 
> We use relevance/field boosts/phrase matching etc. to get the best guess 
> about what results they want to see.   This is similar - we use relevance, 
> field boosting to guess what users want to search for.   Zipf's law applies 
> to searches as well as results.
> 
> -Original Message-
> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
> Sent: Monday, April 06, 2015 2:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring QPS
> 
> Hi Daniel,
> 
> interesting - I never thought of autocompletion but for keeping track 
> of user behaviour :-)
> 
> * the numbers are helpful for the online advertisement team to sell 
> campaigns
> * it is used for sanity checks - sensible queries returning no results 
> or returning too many results
> 
> Cheers,
> 
> Siegfried Goeschl
> 
>> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C]  
>> wrote:
>> 
>> Siegfried,
>> 
>> It is early days as yet.   I don't think we need a code drop.   AFAIK, none 
>> of our current Solr applications autocomplete the search box based on 
>> popular query/title keywords.   We have other applications that do that, but 
>> they don't use Solr.
>> 
>> Thanks again,
>> 
>> Dan
>> 
>> -Original Message-
>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>> Sent: Monday, April 06, 2015 1:42 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Measuring QPS
>> 
>> Hi Dan,
>> 
>> at willhaben.at (customer of mine) two SOLR components were written 
>> for SOLR 3 and ported to SORL 4
>> 
>> 1) SlowQueryLog which dumps long-running search requests into a log 
>> file
>> 
>> 2) Most Frequent Search Terms allowing to query & filter the most 
>> frequent user search terms over the browser
>> 
>> Some notes along the line
>> 
>> 
>> * For both components I have the "GO" to open source them but I never 
>> had enough time to do that (shame on me) - see
>> https://issues.apache.org/jira/browse/SOLR-4056
>> 
>> * The Most Frequent Search Term component actually mimics a SOLR 
>> server you feed the user search terms so this might be a better 
>> solution in the long run. But this requires to have a separate SOLR 
>> core & ingest  plus GUI (check out SILK or ELK) - in other words more 
>> moving parts in production :-)
>> 
>> * If there is sufficient interest I can make a code drop on GitHub
>> 
>> Cheers,
>> 
>> Siegfried Goeschl
>> 
>> 
>> 
>>> On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C] 
>>>  wrote:
>>> 
>>> Siegfried,
>>> 
>>> This is a wonderful find.   The second presentation is a nice write-up of a 
>>> large number of free tools.   The first presentation prompts a question - 
>>> did you add custom request handlers/code to automate determination of best 
>>> user search terms?   Did any of your custom work end-up in Solr?
>>> 
>>> Thank you so much,
>>> 
>>> Dan
>>> 
>>> P.S. - your first presentation takes me back to seeing "Angrif der 
>>> Klonkrieger" in Berlin after a conference - Hayden Christensen was less 
>>> annoying in German, because my wife and I don't speak German ;)   I haven't 
>>> thought of that in a while

Trouble GetSpans lucene 4

2015-04-06 Thread Test Test

Hi, 
I'm working on TamingText's book.I try to upgrade the code from solr 3.6 to 
solr 4.10.2.At the moment, i have a problem about the method 
"getSpans"."spans.next()" returns always "false".Anyone can helps?
SpanNearQuery sQuery = (SpanNearQuery) origQuery;SolrIndexSearcher searcher = 
rb.req.getSearcher();IndexReader reader = 
searcher.getIndexReader();//AtomicReader wrapper = 
SlowCompositeReaderWrapper.wrap(reader);Map termContexts = 
new HashMap();//Spans spans = 
sQuery.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), 
termContexts);while (spans.next() == true) {//}

Thanks.Regards.

Re: Config join parse in solrconfig.xml

2015-04-06 Thread Frank li

The error message was from the query with "debug=query".

On Mon, Apr 6, 2015 at 11:49 AM, Frank li  wrote:

> Hi Erick,
>
>
> Thanks for your response.
>
> Here is the query I am sending:
>
> http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:apple&fq=type:PartyLawyerLawfirm&facet=true&facet.field=lawyer_id_lms&facet.mincount=1&rows=0
> 
>
> You can see it has "all_text:apple". I added field name "all_text",
> because it gives error without it.
>
> Errors:
>
> undefined field all_text number party
> name all_code ent_name400
>
>
> These fields are defined as the default search fields in our
> solr_config.xml file:
>
> all_text number party name all_code ent_name
>
>
> Thanks,
>
> Fudong
>
> On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson 
> wrote:
>
>> You have to show us several more things:
>>
>> 1> what exactly does the query look like?
>> 2> what do you expect?
>> 3> output when you specify &debug=query
>> 4> anything else that would help. You might review:
>>
>> http://wiki.apache.org/solr/UsingMailingLists
>>
>> Best,
>> Erick
>>
>> On Fri, Apr 3, 2015 at 10:58 AM, Frank li  wrote:
>> > Hi,
>> >
>> > I am starting using join parser with our solr. We have some default
>> fields.
>> > They are defined in solrconfig.xml:
>> >
>> >   
>> >edismax
>> >explicit
>> >10
>> >all_text number party name all_code ent_name
>> >all_text number^3 name^5 party^3 all_code^2
>> > ent_name^7
>> >id description market_sector_type parent
>> ult_parent
>> > ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss
>> *_ds
>> > *_sms *_ss *_bs
>> >AND
>> >  
>> >
>> >
>> > I found out once I use join parser, it does not recognize the default
>> > fields any more. How do I modify the configuration for this?
>> >
>> > Thanks,
>> >
>> > Fred
>>
>
>

Re: Backup within SolrCloud

2015-04-06 Thread Timothy Potter

I wrote a simple backup utility for a Collection that uses the
replication handler, see:
https://github.com/LucidWorks/solr-scale-tk/blob/master/src/main/java/com/lucidworks/SolrCloudTools.java#L614
feel free to borrow / steal if useful.

On Mon, Apr 6, 2015 at 12:42 PM, Davis, Daniel (NIH/NLM) [C]
 wrote:
> I withdraw this question - it is covered in the Solr 5 reference manual.   
> The suggestion is to use the replication handler, which suggests that this 
> scheme still works.   That's how I will go.
>
> From: Davis, Daniel (NIH/NLM) [C]
> Sent: Monday, April 06, 2015 2:29 PM
> To: solr-user@lucene.apache.org
> Subject: Backup within SolrCloud
>
> So, we have replication, but what if something bad is indexed into the 
> cluster, or someone accidentally deletes *:* on some collection?
> How do people manage backup in SolrCloud?
>
> I'm primarily interested in smaller indexes where backup is at all feasible.  
>  I imagine a system such as Facebook really has to hope it stays up and maybe 
> do something similar to log shipping, e.g. keep intermediate results that 
> were fed into Solr.   My applications are likely to be a bit smaller, and 
> classic system backups apply.
>
> Thanks,
>
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>

Re: Config join parse in solrconfig.xml

2015-04-06 Thread Frank li

Hi Erick,


Thanks for your response.

Here is the query I am sending:
http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:apple&fq=type:PartyLawyerLawfirm&facet=true&facet.field=lawyer_id_lms&facet.mincount=1&rows=0

You can see it has "all_text:apple". I added field name "all_text", because
it gives error without it.

Errors:

undefined field all_text number party
name all_code ent_name400


These fields are defined as the default search fields in our
solr_config.xml file:

all_text number party name all_code ent_name


Thanks,

Fudong

On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson 
wrote:

> You have to show us several more things:
>
> 1> what exactly does the query look like?
> 2> what do you expect?
> 3> output when you specify &debug=query
> 4> anything else that would help. You might review:
>
> http://wiki.apache.org/solr/UsingMailingLists
>
> Best,
> Erick
>
> On Fri, Apr 3, 2015 at 10:58 AM, Frank li  wrote:
> > Hi,
> >
> > I am starting using join parser with our solr. We have some default
> fields.
> > They are defined in solrconfig.xml:
> >
> >   
> >edismax
> >explicit
> >10
> >all_text number party name all_code ent_name
> >all_text number^3 name^5 party^3 all_code^2
> > ent_name^7
> >id description market_sector_type parent ult_parent
> > ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss
> *_ds
> > *_sms *_ss *_bs
> >AND
> >  
> >
> >
> > I found out once I use join parser, it does not recognize the default
> > fields any more. How do I modify the configuration for this?
> >
> > Thanks,
> >
> > Fred
>

Re: Measuring QPS

2015-04-06 Thread Walter Underwood

We built a servlet request filter that is configured in front of the Solr 
servlets. It reports response times to metricsd, using the Codahale library.

That gives us counts, rates, and response time metrics. We mostly look at 
percentiles, because averages are thrown off by outliers. Average is just the 
wrong metric for a one-sided distribution like response times.

We use Graphite to display the 95th percentile response time for each request 
handler. We use Tattle for alerting on those metrics.

We also use New Relic for a different look at the performance. It is good at 
tracking from the front end through to Solr.

For load testing, we replay production logs to test that we meet the SLA at a 
given traffic level.

Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Apr 6, 2015, at 11:31 AM, Davis, Daniel (NIH/NLM) [C]  
wrote:

> OK,
> 
> I have a lot of chutzpah posting that here ;)The other guys answering the 
> questions can probably explain it better.
> I love showing off, however, so please forgive me.
> 
> -Original Message-
> From: Davis, Daniel (NIH/NLM) [C] 
> Sent: Monday, April 06, 2015 2:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Measuring QPS
> 
> Its very common to do autocomplete based on popular queries/titles over some 
> sliding time window.   Some enterprise search systems even apply age 
> weighting so that they don't need to re-index but continuously add to the 
> index.   This way, they can do autocomplete based on what's popular these 
> days.
> 
> We use relevance/field boosts/phrase matching etc. to get the best guess 
> about what results they want to see.   This is similar - we use relevance, 
> field boosting to guess what users want to search for.   Zipf's law applies 
> to searches as well as results.
> 
> -Original Message-
> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
> Sent: Monday, April 06, 2015 2:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring QPS
> 
> Hi Daniel,
> 
> interesting - I never thought of autocompletion but for keeping track of user 
> behaviour :-)
> 
> * the numbers are helpful for the online advertisement team to sell campaigns
> * it is used for sanity checks - sensible queries returning no results or 
> returning too many results
> 
> Cheers,
> 
> Siegfried Goeschl
> 
>> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C]  
>> wrote:
>> 
>> Siegfried,
>> 
>> It is early days as yet.   I don't think we need a code drop.   AFAIK, none 
>> of our current Solr applications autocomplete the search box based on 
>> popular query/title keywords.   We have other applications that do that, but 
>> they don't use Solr.
>> 
>> Thanks again,
>> 
>> Dan
>> 
>> -Original Message-
>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>> Sent: Monday, April 06, 2015 1:42 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Measuring QPS
>> 
>> Hi Dan,
>> 
>> at willhaben.at (customer of mine) two SOLR components were written 
>> for SOLR 3 and ported to SORL 4
>> 
>> 1) SlowQueryLog which dumps long-running search requests into a log 
>> file
>> 
>> 2) Most Frequent Search Terms allowing to query & filter the most 
>> frequent user search terms over the browser
>> 
>> Some notes along the line
>> 
>> 
>> * For both components I have the “GO" to open source them but I never 
>> had enough time to do that (shame on me) - see
>> https://issues.apache.org/jira/browse/SOLR-4056
>> 
>> * The Most Frequent Search Term component actually mimics a SOLR 
>> server you feed the user search terms so this might be a better 
>> solution in the long run. But this requires to have a separate SOLR 
>> core & ingest  plus GUI (check out SILK or ELK) - in other words more 
>> moving parts in production :-)
>> 
>> * If there is sufficient interest I can make a code drop on GitHub
>> 
>> Cheers,
>> 
>> Siegfried Goeschl
>> 
>> 
>> 
>>> On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C] 
>>>  wrote:
>>> 
>>> Siegfried,
>>> 
>>> This is a wonderful find.   The second presentation is a nice write-up of a 
>>> large number of free tools.   The first presentation prompts a question - 
>>> did you add custom request handlers/code to automate determination of best 
>>> user search terms?   Did any of your custom work end-up in Solr?
>>> 
>>> Thank you so much,
>>> 
>>> Dan
>>> 
>>> P.S. - your first presentation takes me back to seeing "Angrif der 
>>> Klonkrieger" in Berlin after a conference - Hayden Christensen was less 
>>> annoying in German, because my wife and I don't speak German ;)   I haven't 
>>> thought of that in a while.
>>> 
>>> -Original Message-
>>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>>> Sent: Saturday, April 04, 2015 4:54 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Measuring QPS
>>> 
>>> Hi Dan,
>>> 
>>> I’m using JavaMelody for my SOLR production servers - gives you the 
>>> relevant HTTP stats (what’s happening now & historical data) plus J

RE: Backup within SolrCloud

2015-04-06 Thread Davis, Daniel (NIH/NLM) [C]

I withdraw this question - it is covered in the Solr 5 reference manual.   The 
suggestion is to use the replication handler, which suggests that this scheme 
still works.   That's how I will go.

From: Davis, Daniel (NIH/NLM) [C]
Sent: Monday, April 06, 2015 2:29 PM
To: solr-user@lucene.apache.org
Subject: Backup within SolrCloud

So, we have replication, but what if something bad is indexed into the cluster, 
or someone accidentally deletes *:* on some collection?
How do people manage backup in SolrCloud?

I'm primarily interested in smaller indexes where backup is at all feasible.   
I imagine a system such as Facebook really has to hope it stays up and maybe do 
something similar to log shipping, e.g. keep intermediate results that were fed 
into Solr.   My applications are likely to be a bit smaller, and classic system 
backups apply.

Thanks,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

Re: Measuring QPS

2015-04-06 Thread Siegfried Goeschl

Appreciated :-)

Siegfried Goeschl

> On 06 Apr 2015, at 20:31, Davis, Daniel (NIH/NLM) [C]  
> wrote:
> 
> OK,
> 
> I have a lot of chutzpah posting that here ;)The other guys answering the 
> questions can probably explain it better.
> I love showing off, however, so please forgive me.
> 
> -Original Message-
> From: Davis, Daniel (NIH/NLM) [C] 
> Sent: Monday, April 06, 2015 2:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Measuring QPS
> 
> Its very common to do autocomplete based on popular queries/titles over some 
> sliding time window.   Some enterprise search systems even apply age 
> weighting so that they don't need to re-index but continuously add to the 
> index.   This way, they can do autocomplete based on what's popular these 
> days.
> 
> We use relevance/field boosts/phrase matching etc. to get the best guess 
> about what results they want to see.   This is similar - we use relevance, 
> field boosting to guess what users want to search for.   Zipf's law applies 
> to searches as well as results.
> 
> -Original Message-
> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
> Sent: Monday, April 06, 2015 2:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring QPS
> 
> Hi Daniel,
> 
> interesting - I never thought of autocompletion but for keeping track of user 
> behaviour :-)
> 
> * the numbers are helpful for the online advertisement team to sell campaigns
> * it is used for sanity checks - sensible queries returning no results or 
> returning too many results
> 
> Cheers,
> 
> Siegfried Goeschl
> 
>> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C]  
>> wrote:
>> 
>> Siegfried,
>> 
>> It is early days as yet.   I don't think we need a code drop.   AFAIK, none 
>> of our current Solr applications autocomplete the search box based on 
>> popular query/title keywords.   We have other applications that do that, but 
>> they don't use Solr.
>> 
>> Thanks again,
>> 
>> Dan
>> 
>> -Original Message-
>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>> Sent: Monday, April 06, 2015 1:42 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Measuring QPS
>> 
>> Hi Dan,
>> 
>> at willhaben.at (customer of mine) two SOLR components were written 
>> for SOLR 3 and ported to SORL 4
>> 
>> 1) SlowQueryLog which dumps long-running search requests into a log 
>> file
>> 
>> 2) Most Frequent Search Terms allowing to query & filter the most 
>> frequent user search terms over the browser
>> 
>> Some notes along the line
>> 
>> 
>> * For both components I have the “GO" to open source them but I never 
>> had enough time to do that (shame on me) - see
>> https://issues.apache.org/jira/browse/SOLR-4056
>> 
>> * The Most Frequent Search Term component actually mimics a SOLR 
>> server you feed the user search terms so this might be a better 
>> solution in the long run. But this requires to have a separate SOLR 
>> core & ingest  plus GUI (check out SILK or ELK) - in other words more 
>> moving parts in production :-)
>> 
>> * If there is sufficient interest I can make a code drop on GitHub
>> 
>> Cheers,
>> 
>> Siegfried Goeschl
>> 
>> 
>> 
>>> On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C] 
>>>  wrote:
>>> 
>>> Siegfried,
>>> 
>>> This is a wonderful find.   The second presentation is a nice write-up of a 
>>> large number of free tools.   The first presentation prompts a question - 
>>> did you add custom request handlers/code to automate determination of best 
>>> user search terms?   Did any of your custom work end-up in Solr?
>>> 
>>> Thank you so much,
>>> 
>>> Dan
>>> 
>>> P.S. - your first presentation takes me back to seeing "Angrif der 
>>> Klonkrieger" in Berlin after a conference - Hayden Christensen was less 
>>> annoying in German, because my wife and I don't speak German ;)   I haven't 
>>> thought of that in a while.
>>> 
>>> -Original Message-
>>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>>> Sent: Saturday, April 04, 2015 4:54 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Measuring QPS
>>> 
>>> Hi Dan,
>>> 
>>> I’m using JavaMelody for my SOLR production servers - gives you the 
>>> relevant HTTP stats (what’s happening now & historical data) plus JVM 
>>> monitoring as additional benefit. The servers are deployed on Tomcat 
>>> so I’m of little help regarding Jetty - having said that
>>> 
>>> * you need two Jars (javamelody & robin)
>>> * tinker with web.xml
>>> 
>>> Here are two of my presentations mentioning JavaMelody (plus some 
>>> other stuff)
>>> 
>>> http://people.apache.org/~sgoeschl/presentations/solr-from-developmen
>>> t
>>> -to-production-20121210.pdf
>>> >> n
>>> t-to-production-20121210.pdf>
>>> http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-perfor
>>> m
>>> ance-monitoring.pdf
>>> >> r
>>> mance-monitoring.pdf>
>>> 
>>> Cheers,
>>> 
>>>

RE: Measuring QPS

2015-04-06 Thread Davis, Daniel (NIH/NLM) [C]

OK,

I have a lot of chutzpah posting that here ;)The other guys answering the 
questions can probably explain it better.
I love showing off, however, so please forgive me.

-Original Message-
From: Davis, Daniel (NIH/NLM) [C] 
Sent: Monday, April 06, 2015 2:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Measuring QPS

Its very common to do autocomplete based on popular queries/titles over some 
sliding time window.   Some enterprise search systems even apply age weighting 
so that they don't need to re-index but continuously add to the index.   This 
way, they can do autocomplete based on what's popular these days.

We use relevance/field boosts/phrase matching etc. to get the best guess about 
what results they want to see.   This is similar - we use relevance, field 
boosting to guess what users want to search for.   Zipf's law applies to 
searches as well as results.

-Original Message-
From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
Sent: Monday, April 06, 2015 2:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Measuring QPS

Hi Daniel,

interesting - I never thought of autocompletion but for keeping track of user 
behaviour :-)

* the numbers are helpful for the online advertisement team to sell campaigns
* it is used for sanity checks - sensible queries returning no results or 
returning too many results

Cheers,

Siegfried Goeschl

> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C]  
> wrote:
> 
> Siegfried,
> 
> It is early days as yet.   I don't think we need a code drop.   AFAIK, none 
> of our current Solr applications autocomplete the search box based on popular 
> query/title keywords.   We have other applications that do that, but they 
> don't use Solr.
> 
> Thanks again,
> 
> Dan
> 
> -Original Message-
> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
> Sent: Monday, April 06, 2015 1:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring QPS
> 
> Hi Dan,
> 
> at willhaben.at (customer of mine) two SOLR components were written 
> for SOLR 3 and ported to SORL 4
> 
> 1) SlowQueryLog which dumps long-running search requests into a log 
> file
> 
> 2) Most Frequent Search Terms allowing to query & filter the most 
> frequent user search terms over the browser
> 
> Some notes along the line
> 
> 
> * For both components I have the “GO" to open source them but I never 
> had enough time to do that (shame on me) - see
> https://issues.apache.org/jira/browse/SOLR-4056
> 
> * The Most Frequent Search Term component actually mimics a SOLR 
> server you feed the user search terms so this might be a better 
> solution in the long run. But this requires to have a separate SOLR 
> core & ingest  plus GUI (check out SILK or ELK) - in other words more 
> moving parts in production :-)
> 
> * If there is sufficient interest I can make a code drop on GitHub
> 
> Cheers,
> 
> Siegfried Goeschl
> 
> 
> 
>> On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C]  
>> wrote:
>> 
>> Siegfried,
>> 
>> This is a wonderful find.   The second presentation is a nice write-up of a 
>> large number of free tools.   The first presentation prompts a question - 
>> did you add custom request handlers/code to automate determination of best 
>> user search terms?   Did any of your custom work end-up in Solr?
>> 
>> Thank you so much,
>> 
>> Dan
>> 
>> P.S. - your first presentation takes me back to seeing "Angrif der 
>> Klonkrieger" in Berlin after a conference - Hayden Christensen was less 
>> annoying in German, because my wife and I don't speak German ;)   I haven't 
>> thought of that in a while.
>> 
>> -Original Message-
>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>> Sent: Saturday, April 04, 2015 4:54 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Measuring QPS
>> 
>> Hi Dan,
>> 
>> I’m using JavaMelody for my SOLR production servers - gives you the 
>> relevant HTTP stats (what’s happening now & historical data) plus JVM 
>> monitoring as additional benefit. The servers are deployed on Tomcat 
>> so I’m of little help regarding Jetty - having said that
>> 
>> * you need two Jars (javamelody & robin)
>> * tinker with web.xml
>> 
>> Here are two of my presentations mentioning JavaMelody (plus some 
>> other stuff)
>> 
>> http://people.apache.org/~sgoeschl/presentations/solr-from-developmen
>> t
>> -to-production-20121210.pdf
>> > n
>> t-to-production-20121210.pdf>
>> http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-perfor
>> m
>> ance-monitoring.pdf
>> > r
>> mance-monitoring.pdf>
>> 
>> Cheers,
>> 
>> Siegfried Goeschl
>> 
>>> On 03 Apr 2015, at 17:53, Shawn Heisey  wrote:
>>> 
>>> On 4/3/2015 9:37 AM, Davis, Daniel (NIH/NLM) [C] wrote:
 I wanted to gather QPS for our production Solr instances, but I was 
 surprised that the Admin UI did not contain this information.   We are 
 ru

Backup within SolrCloud

2015-04-06 Thread Davis, Daniel (NIH/NLM) [C]

So, we have replication, but what if something bad is indexed into the cluster, 
or someone accidentally deletes *:* on some collection?
How do people manage backup in SolrCloud?

I'm primarily interested in smaller indexes where backup is at all feasible.   
I imagine a system such as Facebook really has to hope it stays up and maybe do 
something similar to log shipping, e.g. keep intermediate results that were fed 
into Solr.   My applications are likely to be a bit smaller, and classic system 
backups apply.

Thanks,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

RE: Measuring QPS

2015-04-06 Thread Davis, Daniel (NIH/NLM) [C]

Its very common to do autocomplete based on popular queries/titles over some 
sliding time window.   Some enterprise search systems even apply age weighting 
so that they don't need to re-index but continuously add to the index.   This 
way, they can do autocomplete based on what's popular these days.

We use relevance/field boosts/phrase matching etc. to get the best guess about 
what results they want to see.   This is similar - we use relevance, field 
boosting to guess what users want to search for.   Zipf's law applies to 
searches as well as results.

-Original Message-
From: Siegfried Goeschl [mailto:sgoes...@gmx.at] 
Sent: Monday, April 06, 2015 2:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Measuring QPS

Hi Daniel,

interesting - I never thought of autocompletion but for keeping track of user 
behaviour :-)

* the numbers are helpful for the online advertisement team to sell campaigns
* it is used for sanity checks - sensible queries returning no results or 
returning too many results

Cheers,

Siegfried Goeschl

> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C]  
> wrote:
> 
> Siegfried,
> 
> It is early days as yet.   I don't think we need a code drop.   AFAIK, none 
> of our current Solr applications autocomplete the search box based on popular 
> query/title keywords.   We have other applications that do that, but they 
> don't use Solr.
> 
> Thanks again,
> 
> Dan
> 
> -Original Message-
> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
> Sent: Monday, April 06, 2015 1:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring QPS
> 
> Hi Dan,
> 
> at willhaben.at (customer of mine) two SOLR components were written 
> for SOLR 3 and ported to SORL 4
> 
> 1) SlowQueryLog which dumps long-running search requests into a log 
> file
> 
> 2) Most Frequent Search Terms allowing to query & filter the most 
> frequent user search terms over the browser
> 
> Some notes along the line
> 
> 
> * For both components I have the “GO" to open source them but I never 
> had enough time to do that (shame on me) - see 
> https://issues.apache.org/jira/browse/SOLR-4056
> 
> * The Most Frequent Search Term component actually mimics a SOLR 
> server you feed the user search terms so this might be a better 
> solution in the long run. But this requires to have a separate SOLR 
> core & ingest  plus GUI (check out SILK or ELK) - in other words more 
> moving parts in production :-)
> 
> * If there is sufficient interest I can make a code drop on GitHub
> 
> Cheers,
> 
> Siegfried Goeschl
> 
> 
> 
>> On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C]  
>> wrote:
>> 
>> Siegfried,
>> 
>> This is a wonderful find.   The second presentation is a nice write-up of a 
>> large number of free tools.   The first presentation prompts a question - 
>> did you add custom request handlers/code to automate determination of best 
>> user search terms?   Did any of your custom work end-up in Solr?
>> 
>> Thank you so much,
>> 
>> Dan
>> 
>> P.S. - your first presentation takes me back to seeing "Angrif der 
>> Klonkrieger" in Berlin after a conference - Hayden Christensen was less 
>> annoying in German, because my wife and I don't speak German ;)   I haven't 
>> thought of that in a while.
>> 
>> -Original Message-
>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>> Sent: Saturday, April 04, 2015 4:54 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Measuring QPS
>> 
>> Hi Dan,
>> 
>> I’m using JavaMelody for my SOLR production servers - gives you the 
>> relevant HTTP stats (what’s happening now & historical data) plus JVM 
>> monitoring as additional benefit. The servers are deployed on Tomcat 
>> so I’m of little help regarding Jetty - having said that
>> 
>> * you need two Jars (javamelody & robin)
>> * tinker with web.xml
>> 
>> Here are two of my presentations mentioning JavaMelody (plus some 
>> other stuff)
>> 
>> http://people.apache.org/~sgoeschl/presentations/solr-from-developmen
>> t
>> -to-production-20121210.pdf
>> > n
>> t-to-production-20121210.pdf>
>> http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-perfor
>> m
>> ance-monitoring.pdf
>> > r
>> mance-monitoring.pdf>
>> 
>> Cheers,
>> 
>> Siegfried Goeschl
>> 
>>> On 03 Apr 2015, at 17:53, Shawn Heisey  wrote:
>>> 
>>> On 4/3/2015 9:37 AM, Davis, Daniel (NIH/NLM) [C] wrote:
 I wanted to gather QPS for our production Solr instances, but I was 
 surprised that the Admin UI did not contain this information.   We are 
 running a mix of versions, but mostly 4.10 at this point.   We are not 
 using SolrCloud at present; that's part of why I'm checking - I want to 
 validate the size of our existing setup and what sort of SolrCloud setup 
 would be needed to centralize several of them.
 
 What is the best way to gather QPS informatio

Re: Measuring QPS

2015-04-06 Thread Siegfried Goeschl

Hi Daniel,

interesting - I never thought of autocompletion but for keeping track of user 
behaviour :-)

* the numbers are helpful for the online advertisement team to sell campaigns
* it is used for sanity checks - sensible queries returning no results or 
returning too many results

Cheers,

Siegfried Goeschl

> On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C]  
> wrote:
> 
> Siegfried,
> 
> It is early days as yet.   I don't think we need a code drop.   AFAIK, none 
> of our current Solr applications autocomplete the search box based on popular 
> query/title keywords.   We have other applications that do that, but they 
> don't use Solr.
> 
> Thanks again,
> 
> Dan
> 
> -Original Message-
> From: Siegfried Goeschl [mailto:sgoes...@gmx.at] 
> Sent: Monday, April 06, 2015 1:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring QPS
> 
> Hi Dan,
> 
> at willhaben.at (customer of mine) two SOLR components were written for SOLR 
> 3 and ported to SORL 4
> 
> 1) SlowQueryLog which dumps long-running search requests into a log file
> 
> 2) Most Frequent Search Terms allowing to query & filter the most frequent 
> user search terms over the browser
> 
> Some notes along the line
> 
> 
> * For both components I have the “GO" to open source them but I never had 
> enough time to do that (shame on me) - see 
> https://issues.apache.org/jira/browse/SOLR-4056
> 
> * The Most Frequent Search Term component actually mimics a SOLR server you 
> feed the user search terms so this might be a better solution in the long 
> run. But this requires to have a separate SOLR core & ingest  plus GUI (check 
> out SILK or ELK) - in other words more moving parts in production :-)
> 
> * If there is sufficient interest I can make a code drop on GitHub 
> 
> Cheers,
> 
> Siegfried Goeschl
> 
> 
> 
>> On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C]  
>> wrote:
>> 
>> Siegfried,
>> 
>> This is a wonderful find.   The second presentation is a nice write-up of a 
>> large number of free tools.   The first presentation prompts a question - 
>> did you add custom request handlers/code to automate determination of best 
>> user search terms?   Did any of your custom work end-up in Solr?
>> 
>> Thank you so much,
>> 
>> Dan
>> 
>> P.S. - your first presentation takes me back to seeing "Angrif der 
>> Klonkrieger" in Berlin after a conference - Hayden Christensen was less 
>> annoying in German, because my wife and I don't speak German ;)   I haven't 
>> thought of that in a while.
>> 
>> -Original Message-
>> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
>> Sent: Saturday, April 04, 2015 4:54 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Measuring QPS
>> 
>> Hi Dan,
>> 
>> I’m using JavaMelody for my SOLR production servers - gives you the 
>> relevant HTTP stats (what’s happening now & historical data) plus JVM 
>> monitoring as additional benefit. The servers are deployed on Tomcat 
>> so I’m of little help regarding Jetty - having said that
>> 
>> * you need two Jars (javamelody & robin)
>> * tinker with web.xml
>> 
>> Here are two of my presentations mentioning JavaMelody (plus some 
>> other stuff)
>> 
>> http://people.apache.org/~sgoeschl/presentations/solr-from-development
>> -to-production-20121210.pdf 
>> > t-to-production-20121210.pdf> 
>> http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-perform
>> ance-monitoring.pdf 
>> > mance-monitoring.pdf>
>> 
>> Cheers,
>> 
>> Siegfried Goeschl
>> 
>>> On 03 Apr 2015, at 17:53, Shawn Heisey  wrote:
>>> 
>>> On 4/3/2015 9:37 AM, Davis, Daniel (NIH/NLM) [C] wrote:
 I wanted to gather QPS for our production Solr instances, but I was 
 surprised that the Admin UI did not contain this information.   We are 
 running a mix of versions, but mostly 4.10 at this point.   We are not 
 using SolrCloud at present; that's part of why I'm checking - I want to 
 validate the size of our existing setup and what sort of SolrCloud setup 
 would be needed to centralize several of them.
 
 What is the best way to gather QPS information?
 
 What is the best way to add information like this to the Admin UI, if I 
 decide to take that step?
>>> 
>>> As of Solr 4.1 (three years ago), request rate information is 
>>> available in the admin UI and via JMX.  In the admin UI, choose a 
>>> core from the dropdown, click on Plugins/Stats, then QUERYHANDLER, 
>>> and open the handler you wish to examine.  You have 
>>> avgRequestsPerSecond, which is calculated for the entire runtime of 
>>> the SolrCore, as well as 5minRateReqsPerSecond and 
>>> 15minRateReqsPerSecond, which are far more useful pieces of information.
>>> 
>>> https://issues.apache.org/jira/browse/SOLR-1972
>>> 
>>> Thanks,
>>> Shawn
>>> 
>> 
>

RE: Measuring QPS

2015-04-06 Thread Davis, Daniel (NIH/NLM) [C]

Siegfried,

It is early days as yet.   I don't think we need a code drop.   AFAIK, none of 
our current Solr applications autocomplete the search box based on popular 
query/title keywords.   We have other applications that do that, but they don't 
use Solr.

Thanks again,

Dan

-Original Message-
From: Siegfried Goeschl [mailto:sgoes...@gmx.at] 
Sent: Monday, April 06, 2015 1:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Measuring QPS

Hi Dan,

at willhaben.at (customer of mine) two SOLR components were written for SOLR 3 
and ported to SORL 4

1) SlowQueryLog which dumps long-running search requests into a log file

2) Most Frequent Search Terms allowing to query & filter the most frequent user 
search terms over the browser

Some notes along the line


* For both components I have the “GO" to open source them but I never had 
enough time to do that (shame on me) - see 
https://issues.apache.org/jira/browse/SOLR-4056

* The Most Frequent Search Term component actually mimics a SOLR server you 
feed the user search terms so this might be a better solution in the long run. 
But this requires to have a separate SOLR core & ingest  plus GUI (check out 
SILK or ELK) - in other words more moving parts in production :-)

* If there is sufficient interest I can make a code drop on GitHub 

Cheers,

Siegfried Goeschl



> On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C]  
> wrote:
> 
> Siegfried,
> 
> This is a wonderful find.   The second presentation is a nice write-up of a 
> large number of free tools.   The first presentation prompts a question - did 
> you add custom request handlers/code to automate determination of best user 
> search terms?   Did any of your custom work end-up in Solr?
> 
> Thank you so much,
> 
> Dan
> 
> P.S. - your first presentation takes me back to seeing "Angrif der 
> Klonkrieger" in Berlin after a conference - Hayden Christensen was less 
> annoying in German, because my wife and I don't speak German ;)   I haven't 
> thought of that in a while.
> 
> -Original Message-
> From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
> Sent: Saturday, April 04, 2015 4:54 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring QPS
> 
> Hi Dan,
> 
> I’m using JavaMelody for my SOLR production servers - gives you the 
> relevant HTTP stats (what’s happening now & historical data) plus JVM 
> monitoring as additional benefit. The servers are deployed on Tomcat 
> so I’m of little help regarding Jetty - having said that
> 
> * you need two Jars (javamelody & robin)
> * tinker with web.xml
> 
> Here are two of my presentations mentioning JavaMelody (plus some 
> other stuff)
> 
> http://people.apache.org/~sgoeschl/presentations/solr-from-development
> -to-production-20121210.pdf 
>  t-to-production-20121210.pdf> 
> http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-perform
> ance-monitoring.pdf 
>  mance-monitoring.pdf>
> 
> Cheers,
> 
> Siegfried Goeschl
> 
>> On 03 Apr 2015, at 17:53, Shawn Heisey  wrote:
>> 
>> On 4/3/2015 9:37 AM, Davis, Daniel (NIH/NLM) [C] wrote:
>>> I wanted to gather QPS for our production Solr instances, but I was 
>>> surprised that the Admin UI did not contain this information.   We are 
>>> running a mix of versions, but mostly 4.10 at this point.   We are not 
>>> using SolrCloud at present; that's part of why I'm checking - I want to 
>>> validate the size of our existing setup and what sort of SolrCloud setup 
>>> would be needed to centralize several of them.
>>> 
>>> What is the best way to gather QPS information?
>>> 
>>> What is the best way to add information like this to the Admin UI, if I 
>>> decide to take that step?
>> 
>> As of Solr 4.1 (three years ago), request rate information is 
>> available in the admin UI and via JMX.  In the admin UI, choose a 
>> core from the dropdown, click on Plugins/Stats, then QUERYHANDLER, 
>> and open the handler you wish to examine.  You have 
>> avgRequestsPerSecond, which is calculated for the entire runtime of 
>> the SolrCore, as well as 5minRateReqsPerSecond and 
>> 15minRateReqsPerSecond, which are far more useful pieces of information.
>> 
>> https://issues.apache.org/jira/browse/SOLR-1972
>> 
>> Thanks,
>> Shawn
>> 
>

Re: HDFS Locking

2015-04-06 Thread Joseph Obernberger

Looks like after 900 seconds, it times out and starts up.  I think the 
issue is that I'm using the bin/solr start/stop script, and it waits 
only 5 seconds before sending a kill -9.  In my experience with solr 
4.10.x and HDFS, that is not enough time to wait for a large shard to 
stop when using HDFS.  I've seen it take well over a minute to stop.
I'm not sure if the index is going to be missing data, or if it will be 
corrupt at this point.


-Joe

On 4/6/2015 1:35 PM, Joseph Obernberger wrote:
Having a couple issues with restarts of a 27 shard cluster using 
SolrCloud 5.0.0 and HDFS.  I'm getting errors that a lock file exists 
and the shard will not start.  When I delete the file, that shard 
starts OK.


On another shard, I'm getting the following messsage:
538220 [coreLoadExecutor-5-thread-1] INFO 
org.apache.solr.util.FSHDFSUtils  â recoverLease=false, attempt=466 on 
file=hdfs://nameservice1:8020/solr5/MAINCOLL/core_node8/data/tlog/tlog.0002971 
after 526067ms


It has been doing this for 526 seconds, and doesn't seem to be coming 
up.  I've tried restarting it several time, but it seems to be in an 
infinite loop retrying.  Help!

Thank you.

-Joe

Re: Measuring QPS

2015-04-06 Thread Siegfried Goeschl

Hi Dan,

at willhaben.at (customer of mine) two SOLR components were written for SOLR 3 
and ported to SORL 4

1) SlowQueryLog which dumps long-running search requests into a log file

2) Most Frequent Search Terms allowing to query & filter the most frequent user 
search terms over the browser

Some notes along the line


* For both components I have the “GO" to open source them but I never had 
enough time to do that (shame on me) - see 
https://issues.apache.org/jira/browse/SOLR-4056

* The Most Frequent Search Term component actually mimics a SOLR server you 
feed the user search terms so this might be a better solution in the long run. 
But this requires to have a separate SOLR core & ingest  plus GUI (check out 
SILK or ELK) - in other words more moving parts in production :-)

* If there is sufficient interest I can make a code drop on GitHub 

Cheers,

Siegfried Goeschl



> On 06 Apr 2015, at 16:25, Davis, Daniel (NIH/NLM) [C]  
> wrote:
> 
> Siegfried,
> 
> This is a wonderful find.   The second presentation is a nice write-up of a 
> large number of free tools.   The first presentation prompts a question - did 
> you add custom request handlers/code to automate determination of best user 
> search terms?   Did any of your custom work end-up in Solr?
> 
> Thank you so much,
> 
> Dan
> 
> P.S. - your first presentation takes me back to seeing "Angrif der 
> Klonkrieger" in Berlin after a conference - Hayden Christensen was less 
> annoying in German, because my wife and I don't speak German ;)   I haven't 
> thought of that in a while.
> 
> -Original Message-
> From: Siegfried Goeschl [mailto:sgoes...@gmx.at] 
> Sent: Saturday, April 04, 2015 4:54 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Measuring QPS
> 
> Hi Dan,
> 
> I’m using JavaMelody for my SOLR production servers - gives you the relevant 
> HTTP stats (what’s happening now & historical data) plus JVM monitoring as 
> additional benefit. The servers are deployed on Tomcat so I’m of little help 
> regarding Jetty - having said that
> 
> * you need two Jars (javamelody & robin)
> * tinker with web.xml
> 
> Here are two of my presentations mentioning JavaMelody (plus some other stuff)
> 
> http://people.apache.org/~sgoeschl/presentations/solr-from-development-to-production-20121210.pdf
>  
> 
> http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-performance-monitoring.pdf
>  
> 
>  
> 
> Cheers,
> 
> Siegfried Goeschl
> 
>> On 03 Apr 2015, at 17:53, Shawn Heisey  wrote:
>> 
>> On 4/3/2015 9:37 AM, Davis, Daniel (NIH/NLM) [C] wrote:
>>> I wanted to gather QPS for our production Solr instances, but I was 
>>> surprised that the Admin UI did not contain this information.   We are 
>>> running a mix of versions, but mostly 4.10 at this point.   We are not 
>>> using SolrCloud at present; that's part of why I'm checking - I want to 
>>> validate the size of our existing setup and what sort of SolrCloud setup 
>>> would be needed to centralize several of them.
>>> 
>>> What is the best way to gather QPS information?
>>> 
>>> What is the best way to add information like this to the Admin UI, if I 
>>> decide to take that step?
>> 
>> As of Solr 4.1 (three years ago), request rate information is 
>> available in the admin UI and via JMX.  In the admin UI, choose a core 
>> from the dropdown, click on Plugins/Stats, then QUERYHANDLER, and open 
>> the handler you wish to examine.  You have avgRequestsPerSecond, which 
>> is calculated for the entire runtime of the SolrCore, as well as 
>> 5minRateReqsPerSecond and 15minRateReqsPerSecond, which are far more 
>> useful pieces of information.
>> 
>> https://issues.apache.org/jira/browse/SOLR-1972
>> 
>> Thanks,
>> Shawn
>> 
>

HDFS Locking

2015-04-06 Thread Joseph Obernberger

Having a couple issues with restarts of a 27 shard cluster using 
SolrCloud 5.0.0 and HDFS.  I'm getting errors that a lock file exists 
and the shard will not start.  When I delete the file, that shard starts OK.


On another shard, I'm getting the following messsage:
538220 [coreLoadExecutor-5-thread-1] INFO 
org.apache.solr.util.FSHDFSUtils  â recoverLease=false, attempt=466 on 
file=hdfs://nameservice1:8020/solr5/MAINCOLL/core_node8/data/tlog/tlog.0002971 
after 526067ms


It has been doing this for 526 seconds, and doesn't seem to be coming 
up.  I've tried restarting it several time, but it seems to be in an 
infinite loop retrying.  Help!

Thank you.

-Joe

RE: Measuring QPS

2015-04-06 Thread Davis, Daniel (NIH/NLM) [C]

Siegfried,

This is a wonderful find.   The second presentation is a nice write-up of a 
large number of free tools.   The first presentation prompts a question - did 
you add custom request handlers/code to automate determination of best user 
search terms?   Did any of your custom work end-up in Solr?

Thank you so much,

Dan

P.S. - your first presentation takes me back to seeing "Angrif der Klonkrieger" 
in Berlin after a conference - Hayden Christensen was less annoying in German, 
because my wife and I don't speak German ;)   I haven't thought of that in a 
while.

-Original Message-
From: Siegfried Goeschl [mailto:sgoes...@gmx.at] 
Sent: Saturday, April 04, 2015 4:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Measuring QPS

Hi Dan,

I’m using JavaMelody for my SOLR production servers - gives you the relevant 
HTTP stats (what’s happening now & historical data) plus JVM monitoring as 
additional benefit. The servers are deployed on Tomcat so I’m of little help 
regarding Jetty - having said that

* you need two Jars (javamelody & robin)
* tinker with web.xml

Here are two of my presentations mentioning JavaMelody (plus some other stuff)

http://people.apache.org/~sgoeschl/presentations/solr-from-development-to-production-20121210.pdf

http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-performance-monitoring.pdf

Cheers,

Siegfried Goeschl

> On 03 Apr 2015, at 17:53, Shawn Heisey  wrote:
> 
> On 4/3/2015 9:37 AM, Davis, Daniel (NIH/NLM) [C] wrote:
>> I wanted to gather QPS for our production Solr instances, but I was 
>> surprised that the Admin UI did not contain this information.   We are 
>> running a mix of versions, but mostly 4.10 at this point.   We are not using 
>> SolrCloud at present; that's part of why I'm checking - I want to validate 
>> the size of our existing setup and what sort of SolrCloud setup would be 
>> needed to centralize several of them.
>> 
>> What is the best way to gather QPS information?
>> 
>> What is the best way to add information like this to the Admin UI, if I 
>> decide to take that step?
> 
> As of Solr 4.1 (three years ago), request rate information is 
> available in the admin UI and via JMX.  In the admin UI, choose a core 
> from the dropdown, click on Plugins/Stats, then QUERYHANDLER, and open 
> the handler you wish to examine.  You have avgRequestsPerSecond, which 
> is calculated for the entire runtime of the SolrCore, as well as 
> 5minRateReqsPerSecond and 15minRateReqsPerSecond, which are far more 
> useful pieces of information.
> 
> https://issues.apache.org/jira/browse/SOLR-1972
> 
> Thanks,
> Shawn
>

Re: Unable to update config file using zkcli or RELOAD

2015-04-06 Thread Noble Paul

The behavior has changed from Solr 5.0 onwards


Please refer to the "How does it work section" here
https://cwiki.apache.org/confluence/display/solr/Config+API

TL:DR

* Every node watches the conf set directory it is using
* Updating individual files WILL NOT trigger a config reload. BUt if you
modify the config set dir , it will trigger a reload. This is how config
API works



On Mon, Apr 6, 2015 at 2:34 AM, Shawn Heisey  wrote:

> On 4/5/2015 12:32 AM, Shai Erera wrote:
> > So, the questions that I have are:
> >
> >1. It does look like Solr re-loads cores on configuration changes, is
> >that true?
> >2. If (1) is YES, do I still need to manually invoke a collection
> RELOAD
> >explicitly after updating the configuration?
> >3. Can someone explain the errors I see in the log, even though the
> test
> >passes and as I indicated, I'm able to index more documents and
> search them?
>
> My experience with SolrCloud is somewhat limited and even more limited
> in recent versions, but it was my understanding that config/schema
> changes do not result in new behavior without an explicit reload, unless
> you are using the managed schema or managed resources API to make the
> changes.  I don't know much about how the managed APIs actually work.
>
> Trying to interpret the stacktraces makes my brain hurt.  You probably
> know a lot more about the code for those areas than I do.
>
> Thanks,
> Shawn
>
>


-- 
-
Noble Paul

filtering indexed documents with multiple filters

2015-04-06 Thread Ali Nazemian

Dear all,
Hi,
I am looking for a way to filtering lucene index with multiple conditions.
For this purpose I checked two different method of filtering search, none
of them work for me:

Using BooleanQuery:

BooleanQuery query = new BooleanQuery();
String lower = "*";
String upper = "*";
for (String fieldName : keywordSourceFields) {
  TermRangeQuery rangeQuery = TermRangeQuery.newStringRange(fieldName,
  lower, upper, true, true);
  query.add(rangeQuery, Occur.MUST);
}
TermRangeQuery rangeQuery = TermRangeQuery.newStringRange(keywordField,
lower, upper, true, true);
query.add(rangeQuery, Occur.MUST_NOT);
try {
  TopDocs results = searcher.search(query, null,
  maxNumDocs);


Using BooleanFilter:

BooleanFilter filter = new BooleanFilter();
String lower = "*";
String upper = "*";
for (String fieldName : keywordSourceFields) {
  TermRangeFilter rangeFilter =
TermRangeFilter.newStringRange(fieldName,
  lower, upper, true, true);
  filter.add(rangeFilter, Occur.MUST_NOT);
}
TermRangeFilter rangeFilter =
TermRangeFilter.newStringRange(keywordField,
lower, upper, true, true);
filter.add(rangeFilter, Occur.MUST);
try {
  TopDocs results = searcher.search(new MatchAllDocsQuery(), filter,
  maxNumDocs);

I was wondering what part of chosen queries are wrong? I am looking for
documents that for each keywordSourceFields, the field has some value AND
also has not value for keyword field. Please guide me through correcting
the corresponding query.

Best regards.

-- 
A.Nazemian

Solr 4.2.0 index corruption issue

2015-04-06 Thread Puneet Jain

Hi Guys,

I am using 4.2.0 since more than a year and since last October 2014 facing
index corruption issue. However, now it is happening everyday and have to
built a fresh index for the temporary fix. Please find the logs below where
i can see an error while replicating data from master to slave and notice
the index corruption issue at slave nodes:

2015-04-05 00:00:37,671 ERROR snapPuller-15-thread-1 [handler.SnapPuller] -
Error closing the file stream: _1re_Lucene41_0.tim
java.io.IOException: Input/output error
at java.io.RandomAccessFile.close0(Native Method)
at java.io.RandomAccessFile.close(RandomAccessFile.java:543)
at
org.apache.lucene.store.FSDirectory$FSIndexOutput.close(FSDirectory.java:494)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1223)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1117)
at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:744)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:398)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:281)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

Not getting exact solution for the same was thinking to upgrade to SOLR
4.7.0 as it uses new versions of httpcomponents and i thought that older
version have some issues. Please can someone recommend what can be done to
avoid the index corruption issue in SOLR 4.2.0.

Thanks in advance..!

Thanks & Regards,
Puneet

Spellchecker for Hindi (Indian Language) Content

2015-04-06 Thread anshumandash88

Hi,

I have been trying to make Solr Spellchecker for Indian Local Language
content (Hindi specifically) but it doesn't seem to work no matter what I
try. It could be that I am missing something small, or Solr isn't equipped
to handle spellcheck for Hindi content.

Please let me know your thoughts. Please find the details of my approach and
the output I am getting at the following link,

http://stackoverflow.com/questions/29407771/solr-spellchecker-component-unexpected-behavior-for-non-english-language

  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecker-for-Hindi-Indian-Language-Content-tp4197789.html
Sent from the Solr - User mailing list archive at Nabble.com.

39 matches

Mail list logo