Re: SOLR 4.4 - Slave always replicates full index

2014-06-25 Thread Dominik Siebel
Hey Suresh,

could you get a little more specific on what solved your problem here?
I am currently facing the same problem and am trying to find a proper
solution.
Thanks!

~ Dom


2014-02-28 7:46 GMT+01:00 sureshrk19 sureshr...@gmail.com:

 Thanks Shawn and Erick.

 I followed SOLR configuration document and modified index strategy.

 Looks good now. I haven't seen any problems in last 1 week.

 Thanks for your suggestions.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4120337.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH on Solr

2014-06-25 Thread atp
Thanks Ahmet,  Walfgang , i have installed hbase-indexer on one the server
but here also im unable to start the hbase indexer server. 

Error: Could not find or load main class com.ngdata.hbaseindexer.Main

properly set the JAVA_HOME and INDEXER_HOME environmental variables.


please guide

Thanks.
ATP 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-on-Solr-tp4143669p4143955.html
Sent from the Solr - User mailing list archive at Nabble.com.


default query operator ignored by edismax query parser

2014-06-25 Thread Johannes Siegert

Hi,

I have defined the following edismax query parser:

requestHandler name=/search class=solr.SearchHandler lst 
name=defaultsstr name=mm100%/strstr 
name=defTypeedismax/strfloat name=tie0.01/floatint 
name=ps100/intstr name=q.alt*:*/strstr 
name=q.opAND/strstr name=qffield1^2.0 field2/strstr 
name=rows10/strstr name=fl*/str/lst

/requestHandler


My search query looks like:

q=(word1 word2) OR (word3 word4)

Since I specified AND as default query operator, the query should match 
documents by ((word1 AND word2) OR (word3 AND word4)) but the query 
matches documents by ((word1 OR word2) OR (word3 OR word4)).


Could anyone explain the behaviour?

Thanks!

Johannes

P.S. The query q=(word1 word2) match all documents by (word1 AND word2)


Re: No results for a wildcard query for text_general field in solr 4.1

2014-06-25 Thread Sven Schönfeldt
Thanks for the answers.

I will try to solve my problem, by extracting the affected text and index that 
part into another string field, where the wild card query work as expected.
The Solr queries will be extend by an „OR“ with that new field, that should 
work for my case.


Yours truly, Sven 
 
Am 24.06.2014 um 17:48 schrieb Jack Krupansky j...@basetechnology.com:

 I think I am officially tired of having to explain why Solr doesn't do what 
 users expect for this query. I mean, I can accept that low level Lucene 
 should work strictly on the decomposed terms of test test-or*, but is is very 
 reasonable for users (even EXPERT users) to expect that the Solr query parser 
 will generate what the complex phrase query parser generates.
 
 See:
 https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
 
 Having to use a separate query parser for this obvious, common case is... 
 absurd.
 
 (What does Elasticsearch do for this case??)
 
 -- Jack Krupansky
 
 -Original Message- From: Erick Erickson
 Sent: Tuesday, June 24, 2014 11:38 AM
 To: solr-user@lucene.apache.org ; Ahmet Arslan
 Subject: Re: No results for a wildcard query for text_general field in solr 
 4.1
 
 Wildcards are a tough thing to get your head around. I
 think my first post on the users list was titled
 I just don't get wildcards at all or something like that...
 
 Right, wildcards aren't tokenized. So by getting your term
 through the query parsing as a single token, including the
 hyphen, when the analyzer sees that it's a wildcard
 it doesn't break on the hyphen. So it's looking for a single
 token. And since there is not single term like
 test-or123 you get no matches.
 
 I'm afraid this is just how it works. You can do something like
 replace the hyphen at the app layer. But I don't think there's
 a way to do what you want OOB.
 
 Best,
 Erick
 
 On Tue, Jun 24, 2014 at 1:55 AM, Ahmet Arslan iori...@yahoo.com.invalid 
 wrote:
 Hi Sven,
 
 StandardTokenizerFactory splits it into two pieces. You can confirm this at 
 analysis page.
 If this is something you don't want, lets us know.
 We can help you to create an analysis chain that suits your needs.
 
 Ahmet
 
 
 On Tuesday, June 24, 2014 10:39 AM, Sven Schönfeldt 
 schoenfe...@subshell.com wrote:
 Hi Erick,
 
 that is what i did, tried that input on analysis page.
 
 The index field splitting the value into two words: „test“ and „or123
 Now checking the query at analysis page, and there are the word ist 
 splitting into „test“ and „or123“.
 
 By doing the query and look into the debug result, i see that there is no 
 splitting of words. Thats what i expect…
 
 str name=rawquerystringsearchField_t:test\-or123*/str
 str name=querystringsearchField_t:test\-or123*/str
 str name=parsedquerysearchField_t:test-or123*/str
 str name=parsedquery_toStringsearchField_t:test-or123*/str
 
 Without the wildcard, the word is splitting also in two parts:
 
 str name=rawquerystringsearchField_t:test\-or123/str
 str name=querystringsearchField_t:test\-or123/str
 str name=parsedquerysearchField_t:test searchField_t:or123/str
 str name=parsedquery_toStringsearchField_t:test searchField_t:or123/str
 
 Any idea which configuration has the responsibility for that behavior?
 
 Thanks!
 
 
 
 
 
 Am 23.06.2014 um 22:55 schrieb Erick Erickson erickerick...@gmail.com:
 
 Well, you can do more than guess by looking at the admin/analysis page
 and trying your input on the field in question. That'll show you what
 actual transformations are performed.
 
 You're probably right though. Try adding debug=query to your URL to
 see what the actual parsed query looks like and compare with the
 admin/analysis page
 
 But yeah, it's a matter of getting all the parts (query parser and
 analysis chains) to do the right thing.
 
 Best,
 Erick
 
 On Mon, Jun 23, 2014 at 7:30 AM, Sven Schönfeldt
 schoenfe...@subshell.com wrote:
 Hi Solr-Users,
 
 i am trying to do a wildcard query on a dynamic textfield (_t), but don’t 
 get the right result.
 The configuration for the field type is „text_general“, the default 
 configuration:
 
 fieldType name=text_general class=solr.TextField 
 positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
   /fieldType
 
 
 Input for the textfield is test-or123 and my query looks like 
 test\-or*“.
 
 It seems that the input is allready split into two words: „test“ and 
 

Re: OOM during indexing nested docs

2014-06-25 Thread adfel70
I made two tests, one with MaxRamBuffer=128 and the second with 
MaxRamBuffer=256.
In both i got OOM.

I also made two tests on autocommit:
one with commit every 5 min, and the second with commit every 100,000 docs.
(disabled softcommit)
In both i got OOM.

merge policy - Tiered (max segment size of 5000, and merged at once = 2,
merge factor = 12).

any idea for more tests?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/OOM-during-indexing-nested-docs-tp4143722p4143966.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: default query operator ignored by edismax query parser

2014-06-25 Thread Shawn Heisey
On 6/25/2014 1:05 AM, Johannes Siegert wrote:
 I have defined the following edismax query parser:
 
 requestHandler name=/search class=solr.SearchHandler lst
 name=defaultsstr name=mm100%/strstr
 name=defTypeedismax/strfloat name=tie0.01/floatint
 name=ps100/intstr name=q.alt*:*/strstr
 name=q.opAND/strstr name=qffield1^2.0 field2/strstr
 name=rows10/strstr name=fl*/str/lst
 /requestHandler
 
 
 My search query looks like:
 
 q=(word1 word2) OR (word3 word4)
 
 Since I specified AND as default query operator, the query should match
 documents by ((word1 AND word2) OR (word3 AND word4)) but the query
 matches documents by ((word1 OR word2) OR (word3 OR word4)).
 
 Could anyone explain the behaviour?

I believe that you are running into this bug:

https://issues.apache.org/jira/browse/SOLR-2649

It's a very old bug, coming up on three years.  The workaround is to not
use boolean operators at all, or to use operators EVERYWHERE so that
your intent is explicitly described.  It is not much of a workaround,
but it does work.

Thanks,
Shawn



Re: TokenFilter not working at index time

2014-06-25 Thread Erlend Garåsen

On 24.06.14 17:33, Erick Erickson wrote:

Hmmm. It would help if you posted a couple of other
pieces of information BTW, if this is new code are you
considering donating it back? If so please open a JIRA so
we can track it, see: http://wiki.apache.org/solr/HowToContribute


All my other language improvements for the existing Norwegian stemmers 
have been donated back to Solr, so yes, if possible. I want to 
experiment a little bit before I open a ticket.



But to your question:
First couple of things I'd do:
1 see what the admin/analysis page tells you happens.


Shows correct results for index and query. The lemmatizer is enable to 
find the correct stem.



2 attach debug=query to your test case, see what the parsed
 query looks like.


Seems to be OK. Remember that the problem is related to indexing, not 
querying. I have double-checked by indexing all the documents by another 
stemmer and configured my lemmatizer only for queries. Then everything 
works as it should. Here's the query. As you can see, studentene is 
stemmed to student for two fields (content_no and title_no) which is 
correct:


BoostedQuery(boost(+(title_en:studentene^10.0 | host:studentene^30.0 | 
content_en:studentene^0.1 | content_no:student^0.1 | 
title_no:student^10.0 | anchortext_partial:studentene^70.0 | 
subjectcode:studentene^100.0 | canonicalurl:studentene^5.0)~0.2 () () () 
() () (product(int(url_toplevel),const(5)))^20.0 
(2.0/(1.0*float(int(url_levels))+1.0))^250.0 
(product(float(docrank),const(1)))^4.0 
(1.0/(3.16E-11*float(ms(const(1403686863701),date(last_modified)))+1.0))^50.0 
(product(int(url_landingpage),const(3)))^40.0,product(float(urlboost),map(query(language:no,def=0.0),0.0,0.0,1.0



3 use the admin/schema browser link for the field in question
to see what actually makes it into the index. (Or use Luke or
even the TermsComponent).


I haven't played much around with this, but is says 27 for docs if I 
select the field content_no. Does this mean that there are only 27 
documents in my index with data in this field? Then there is something 
really bad going on, because if I change to content_en, this number 
grows to 10526 (because another English stemmer is used for that field 
instead).


If I change to NorwegianMinimalStemFilter and reindex everything, the 
number grows to 28270.


By writing out debugging info from my stemmer, I just figured out that 
only the document's titles are being stemmed at index time, not the 
content itself. So I have found the root of the problem, but I'm not 
sure why the field is omitted.


Erlend


Not able to save SolrInputDocument object in Solr database

2014-06-25 Thread Srikanth GONE
Hi,

I am getting exception while saving SolrInputDocument object from Java in
client Server,

but in my local machine it works fine.

org.apache.solr.common.SolrException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:768)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:193)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
at 
com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
at 
com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
at
com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
... 22 more

My java code is 
try {
 SolrInputDocument doc = new SolrInputDocument();
String docID = doctor.getId();
doc.addField(_id, docID, 1.0f);
doc.addField(providerType, Doctor);
doc.addField(experience, 
doctor.getExperience());
doc.addField(specialities, 
doctor.getSpecialities());
doc.addField(specialty, 
doctor.getSpecialty());
doc.addField(firstName, 
doctor.getFirstName());
doc.addField(lastName, 
doctor.getLastName());
doc.addField(medicine, 
doctor.getMedicine());
doc.addField(registrationNumber, 
doctor.getRegistrationNumber());
doc.addField(registrationYear, 
doctor.getRegistrationYear());
doc.addField(description, 
doctor.getDescription());
if (doctor.getAddressInfo() != null) {
doc.addField(primaryClinic,
doctor.getAddressInfo().getPrimaryClinic());
doc.addField(address, 
doctor.getAddressInfo().getAddress());
doc.addField(state, 
doctor.getAddressInfo().getState());
doc.addField(country, 
doctor.getAddressInfo().getCountry());
doc.addField(pincode, 
doctor.getAddressInfo().getPincode());
doc.addField(mobile, 
doctor.getAddressInfo().getMobile());
doc.addField(phone1, 
doctor.getAddressInfo().getPhone1());
doc.addField(email, 

[ANNOUNCE] Apache Solr 4.9.0 released

2014-06-25 Thread Robert Muir
25 June 2014, Apache Solr™ 4.9.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.9.0

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.9.0 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Solr 4.9.0 Release Highlights:

* Numerous optimizations for doc values search-time performance

* Allow a client application to request the minium achieved replication
  factor for an update request (single or batch) by sending an optional
  parameter min_rf.

* Query re-ranking support with the new ReRankingQParserPlugin.

* A new [child ...] DocTransformer for optionally including Block-Join
  decendent documents inline in the results of a search.

* A new (default) Lucene49NormsFormat to better compress certain cases
  such as very short fields.


Solr 4.9.0 also includes many other new features as well as numerous
optimizations and bugfixes of the corresponding Apache Lucene release.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.

On behalf of the Lucene PMC,
Happy Searching


Re: Solr 4.8 result page desplay changes and highlighting

2014-06-25 Thread Erik Hatcher
Vicky - were you able to get the results page formatted how you’d like?You 
may want to tweak results_list.vm or a sub (or maybe parent?)-template from 
there to achieve what you want.

Erik



On Jun 18, 2014, at 10:02 AM, vicky vi...@raytheon.com wrote:

 Hi Everyone,
 
 I just installed solr 4.8 release and playing with DIH and Velocity
 configuration. 
 
 I am trying to change result page columns to display more # of fields and
 type of format to tabular since I have 1 rows to display on one page if
 I can in out of box configuration.
 
 I also tried highlight feature in 4.8 and out of box it is not working.
 
 Has anyone ran into this issue? Please advise,
 
 All help is appreciated in advance.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-8-result-page-desplay-changes-and-highlighting-tp4142504.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: default query operator ignored by edismax query parser

2014-06-25 Thread Johannes Siegert

Thanks Shawn!

In this case I will use operators everywhere.

Johannes


Am 25.06.2014 15:09, schrieb Shawn Heisey:

On 6/25/2014 1:05 AM, Johannes Siegert wrote:

I have defined the following edismax query parser:

requestHandler name=/search class=solr.SearchHandler lst
name=defaultsstr name=mm100%/strstr
name=defTypeedismax/strfloat name=tie0.01/floatint
name=ps100/intstr name=q.alt*:*/strstr
name=q.opAND/strstr name=qffield1^2.0 field2/strstr
name=rows10/strstr name=fl*/str/lst
/requestHandler


My search query looks like:

q=(word1 word2) OR (word3 word4)

Since I specified AND as default query operator, the query should match
documents by ((word1 AND word2) OR (word3 AND word4)) but the query
matches documents by ((word1 OR word2) OR (word3 OR word4)).

Could anyone explain the behaviour?

I believe that you are running into this bug:

https://issues.apache.org/jira/browse/SOLR-2649

It's a very old bug, coming up on three years.  The workaround is to not
use boolean operators at all, or to use operators EVERYWHERE so that
your intent is explicitly described.  It is not much of a workaround,
but it does work.

Thanks,
Shawn



Re: OOM during indexing nested docs

2014-06-25 Thread Tang, Rebecca
How big is your request size from client to server?

I ran into OOM problems too. For me the reason was that I was sending big
requests (1+ docs) at too fast a pace.

So I put a throttle on the client to control the throughput of the request
it sends to the server, and that got rid of the OOM error.


Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Library legacy.library.ucsf.edu/
E: rebecca.t...@ucsf.edu




On 6/25/14 1:45 AM, adfel70 adfe...@gmail.com wrote:

I made two tests, one with MaxRamBuffer=128 and the second with
MaxRamBuffer=256.
In both i got OOM.

I also made two tests on autocommit:
one with commit every 5 min, and the second with commit every 100,000
docs.
(disabled softcommit)
In both i got OOM.

merge policy - Tiered (max segment size of 5000, and merged at once = 2,
merge factor = 12).

any idea for more tests?



--
View this message in context:
http://lucene.472066.n3.nabble.com/OOM-during-indexing-nested-docs-tp41437
22p4143966.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr on S3FileSystem, Kosmos, GlusterFS, etc….

2014-06-25 Thread jay vyas
Hi paul.

Im not using it on S3 -- But yes - I dont think S3 would be ideal for Solr
at all.   There are several other Hadoop Compatible File Systems, however,
some of which might be ideal for certain types of SolrCloud workloads.

Anyways... would love to see a Solr wiki page on FileSystem compatibiity,
possibly an entry linking here https://wiki.apache.org/hadoop/HCFS.

In the meantime, I will update this thread if I find anything interesting
when we increase load size.



On Wed, Jun 25, 2014 at 1:34 AM, Paul Libbrecht p...@hoplahup.net wrote:

 I've always been under the impression that file-system-access-speed is
 crucial for Lucene-based storage and have always advocated to not use NFS
 for that (for which we had slowness of a factor of 5 approximately). Has
 there any performance measurement made for such a setting? Is FS-caching
 suddenly getting so much better that it is not a problem.

 Also, as far as I know S3 bills by the amount of (giga-)bytes exchanged….
 this gives plenty of room but if each starts needs to exchange a big part
 of the index from the storage to the solr server because of cache filling,
 it looks like it won't be that cheap.

 thanks for experience report.

 paul


 On 25 juin 2014, at 07:16, Jay Vyas jayunit100.apa...@gmail.com wrote:

  Hi Solr !
 
  I got this working .  Here's how :
 
  With the example jetty runner, you can Extract the tarball, and go to
 the examples/ directory, where you can launch an embedded core. Then, find
 the solrconfig.xml file. Edit it to contain the following xml:
 
  directoryFactory name=DirectoryFactory
 class=org.apache.solr.core.HdfsDirectoryFactory
  str name=solr.hdfs.homemyhcfs:///solr/str
  str name=solr.hdfs.confdir/etc/hadoop/conf/str
  /directoryFactory
 
  the confdir is important: That is where you will have something like a
 core-site.xml that defines all the parameters for your filesystem
 (fs.defaultFS, fs.mycfs.impl…. and so on).
 
 
  This tells solr, when launched, to use myhcfs as the underlying file
 store.
 
  You also should make sure that the jar for your plugin (in our case
 glisters, but hadoop will reference it by looking up the dynamically
 generated parameters that come from the base uri myhcfs… classes are on
 the class path, and the hadoop-common jar is also there (Some HCFS shims
 will need FilterFileSystem to run correctly, which is only in
 hadoop-common.jar).
 
  So - how to modify the running sold core's class path?
 
  To do so – you can update the solrconfig.xml jar directives. There are a
 bunch of regular expression templates you can modify in the
 examples/.../solrconfig.xml file. You can also copy the jars in at runtime,
 to be really safe.
 
  Once your example core with gluster configuration is setup, launch it
 with the following properties:
 
  java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs
 -Dsolr.data.dir=glusterfs:///solr -Dsolr.updatelog=glusterfs:///solr
 -Dlog4j.configuration=file:/opt/solr-4.4.0-cdh5.0.2/example/etc/logging.properties
 -jar start.jar
 
  This starts a basic SOLR server on port 8983.
 
  If you are running from the simple jetty based examples which I've used
 to describe this above, then you should see the collection1 core up and
 running, and you should see its index sitting inside the /solr directory of
 your file system.
 
  Hope this helps those interested in expanding the use of SolrCloud
 outside of a single FS.
 
 
  On Jun 23, 2014, at 6:16 PM, Jay Vyas jayunit100.apa...@gmail.com
 wrote:
 
  Hi folks.  Does anyone deploy solr indices on other HCFS
 implementations (S3FileSystem, for example) regularly ? If so I'm wondering
 
  1) Where are the docs for doing this - or examples?  Seems like
 everything, including parameter names for dfs setup, are based around
 hdfs.   Maybe I should file a JIRA similar to
 https://issues.apache.org/jira/browse/FLUME-2410 (to make the generic
 deployment of SOLR on any file system explicit / obvious).
 
  2) if there are any interesting requirements (i.e. createNonRecursive,
 Atomic mkdirs, sharing, blocking expectations etc etc) which need to be
 implemented
 




-- 
jay vyas


How much free disk space will I need to optimize my index

2014-06-25 Thread johnmunir
Hi,


I need to de-fragment my index.  My question is, how much free disk space I 
need before I can do so?  My understanding is, I need 1X free disk space of my 
current index un-optimized index size before I can optimize it.  Is this true?


That is, let say my index is 20 GB (un-optimized) then I must have 20 GB of 
free disk space to make sure the optimization is successful.  The reason for 
this is because during optimization the index is re-written (is this the case?) 
and if it is already optimized, the re-write will create a new 20 GB index 
before it deletes the old one (is this true?), thus why there must be at least 
20 GB free disk space.


Can someone help me with this or point me to a wiki on this topic?


Thanks!!!


- MJ


RE: How much free disk space will I need to optimize my index

2014-06-25 Thread Markus Jelsma


 
 
-Original message-
 From:johnmu...@aol.com johnmu...@aol.com
 Sent: Wednesday 25th June 2014 20:13
 To: solr-user@lucene.apache.org
 Subject: How much free disk space will I need to optimize my index
 
 Hi,
 
 
 I need to de-fragment my index.  My question is, how much free disk space I 
 need before I can do so?  My understanding is, I need 1X free disk space of 
 my current index un-optimized index size before I can optimize it.  Is this 
 true?

Yes, 20 GB of FREE space to force merge an existing 20 GB index.

 
 
 That is, let say my index is 20 GB (un-optimized) then I must have 20 GB of 
 free disk space to make sure the optimization is successful.  The reason for 
 this is because during optimization the index is re-written (is this the 
 case?) and if it is already optimized, the re-write will create a new 20 GB 
 index before it deletes the old one (is this true?), thus why there must be 
 at least 20 GB free disk space.
 
 
 Can someone help me with this or point me to a wiki on this topic?
 
 
 Thanks!!!
 
 
 - MJ
 


Re: Double cast exception with grouping and sort function

2014-06-25 Thread Nate Dire
 Can you provide some sample data to demonstrates the problem? (ideally
 using the 4.x example configs - but if you can't reproduce with that
 then providing your own configs would be helpful)

I repo'd using the example config (with sharding).  I was missing one
necessary condition: the schema needs a * dynamic field.
It looks like serializeSearchGroup matches the sort expression as the
* field, thus marshalling the double as TextField.

Should I enter a ticket with the full repro?

Thanks,

Nate

On Tue, Jun 24, 2014 Chris Hostetter hossman_luc...@fucit.org wrote:

 : I recently tried upgrading our setup from 4.5.1 to 4.7+, and I'm
 : seeing an exception when I use (1) a function to sort and (2) result
 : grouping.  The same query works fine with either (1) or (2) alone.
 : Example below.

 Did you modify your schema in any way when upgrading?

 Can you provide some sample data to demonstrates the problem? (ideally
 using the 4.x example configs - but if you can't reproduce with that
 then providing your own configs would be helpful)

 I was unabled to reproduce doing a quick sanity check using the example
 with a shard param to force a distrib query...

 http://localhost:8983/solr/select?q=*:*shards=localhost:8983/solrsort=sum%281,1%29%20descgroup=truegroup.field=inStock

 It's possible that the distributed grouping code has a bug in it related
 to the marshalling of sort values and i'm just not tickling that bug
 with my quick check ... but if i remember correctly work was done to fix
 grouped sorting to correctly deal with this when
 FieldType.marshalSortValue was introduced.


 : Example (v4.8.1):
 : {
 :   responseHeader: {
 : status: 500,
 : QTime: 14,
 : params: {
 :   sort: sum(1,1) desc,
 :   indent: true,
 :   q: title:solr,
 :   _: 1403586036335,
 :   group.field: type,
 :   group: true,
 :   wt: json
 : }
 :   },
 :   error: {
 : msg: java.lang.Double cannot be cast to 
 org.apache.lucene.util.BytesRef,
 : trace: 
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 : java.lang.Double cannot be cast to org.apache.lucene.util.BytesRef
 : code: 500
 :   }
 : }
 :
 : From the log:
 :
 : org.apache.solr.common.SolrException;
 : null:java.lang.ClassCastException: java.lang.Double cannot be cast to
 : org.apache.lucene.util.BytesRef
 : at 
 org.apache.solr.schema.FieldType.marshalStringSortValue(FieldType.java:981)
 : at 
 org.apache.solr.schema.TextField.marshalSortValue(TextField.java:176)
 : at 
 org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.serializeSearchGroup(SearchGroupsResultTransformer.java:125)
 : at 
 org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:65)
 : at 
 org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:43)
 : at 
 org.apache.solr.search.grouping.CommandHandler.processResult(CommandHandler.java:193)


 -Hoss
 http://www.lucidworks.com/


Re: Double cast exception with grouping and sort function

2014-06-25 Thread Chris Hostetter

: I repo'd using the example config (with sharding).  I was missing one
: necessary condition: the schema needs a * dynamic field.
: It looks like serializeSearchGroup matches the sort expression as the
: * field, thus marshalling the double as TextField.
: 
: Should I enter a ticket with the full repro?

yes please -- i remember a similar problem coming up in the past, and i 
know we account for it in the distributed sorting tests (i remember adding 
it) but i guess we missed an edge case here with the distributed group.



-Hoss
http://www.lucidworks.com/


suggest not working 4.8.1

2014-06-25 Thread Hardik P
My configs below are not returning anything in suggest!  Any pointers
please?

solrconf
searchComponent class=solr.SuggestComponent name=mysuggestion
lst name=suggester
  str name=namemysuggestion/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
  str name=fieldmysuggestion/str  !-- the indexed field to
derive suggestions from --
  float name=threshold0.0/float
  str name=buildOnCommittrue/str
!--
  str name=sourceLocationamerican-english/str
--
/lst
  /searchComponent
  requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarymysuggestion/str
  str name=spellcheck.onlyMorePopularfalse/str
  str name=spellcheck.count5/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strmysuggestion/str
/arr
  /requestHandler


 schema 

fieldType name=textspell class=solr.TextField
positionIncrementGap=100 omitNorms=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.StandardFilterFactory /
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.StandardFilterFactory /
/analyzer
/fieldType


 response EMPTY! 

response
lst name=responseHeader
int name=status0/int
int name=QTime15/int
/lst
/response


Re: Adding router.field property to an existing collection.

2014-06-25 Thread Damien Dykman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Modassar,

I ran into the same issue (Solr 4.8.1) with an existing collection set
to implicit routing but with no router.field defined. I managed to
set the router.field by modifying /clusterstate.json and pushing it
back to Zookeeper. For instance, I use field shard_name for routing.
Now, in my /clusterstate.json, I have:

router:{
  name:implicit,
  field:shard_name
}

Warning: you'll probably need to reload your collection (see Collection
API) for the change to be taken into account. Or a more brutal way,
restart your Solr nodes. Then you should see the update in
http://localhost:8983/solr/admin/collections?action=clusterstatus.

I'd be curious to know if there's a cleaner method though, rather than
modifying /clusterstate.json.

Otherwise, if you want to create a collection from scratch with implict
routing and a router.field (see Collection API), use:

http://localhost:8983/solr/admin/collections?action=CREATEname=my_collectionrouter.name=implicitrouter.field=shard_name

Good luck,
Damien

On 05/06/2014 05:59 AM, Modassar Ather wrote:
 Hi,

 I have a setup of two shard with embedded zookeeper and one collection on
 two tomcat instances. I cannot use uniqueKey i.e the compositeId routing
 for document routing as per my understanding it will change the uniqueKey.
 There is another way mentioned on Solr wiki is by using router.field. I
 could not find a way of setting it in solr.xml/other configuration file to
 get it added.

 Kindly share your suggestion on:
  How I can use router.field in an existing collection?
  Create a collection with router.field and implicit routing enabled?

 Thanks,
 Modassar


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTqyixAAoJENfoFMxpEaCCPGgH/iAyTPeWbEtdgWdLN46kP3RT
vnSzf2qFEE4bXgdyVVuuZ/dagEPYUDxn9EhSwOrzuZmJcBNpgaTP8lZtejRo6LCO
jYItfO14uq/wEczelyvb3iEAqFYdCG1hQxpmabEi1uuLvLCgwLgbgsvZ8AR7l3ci
IGdQvMnD004VRXIAqErpv8E24ChH+qD+gC7ed4FiAhKfb6fBvNmsoIqmPSRcmeZX
zXjSZJ3K/c3P+pddKaEGr6BFccb/zIK/yJ/q/ihZIr1kyBnjEBfhhlBhgSvVXBEu
l97gvyz84WO5++TGFNbNIAj9quTu6+23Rn2ohjcMpz9TA9RtVbNImoZ5wQ0qjYY=
=F0U4
-END PGP SIGNATURE-



Re: POST Vs GET

2014-06-25 Thread Sameer Maggon
Ravi,

The POST should work. Here's an example that works within tomcat.

curl -X POST --data q=*:*rows=1
http://localhost:8080/solr/collection1/select

Sameer.


On Mon, Jun 23, 2014 at 10:37 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 Hi, I am executing a solr query runs 10 to 12 lines with all the boosting
 and condition. I change the Http Contentype to POST from GET as post
 doesn't have any restriction for size. But I am getting an error. I am
 using Tomcat 7, Is there any place we need to specify in Tomcat to accept
 POST..

 FYI, From my Jetty solr version everthing works good.

 Thanks

 Ravi




-- 
*Sameer Maggon*
http://measuredsearch.com


Facet for calculated Column

2014-06-25 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi,

Is it possible to get the facet count for calculated values, along with regular 
columns.

e.g I have Price  MSRP, I like to get how many are in Sale (Price  MSRP)

Onsale (10) 
Jeans (20)
Shirts (50)


Above, Jeans  Shirts are there in Schema.xml, I can add there in the facet 
fields, How can I get the Onsale in the same hit.

Thanks

Ravi


Re: How to extend the behavior of a common text field (such as text_general) to recognize regex

2014-06-25 Thread Vinay B,
Thanks, I tried your suggestion today

1. Define a text_num fieldType
fieldType name=text_num class=solr.TextField 
  analyzer
tokenizer class=solr.PatternTokenizerFactory
pattern=\s*[0-9][0-9-\s]*[0-9]?\s* group=0/
filter class=solr.TrimFilterFactory/
  /analyzer
/fieldType

2. Define a new text field to capture numerical data and link it to the
text field via a copyField
field name=text_il type=text_num indexed=true stored=false
multiValued=true /
copyField source=text dest=text_il maxChars=3 /

3. Restart the server and reindex my test data

As you can see from a simple analysis test on text copied from my test
document (see screenshot), the field and the regex work as expected
http://i.imgur.com/o4y2Q9u.png  However, when I try and use the same query
(for the text_il field, not even trying to combine queries across fields)
using the edismax parser, I don't get any hits. Also. when I searched the
forums and JIRA, I came across these two

https://issues.apache.org/jira/browse/SOLR-6009
http://lucene.472066.n3.nabble.com/Regex-with-local-params-is-not-working-tt4138257.html


So my questions are:
1. Do the dismax / edismax parser even support regex syntax ?
2. Am I doing something wrong ?

Results

Regex usiing the default parser works
{ responseHeader: { status: 0, QTime: 4, params: { indent: true,
q: text_il:/.*[7-8].*/, _: 1403729219835, wt: json } }, 
response: { numFound: 1, start: 0, docs: [ { id: 1, content_type:
parentDocument, _version_: 1471911225402065000 } ] } }

Whereas using the edismax parser, it doesn't return any hits. I used this
link as a guide to forming my queries
http://lucidworks.lucidimagination.com/display/solr/The+Extended+DisMax+Query+Parser

{ responseHeader: { status: 0, QTime: 3, params: { 
lowercaseOperators: true, pf: text_il, indent: true, q:
/.*[7-8].*/, qf: text_il, _: 1403729594057, stopwords: true, 
wt: json, defType: edismax } }, response: { numFound: 0, start:
0, docs: [] } }

Debug-enabled query at
https://gist.github.com/anonymous/625e7669918deba4a071

Thanks





On Tue, Jun 24, 2014 at 7:35 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 What about copyField'ing the content into the second field where you
 apply the alternative processing. Than eDismax searching both. Don't
 have to store the other field, just index.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Wed, Jun 25, 2014 at 5:55 AM, Vinay B, vybe3...@gmail.com wrote:
  Sorry, previous post got sent prematurely.
 
  Here is the complete post:
 
  This is easy if I only reqdefine a custom field to identify the desired
  patterns (numbers, in my case)
 
  For example, I could define a field thus:
  !-- A text field that identifies numberical entities--
  fieldType name=text_num class=solr.TextField 
analyzer
  tokenizer class=solr.PatternTokenizerFactory
  pattern=\s*[0-9][0-9-]*[0-9]?\s* group=0/
/analyzer
  /fieldType
 
  Input:
  hello, world bye 123-45 abcd  sdfssdf --- aaa
 
  Output:
  123-45 , 
 
  However, I also want to retain the behavior of the default text_general
  field , that is recognize the usual text tokens (hello, world, bye etc
  ...). What is the best way to achieve this.
  I've looked at PatternCaptureGroupFilterFactory (
 
 http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternCaptureGroupFilterFactory.html
  ) but I suspect that it too is subject to the behavior of the prior
  tokenizer (which for text_general is StandardTokenizerFactory ).
 
  Thanks
 
 
 



Re: Calculating filterCache size

2014-06-25 Thread Benjamin Wiens
Thank you for your help!

I wrote an article on Performance Testing Solr filterCache Shedding Light
on Apache Solr filterCache for VuFind that I am hoping to get published.

https://docs.google.com/document/d/1vl-nmlprSULvNZKQNrqp65eLnLhG9s_ydXQtg9iML10

Anyone can comment and I would highly appreciate this! My biggest fear is
to have something inaccurate about filterCache or Solr in general in there.
Any and all suggestions welcome!

Thanks again,
Ben


On Thu, Jun 19, 2014 at 3:42 PM, Erick Erickson erickerick...@gmail.com
wrote:

 That's specific to using the facet.method=enum, but do admit it's easy
 to miss that.

 I added a note about that though...

 Thanks for pointing that out!


 On Thu, Jun 19, 2014 at 9:38 AM, Benjamin Wiens
 benjamin.wi...@gmail.com wrote:
  Thanks to both of you. Yes the mentioned config is illustrative, we
 decided
  for 512 after thorough testing. However, when you google Solr
 filterCache
  the first link is the community wiki which has a config even higher than
  the illustration which is quite different from the official reference
  guide. It might be a good idea to change this unless there's a very small
  index.
 
  http://wiki.apache.org/solr/SolrCaching#filterCache
 
  filterCache  class=solr.LRUCache  size=16384
  initialSize=4096  autowarmCount=4096/
 
 
 
 
 
 
  On Thu, Jun 19, 2014 at 9:48 AM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  Ben:
 
  As Shawn says, you're on the right track...
 
  Do note, though, that a 10K size here is probably excessive, YMMV of
  course.
 
  And an autowarm count of 5,000 is almost _certainly_ far more than you
  want. All these fq clauses get re-executed whenever a new searcher is
  opened (soft commit or hard commit with openSearcher=true). I realize
  this may just be illustrative. Is this your actual setup? And if so,
  what is your motivation for 5,000 autowarm count?
 
  Best,
  Erick
 
  On Wed, Jun 18, 2014 at 11:42 AM, Shawn Heisey s...@elyograg.org
 wrote:
   On 6/18/2014 10:57 AM, Benjamin Wiens wrote:
   Thanks Erick!
   So let's say I have a config of
  
   filterCache
   class=solr.FastLRUCache
   size=1
   initialSize=1
   autowarmCount=5000/
  
   MaxDocuments = 1,000,000
  
   So according to your formula, filterCache should roughly have the
  potential
   to consume this much RAM:
   ((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 =
   1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb
  
   Yes, this is essentially correct.  If you want to arrive at a number
   that's more accurate for the way that OS tools will report memory,
   you'll divide by 1024 instead of 1000 for each of the larger units.
   That results in a size of 1.16GB instead of 1.25.  Computers think in
   powers of 2, dividing by 1000 assumes a bias to how people think, in
   powers of 10.  It's the same thing that causes your computer to report
   931GB for a 1TB hard drive.
  
   Thanks,
   Shawn
  
 



Am I being dense? Or are real-time gets not exposed in SolrJ?

2014-06-25 Thread Michael Della Bitta
The subject line kind of says it all... this is the latest thing we have
noticed that doesn't seem to have made it in. Am I missing something?

Other awkwardness was doing a deleteByQuery against a collection other than
the defaultCollection, and trying to share a CloudSolrServer among
different objects that were writing and reading against multiple
collections.

We managed to hack around the former by doing it with an UpdateRequest. I'm
wondering if a valid solution to the latter is actually to create one
CloudSolrServer, rip the zkStateReader out of it, and stuff it in
subsequent ones. Is that a bad idea? It seems like there might be some
overhead to having several going in the same process that could be avoided,
but maybe I'm overcomplicating things.

Thanks,

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


Re: SolrCloud multiple data center support

2014-06-25 Thread Arcadius Ahouansou
I have just created https://issues.apache.org/jira/browse/SOLR-6205
I hope the description makes sens.

Thanks.

Arcadius.



On 23 June 2014 18:49, Mark Miller markrmil...@gmail.com wrote:

 We have been waiting for that issue to be finished before thinking too
 hard about how it can improve things. There have been a couple ideas (I’ve
 mostly wanted it for improving the internal zk mode situation), but no
 JIRAs yet that I know of.
 --
 Mark Miller
 about.me/markrmiller

 On June 23, 2014 at 10:37:27 AM, Arcadius Ahouansou (arcad...@menelic.com)
 wrote:

 On 3 February 2014 22:16, Daniel Collins danwcoll...@gmail.com wrote:

 
  One other option is in ZK trunk (but not yet in a release) is the ability
  to dynamically reconfigure ZK ensembles (
  https://issues.apache.org/jira/browse/ZOOKEEPER-107). That would give
 the
  ability to create new ZK instances in the event of a DC failure, and
  reconfigure the Solr Cloud without having to reload everything. That
 would
  help to some extent.
 


 ZOOKEEPER-107 has now been implemented.
 I checked the Solr Jira and it seems there is nothing for multi-data-center
 support.

 Do we need to create a ticket or is there already one?

 Thanks.

 Arcadius.




-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---


RC for 4.9 Solr Ref-Guide immenient, please help look for formatting mistakes

2014-06-25 Thread Chris Hostetter


FYI: The current plan is to call a vote for the 4.9 Solr Ref Guide
sometime tomorrow (2014-06-26) morning (~11AM UTC-0500 maybe?)

The main thing we are currently waiting on is that sarowe is working on a
simple page to document using Solr with SSL -- but now would be a great
time for folks to help review the existing documentation for typos or
formatting glitches.

If you have some time, please download the PDF from the following and 
review it as much as you can -- if you notice any problems, please 
feel free to reply to this message (with page # please), or post a comment 
on the affected cwiki page (if you know what it is)...


https://people.apache.org/~hossman/tmp/solr_4.9_shrunk__250614-2148-22971.pdf

Thanks!




-Hoss
http://www.lucidworks.com/


Re: Am I being dense? Or are real-time gets not exposed in SolrJ?

2014-06-25 Thread Shawn Heisey
On 6/25/2014 3:27 PM, Michael Della Bitta wrote:
 The subject line kind of says it all... this is the latest thing we have
 noticed that doesn't seem to have made it in. Am I missing something?

This code:

SolrServer server;

server = new HttpSolrServer(http://server:port/solr/corename;);
((HttpSolrServer) server).setMaxRetries(1);
((HttpSolrServer) server).setConnectionTimeout(5000);

SolrQuery q = new SolrQuery();
q.setRequestHandler(/get);
q.set(id,ai_spa509997);
System.out.println(q);
QueryResponse r = server.query(q);
System.out.println(r);

Produced this output:

qt=%2Fgetid=ai_spa509997
{doc=SolrDocument{location=PARIS,FRANCE,
photographer_id=22213,[lots_redacted]}}

 Other awkwardness was doing a deleteByQuery against a collection other than
 the defaultCollection, and trying to share a CloudSolrServer among
 different objects that were writing and reading against multiple
 collections.

If you set the collection parameter on a request to the name of the
collection you want to query/update, that should do what you're after. 
You'll need to do all changes with an UpdateRequest object -- the
syntactic sugar methods (add, deleteByQuery, etc) don't handle cases
where you need to set parameters on the request.

 We managed to hack around the former by doing it with an UpdateRequest. I'm
 wondering if a valid solution to the latter is actually to create one
 CloudSolrServer, rip the zkStateReader out of it, and stuff it in
 subsequent ones. Is that a bad idea? It seems like there might be some
 overhead to having several going in the same process that could be avoided,
 but maybe I'm overcomplicating things.

Another possibility is to create multiple CloudSolrServer objects and
use 'setDefaultCollection' on each of them, but that seems like complete
overkill unless you've got a small number of collections.  If you are
absolutely sure that you won't have multiple threads using the
CloudSolrServer object, you could call setDefaultCollection before each
use ... but IMHO that's sloppy coding.

Thanks,
Shawn



Sorting date fields

2014-06-25 Thread Paweł Gdula
Hey

I try to sort my documents over creation date

field name=created type=date indexed=true stored=false
multiValued=false/
fieldType name=date class=solr.TrieDateField precisionStep=0
positionIncrementGap=0/

I see that result is affected by sorting order (ASC/DESC change order) but
result is not precise. For example for query

params={mm=2pf=tags^10+title^5sort=created+ascq=queryqf=tags^10+title^5wt=javabinversion=2defType=edismaxrows=10}

result is like:

2007-05-14
2007-08-13
2007-05-13
2008-03-26
2008-03-19
2007-07-02
...

general direction is ascending but between singe documents it fluctuate. Is
there any way to get strict ordering? Can anybody point me/explain me
current behavior?

Best

Pawel


Re: Sorting date fields

2014-06-25 Thread Chris Hostetter

: I see that result is affected by sorting order (ASC/DESC change order) but
: result is not precise. For example for query
: 
: 
params={mm=2pf=tags^10+title^5sort=created+ascq=queryqf=tags^10+title^5wt=javabinversion=2defType=edismaxrows=10}

those results don't really make sense -- can you please show us the full  
complete output you see in your browser from this query...

mm=2pf=tags^10+title^5sort=created+ascq=queryqf=tags^10+title^5defType=edismaxrows=10fl=id,createdwt=jsonindent=trueechoParams=all


-Hoss
http://www.lucidworks.com/


Crawl-Delay in robots.txt and fetcher.threads.per.queue property in Nutch

2014-06-25 Thread S.L
Hello All

If I set fetcher.threads.per.queue property to more than 1 , I believe the 
behavior would be to have those many number of threads per host from Nutch, in 
that case would Nutch still respect the Crawl-Delay directive in robots.txt and 
not crawl at a faster pace that what is specified in robots.txt.

In short what I am trying to ask is if setting fetcher.threads.per.queue to 1 
is required for being as polite as Crawl-Delay in robots.txt expects?

Thx



Re: SOLR 4.4 - Slave always replicates full index

2014-06-25 Thread Erick Erickson
Dominik:

If you optimize your index, then the entire thing will be replicated
from the master to the slave every time. In general, optimizing isn't
necessary even though it sounds like something that's A Good Thing.

I suspect that's the nub of the issue.

Erick

On Tue, Jun 24, 2014 at 11:14 PM, Dominik Siebel m...@dsiebel.de wrote:
 Hey Suresh,

 could you get a little more specific on what solved your problem here?
 I am currently facing the same problem and am trying to find a proper
 solution.
 Thanks!

 ~ Dom


 2014-02-28 7:46 GMT+01:00 sureshrk19 sureshr...@gmail.com:

 Thanks Shawn and Erick.

 I followed SOLR configuration document and modified index strategy.

 Looks good now. I haven't seen any problems in last 1 week.

 Thanks for your suggestions.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4120337.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR 4.4 - Slave always replicates full index

2014-06-25 Thread Shalin Shekhar Mangar
Note that this problem can also happen if the RealTimeGet handler is
missing from your solrconfig.xml because PeerSync will always fail and a
full replication will be triggerred. I added warn-level logging to complain
when this happens but it is possible that you are using an older version of
Solr which does not have that logging.


On Thu, Jun 26, 2014 at 5:27 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Dominik:

 If you optimize your index, then the entire thing will be replicated
 from the master to the slave every time. In general, optimizing isn't
 necessary even though it sounds like something that's A Good Thing.

 I suspect that's the nub of the issue.

 Erick

 On Tue, Jun 24, 2014 at 11:14 PM, Dominik Siebel m...@dsiebel.de wrote:
  Hey Suresh,
 
  could you get a little more specific on what solved your problem here?
  I am currently facing the same problem and am trying to find a proper
  solution.
  Thanks!
 
  ~ Dom
 
 
  2014-02-28 7:46 GMT+01:00 sureshrk19 sureshr...@gmail.com:
 
  Thanks Shawn and Erick.
 
  I followed SOLR configuration document and modified index strategy.
 
  Looks good now. I haven't seen any problems in last 1 week.
 
  Thanks for your suggestions.
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4120337.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 




-- 
Regards,
Shalin Shekhar Mangar.


Spellchecker causing 500 (ISE)

2014-06-25 Thread Aman Tandon
Hi,

We are getting the results for the query but the spellchecker component is
returning 500. Please help us out.

*query*: http://localhostt:8111/solr/srch/select?q=malerkotlaqt=search

*Error:*

 trace:java.lang.StringIndexOutOfBoundsException: String index out of 
 range: -5

 \tat java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
 \tat java.lang.StringBuilder.replace(StringBuilder.java:266)
 \tat 
 org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)

 \tat 
 org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
 \tat 
 org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230)
 \tat 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197)

 \tat 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
 \tat 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 \tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)

 \tat 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
 \tat 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
 \tat 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)

 \tat 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 \tat 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 \tat 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)

 \tat 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 \tat 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 \tat 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)

 \tat org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
 \tat 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 \tat 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)

 \tat 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
 \tat 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 \tat 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)

 \tat 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 \tat 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 \tat java.lang.Thread.run(Thread.java:745)


 The suggestions when i query with the separate words (maler  kotla):
http://localhostt:8111/solr/srch/select?q=maler%20kotlaqt=search

  facet_counts:{
 facet_queries:{},
 facet_fields:{
   city:[
 maler kotla,2,
 ludhiana,1],
   datatype:[
 company,2,
 product,1]},
 facet_dates:{},
 facet_ranges:{}},
   spellcheck:{
 suggestions:[
   maler,{
 numFound:7,
 startOffset:0,
 endOffset:5,
 origFreq:9,
 suggestion:[{
 word:maker,
 freq:19751},
   {
 word:mailer,
 freq:1439},
   {
 word:mayer,
 freq:271},
   {
 word:mater,
 freq:214},
   {
 word:malar,
 freq:183},
   {
 word:maier,
 freq:123},
   {
 word:male,
 freq:32169}]},
   kotla,{
 numFound:3,
 startOffset:6,
 endOffset:11,
 origFreq:30,
 suggestion:[{
 word:koala,
 freq:282},
   {
 word:kota,
 freq:5355},
   {
 word:kola,
 freq:861}]},
   correctlySpelled,true,
   collation,maker koala]}}


Full Response for erroed url :
http://localhostt:8111/solr/srch/select?q=malerkotlaqt=search

 {
   responseHeader:{
 status:500,
 QTime:49},
   grouped:{
 glusrid:{
   matches:2802,
   ngroups:314,
   groups:[]}},
   facet_counts:{
 facet_queries:{},
 facet_fields:{
   city:[
 maler kotla,311,
 bengaluru,1,
 ludhiana,1,
 mohali,1],
   datatype:[
 company,162,
 product,146,
 offer,6]},
 facet_dates:{},
 facet_ranges:{}},
   error:{
 msg:String index out of range: -5,
 trace:java.lang.StringIndexOutOfBoundsException: String index out of 
 range: -5\n\tat 
 java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)\n\tat 
 java.lang.StringBuilder.replace(StringBuilder.java:266)\n\tat 
 org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)\n\tat
  
 org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)\n\tat