Re: How to find the ordinal for a numeric doc value
Hello, Giving the code https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/schema/TrieField.java#L727 it creates NumericDocValuesField only. try to define field as multivalued, giving the code it creates SortedSetDocValuesField. On Wed, Aug 19, 2015 at 11:13 PM, tedsolr tsm...@sciquest.com wrote: One error (others perhaps?) in my statement ... the code searcher.getLeafReader().getSortedDocValues(field) just returns null for numeric and date fields. That is why they appear to be ignored, not that the ordinals are all absent or equivalent. But my question is still valid I think! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-find-the-ordinal-for-a-numeric-doc-value-tp4224018p4224037.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: How to find the ordinal for a numeric doc value
I see. The UninvertingReader even throws an IllegalStateException if you try read a numeric field as a sorted doc values. I may have to index extra fields to support my document collapsing scheme. Thanks for responding. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-find-the-ordinal-for-a-numeric-doc-value-tp4224018p4224255.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to add second Zookeeper to same machine?
You might want to look into the following documentation. These documents have explanation on how to setup Zookeeper ensemble and Zookeeper administration. https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html Regards, Modassar On Thu, Aug 20, 2015 at 1:19 PM, Merlin Morgenstern merlin.morgenst...@gmail.com wrote: I am running 2 dedicated servers on which I plan to install Solrcloud with 2 solr nodes and 3 ZK. From Stackoverflow I learned that the best method for autostarting zookeeper on ubuntu 14.04 is to install it via apt-get install zookeeperd. I have that running now. How could I add a second zookeeper to one machine? The config only allows one. Or if this is not possible, what would be the recommended way to get 3 ZK on 2 dedicated running? I have followed a tutorial where I have that setup available va bash script, but it seems that the ubuntu zookeeper setup is robust as it offers zombie processes and a startup script as well. Thank you for any help on this.
How to configure solr to not bind at 8983
I changed the solr listen port in the solr.in.sh file in my solr home directory by setting the variable: SOLR_PORT=. But Solr is still trying to also listen on 8983 because it gets started with the -DSTOP.PORT=8983 variable. What is this -DSTOP.PORT variable for and where should I configure it? I ran the install_solr_service.sh script to setup solr and changed the SOLR_PORT afterwards. best regards. Samy
How to add second Zookeeper to same machine?
I am running 2 dedicated servers on which I plan to install Solrcloud with 2 solr nodes and 3 ZK. From Stackoverflow I learned that the best method for autostarting zookeeper on ubuntu 14.04 is to install it via apt-get install zookeeperd. I have that running now. How could I add a second zookeeper to one machine? The config only allows one. Or if this is not possible, what would be the recommended way to get 3 ZK on 2 dedicated running? I have followed a tutorial where I have that setup available va bash script, but it seems that the ubuntu zookeeper setup is robust as it offers zombie processes and a startup script as well. Thank you for any help on this.
Re: How to configure solr to not bind at 8983
I think you need to add the port number in solr.xml too under hostPort attribute. STOP.PORT is SOLR.PORT-1000 and set under SOLR_HOME/bin/solr file. As far as I understand this can not be changed but I am not sure. Regards, Modassar On Thu, Aug 20, 2015 at 11:39 AM, Samy Ateia samyat...@hotmail.de wrote: I changed the solr listen port in the solr.in.sh file in my solr home directory by setting the variable: SOLR_PORT=. But Solr is still trying to also listen on 8983 because it gets started with the -DSTOP.PORT=8983 variable. What is this -DSTOP.PORT variable for and where should I configure it? I ran the install_solr_service.sh script to setup solr and changed the SOLR_PORT afterwards. best regards. Samy
How to configure solr to not bind at 8983
I changed the solr listen port in the solr.in.sh file in my solr home directory by setting the variable: SOLR_PORT=. But Solr is still trying to also listen on 8983 because it gets started with the -DSTOP.PORT=8983 variable. What is this -DSTOP.PORT variable for and where should I configure it? I ran the install_solr_service.sh script to setup solr and changed the SOLR_PORT afterwards. best regards. Samy
RE: Performance issue with FILTER QUERY
Thanks Erick. Even 1 second commit interval is fine for us. But in that case also filter cache will be flushed after 1 sec. The end user will still feel slowness due to this as the query will take around 1 sec if we use filter query. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 20 August 2015 00:44 To: solr-user@lucene.apache.org Subject: Re: Performance issue with FILTER QUERY If you're committing that rapidly then you're correct, filter caching may not be a good fit. The entire _point_ of filter caching is to increase performance of subsequent executions of the exact same fq clause. But if you're throwing them away every second there's little/no benefit. You really have two choices here 1 lengthen out the commit interval. Frankly, 1 second commit intervals are rarely necessary despite what your product manager says. Really, check this requirement out. 2 disable caches. Autowarming is potentially useful here, but if your filter queries are taking on the order of a second and you're committing every second then autowarming takes too long to help. Best, Erick On Wed, Aug 19, 2015 at 12:26 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Maulin, Did you check performance with segmented filters which I advised recently? On Wed, Aug 19, 2015 at 10:24 AM, Maulin Rathod mrat...@asite.com wrote: As per my understanding caches are flushed every time when add new document to collection (we do soft commit at every 1 sec to make newly added document available for search). Due to which it is not effectively uses cache and hence it slow every time in our case. -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: 19 August 2015 12:16 To: solr-user@lucene.apache.org Subject: Re: Performance issue with FILTER QUERY On Wed, 2015-08-19 at 05:55 +, Maulin Rathod wrote: SLOW WITH FILTER QUERY (takes more than 1 second) q=+recipient_id:(4042) AND project_id:(332) AND resource_id:(13332247 13332245 13332243 13332241 13332239) AND entity_type:(2) AND -action_id:(20 32) == This returns 5 records fq=+action_status:(0) AND is_active:(true) == This Filter Query returns 9432252 records The fq is evaluated independently of the q: For the fq a bitset is allocated, filled and stored in cache. Then the q is evaluated and the two bitsets are merged. Next time you use the same fq, it should be cached (if you have caching enabled) and be a lot faster. Also, if you ran your two tests right after each other, the second one benefits from disk caching. If you had executed them in reverse order, the q+fq might have been the fastest one. - Toke Eskildsen, State and University Library, Denmark -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Solr: How to index range-pair fields?
My scenario is something like this: I have a students database. I want to query all the students who were either `absent` or `present` during a particular `date-range`. For example: Student X was `absent` between dates: Jan 1, 2015 and Jan 15, 2015 Feb 13, 2015 and Feb 16, 2015 March 19, 2015 and March 25, 2015 Also X was `present` between dates: Jan 25, 2015 and Jan 30, 2015 Feb 1, 2015 and Feb 12, 2015 (Other days were either school holidays or the teacher was either lazy/forgot to take the attendance ;) If the date range was only a single-valued field then this approach would work: http://stackoverflow.com/questions/25246204/solr-query-for-documents-whose-from-to-date-range-contains-the-user-input. I have multiple-date ranges for each student, so this would not work for my use-case. Lucent 5.0 has support for `DateRangeField` (http://lucene.apache.org/solr/5_0_0/solr-core/index.html?org/apache/solr/schema/DateRangeField.html ) which is perfect for my use-case, but I cannot upgrade to 5.0 yet! I am on Lucene 4.1.0. David Smiley had mentioned that it would be ported to 4.x but I guess it never happened (https://issues.apache.org/jira/browse/SOLR-6103, I can try porting this patch my-self but I would like to know what it takes and opinions) So basically, I need to maintain relationship between the start and end dates for each of the `state`s (absence or presence). So I thought I would need to index the fields as pairs as mentioned here: http://grokbase.com/t/lucene/solr-user/128r96vwz6/how-do-i-represent-a-group-of-customer-key-value-pairs I guess my schema would look like: fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ field name=state type=string indexed=true stored=true multiValued=true/ dynamicField name=presenceStartTime_* type=tdate indexed=true stored=true/ dynamicField name=presenceEndTime_* type=tdate indexed=true stored=true/ dynamicField name=absenceStartTime_* type=tdate indexed=true stored=true/ dynamicField name=absenceEndTime_* type=tdate indexed=true stored=true/ **Question #1:** Does this look correct ? **Question #2:** What are the ramifications if I use `tlong` instead of `tdate` ? My `tlong` type looks like this: fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ **Question #3:** So in this case, for the query: get all the students who were absent between a date range would the query would look something similar to this ? (state: absent) AND (absenceStartTime1: givenLowerBoundDate) AND (absenceStartTime2: givenLowerBoundDate) AND (absenceStartTime3: givenLowerBoundDate) AND (absenceEndTime1: givenUpperBoundDate) AND (absenceEndTime2: givenUpperBoundDate) AND (absenceEndTime3: givenUpperBoundDate) This would work only if I knew that there were 3 dates in which the student was absent before hand and there's no way to query all dynamic fields with wild-cards according to http://stackoverflow.com/questions/6213184/solr-search-query-for-dynamic-fields-indexed **Question #4:** The workaround mentioned in one of the answers in that question did not look terrible but seemed a bit complicated. Is there a better alternative for solving this problem in Solr ? Of course, I would be highly interested in any other better approaches. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369.html Sent from the Solr - User mailing list archive at Nabble.com.
Remove duplicate suggestions in Solr
Hi, I would like to check, is there anyway to remove duplicate suggestions in Solr? I have several documents that looks very similar, and when I do a suggestion query, it came back with all same results. I'm using Solr 5.2.1 This is my suggestion pipeline: requestHandler name=/suggest class=solr.SearchHandler lst name=defaults !-- Browse specific stuff -- str name=echoParamsall/str str name=wtjson/str str name=indenttrue/str !-- Everything below should be identical to ac handler above -- str name=defTypeedismax/str str name=rows10/str str name=flid, score/str !--str name=qftextsuggest^30 extrasearch^30.0 textng^50.0 phonetic^10/str-- !--str name=qfcontent^50 title^50 extrasearch^30.0 textng^1.0 textng2^200.0/str-- str name=qfcontent^50 title^50 extrasearch^30.0/str str name=pftextnge^50.0/str !--str name=bfproduct(log(sum(popularity,1)),100)^20/str-- !-- Define relative importance between types. May be overridden per request by e.g. personboost=120 -- str name=boostproduct(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost)/str double name=typeboost1.0/double str name=type1querycontent_type:application/pdf/str double name=type1boost0.9/double str name=type2querycontent_type:application/msword/str double name=type2boost0.5/double str name=type3querycontent_type:NA/str double name=type3boost0.0/double str name=type4querycontent_type:NA/str double name=type4boost0.0/double str name=hlon/str str name=hl.flid, textng, textng2, language_s/str str name=hl.highlightMultiTermtrue/str str name=hl.preserveMultitrue/str str name=hl.encoderhtml/str !--str name=f.content.hl.fragsize80/str-- str name=hl.fragsize50/str str name=debugQueryfalse/str /lst /requestHandler This is my query: http://localhost:8983/edm/chinese2/suggest?q=do our bestdefType=edismaxqf=content^5 textng^5pf=textnge^50pf2=content^20 textnge^50pf3=content^40%20textnge^50ps2=2ps3=2stats.calcdistinct=true This is the suggestion result: highlighting:{ responsibility001:{ id:[responsibility001], textng:[We will strive to emdo/em emour/em embest/em. lt;brgt; ], responsibility002:{ id:[responsibility002], textng:[We will strive to emdo/em emour/em embest/em. lt;brgt; ], responsibility003:{ id:[responsibility003], textng:[We will strive to emdo/em emour/em embest/em. lt;brgt; ], responsibility004:{ id:[responsibility004], textng:[We will strive to emdo/em emour/em embest/em. lt;brgt; ], responsibility005:{ id:[responsibility005], textng:[We will strive to emdo/em emour/em embest/em. lt;brgt; ], responsibility006:{ id:[responsibility006], textng:[We will strive to emdo/em emour/em embest/em. lt;brgt; ], responsibility007:{ id:[responsibility007], textng:[We will strive to emdo/em emour/em embest/em. lt;brgt; ], responsibility008:{ id:[responsibility008], textng:[We will strive to emdo/em emour/em embest/em. lt;brgt; ], responsibility009:{ id:[responsibility009], textng:[We will strive to emdo/em emour/em embest/em. lt;brgt; ], responsibility010:{ id:[responsibility010], textng:[We will strive to emdo/em emour/em embest/em. lt;brgt; ], Regards, Edwin
Bug in query elevation transformers SOLR-7953
Hey guys, I just logged this bug and I wanted to raise awareness. If you use the QueryElevationComponent, and ask for fl=[elevated], you'll get only false if solr is using LazyDocuments. This looks even stranger when you request exclusive=true and you only get back elevated documents, and they all say false. I'm not sure how often LazyDocuments are used, but it's probably not an uncommon issue. Ryan
Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
On 7/8/2015 6:13 PM, Yonik Seeley wrote: On Wed, Jul 8, 2015 at 6:50 PM, Shawn Heisey apa...@elyograg.org wrote: After the fix (with luceneMatchVersion at 4.9), both aaa and bbb end up at position 2. Yikes, that's definitely wrong. I have filed LUCENE-6889 for this problem. I'd like to write a unit test that demonstrates the problem, but Lucene internals are a mystery to me. I have a concise and repeatable manual test (using Solr) outlined in this comment: https://issues.apache.org/jira/browse/LUCENE-6689?focusedCommentId=14705543page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14705543 Is there an existing Lucene test class that I could use as a basis for a test? I will look into tests for analysis components and try to build it on my own, but any help is appreciated. Thanks, Shawn
SOLR to SOLR communication with custom authentication
Hi All, We have cluster environment on JBOSS, All of our deployed applications are protected by OpenAM including SOLR. On Slave nodes we enabled SOLR to communicate with master nodes to get data. Since the SOLR on master is protected with OpenAM slave can't talk to it. In Solr.xml there is a way to configure replication requests to use basic HTTP authentication but not to use custom authentication. I have tried to override ReplicationHandler and SnapPuller classes to use provide custom authentication but I couldn't. I have tried to follow instructions at https://wiki.apache.org/solr/SolrSecurity but I could not find the classes org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.SubRequestFactory and org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.SubRequestFactory. Have anyone of you used custom authentication before for replocation ? Any help would be greatly appreciated. Environment SOLR version: 4.10.2 (We can't upgrade at moment as we use Java 7) JBOSS 6.2 EAP Thanks, Prasad
Re: How to configure solr to not bind at 8983
Hi Samy, Any particular reason to not to use the -p paratmeter to start it on another port? ./solr start -p 9983 With Regards Aman Tandon On Thu, Aug 20, 2015 at 2:02 PM, Modassar Ather modather1...@gmail.com wrote: I think you need to add the port number in solr.xml too under hostPort attribute. STOP.PORT is SOLR.PORT-1000 and set under SOLR_HOME/bin/solr file. As far as I understand this can not be changed but I am not sure. Regards, Modassar On Thu, Aug 20, 2015 at 11:39 AM, Samy Ateia samyat...@hotmail.de wrote: I changed the solr listen port in the solr.in.sh file in my solr home directory by setting the variable: SOLR_PORT=. But Solr is still trying to also listen on 8983 because it gets started with the -DSTOP.PORT=8983 variable. What is this -DSTOP.PORT variable for and where should I configure it? I ran the install_solr_service.sh script to setup solr and changed the SOLR_PORT afterwards. best regards. Samy
How to close log when use the solrj api
when i use solrj api to add category data to solr , their will have a lot of DEBUG info , how to close this ,or how to set the log ? ths -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-close-log-when-use-the-solrj-api-tp4224142.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH delta-import pk
On 8/20/2015 4:27 PM, CrazyDiamond wrote: i have a DIH delta-import query based on last_index_time.it works perfectly But sometimes i add documents to Solr manually and i want DIH not to add them again.I have UUID unique field and also i have id from database which is marked as pk in DIH schema. my question is : will DIH update existing document or add new one? p.s. id field is not marked as unique in config The pk (primary key) in DIH is only relevant in the context of DIH, and is only used by DIH for validating and coordinating database queries. It has absolutely no impact on the Solr index. If you want a newly indexed document to replace an existing document, the value in the uniqueKey field (defined in schema.xml) must be the same as that field's value in the existing document that you wish to replace. If you have a matching value, Solr will automatically replace the document for you -- the old version will be deleted before the new one is indexed. Thanks, Shawn
DIH delta-import pk
i have a DIH delta-import query based on last_index_time.it works perfectly But sometimes i add documents to Solr manually and i want DIH not to add them again.I have UUID unique field and also i have id from database which is marked as pk in DIH schema. my question is : will DIH update existing document or add new one? p.s. id field is not marked as unique in config -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-delta-import-pk-tp4224342.html Sent from the Solr - User mailing list archive at Nabble.com.
Number of requests to each shard is different with and without using of grouping
I want to understand, why number of requests in SOLD CLOUD is different with and without using of grouping feature. 1. suppose we have several shards in SOLR CLOUD ( lets say 3 shards ) 2. One of them, gets a query with rows = n 3. This shards distributes a request among others and suppose that every shard has a lot of results , much more than n . 4. Then it receives an item IDs from each shards , so the number of results in total is 3n 5. Then it sorts the results and chooses the best n results , when in my case each shard has representatives in total results . 6. Then it send a second request to each shard , with appropriate item IDs , to get a stored fields . So then in this case ,each shard will be queried twice, first one to get item IDs , and the second to get stored fields . That is what I see in my logs . ( I see 6 log entries , 2 for each shard ) *The question is , why when I am using a grouping feature, the number of request to each shard is 3 instead of 2 times ?* ( I see 8 or 9 log entries ) -- View this message in context: http://lucene.472066.n3.nabble.com/Number-of-requests-to-each-shard-is-different-with-and-without-using-of-grouping-tp4224293.html Sent from the Solr - User mailing list archive at Nabble.com.
exclude folder in dataimport handler.
I am importing files from my file system and want to exclude import of files from folder called templatedata. How do i configure that in entity. excludes=templatedata doesnt seem to work. entity name=files dataSource=null rootEntity=false processor=FileListEntityProcessor baseDir=E:\Malathy\ fileName=.*\.* excludes=templatedata pk=id onError=skip recursive=true -- View this message in context: http://lucene.472066.n3.nabble.com/exclude-folder-in-dataimport-handler-tp4224267.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: exclude folder in dataimport handler.
I took a quick look at FileListEntityProcessor#init, and it looks like it applies the excludes regex to the filename element of the path only, and not to the directories. If your filenames do not have a naming convention that would let you use it this way, you might be able to write a transformer to get what you want. James Dyer Ingram Content Group -Original Message- From: coolmals [mailto:coolm...@gmail.com] Sent: Thursday, August 20, 2015 12:57 PM To: solr-user@lucene.apache.org Subject: exclude folder in dataimport handler. I am importing files from my file system and want to exclude import of files from folder called templatedata. How do i configure that in entity. excludes=templatedata doesnt seem to work. entity name=files dataSource=null rootEntity=false processor=FileListEntityProcessor baseDir=E:\Malathy\ fileName=.*\.* excludes=templatedata pk=id onError=skip recursive=true -- View this message in context: http://lucene.472066.n3.nabble.com/exclude-folder-in-dataimport-handler-tp4224267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to close log when use the solrj api
You may want to see the logging level using the Dashboard URL http://localhost:8983/solr/#/~logging/level even can set for the session but otherwise you can look into server/resources/log4j.properties. Refer https://cwiki.apache.org/confluence/display/solr/Configuring+Logging On Thu, Aug 20, 2015 at 4:30 AM, fent wutian_...@hotmail.com wrote: when i use solrj api to add category data to solr , their will have a lot of DEBUG info , how to close this ,or how to set the log ? ths -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-close-log-when-use-the-solrj-api-tp4224142.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How to configure solr to not bind at 8983
Ahh thank you, that explains it I changed the port to 9983 not knowing that the stop port would result to the old port. so i guess i just need to change it to something else then. Subject: Re: How to configure solr to not bind at 8983 To: solr-user@lucene.apache.org From: apa...@elyograg.org Date: Thu, 20 Aug 2015 07:14:25 -0600 On 8/20/2015 2:34 AM, Samy Ateia wrote: I changed the solr listen port in the solr.in.sh file in my solr home directory by setting the variable: SOLR_PORT=. But Solr is still trying to also listen on 8983 because it gets started with the -DSTOP.PORT=8983 variable. What is this -DSTOP.PORT variable for and where should I configure it? I ran the install_solr_service.sh script to setup solr and changed the SOLR_PORT afterwards. The stop port is used by Jetty ... as a mechanism to stop jetty. It will be different than 8983 on a standard install. It defaults to 1000 less than the Solr port -- 7983 if you don't change the solr port. This is in the solr shell script: STOP_PORT=`expr $SOLR_PORT - 1000` In the same way, the embedded zookeeper port for SolrCloud examples is 1000 *more* than the Solr port: zk_port=$[$SOLR_PORT+1000] The RMI (JMX) port defaults to the Solr port plus 1, although the script doesn't set this very intelligently, I think I should probably fix this: RMI_PORT=1$SOLR_PORT Thanks, Shawn
Re: How to add second Zookeeper to same machine?
On 8/20/2015 1:49 AM, Merlin Morgenstern wrote: I am running 2 dedicated servers on which I plan to install Solrcloud with 2 solr nodes and 3 ZK. From Stackoverflow I learned that the best method for autostarting zookeeper on ubuntu 14.04 is to install it via apt-get install zookeeperd. I have that running now. How could I add a second zookeeper to one machine? The config only allows one. Or if this is not possible, what would be the recommended way to get 3 ZK on 2 dedicated running? I have followed a tutorial where I have that setup available va bash script, but it seems that the ubuntu zookeeper setup is robust as it offers zombie processes and a startup script as well. It is possible to have multiple zookeeper installs on one machine, but if you do this, your system will not be fault tolerant. A simple fact of life is that hardware can fail, and it can fail completely. If the motherboard in a server develops a fault, the entire server is probably going to fail. If the machine with two zookeepers on it dies, zookeeper quorum will be lost and SolrCloud will go read-only. It will not be possible to write to it, even though there is still a surviving machine. Redundant zookeeper requires three completely separate machines, so that if you lose any one of those machines, the cluster still has a majority present and stays completely operational. This means that SolrCloud requires three machines minimum. The third server can be a much less capable machine that runs zookeeper only, but it must be there in order to achieve true fault tolerance. Thanks, Shawn
Re: How to configure solr to not bind at 8983
On 8/20/2015 2:34 AM, Samy Ateia wrote: I changed the solr listen port in the solr.in.sh file in my solr home directory by setting the variable: SOLR_PORT=. But Solr is still trying to also listen on 8983 because it gets started with the -DSTOP.PORT=8983 variable. What is this -DSTOP.PORT variable for and where should I configure it? I ran the install_solr_service.sh script to setup solr and changed the SOLR_PORT afterwards. The stop port is used by Jetty ... as a mechanism to stop jetty. It will be different than 8983 on a standard install. It defaults to 1000 less than the Solr port -- 7983 if you don't change the solr port. This is in the solr shell script: STOP_PORT=`expr $SOLR_PORT - 1000` In the same way, the embedded zookeeper port for SolrCloud examples is 1000 *more* than the Solr port: zk_port=$[$SOLR_PORT+1000] The RMI (JMX) port defaults to the Solr port plus 1, although the script doesn't set this very intelligently, I think I should probably fix this: RMI_PORT=1$SOLR_PORT Thanks, Shawn
How to use DocumentAnalysisRequestHandler in java
Hi, I'm trying to obtain indexed tokens from a document id, in order to see what has been indexed exactly. It seems that DocumentAnalysisRequestHandler does that, but I couldn't figure out how to use it in java. The doc says I must provide a contentstream but the available init() method only takes a NamedList as a parameter. https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/handler/DocumentAnalysisRequestHandler.html Could somebody provide me with a short example of how to get index information from a document id? Thanks, Jean-Pierre.
Re: replication and HDFS
Yes. Maybe. It Depends (tm). Details matter (tm). If you're firing just a few QPS at the system, then improved throughput by adding replicas is unlikely. OTOH, if you're firing lots of simultaneous queries at Solr and are pegging the processors, then adding replication will increase aggregate QPS. If your soft commit interval is very short and you're not doing proper warming, it won't help at all in all probability. Replication in Solr is about increasing the number of instances available to serve queries. The two types of replication (HDFS or Solr) are really orthogonal, the first is about data integrity and the second is about increasing the number of Solr nodes available to service queries. Best, Erick On Thu, Aug 20, 2015 at 9:23 AM, Joseph Obernberger j...@lovehorsepower.com wrote: Hi - we currently have a multi-shard setup running solr cloud without replication running on top of HDFS. Does it make sense to use replication when using HDFS? Will we expect to see a performance increase in searches? Thank you! -Joe
Re: How to use DocumentAnalysisRequestHandler in java
If this is for a quick test, have you tried just faceting on that field with document ID set through query? Facet returns the indexed/tokenized items. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 20 August 2015 at 11:34, Jean-Pierre Lauris jplau...@gmail.com wrote: Hi, I'm trying to obtain indexed tokens from a document id, in order to see what has been indexed exactly. It seems that DocumentAnalysisRequestHandler does that, but I couldn't figure out how to use it in java. The doc says I must provide a contentstream but the available init() method only takes a NamedList as a parameter. https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/handler/DocumentAnalysisRequestHandler.html Could somebody provide me with a short example of how to get index information from a document id? Thanks, Jean-Pierre.
caches with faceting
i have used json facet api and noticed that its relying heavily on filter cache. index is optimized and all my fields are with docValues='true' and the number of documents are 2.6 million and always faceting on almost all the documents with 'fq' the size of documentCache and queryResultCache are very minimal 10 ? is it ok ? i understand that documentCache stores the documents that are fetched from disk(segment merged) and the size is set to 2000 fieldCache is always zero is it because of docValues? ver 5.2.1
Re: How to use DocumentAnalysisRequestHandler in java
On Thu, Aug 20, 2015, at 04:34 PM, Jean-Pierre Lauris wrote: Hi, I'm trying to obtain indexed tokens from a document id, in order to see what has been indexed exactly. It seems that DocumentAnalysisRequestHandler does that, but I couldn't figure out how to use it in java. The doc says I must provide a contentstream but the available init() method only takes a NamedList as a parameter. https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/handler/DocumentAnalysisRequestHandler.html Could somebody provide me with a short example of how to get index information from a document id? If you are talking about what I think you are, then that is used by the Admin UI to implement the analysis tab. You pass in a document, and it returns it analysed. As Alexandre says, faceting may well get you there if you want to query a document already in your index. Upayavira
replication and HDFS
Hi - we currently have a multi-shard setup running solr cloud without replication running on top of HDFS. Does it make sense to use replication when using HDFS? Will we expect to see a performance increase in searches? Thank you! -Joe