Re: How to find the ordinal for a numeric doc value

2015-08-20 Thread Mikhail Khludnev
Hello,
Giving the code
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/schema/TrieField.java#L727
it creates NumericDocValuesField only.
try to define field as multivalued, giving the code it creates
SortedSetDocValuesField.

On Wed, Aug 19, 2015 at 11:13 PM, tedsolr tsm...@sciquest.com wrote:

 One error (others perhaps?) in my statement ... the code

 searcher.getLeafReader().getSortedDocValues(field)

 just returns null for numeric and date fields. That is why they appear to
 be
 ignored, not that the ordinals are all absent or equivalent. But my
 question
 is still valid I think!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-find-the-ordinal-for-a-numeric-doc-value-tp4224018p4224037.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: How to find the ordinal for a numeric doc value

2015-08-20 Thread tedsolr
I see. The UninvertingReader even throws an IllegalStateException if you try
read a numeric field as a sorted doc values. I may have to index extra
fields to support my document collapsing scheme. Thanks for responding.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-find-the-ordinal-for-a-numeric-doc-value-tp4224018p4224255.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to add second Zookeeper to same machine?

2015-08-20 Thread Modassar Ather
You might want to look into the following documentation. These documents
have explanation on how to setup Zookeeper ensemble and Zookeeper
administration.

https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html

Regards,
Modassar

On Thu, Aug 20, 2015 at 1:19 PM, Merlin Morgenstern 
merlin.morgenst...@gmail.com wrote:

 I am running 2 dedicated servers on which I plan to install Solrcloud with
 2 solr nodes and 3 ZK.

 From Stackoverflow I learned that the best method for autostarting
 zookeeper on ubuntu 14.04 is to install it via apt-get install
 zookeeperd. I have that running now.

 How could I add a second zookeeper to one machine? The config only allows
 one. Or if this is not possible, what would be the recommended way to get 3
 ZK on 2 dedicated running?

 I have followed a tutorial where I have that setup available va bash
 script, but it seems that the ubuntu zookeeper setup is robust as it offers
 zombie processes and a startup script as well.

 Thank you for any help on this.



How to configure solr to not bind at 8983‏

2015-08-20 Thread Samy Ateia
I changed the solr listen port in the solr.in.sh file in my solr home directory 
by setting the variable: SOLR_PORT=.
But Solr is still trying to also listen on 8983 because it gets started with 
the -DSTOP.PORT=8983 variable.

What is this -DSTOP.PORT variable for and where should I configure it?

I ran the install_solr_service.sh script to setup solr and changed the 
SOLR_PORT afterwards.

best regards. 

Samy
  

How to add second Zookeeper to same machine?

2015-08-20 Thread Merlin Morgenstern
I am running 2 dedicated servers on which I plan to install Solrcloud with
2 solr nodes and 3 ZK.

From Stackoverflow I learned that the best method for autostarting
zookeeper on ubuntu 14.04 is to install it via apt-get install
zookeeperd. I have that running now.

How could I add a second zookeeper to one machine? The config only allows
one. Or if this is not possible, what would be the recommended way to get 3
ZK on 2 dedicated running?

I have followed a tutorial where I have that setup available va bash
script, but it seems that the ubuntu zookeeper setup is robust as it offers
zombie processes and a startup script as well.

Thank you for any help on this.


Re: How to configure solr to not bind at 8983

2015-08-20 Thread Modassar Ather
I think you need to add the port number in solr.xml too under hostPort
attribute.

STOP.PORT is SOLR.PORT-1000 and set under SOLR_HOME/bin/solr file.
As far as I understand this can not be changed but I am not sure.

Regards,
Modassar

On Thu, Aug 20, 2015 at 11:39 AM, Samy Ateia samyat...@hotmail.de wrote:

 I changed the solr listen port in the solr.in.sh file in my solr home
 directory by setting the variable: SOLR_PORT=.
 But Solr is still trying to also listen on 8983 because it gets started
 with the -DSTOP.PORT=8983 variable.

 What is this -DSTOP.PORT variable for and where should I configure it?

 I ran the install_solr_service.sh script to setup solr and changed the
 SOLR_PORT afterwards.

 best regards.

 Samy



How to configure solr to not bind at 8983

2015-08-20 Thread Samy Ateia
I changed the solr listen port in the solr.in.sh file in my solr home directory 
by setting the variable: SOLR_PORT=.
But Solr is still trying to also listen on 8983 because it gets started with 
the -DSTOP.PORT=8983 variable.

What is this -DSTOP.PORT variable for and where should I configure it?

I ran the install_solr_service.sh script to setup solr and changed the 
SOLR_PORT afterwards.

best regards. 

Samy
  

RE: Performance issue with FILTER QUERY

2015-08-20 Thread Maulin Rathod
Thanks Erick. Even 1 second commit interval is fine for us. But in that case 
also filter cache will be flushed after 1 sec. The end user will still feel 
slowness due to this as the query will take around 1 sec if we use filter query.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 20 August 2015 00:44
To: solr-user@lucene.apache.org
Subject: Re: Performance issue with FILTER QUERY

If you're committing that rapidly then you're correct, filter caching may not 
be a good fit. The entire _point_ of filter caching is to increase performance 
of subsequent executions of the exact same fq clause. But if you're throwing 
them away every second there's little/no benefit.

You really have two choices here
1 lengthen out the commit interval. Frankly, 1 second commit
intervals are rarely necessary despite what
 your product manager says. Really, check this requirement out.
2 disable caches.

Autowarming is potentially useful here, but if your filter queries are taking 
on the order of a second and you're committing every second then autowarming 
takes too long to help.

Best,
Erick

On Wed, Aug 19, 2015 at 12:26 AM, Mikhail Khludnev mkhlud...@griddynamics.com 
wrote:
 Maulin,
 Did you check performance with segmented filters which I advised recently?

 On Wed, Aug 19, 2015 at 10:24 AM, Maulin Rathod mrat...@asite.com wrote:

 As per my understanding caches are flushed every time when add new 
 document to collection (we do soft commit at every 1 sec to make 
 newly added document available for search). Due to which it is not 
 effectively uses cache and hence it slow every time in our case.

 -Original Message-
 From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
 Sent: 19 August 2015 12:16
 To: solr-user@lucene.apache.org
 Subject: Re: Performance issue with FILTER QUERY

 On Wed, 2015-08-19 at 05:55 +, Maulin Rathod wrote:
  SLOW WITH FILTER QUERY (takes more than 1 second) 
  
 
  q=+recipient_id:(4042) AND project_id:(332) AND 
  resource_id:(13332247
  13332245 13332243 13332241 13332239) AND entity_type:(2) AND
  -action_id:(20 32) == This returns 5 records
  fq=+action_status:(0) AND is_active:(true) == This Filter Query 
  returns 9432252 records

 The fq is evaluated independently of the q: For the fq a bitset is 
 allocated, filled and stored in cache. Then the q is evaluated and 
 the two bitsets are merged.

 Next time you use the same fq, it should be cached (if you have 
 caching
 enabled) and be a lot faster.


 Also, if you ran your two tests right after each other, the second 
 one benefits from disk caching. If you had executed them in reverse 
 order, the
 q+fq might have been the fastest one.

 - Toke Eskildsen, State and University Library, Denmark





 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com


Solr: How to index range-pair fields?

2015-08-20 Thread vaedama
My scenario is something like this:

I have a students database. I want to query all the students who were either
`absent` or `present` during a particular `date-range`.

For example:

Student X was `absent` between dates:

 Jan 1, 2015 and Jan 15, 2015
 Feb 13, 2015 and Feb 16, 2015
 March 19, 2015 and March 25, 2015

Also X was `present` between dates:

 Jan 25, 2015 and Jan 30, 2015
 Feb 1, 2015 and Feb 12, 2015

(Other days were either school holidays or the teacher was either
lazy/forgot to take the attendance ;)

If the date range was only a single-valued field then this approach would
work:
http://stackoverflow.com/questions/25246204/solr-query-for-documents-whose-from-to-date-range-contains-the-user-input.
I have multiple-date ranges for each student, so this would not work for my
use-case.

Lucent 5.0 has support for `DateRangeField`
(http://lucene.apache.org/solr/5_0_0/solr-core/index.html?org/apache/solr/schema/DateRangeField.html
) which is perfect for my use-case, but I cannot upgrade to 5.0 yet! I am on
Lucene 4.1.0. David Smiley had mentioned that it would be ported to 4.x but
I guess it never happened (https://issues.apache.org/jira/browse/SOLR-6103,
I can try porting this patch my-self but I would like to know what it takes
and opinions)

So basically, I need to maintain relationship between the start and end
dates for each of the `state`s (absence or presence). So I thought I would
need to index the fields as pairs as mentioned here:
http://grokbase.com/t/lucene/solr-user/128r96vwz6/how-do-i-represent-a-group-of-customer-key-value-pairs

I guess my schema would look like:

fieldType name=tdate class=solr.TrieDateField omitNorms=true
precisionStep=6 positionIncrementGap=0/

field name=state type=string indexed=true stored=true
multiValued=true/
dynamicField name=presenceStartTime_* type=tdate indexed=true
stored=true/
dynamicField name=presenceEndTime_* type=tdate indexed=true
stored=true/
dynamicField name=absenceStartTime_* type=tdate indexed=true
stored=true/
dynamicField name=absenceEndTime_* type=tdate indexed=true
stored=true/

**Question #1:** Does this look correct ? 

**Question #2:** What are the ramifications if I use `tlong` instead of
`tdate` ? My `tlong` type looks like this:

fieldType name=tlong class=solr.TrieLongField precisionStep=8
omitNorms=true positionIncrementGap=0/

**Question #3:** So in this case, for the query: get all the students who
were absent between a date range would the query would look something
similar to this ?

(state: absent) AND 
(absenceStartTime1: givenLowerBoundDate) AND
(absenceStartTime2: givenLowerBoundDate) AND
(absenceStartTime3: givenLowerBoundDate) AND
(absenceEndTime1: givenUpperBoundDate) AND
(absenceEndTime2: givenUpperBoundDate) AND
(absenceEndTime3: givenUpperBoundDate)


This would work only if I knew that there were 3 dates in which the student
was absent before hand and there's no way to query all dynamic fields with
wild-cards according to
http://stackoverflow.com/questions/6213184/solr-search-query-for-dynamic-fields-indexed

**Question #4:** The workaround mentioned in one of the answers in that
question did not look terrible but seemed a bit complicated. Is there a
better alternative for solving this problem in Solr ?

Of course, I would be highly interested in any other better approaches.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369.html
Sent from the Solr - User mailing list archive at Nabble.com.


Remove duplicate suggestions in Solr

2015-08-20 Thread Zheng Lin Edwin Yeo
Hi,

I would like to check, is there anyway to remove duplicate suggestions in
Solr?
I have several documents that looks very similar, and when I do a
suggestion query, it came back with all same results. I'm using Solr 5.2.1

This is my suggestion pipeline:

requestHandler name=/suggest class=solr.SearchHandler
lst name=defaults
!-- Browse specific stuff --
str name=echoParamsall/str
  str name=wtjson/str
  str name=indenttrue/str

!-- Everything below should be identical to ac handler above --
str name=defTypeedismax/str
str name=rows10/str
str name=flid, score/str
!--str name=qftextsuggest^30 extrasearch^30.0 textng^50.0
phonetic^10/str--
!--str name=qfcontent^50 title^50 extrasearch^30.0 textng^1.0
textng2^200.0/str--
str name=qfcontent^50 title^50 extrasearch^30.0/str
str name=pftextnge^50.0/str
!--str name=bfproduct(log(sum(popularity,1)),100)^20/str--
!-- Define relative importance between types. May be overridden per
request by e.g. personboost=120 --
str
name=boostproduct(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost)/str
double name=typeboost1.0/double

str name=type1querycontent_type:application/pdf/str
double name=type1boost0.9/double
str name=type2querycontent_type:application/msword/str
double name=type2boost0.5/double
str name=type3querycontent_type:NA/str
double name=type3boost0.0/double
str name=type4querycontent_type:NA/str
double name=type4boost0.0/double
  str name=hlon/str
  str name=hl.flid, textng, textng2, language_s/str
  str name=hl.highlightMultiTermtrue/str
  str name=hl.preserveMultitrue/str
  str name=hl.encoderhtml/str
  !--str name=f.content.hl.fragsize80/str--
  str name=hl.fragsize50/str
str name=debugQueryfalse/str
/lst
/requestHandler

This is my query:
http://localhost:8983/edm/chinese2/suggest?q=do our
bestdefType=edismaxqf=content^5 textng^5pf=textnge^50pf2=content^20
textnge^50pf3=content^40%20textnge^50ps2=2ps3=2stats.calcdistinct=true


This is the suggestion result:

 highlighting:{
responsibility001:{
  id:[responsibility001],
  textng:[We will strive to emdo/em emour/em embest/em.
 lt;brgt; ],
responsibility002:{
  id:[responsibility002],
  textng:[We will strive to emdo/em emour/em embest/em.
 lt;brgt; ],
responsibility003:{
  id:[responsibility003],
  textng:[We will strive to emdo/em emour/em embest/em.
 lt;brgt; ],
responsibility004:{
  id:[responsibility004],
  textng:[We will strive to emdo/em emour/em embest/em.
 lt;brgt; ],
responsibility005:{
  id:[responsibility005],
  textng:[We will strive to emdo/em emour/em embest/em.
 lt;brgt; ],
responsibility006:{
  id:[responsibility006],
  textng:[We will strive to emdo/em emour/em embest/em.
 lt;brgt; ],
responsibility007:{
  id:[responsibility007],
  textng:[We will strive to emdo/em emour/em embest/em.
 lt;brgt; ],
responsibility008:{
  id:[responsibility008],
  textng:[We will strive to emdo/em emour/em embest/em.
 lt;brgt; ],
responsibility009:{
  id:[responsibility009],
  textng:[We will strive to emdo/em emour/em embest/em.
 lt;brgt; ],
responsibility010:{
  id:[responsibility010],
  textng:[We will strive to emdo/em emour/em embest/em.
 lt;brgt; ],


Regards,
Edwin


Bug in query elevation transformers SOLR-7953

2015-08-20 Thread Ryan Josal
Hey guys, I just logged this bug and I wanted to raise awareness.  If you
use the QueryElevationComponent, and ask for fl=[elevated], you'll get only
false if solr is using LazyDocuments.  This looks even stranger when you
request exclusive=true and you only get back elevated documents, and they
all say false.  I'm not sure how often LazyDocuments are used, but it's
probably not an uncommon issue.

Ryan


Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1

2015-08-20 Thread Shawn Heisey
On 7/8/2015 6:13 PM, Yonik Seeley wrote:
 On Wed, Jul 8, 2015 at 6:50 PM, Shawn Heisey apa...@elyograg.org wrote:
 After the fix (with luceneMatchVersion at 4.9), both aaa and bbb end
 up at position 2.
 Yikes, that's definitely wrong.

I have filed LUCENE-6889 for this problem.  I'd like to write a unit
test that demonstrates the problem, but Lucene internals are a mystery
to me.  I have a concise and repeatable manual test (using Solr)
outlined in this comment:

https://issues.apache.org/jira/browse/LUCENE-6689?focusedCommentId=14705543page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14705543

Is there an existing Lucene test class that I could use as a basis for a
test?  I will look into tests for analysis components and try to build
it on my own, but any help is appreciated.

Thanks,
Shawn



SOLR to SOLR communication with custom authentication

2015-08-20 Thread Prasad Bodapati
Hi All,

We have cluster environment on JBOSS, All of our deployed applications are 
protected by OpenAM including SOLR. On Slave nodes we enabled SOLR to 
communicate with master nodes to get data.
Since the SOLR on master is protected with OpenAM slave can't talk to it. In 
Solr.xml there is a way to configure replication requests to use basic HTTP 
authentication but not to use custom authentication.
I have tried to override ReplicationHandler and SnapPuller classes to use 
provide custom authentication but I couldn't.

I have tried to follow instructions at 
https://wiki.apache.org/solr/SolrSecurity but I could not find the classes 
org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.SubRequestFactory  
and 
org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.SubRequestFactory.

Have anyone of you used custom authentication before for replocation ? Any help 
would be greatly appreciated.

Environment
SOLR version: 4.10.2 (We can't upgrade at moment as we use Java 7)
JBOSS 6.2 EAP

Thanks,
Prasad





Re: How to configure solr to not bind at 8983

2015-08-20 Thread Aman Tandon
Hi Samy,

Any particular reason to not to use the -p paratmeter to start it on
another port?
./solr start -p 9983

With Regards
Aman Tandon

On Thu, Aug 20, 2015 at 2:02 PM, Modassar Ather modather1...@gmail.com
wrote:

 I think you need to add the port number in solr.xml too under hostPort
 attribute.

 STOP.PORT is SOLR.PORT-1000 and set under SOLR_HOME/bin/solr file.
 As far as I understand this can not be changed but I am not sure.

 Regards,
 Modassar

 On Thu, Aug 20, 2015 at 11:39 AM, Samy Ateia samyat...@hotmail.de wrote:

  I changed the solr listen port in the solr.in.sh file in my solr home
  directory by setting the variable: SOLR_PORT=.
  But Solr is still trying to also listen on 8983 because it gets started
  with the -DSTOP.PORT=8983 variable.
 
  What is this -DSTOP.PORT variable for and where should I configure it?
 
  I ran the install_solr_service.sh script to setup solr and changed the
  SOLR_PORT afterwards.
 
  best regards.
 
  Samy
 



How to close log when use the solrj api

2015-08-20 Thread fent
when  i use solrj api to add category  data to solr ,
their will have a lot of DEBUG info ,
how to close this ,or how to set the log ?
ths



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-close-log-when-use-the-solrj-api-tp4224142.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH delta-import pk

2015-08-20 Thread Shawn Heisey
On 8/20/2015 4:27 PM, CrazyDiamond wrote:
 i have a DIH delta-import  query based on last_index_time.it works perfectly
 But sometimes i add documents to Solr manually and  i want DIH not to add
 them again.I have UUID unique field and also i have id from database which
 is marked as pk in DIH schema. my question is : will DIH update existing
 document or add new one?
  p.s. id field is not marked as unique in config

The pk (primary key) in DIH is only relevant in the context of DIH, and
is only used by DIH for validating and coordinating database queries. 
It has absolutely no impact on the Solr index.

If you want a newly indexed document to replace an existing document,
the value in the uniqueKey field (defined in schema.xml) must be the
same as that field's value in the existing document that you wish to
replace.  If you have a matching value, Solr will automatically replace
the document for you -- the old version will be deleted before the new
one is indexed.

Thanks,
Shawn



DIH delta-import pk

2015-08-20 Thread CrazyDiamond
i have a DIH delta-import  query based on last_index_time.it works perfectly
But sometimes i add documents to Solr manually and  i want DIH not to add
them again.I have UUID unique field and also i have id from database which
is marked as pk in DIH schema. my question is : will DIH update existing
document or add new one?
 p.s. id field is not marked as unique in config



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-delta-import-pk-tp4224342.html
Sent from the Solr - User mailing list archive at Nabble.com.


Number of requests to each shard is different with and without using of grouping

2015-08-20 Thread SolrUser1543
I want to understand, why number of requests in SOLD CLOUD is different with
and without using of grouping feature.


1. suppose we have several shards in SOLR CLOUD ( lets say 3 shards )  
2. One of them, gets a query with rows = n 
3. This shards distributes a request among others and suppose that every
shard has a lot of results , much more than n . 
4. Then it receives an item IDs from each shards , so the number of results
in total is 3n 
5. Then it sorts the results and chooses the  best n results , when in my
case each shard  has representatives in total results . 
6. Then it send a second request to each shard , with appropriate item IDs ,
to get a stored fields .

So then in this case ,each shard will be queried twice, first one to get
item IDs , and the second to get stored fields . 

That is what I see in my logs . ( I see 6 log entries , 2 for each shard ) 

*The question is , why when I am using a grouping feature, the number of
request to each shard is 3 instead of 2 times ?*  ( I see 8 or 9 log entries
) 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Number-of-requests-to-each-shard-is-different-with-and-without-using-of-grouping-tp4224293.html
Sent from the Solr - User mailing list archive at Nabble.com.


exclude folder in dataimport handler.

2015-08-20 Thread coolmals
I am importing files from my file system and want to exclude import of files
from folder called templatedata. How do i configure that in entity. 
excludes=templatedata doesnt seem to work.

 entity name=files dataSource=null rootEntity=false
processor=FileListEntityProcessor
baseDir=E:\Malathy\ fileName=.*\.* excludes=templatedata
pk=id 
onError=skip
recursive=true




--
View this message in context: 
http://lucene.472066.n3.nabble.com/exclude-folder-in-dataimport-handler-tp4224267.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: exclude folder in dataimport handler.

2015-08-20 Thread Dyer, James
I took a quick look at FileListEntityProcessor#init, and it looks like it 
applies the excludes regex to the filename element of the path only, and not 
to the directories.

If your filenames do not have a naming convention that would let you use it 
this way, you might be able to write a transformer to get what you want.

James Dyer
Ingram Content Group


-Original Message-
From: coolmals [mailto:coolm...@gmail.com] 
Sent: Thursday, August 20, 2015 12:57 PM
To: solr-user@lucene.apache.org
Subject: exclude folder in dataimport handler.

I am importing files from my file system and want to exclude import of files
from folder called templatedata. How do i configure that in entity. 
excludes=templatedata doesnt seem to work.

 entity name=files dataSource=null rootEntity=false
processor=FileListEntityProcessor
baseDir=E:\Malathy\ fileName=.*\.* excludes=templatedata
pk=id 
onError=skip
recursive=true




--
View this message in context: 
http://lucene.472066.n3.nabble.com/exclude-folder-in-dataimport-handler-tp4224267.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to close log when use the solrj api

2015-08-20 Thread Susheel Kumar
You may want to see the logging level using the Dashboard URL
http://localhost:8983/solr/#/~logging/level  even can set for the session
but otherwise you can look into server/resources/log4j.properties. Refer
https://cwiki.apache.org/confluence/display/solr/Configuring+Logging

On Thu, Aug 20, 2015 at 4:30 AM, fent wutian_...@hotmail.com wrote:

 when  i use solrj api to add category  data to solr ,
 their will have a lot of DEBUG info ,
 how to close this ,or how to set the log ?
 ths



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-close-log-when-use-the-solrj-api-tp4224142.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: How to configure solr to not bind at 8983‏

2015-08-20 Thread Samy Ateia
Ahh thank you,
that explains it I changed the port to 9983 not knowing that the stop port 
would result to the old port.
so i guess i just need to change it to something else then.

 Subject: Re: How to configure solr to not bind at 8983‏
 To: solr-user@lucene.apache.org
 From: apa...@elyograg.org
 Date: Thu, 20 Aug 2015 07:14:25 -0600
 
 On 8/20/2015 2:34 AM, Samy Ateia wrote:
  I changed the solr listen port in the solr.in.sh file in my solr home 
  directory by setting the variable: SOLR_PORT=.
  But Solr is still trying to also listen on 8983 because it gets started 
  with the -DSTOP.PORT=8983 variable.
  
  What is this -DSTOP.PORT variable for and where should I configure it?
  
  I ran the install_solr_service.sh script to setup solr and changed the 
  SOLR_PORT afterwards.
 
 The stop port is used by Jetty ... as a mechanism to stop jetty.  It
 will be different than 8983 on a standard install.  It defaults to 1000
 less than the Solr port -- 7983 if you don't change the solr port.
 
 This is in the solr shell script:
 
   STOP_PORT=`expr $SOLR_PORT - 1000`
 
 In the same way, the embedded zookeeper port for SolrCloud examples is
 1000 *more* than the Solr port:
 
 zk_port=$[$SOLR_PORT+1000]
 
 The RMI (JMX) port defaults to the Solr port plus 1, although the
 script doesn't set this very intelligently, I think I should probably
 fix this:
 
 RMI_PORT=1$SOLR_PORT
 
 Thanks,
 Shawn
 
  

Re: How to add second Zookeeper to same machine?

2015-08-20 Thread Shawn Heisey
On 8/20/2015 1:49 AM, Merlin Morgenstern wrote:
 I am running 2 dedicated servers on which I plan to install Solrcloud with
 2 solr nodes and 3 ZK.
 
 From Stackoverflow I learned that the best method for autostarting
 zookeeper on ubuntu 14.04 is to install it via apt-get install
 zookeeperd. I have that running now.
 
 How could I add a second zookeeper to one machine? The config only allows
 one. Or if this is not possible, what would be the recommended way to get 3
 ZK on 2 dedicated running?
 
 I have followed a tutorial where I have that setup available va bash
 script, but it seems that the ubuntu zookeeper setup is robust as it offers
 zombie processes and a startup script as well.

It is possible to have multiple zookeeper installs on one machine, but
if you do this, your system will not be fault tolerant.

A simple fact of life is that hardware can fail, and it can fail
completely.  If the motherboard in a server develops a fault, the entire
server is probably going to fail.  If the machine with two zookeepers on
it dies, zookeeper quorum will be lost and SolrCloud will go read-only.
 It will not be possible to write to it, even though there is still a
surviving machine.

Redundant zookeeper requires three completely separate machines, so that
if you lose any one of those machines, the cluster still has a majority
present and stays completely operational.  This means that SolrCloud
requires three machines minimum.  The third server can be a much less
capable machine that runs zookeeper only, but it must be there in order
to achieve true fault tolerance.

Thanks,
Shawn



Re: How to configure solr to not bind at 8983‏

2015-08-20 Thread Shawn Heisey
On 8/20/2015 2:34 AM, Samy Ateia wrote:
 I changed the solr listen port in the solr.in.sh file in my solr home 
 directory by setting the variable: SOLR_PORT=.
 But Solr is still trying to also listen on 8983 because it gets started with 
 the -DSTOP.PORT=8983 variable.
 
 What is this -DSTOP.PORT variable for and where should I configure it?
 
 I ran the install_solr_service.sh script to setup solr and changed the 
 SOLR_PORT afterwards.

The stop port is used by Jetty ... as a mechanism to stop jetty.  It
will be different than 8983 on a standard install.  It defaults to 1000
less than the Solr port -- 7983 if you don't change the solr port.

This is in the solr shell script:

  STOP_PORT=`expr $SOLR_PORT - 1000`

In the same way, the embedded zookeeper port for SolrCloud examples is
1000 *more* than the Solr port:

zk_port=$[$SOLR_PORT+1000]

The RMI (JMX) port defaults to the Solr port plus 1, although the
script doesn't set this very intelligently, I think I should probably
fix this:

RMI_PORT=1$SOLR_PORT

Thanks,
Shawn



How to use DocumentAnalysisRequestHandler in java

2015-08-20 Thread Jean-Pierre Lauris
Hi,
I'm trying to obtain indexed tokens from a document id, in order to see
what has been indexed exactly.
It seems that DocumentAnalysisRequestHandler does that, but I couldn't
figure out how to use it in java.

The doc says I must provide a contentstream but the available init() method
only takes a NamedList as a parameter.
https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/handler/DocumentAnalysisRequestHandler.html

Could somebody provide me with a short example of how to get index
information from a document id?

Thanks,
Jean-Pierre.


Re: replication and HDFS

2015-08-20 Thread Erick Erickson
Yes. Maybe. It Depends (tm).

Details matter (tm).

If you're firing just a few QPS at the system, then improved
throughput by adding replicas is unlikely. OTOH, if you're firing lots
of simultaneous queries at Solr and are pegging the processors, then
adding replication will increase aggregate QPS.

If your soft commit interval is very short and you're not doing proper
warming, it won't help at all in all probability.

Replication in Solr is about increasing the number of instances
available to serve queries. The two types of replication (HDFS or
Solr) are really orthogonal, the first is about data integrity and the
second is about increasing the number of Solr nodes available to
service queries.

Best,
Erick

On Thu, Aug 20, 2015 at 9:23 AM, Joseph Obernberger
j...@lovehorsepower.com wrote:
 Hi - we currently have a multi-shard setup running solr cloud without
 replication running on top of HDFS.  Does it make sense to use replication
 when using HDFS?  Will we expect to see a performance increase in searches?
 Thank you!

 -Joe


Re: How to use DocumentAnalysisRequestHandler in java

2015-08-20 Thread Alexandre Rafalovitch
If this is for a quick test, have you tried just faceting on that
field with document ID set through query? Facet returns the
indexed/tokenized items.

Regards,
Alex.



Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 20 August 2015 at 11:34, Jean-Pierre Lauris jplau...@gmail.com wrote:
 Hi,
 I'm trying to obtain indexed tokens from a document id, in order to see
 what has been indexed exactly.
 It seems that DocumentAnalysisRequestHandler does that, but I couldn't
 figure out how to use it in java.

 The doc says I must provide a contentstream but the available init() method
 only takes a NamedList as a parameter.
 https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/handler/DocumentAnalysisRequestHandler.html

 Could somebody provide me with a short example of how to get index
 information from a document id?

 Thanks,
 Jean-Pierre.


caches with faceting

2015-08-20 Thread Kiran Sai Veerubhotla
i have used json facet api and noticed that its relying heavily on filter
cache.

index is optimized and all my fields are with docValues='true'  and the
number of documents are 2.6 million and always faceting on almost all the
documents with 'fq'

the size of documentCache and queryResultCache are very minimal  10 ? is
it ok ? i understand that documentCache stores the documents that are
fetched from disk(segment merged) and the size is set to 2000

fieldCache is always zero is it because of docValues?

ver 5.2.1


Re: How to use DocumentAnalysisRequestHandler in java

2015-08-20 Thread Upayavira


On Thu, Aug 20, 2015, at 04:34 PM, Jean-Pierre Lauris wrote:
 Hi,
 I'm trying to obtain indexed tokens from a document id, in order to see
 what has been indexed exactly.
 It seems that DocumentAnalysisRequestHandler does that, but I couldn't
 figure out how to use it in java.
 
 The doc says I must provide a contentstream but the available init()
 method
 only takes a NamedList as a parameter.
 https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/handler/DocumentAnalysisRequestHandler.html
 
 Could somebody provide me with a short example of how to get index
 information from a document id?

If you are talking about what I think you are, then that is used by the
Admin UI to implement the analysis tab. You pass in a document, and it
returns it analysed.

As Alexandre says, faceting may well get you there if you want to query
a document already in your index.

Upayavira


replication and HDFS

2015-08-20 Thread Joseph Obernberger
Hi - we currently have a multi-shard setup running solr cloud without 
replication running on top of HDFS.  Does it make sense to use 
replication when using HDFS?  Will we expect to see a performance 
increase in searches?

Thank you!

-Joe