wildcard matches in EnumField - what do I need to change in code to enable wildcard matches?

2014-05-29 Thread Elran Dvir
Hi all,

In my index, I have an EnumField called severity. This is its configuration in 
enumsConfig.xml:

enum name=severity 
valueNot Available/value 
valueLow/value
 valueMedium/value
 valueHigh/value
 valueCritical/value
 /enum

My index contains documents with these values.
When I search for severity:High, I get results. But when I search for 
severity:H* , I get no results.
What do  I need to change in Solr code to enable wildcard matches in EnumField  
(or any other field)?

Thanks.


Re: Percolator feature

2014-05-29 Thread Alan Woodward
Hi,

There's https://github.com/flaxsearch/luwak, which isn't integrated into Solr 
yet, but could be added as a SearchComponent with a bit of work.  It's running 
off a lucene fork at the moment, but I cut a 4.8 branch at Berlin Buzzwords 
which I will push to github later today.

Alan Woodward
www.flax.co.uk


On 28 May 2014, at 21:44, Jorge Luis Betancourt Gonzalez wrote:

 Is there some work around in Solr ecosystem to get something similar to the 
 percolator feature offered by elastic search? 
 
 Greetings!VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 
 de julio de 2014. Ver www.uci.cu



Solr GeoHash Field (Solr 4.5)

2014-05-29 Thread Chris Atkinson
Hi,

I've been reading up a lot on what David has written about GeoHash fields
and would like to use them.

I'm trying to create a nice way to display cluster counts of geo points on
a google map. It's naturally not going to be possible to send 40k marker
information over the wire to cluster... so figured GeoHash would be
perfect.

I'm running Solr 4.5. I've seen this.. https://github.com/dsmiley/SOLR-2155
Would this be what I use? It looks like it's really old, and I noticed that
there is now a solr.GeoHash core field...

However, if I check the documentation at this page
https://wiki.apache.org/solr/SpatialSearchDev

Solr includes a the field type solr.GeoHashField but it unfortunately
 doesn't realize any of the intrinsic properties of the geohash to its
 advantage. *You shouldn't use it.* Instead, check out
 http://wiki.apache.org/solr/SpatialSearch#SOLR-2155. The main feature is
 multi-valued field support.

 Does this mean that there isn't any way to use GeoHash with my version of
Solr?

Should I just implement a multi value field andadd all of the multi value
fields myself?

(Also, can you confirm that for doing clustering, I'm on the right track
for using GeoHash. I don't need anything perfect. I just want to be able to
break up the markers into groups).

Thanks


search using Ngram.

2014-05-29 Thread Gurfan
Hi All,

We are using EdgeNGramFilterFactory for searching with minGramSize=3, as
per Business logic, auto fill suggestions should appear on entering 3
characters in search filter. While searching for contact with name Bill
Moor,  the  value will does not get listed when we type 'Bill M' but when
we type 'Bill Moo' or 'Bill' it suggests 'Bill Moor'.

Clearly, The tokens are not generated when there is space in between, we
cannot set set minGramSize=1 as that will generate many tokens and slow
the performance. Do we have a solution without using Ngram to generate
tokens on entering 3 characters?


Please suggest.

Thanks,
--Gurfan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-using-Ngram-tp4138596.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud distributed indexing

2014-05-29 Thread Priti Solanki
Hi,

How to achieve distributed indexing in solr cloud.I have external Zookeeper
with two separate machines acting as leader.

In researching further I found

As of now we are specifying the port id in our update call and if the
leader is down zookeeper do not forward the request to other leader for
indexing instead the call fails. As I understand it is because of the port
I have specified but then how to achieve this requirement.

I have tried following

http://localhost:/solr/collection1/update?update.processor=distribself=localhost:/solrshards=localhost:8983/solr,localhost:7574/solr,localhost:/solr


but this is not working, Can some one outline the steps or redirect me to
proper notes where I can go though the steps.


Re: SolrCloud distributed indexing

2014-05-29 Thread Shalin Shekhar Mangar
If you are using Java to index/query, then use CloudSolrServer which
accepts the ZooKeeper connection string as a constructor parameter and it
will take care of routing requests and failover.


On Thu, May 29, 2014 at 2:41 PM, Priti Solanki pritiatw...@gmail.comwrote:

 Hi,

 How to achieve distributed indexing in solr cloud.I have external Zookeeper
 with two separate machines acting as leader.

 In researching further I found

 As of now we are specifying the port id in our update call and if the
 leader is down zookeeper do not forward the request to other leader for
 indexing instead the call fails. As I understand it is because of the port
 I have specified but then how to achieve this requirement.

 I have tried following


 http://localhost:/solr/collection1/update?update.processor=distribself=localhost:/solrshards=localhost:8983/solr,localhost:7574/solr,localhost:/solr


 but this is not working, Can some one outline the steps or redirect me to
 proper notes where I can go though the steps.




-- 
Regards,
Shalin Shekhar Mangar.


RE: Error enquiry- exceeded limit of maxWarmingSearchers=2

2014-05-29 Thread M, Arjun (NSN - IN/Bangalore)
Hi,

Thanks for your valuable inputs... Find below my code and config in 
solrconfig.xml. Index update is successful but I am not able to see any data 
from solr admin console. What could be the issue? Any help here is highly 
appreciated.

I can see the data in the solr admin gui after tomcat restart(solr is 
running in tomcat in my case)

private void addToSolr(ListSolrInputDocument c) throws SolrServerException, 
IOException {
if (!c.isEmpty()) {
try {


solr.add(c);
logger.info(Commit size after Add= + c.size());

} finally {
//renew lock
}
}
}

autoCommit config in solrconfig.xml
=

autoCommit 
   maxTime${solr.autoCommit.maxTime:15000}/maxTime
maxDocs1/maxDocs
   openSearcherfalse/openSearcher 
 /autoCommit

autoSoftCommit 
   maxTime${solr.autoSoftCommit.maxTime:-1}/maxTime 
 /autoSoftCommit

Few more questions.. 

2) If I use solrServer.add(Doc list,commitwithin), should I do 
solrServer.commit() also



Thanks  Regards,
Arjun M


-Original Message-
From: ext Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, May 28, 2014 6:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Error enquiry- exceeded limit of maxWarmingSearchers=2

On 5/28/2014 3:45 AM, M, Arjun (NSN - IN/Bangalore) wrote:
   Also is there a way to check if autowarming completed (or) how to make 
 the next commit wait till previous commit finishes?

With Solr, probably not.  There might be a statistic available from an
admin handler that I don't know about, but as far as I know, your code
must be aware of approximately how long a commit is likely to take, and
not send another commit until you can be sure that the previous commit
is done.  This includes the commitWithin parameter on an update request.

Now that I've just said that, you *can* do an all documents query with
rows=0 and look for a change in numFound.  An update might actually
result in no change to numFound, so you would need to build in a
time-based exit to the loop that looks for numFound changes.

In the case of commits done automatically by the configuration
(autoCommit and/or autoSoftCommit), there is definitely no way to detect
when a previous commit is done.

The general recommendation with Solr 4.x is to have autoCommit enabled
with openSearcher=false, with a relatively short maxTime -- from 5
minutes down to 15 seconds, depending on indexing rate.  These commits
will not open a new searcher, and they will not make new documents visible.

For commits that affect which documents are visible, you need to
determine how long you can possibly stand to go without seeing new data
that has been indexed.  Once you know that time interval, you can use it
to do a manual commit, or you can set up autoSoftCommit with that
interval.  It is not at all unusual to have an autoCommit time interval
that's shorter than autoSoftCommit.

This blog post mentions SolrCloud, but it is also applicable to Solr 4.x
when NOT running in cloud mode:

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks,
Shawn



RE: Email Notification for Sucess/Failure of Import Process.

2014-05-29 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
I am not using DIH to index data, I use the post.jar  .XML file to load in 
SOLR.

I am not sure still I can use DIH , importstart and importend ..?

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@gmail.com] 
Sent: Wednesday, May 28, 2014 11:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Email Notification for Sucess/Failure of Import Process.

How about using DIH’s EventListeners? 
http://wiki.apache.org/solr/DataImportHandler#EventListeners  

-Stefan  


On Wednesday, May 28, 2014 at 5:31 PM, EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions) wrote:

 Hi I am using the XML file for Indexing In SOLR. I am planning to make this 
 process more automation. Creating XML File and Loading to SOLR.
  
 I like to get email once the process is completed. Is there any way in solr 
 can this achieved, I am not seeing more inputs on configure notification in 
 SOLR.
  
 Also I am trying DIH, using MS SQL , Someone can help me on sharing the 
 data-config.xml if you are using already once for MSSQL with few basic steps.
  
 Thanks
  
 Ravi  



Re: search using Ngram.

2014-05-29 Thread Michael Della Bitta
Sounds like you are tokenizing your string when you don't really want to.

Either you want all queries to only search against prefixes of the whole
value without tokenization, or you need to produce several copyFields with
different analysis applied and use dismax to let Solr know which should
rank higher.

Or, you could use the Suggester component or one of the other bolt-on
autocomplete components instead.

Maybe you should post your current field definition and let us know
specifically what you're trying to achieve?


Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Thu, May 29, 2014 at 4:54 AM, Gurfan htc.ja...@gmail.com wrote:

 Hi All,

 We are using EdgeNGramFilterFactory for searching with minGramSize=3, as
 per Business logic, auto fill suggestions should appear on entering 3
 characters in search filter. While searching for contact with name Bill
 Moor,  the  value will does not get listed when we type 'Bill M' but when
 we type 'Bill Moo' or 'Bill' it suggests 'Bill Moor'.

 Clearly, The tokens are not generated when there is space in between, we
 cannot set set minGramSize=1 as that will generate many tokens and slow
 the performance. Do we have a solution without using Ngram to generate
 tokens on entering 3 characters?


 Please suggest.

 Thanks,
 --Gurfan



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/search-using-Ngram-tp4138596.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Percolator feature

2014-05-29 Thread Michael Della Bitta
We've definitely looked at Luwak before... nice to hear it might be being
brought closer into the Solr ecosystem!


Re: Error enquiry- exceeded limit of maxWarmingSearchers=2

2014-05-29 Thread Shawn Heisey
On 5/29/2014 4:18 AM, M, Arjun (NSN - IN/Bangalore) wrote:
   Thanks for your valuable inputs... Find below my code and config in 
 solrconfig.xml. Index update is successful but I am not able to see any data 
 from solr admin console. What could be the issue? Any help here is highly 
 appreciated.
 
   I can see the data in the solr admin gui after tomcat restart(solr is 
 running in tomcat in my case)
 
 private void addToSolr(ListSolrInputDocument c) throws SolrServerException, 
 IOException {
 if (!c.isEmpty()) {
 try {
 
 
 solr.add(c);
 logger.info(Commit size after Add= + c.size());
 
 } finally {
 //renew lock
 }
 }
 }
 
 autoCommit config in solrconfig.xml
 =
 
 autoCommit 
maxTime${solr.autoCommit.maxTime:15000}/maxTime
   maxDocs1/maxDocs
openSearcherfalse/openSearcher 
  /autoCommit
 
 autoSoftCommit 
maxTime${solr.autoSoftCommit.maxTime:-1}/maxTime 
  /autoSoftCommit

The code snippet does not include a commit.

I am not really clear on what using a value of -1 would do on maxTime
here.  I suspect that it efectively disables autoSoftCommit.  If that's
the case, then there is nothing at all in your code or your config that
will open a new searcher -- that option is set to false in your autoCommit.

If you want Solr to automatically do commits to make documents visible,
I think you should configure a maxTime value for autoSoftCommit, and
make it as long as you can possibly stand to not have new documents
available.  Then you won't have to worry about commits in your code at all.

 Few more questions.. 
 
 2) If I use solrServer.add(Doc list,commitwithin), should I do 
 solrServer.commit() also

No.  The commitWithin would do a soft commit for you once that much time
has elapsed since indexing started (or the last commit with
openSearcher=true), so you would not need to do a commit().

My opinion is that you should not combine manual commits with
autoSoftCommit.  Depending on exactly what your needs are, you might
want to use commitWithin, and have autoSoftCommit as a last guarantee
against errors in your indexing process.

Thanks,
Shawn



Re: wildcard matches in EnumField - what do I need to change in code to enable wildcard matches?

2014-05-29 Thread Shawn Heisey
On 5/29/2014 12:50 AM, Elran Dvir wrote:
 In my index, I have an EnumField called severity. This is its configuration 
 in enumsConfig.xml:
 
 enum name=severity 
   valueNot Available/value 
   valueLow/value
valueMedium/value
valueHigh/value
valueCritical/value
  /enum
 
 My index contains documents with these values.
 When I search for severity:High, I get results. But when I search for 
 severity:H* , I get no results.
 What do  I need to change in Solr code to enable wildcard matches in 
 EnumField  (or any other field)?

I would suspect that enum fields are not actually stored as text.  They
are likely stored in the index as an integer, with the Solr schema being
the piece that knows what the strings are for each of the numbers.  I
don't think a wildcard match is possible.

Looking at the code for the EnumFieldValue class (added by SOLR-5084), I
do not see any way to match the string value based on a wildcard or
substring.

If you want to use wildcard matches, you'll need to switch the field to
StrField or TextField, and make sure that all of your code is strict
about the values that can end up in the field.

Thanks,
Shawn



RE: Error enquiry- exceeded limit of maxWarmingSearchers=2

2014-05-29 Thread M, Arjun (NSN - IN/Bangalore)
Thanks Shawn... Just one more question..

Can both autoCommit and autoSoftCommit be enabled? If both are enabled, which 
one takes precedence?

Thanks  Regards,
Arjun M


-Original Message-
From: ext Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Thursday, May 29, 2014 7:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Error enquiry- exceeded limit of maxWarmingSearchers=2

On 5/29/2014 4:18 AM, M, Arjun (NSN - IN/Bangalore) wrote:
   Thanks for your valuable inputs... Find below my code and config in 
 solrconfig.xml. Index update is successful but I am not able to see any data 
 from solr admin console. What could be the issue? Any help here is highly 
 appreciated.
 
   I can see the data in the solr admin gui after tomcat restart(solr is 
 running in tomcat in my case)
 
 private void addToSolr(ListSolrInputDocument c) throws SolrServerException, 
 IOException {
 if (!c.isEmpty()) {
 try {
 
 
 solr.add(c);
 logger.info(Commit size after Add= + c.size());
 
 } finally {
 //renew lock
 }
 }
 }
 
 autoCommit config in solrconfig.xml
 =
 
 autoCommit 
maxTime${solr.autoCommit.maxTime:15000}/maxTime
   maxDocs1/maxDocs
openSearcherfalse/openSearcher 
  /autoCommit
 
 autoSoftCommit 
maxTime${solr.autoSoftCommit.maxTime:-1}/maxTime 
  /autoSoftCommit

The code snippet does not include a commit.

I am not really clear on what using a value of -1 would do on maxTime
here.  I suspect that it efectively disables autoSoftCommit.  If that's
the case, then there is nothing at all in your code or your config that
will open a new searcher -- that option is set to false in your autoCommit.

If you want Solr to automatically do commits to make documents visible,
I think you should configure a maxTime value for autoSoftCommit, and
make it as long as you can possibly stand to not have new documents
available.  Then you won't have to worry about commits in your code at all.

 Few more questions.. 
 
 2) If I use solrServer.add(Doc list,commitwithin), should I do 
 solrServer.commit() also

No.  The commitWithin would do a soft commit for you once that much time
has elapsed since indexing started (or the last commit with
openSearcher=true), so you would not need to do a commit().

My opinion is that you should not combine manual commits with
autoSoftCommit.  Depending on exactly what your needs are, you might
want to use commitWithin, and have autoSoftCommit as a last guarantee
against errors in your indexing process.

Thanks,
Shawn



Re: wildcard matches in EnumField - what do I need to change in code to enable wildcard matches?

2014-05-29 Thread Jack Krupansky
At a minimum, the doc is too skimpy to say whether this should work or 
whether this is forbidden. That said, I wouldn't have expected wildcard to 
be supported for enum fields since they are really storing small integers. 
Ditto for regular expressions on enum fields.


See:
https://cwiki.apache.org/confluence/display/solr/Working+with+Enum+Fields

-- Jack Krupansky

-Original Message- 
From: Elran Dvir

Sent: Thursday, May 29, 2014 2:50 AM
To: solr-user@lucene.apache.org
Subject: wildcard matches in EnumField - what do I need to change in code to 
enable wildcard matches?


Hi all,

In my index, I have an EnumField called severity. This is its configuration 
in enumsConfig.xml:


enum name=severity
valueNot Available/value
valueLow/value
valueMedium/value
valueHigh/value
valueCritical/value
/enum

My index contains documents with these values.
When I search for severity:High, I get results. But when I search for 
severity:H* , I get no results.
What do  I need to change in Solr code to enable wildcard matches in 
EnumField  (or any other field)?


Thanks. 



Re: Offline Indexes Update to Shard

2014-05-29 Thread Otis Gospodnetic
Hi,

On Wed, May 28, 2014 at 4:25 AM, Vineet Mishra clearmido...@gmail.comwrote:

 Hi All,

 Has anyone tried with building Offline indexes with EmbeddedSolrServer and
 posting it to Shards.


What do you mean by posting it to shards?  How is that different than
copying them manually to the right location in FS?  Could you please
elaborate?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/



 FYI, I am done building the indexes but looking out for a way to post these
 index files on shards.
 Copying the indexes manually to each shard's replica is possible and is
 working fine but I don't want to go with that approach.

 Thanks!



Re: Solr High GC issue

2014-05-29 Thread Otis Gospodnetic
Hi Bihan,

That's a lot of parameters and without trying one can't really give you
very specific and good advice.  If I had to suggest something quickly I'd
say:

* go back to the basics - remove most of those params and stick with the
basic ones.  Look at GC and tune slowly by changing/adding params one at a
time.
* consider using G1 GC with the most recent Java7.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Thu, May 29, 2014 at 1:36 AM, bihan.chandu bihan.cha...@gmail.comwrote:

 Hi All

 I am Currently using solr 3.6.1 and my system handle lot of request .Now we
 are facing High GC issue in system. Please find the memory parameters in my
 solr system . Can some on help me to identify is there any relationship
 between my memory parameters and GC issue.

 MEM_ARGS=-Xms7936M -Xmx7936M -XX:NewSize=512M -XX:MaxNewSize=512M
 -Xss1024k
 -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
 -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:+CMSParallelRemarkEnabled -XX:+AggressiveOpts
 -XX:LargePageSizeInBytes=2m -XX:+UseLargePages -XX:MaxTenuringThreshold=15
 -XX:-UseAdaptiveSizePolicy -XX:PermSize=256M -XX:MaxPermSize=256M
 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+PrintGCDetails
 -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGC
 -Xloggc:${GCLOG} -XX:-OmitStackTraceInFastThrow -XX:+DisableExplicitGC
 -XX:-BindGCTaskThreadsToCPUs -verbose:gc -XX:StackShadowPages=20

 Thanks
 Bihan



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-High-GC-issue-tp4138570.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Transfer Existing Index to Core with Clean Index

2014-05-29 Thread ScottFree
I managed to figure it out! I did a full commit to the index by:

1. Creating an update.xml file with the commands:

commit/
optimize/

... with the pasted index in the data folder.

2. Running the command from the web browser:

hostport/solr/update?commit=true



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Transfer-Existing-Index-to-Core-with-Clean-Index-tp4138530p4138675.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr High GC issue

2014-05-29 Thread Boogie Shafer
you will probably also want to get some better visibility into what is going on 
with your JVM and GC

easiest way is to enable some GC logging options. the following additional 
options will give you a good deal of information in gc logs

-Xloggc:$JETTY_LOGS/gc.log
-verbose:gc
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintClassHistogram
-XX:+PrintHeapAtGC
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintTLAB
-XX:PrintFLSStatistics=1
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=10m


you may find you have a particular portion of your heap which is undersized

using G1GC with the adaptive sizing is a very handy way to deal with the memory 
usage in solr which can be somewhat difficult to tune optimally using the 
traditional static ratios (what works well for data ingestion is probably not 
optimal for searching)

once you have a baseline of logs using your existing JVM sizings and the 
additional logging options above you might try switching from CMS to G1GC with 
adaptive sizing,  and removing all the static tunings for tenuring and ratios 
and compare to a very minimal G1GC config

-XX:+UseG1GC
-XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=1000

--

configuring the JMX interface is another way to get real time views into what 
is going on using jconsole or jvisualvm tools




From: Otis Gospodnetic otis.gospodne...@gmail.com
Sent: Thursday, May 29, 2014 07:20
To: solr-user@lucene.apache.org
Subject: Re: Solr High GC issue

Hi Bihan,

That's a lot of parameters and without trying one can't really give you
very specific and good advice.  If I had to suggest something quickly I'd
say:

* go back to the basics - remove most of those params and stick with the
basic ones.  Look at GC and tune slowly by changing/adding params one at a
time.
* consider using G1 GC with the most recent Java7.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Thu, May 29, 2014 at 1:36 AM, bihan.chandu bihan.cha...@gmail.comwrote:

 Hi All

 I am Currently using solr 3.6.1 and my system handle lot of request .Now we
 are facing High GC issue in system. Please find the memory parameters in my
 solr system . Can some on help me to identify is there any relationship
 between my memory parameters and GC issue.

 MEM_ARGS=-Xms7936M -Xmx7936M -XX:NewSize=512M -XX:MaxNewSize=512M
 -Xss1024k
 -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
 -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:+CMSParallelRemarkEnabled -XX:+AggressiveOpts
 -XX:LargePageSizeInBytes=2m -XX:+UseLargePages -XX:MaxTenuringThreshold=15
 -XX:-UseAdaptiveSizePolicy -XX:PermSize=256M -XX:MaxPermSize=256M
 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+PrintGCDetails
 -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGC
 -Xloggc:${GCLOG} -XX:-OmitStackTraceInFastThrow -XX:+DisableExplicitGC
 -XX:-BindGCTaskThreadsToCPUs -verbose:gc -XX:StackShadowPages=20

 Thanks
 Bihan



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-High-GC-issue-tp4138570.html
 Sent from the Solr - User mailing list archive at Nabble.com.


openSearcher, default commit settings

2014-05-29 Thread Boon Low
Hi,

1. openSearcher (autoCommit)
According to the Apache Solr reference, autoCommit/openSearcher is set to 
false by default.

https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig

But on Solr v4.8.1, if openSearcher is omitted from the autoCommit config, 
new searchers are opened and warmed post auto-commits. Is this behaviour 
intended or the wiki wrong?

2. openSearcher and other default commit settings
From previous posts, I know it's not possible to disable commits completely in 
Solr config (without coding). But is there a way to configure the default 
settings of hard/explicit commits for the update handler? If not it makes sense 
to have a configuration mechanism. Currently, a simple commit call seems to be 
hard-wired with the following options:

.. 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

There's no server-side option, e.g. to set openSearcher=false as default or 
invariant (cf. searchHandler) to prevent new searchers from opening.

I found that at times it is necessary to have better server- or 
infrastructure-side controls for update/commits, especially in agile teams. 
Client/UI developers do not necessarily have complete Solr knowledge. 
Unintended commits from misbehaving client-side updates may be norm (e.g. 10 
times per minute!).

Regards,

Boon


-
Boon Low
Search Engineer,
DCT Family History




__
brightsolid is used in this email to mean brightsolid online technology
limited.

Email Disclaimer

This message is confidential and may contain privileged information. You should
not disclose its contents to any other person. If you are not the intended
recipient, please notify the sender named above immediately. It is expressly
declared that this e-mail does not constitute nor form part of a contract or
unilateral obligation. Opinions, conclusions and other information in this
message that do not relate to the official business of brightsolid shall be
understood as neither given nor endorsed by it.
__
This email has been scanned by the brightsolid Email Security System. Powered by
McAfee.
__

RE: autowarming queries

2014-05-29 Thread Joshi, Shital
Thanks for looking into this. 

These are our static queries. We only see one of them getting executed. If it 
fails to execute others, shouldn't it show error in log? 

listener event=newSearcher class=solr.QuerySenderListener
arr name=queries
  lst
str name=q*:*/str
str name=fqfield1:abc/str
str name=fq-field2:xyz/str
str name=facettrue/str
str name=facet.mincount1/str
str name=sortbusdate_i desc/str
str name=facet.fieldfield1/str
str name=facet.fieldfield2/str
str name=facet.fieldfield3/str
str name=facet.fieldfield4/str
str name=facet.fieldfield5/str
str name=facet.fieldfield6/str
str name=facet.fieldfield7/str
str name=facet.fieldfield8/str
str name=facet.fieldfield9/str
str name=facet.fieldfield10/str
str name=facet.fieldfield11/str
str name=facet.fieldfield12/str
str name=facet.fieldfield13/str
str name=facet.fieldfield14/str
 str name=facet.fieldfield15/str
str name=facet.missingtrue/str
str name=facet.sortindex/str
str name=facet.limit50/str
str name=facet.offset0/str
str name=statstrue/str
 str name=stats.fieldfield16/str
str name=stats.fieldfield17/str 
 str name=stats.fieldfield18/str
  /lst
   lst
str name=q*:*/str
str name=fqfield1:abc/str
str name=fq-field2:xyz/str
str name=facettrue/str
str name=facet.mincount1/str
str name=sortbusdate_i desc/str
str name=facet.fieldfield1/str
str name=facet.fieldfield2/str
str name=facet.fieldfield3/str
str name=facet.fieldfield4/str
str name=facet.fieldfield5/str
str name=facet.fieldfield6/str
str name=facet.fieldfield7/str
str name=facet.fieldfield8/str
str name=facet.fieldfield9/str
str name=facet.fieldfield10/str
str name=facet.fieldfield11/str
str name=facet.fieldfield12/str
str name=facet.fieldfield13/str
   str name=facet.fieldfield14/str
str name=facet.fieldfield15/str
str name=facet.missingtrue/str
str name=facet.sortcount/str
str name=facet.limit50/str
str name=facet.offset0/str
   str name=statstrue/str
 str name=stats.fieldfield16/str
str name=stats.fieldfield17/str 
str name=stats.fieldfield18/str
 
   /lst  
   lst
str name=indenton/str 
str name=echoHandlertrue/str  
str name=shards.infofalse/str 
str name=shards.tolerantfalse/str 

str name=dftext/str   
str name=defTypelucene/str
str name=q*:*/str
str name=fqfield1:abc/str
str name=fq-field2:xyz/str
str name=q.opAND/str  

str name=facettrue/str
str name=facet.mincount1/str
str name=rows75/str   
str name=start00/str  
str name=facet.fieldfield1/str
str name=field1.facet.missingtrue/str
str name=field1.facet.sortindex/str
str name=field1.facet.limit25/str
str name=field1.facet.offset0/str 
str name=facet.field field2/str
str name=field2.facet.missingtrue/str
str name=field2.facet.sortcount/str
str name=field2.facet.limit25/str
str name=field2.facet.offset0/str 
str name=facet.fieldfield3/str
str 

Safeguards for stray commands from deleting solr data

2014-05-29 Thread Joshi, Shital
Hi,

What are ways to prevent someone executing random delete commands against Solr? 
Like:

curl http://solr.com:8983/solr/core/update?commit=true -H Content-Type: 
text/xml --data-binary 'deletequery*:*/query/delete'

I understand we can do IP based access (change /etc/jetty.xml). Is there 
anything Solr provides out of the box?

Thanks!





Re: Solr High GC issue

2014-05-29 Thread bihan.chandu
Hi All 

Thanks for the Suggestion. I will implement this changes and let us know the
update 

Regards
Bihan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-High-GC-issue-tp4138570p4138694.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: wildcard matches in EnumField - what do I need to change in code to enable wildcard matches?

2014-05-29 Thread Jack Krupansky
And I'm not even sure what the actual use case is here. I mean, the values 
of an enum field must be defined in advance, so if you think a value starts 
with H, just eyeball that static list and see that the only predefined 
value starting with H is High, so you can simply replace your * with 
igh - problem solved! Right? Or is there something or a lot more to your 
use case that you haven't disclosed?


That said, there might be some value to having Solr do the wildcard lookup 
in the predefined list of values and then search for that value. Although 
the wildcard or regex could match more than one predefined value, which 
might be nice to select a set of enum values on a query, an OR of enum 
values. But... we need to consider the real use case before knowing if this 
makes any sense. I can imagine interesting use cases, but my personal 
imagination is not at issue for this particular thread.


-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Thursday, May 29, 2014 9:46 AM
To: solr-user@lucene.apache.org
Subject: Re: wildcard matches in EnumField - what do I need to change in 
code to enable wildcard matches?


On 5/29/2014 12:50 AM, Elran Dvir wrote:
In my index, I have an EnumField called severity. This is its 
configuration in enumsConfig.xml:


enum name=severity
valueNot Available/value
valueLow/value
valueMedium/value
valueHigh/value
valueCritical/value
 /enum

My index contains documents with these values.
When I search for severity:High, I get results. But when I search for 
severity:H* , I get no results.
What do  I need to change in Solr code to enable wildcard matches in 
EnumField  (or any other field)?


I would suspect that enum fields are not actually stored as text.  They
are likely stored in the index as an integer, with the Solr schema being
the piece that knows what the strings are for each of the numbers.  I
don't think a wildcard match is possible.

Looking at the code for the EnumFieldValue class (added by SOLR-5084), I
do not see any way to match the string value based on a wildcard or
substring.

If you want to use wildcard matches, you'll need to switch the field to
StrField or TextField, and make sure that all of your code is strict
about the values that can end up in the field.

Thanks,
Shawn 



Re: Solr High GC issue

2014-05-29 Thread Walter Underwood
Agreed, that is a LOT of options. 

First, check the defaults and remove any flags that are setting something to 
the default. You can see all the flags and the default values with this command:

java -XX:+PrintFlagsFinal -version

For example, the default for ParallelGCThreads is 8, so you do not need to set 
that.

We set a fairly large new generation, about 1/4 of heap. 512 Meg is way too 
small. Solr will allocate a lot of objects that are only used to handle one 
HTTP request. You want all of those to fit in the new space, even when there 
are simultaneous requests. If new is not big enough, they will be allocated in 
tenured space and will cause more frequent major GCs. For an 8G heap, we use a 
2G new size.

wunder

On May 29, 2014, at 7:20 AM, Otis Gospodnetic otis.gospodne...@gmail.com 
wrote:

 Hi Bihan,
 
 That's a lot of parameters and without trying one can't really give you
 very specific and good advice.  If I had to suggest something quickly I'd
 say:
 
 * go back to the basics - remove most of those params and stick with the
 basic ones.  Look at GC and tune slowly by changing/adding params one at a
 time.
 * consider using G1 GC with the most recent Java7.
 
 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/
 
 
 On Thu, May 29, 2014 at 1:36 AM, bihan.chandu bihan.cha...@gmail.comwrote:
 
 Hi All
 
 I am Currently using solr 3.6.1 and my system handle lot of request .Now we
 are facing High GC issue in system. Please find the memory parameters in my
 solr system . Can some on help me to identify is there any relationship
 between my memory parameters and GC issue.
 
 MEM_ARGS=-Xms7936M -Xmx7936M -XX:NewSize=512M -XX:MaxNewSize=512M
 -Xss1024k
 -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
 -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:+CMSParallelRemarkEnabled -XX:+AggressiveOpts
 -XX:LargePageSizeInBytes=2m -XX:+UseLargePages -XX:MaxTenuringThreshold=15
 -XX:-UseAdaptiveSizePolicy -XX:PermSize=256M -XX:MaxPermSize=256M
 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+PrintGCDetails
 -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGC
 -Xloggc:${GCLOG} -XX:-OmitStackTraceInFastThrow -XX:+DisableExplicitGC
 -XX:-BindGCTaskThreadsToCPUs -verbose:gc -XX:StackShadowPages=20
 
 Thanks
 Bihan
 





Re: Error enquiry- exceeded limit of maxWarmingSearchers=2

2014-05-29 Thread Shawn Heisey
On 5/29/2014 7:52 AM, M, Arjun (NSN - IN/Bangalore) wrote:
 Thanks Shawn... Just one more question..

 Can both autoCommit and autoSoftCommit be enabled? If both are enabled, which 
 one takes precedence?

Yes, and it's a very common configuration.  If you do enable both, you
want openSearcher to be false on autoCommit, so that your hard commits
are not making documents visible.  That is a job for autoSoftCommit.  If
you use openSearcher=false on autoCommit, then the question of which one
takes precendence actually has no meaning, because the two kinds of
commits will be doing different things.

Read this until you completely understand it:

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks,
Shawn



Re: Solr GeoHash Field (Solr 4.5)

2014-05-29 Thread david.w.smi...@gmail.com
On IRC you said you found out the answers before I came along.  For
everyone else’s benefit:

* Solr’s “documentation” is essentially the “Solr Reference Guide”. Only
look at the wiki as a secondary source.

* See “location_rpt” in the example schema.xml which supports multi-valued
spatial data.  It’s the evolution of SOLR-2155.

* For clustering, see: http://wiki.apache.org/solr/SpatialClustering

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, May 29, 2014 at 4:18 AM, Chris Atkinson chrisa...@gmail.com wrote:

 Hi,

 I've been reading up a lot on what David has written about GeoHash fields
 and would like to use them.

 I'm trying to create a nice way to display cluster counts of geo points on
 a google map. It's naturally not going to be possible to send 40k marker
 information over the wire to cluster... so figured GeoHash would be
 perfect.

 I'm running Solr 4.5. I've seen this..
 https://github.com/dsmiley/SOLR-2155
 Would this be what I use? It looks like it's really old, and I noticed that
 there is now a solr.GeoHash core field...

 However, if I check the documentation at this page
 https://wiki.apache.org/solr/SpatialSearchDev

 Solr includes a the field type solr.GeoHashField but it unfortunately
  doesn't realize any of the intrinsic properties of the geohash to its
  advantage. *You shouldn't use it.* Instead, check out
  http://wiki.apache.org/solr/SpatialSearch#SOLR-2155. The main feature is
  multi-valued field support.

  Does this mean that there isn't any way to use GeoHash with my version of
 Solr?

 Should I just implement a multi value field andadd all of the multi value
 fields myself?

 (Also, can you confirm that for doing clustering, I'm on the right track
 for using GeoHash. I don't need anything perfect. I just want to be able to
 break up the markers into groups).

 Thanks



Re: openSearcher, default commit settings

2014-05-29 Thread Shawn Heisey
On 5/29/2014 9:21 AM, Boon Low wrote:
 1. openSearcher (autoCommit)
 According to the Apache Solr reference, autoCommit/openSearcher is set to 
 false by default.

 https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig

 But on Solr v4.8.1, if openSearcher is omitted from the autoCommit config, 
 new searchers are opened and warmed post auto-commits. Is this behaviour 
 intended or the wiki wrong?

I am reasonably certain that the default for openSearcher if it is not
specified will always be true. My understanding and your actual
experience says that the documentation is wrong.  Additional note: The
docs for autoSoftCommit are basically a footnote on autoCommit, which I
think is a mistake -- it should have its own section, and the docs
should mention that openSearcher does not apply.

I think the code confirms this.  From SolrConfig.java:

  protected UpdateHandlerInfo loadUpdatehandlerInfo() {
return new UpdateHandlerInfo(get(updateHandler/@class,null),
getInt(updateHandler/autoCommit/maxDocs,-1),
getInt(updateHandler/autoCommit/maxTime,-1),
getBool(updateHandler/autoCommit/openSearcher,true),
getInt(updateHandler/commitIntervalLowerBound,-1),
getInt(updateHandler/autoSoftCommit/maxDocs,-1),
getInt(updateHandler/autoSoftCommit/maxTime,-1),
getBool(updateHandler/commitWithin/softCommit,true));
  }

 2. openSearcher and other default commit settings
 From previous posts, I know it's not possible to disable commits completely 
 in Solr config (without coding). But is there a way to configure the default 
 settings of hard/explicit commits for the update handler? If not it makes 
 sense to have a configuration mechanism. Currently, a simple commit call 
 seems to be hard-wired with the following options:

 .. 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

 There's no server-side option, e.g. to set openSearcher=false as default or 
 invariant (cf. searchHandler) to prevent new searchers from opening.

 I found that at times it is necessary to have better server- or 
 infrastructure-side controls for update/commits, especially in agile teams. 
 Client/UI developers do not necessarily have complete Solr knowledge. 
 Unintended commits from misbehaving client-side updates may be norm (e.g. 10 
 times per minute!).

Since you want to handle commits automatically, you'll want to educate
your developers and tell them that they should never send commits -- let
Solr handle it.  If the code that talks to Solr is Java and uses SolrJ,
you might want to consider using forbidden-apis in your project so that
a build will fail if the commit method gets used.

https://code.google.com/p/forbidden-apis/

Thanks,
Shawn



RE: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-05-29 Thread Ronald Matamoros
Hi all,

At the moment I am reviewing the code to determine if this is a legitimate bug 
that needs to be set as a JIRA ticket.
Any insight or recommendation is appreciated.

Including the replication steps as text:

-
Solr versions where issue was replicated.
  * 4.5.1 (Linux)
  * 4.8.1 (Windows + Cygwin)

Replicating

  1. Created two-shard environment - no replication 
 
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud

 a. Download Solr distribution from 
http://lucene.apache.org/solr/downloads.html 
 b. Unzipped solr-4.8.1.zip to a temporary location: SOLR_DIST_HOME 
 c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar
 d. Create nodes
  i. cd SOLR_DIST_HOME
  ii. Via Windows Explorer copied example to node1
  iii. Via Windows Explorer copied example to node2

 e. Start Nodes 
  i. Start node 1

   cd node1
   java -DzkRun -DnumShards=2 
-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar 
start.jar

  ii. Start node 2

   cd node2
   java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar

 f. Fed sample documents
  i. Out of the box

   curl http://localhost:8983/solr/update?commit=true -H 
Content-Type: text/xml -d @mem.xml
   curl http://localhost:7574/solr/update?commit=true -H 
Content-Type: text/xml -d @monitor2.xml

  ii. Create a copy of mem.xml to mem2.xml; modified identifiers, 
names, prices and fed

   curl http://localhost:8983/solr/update?commit=true -H 
Content-Type: text/xml -d @mem2.xml

   add
 doc
   field name=idCOMPANY1/field
   field name=nameCOMPANY1 Device/field
   field name=manuCOMPANY1 Device Mfg/field
   .
   field name=price190/field
   .
 /doc
 doc
   field name=idCOMPANY2/field
   field name=nameCOMPANY2 flatscreen/field
   field name=manuCOMPANY2 Device Mfg./field
   .
   field name=price200.00/field
   .
 /doc
 doc
   field name=idCOMPANY3/field
   field name=nameCOMPANY3 Laptop/field
   field name=manuCOMPANY3 Device Mfg./field
   .
   field name=price800.00/field
   .
 /doc
 
 /add

  2. Query **without** f.price.facet.mincount=1, counts and buckets are OK

 
http://localhost:8983/solr/collection1/select?q=*:*fl=id,pricesort=id+ascfacet=truefacet.range=pricef.price.facet.range.start=0f.price.facet.range.end=1000f.price.facet.range.gap=50f.price.facet.range.other=allf.price.facet.range.include=upperspellcheck=falsehl=false
 
 Only six documents have prices
 
  lst name=facet_ranges
lst name=price
  lst name=counts
int name=0.00/int
int name=50.01/int
int name=100.00/int
int name=150.03/int
int name=200.00/int
int name=250.01/int
int name=300.00/int
int name=350.00/int
int name=400.00/int
int name=450.00/int
int name=500.00/int
int name=550.00/int
int name=600.00/int
int name=650.00/int
int name=700.00/int
int name=750.01/int
int name=800.00/int
int name=850.00/int
int name=900.00/int
int name=950.00/int
  /lst
  float name=gap50.0/float
  float name=start0.0/float
  float name=end1000.0/float
  int name=before0/int
  int name=after0/int
  int name=between2/int
/lst
  /lst

  Note: the value in int name=between changes with every other 
refresh of the query. 

  3.Use of f.price.facet.mincount=1, missing bucket  int 
name=250.01/int

 
http://localhost:8983/solr/collection1/select?q=*:*fl=id,pricesort=id+ascfacet=truefacet.range=pricef.price.facet.range.start=0f.price.facet.range.end=1000f.price.facet.range.gap=50f.price.facet.range.other=allf.price.facet.range.include=upperspellcheck=falsehl=falsef.price.facet.mincount=1

  lst name=facet_ranges
lst name=price
  lst name=counts
int name=50.01/int
int name=150.03/int
int name=750.01/int
  /lst
  float name=gap50.0/float
  float name=start0.0/float
  float name=end1000.0/float
  int 

RE: Solr High GC issue

2014-05-29 Thread Toke Eskildsen
bihan.chandu [bihan.cha...@gmail.com] wrote:
 I am Currently using solr 3.6.1 and my system handle lot of request .Now we
 are facing High GC issue in system.

Maybe it would help to get an idea of what is causing all the allocations?

- How many documents in your index?
- How many queries/sec?
- How long does a typical query take?
- How many cores does your machine have?
- How many documents are returned/query?
- How much faceting do you perform? How many unique terms/facet field?

- Toke Eskildsen


Re: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-05-29 Thread Shawn Heisey
On 5/29/2014 12:06 PM, Ronald Matamoros wrote:
 Hi all,

 At the moment I am reviewing the code to determine if this is a legitimate 
 bug that needs to be set as a JIRA ticket.
 Any insight or recommendation is appreciated.

snip

   Note: the value in int name=between changes with every other 
 refresh of the query. 

Whenever distributed search results change from one query to the next,
it's almost always caused by having documents with the same uniqueKey in
more than one shard.  Solr is able to remove these duplicates from the
results, but there are other aspects of distributed searching that
cannot be dealt with when there are duplicate documents.  This leads to
problems like numFound changing from one request to the next.

To avoid these problems with SolrCloud, you'll likely want to create a
new collection and set its router to compositeId.  This ensures that
indexed documents are distributed to shards according to the hash of
their uniqueKey, not imported directly into the node where you made the
update request.

It's possible that my guess here is completely wrong, but this is
usually the problem.

Thanks,
Shawn



PDFStreamEngine returning a NULL pointer error

2014-05-29 Thread amoreira
I am wondering the best way to debug an error I am getting in Solr.  The
error is below, but as far as I can tell, pdfbox can not read a font and
returns a null pointer which is passed to tika and then to solr.  Even
though it is only a warning, this appears to terminate the indexing and I
get an error that the indexing could not complete.

My question is how do I determine what the name and directory of this file,
and is there a way to configure either solr or tika to not terminate the
indexing on a null pointer?  Or is this a completely different problem?

Thanks for any help or advice!


5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.io.IOException: Error: Could not find font(COSName{Rx142}) in
map={Rx133=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@f1f3dd,​
Rx136=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@c15066,​
Rx138=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@1858b31,​
Rx110=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@233dfd,​
Rx02=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@186de83}
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.io.IOException: Error: Could not find font(COSName{Rx302}) in
map={Rx110=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@233dfd,​
Rx02=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@186de83,​
Rx266=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@845fc8}
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.io.IOException: Error: Could not find font(COSName{Rx302}) in
map={Rx110=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@233dfd,​
Rx02=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@186de83,​
Rx266=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@845fc8}
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:56:09 AM
WARN
PDFStreamEngine
java.lang.NullPointerException
5/23/2014 9:57:23 AM
WARN
COSDocument
Warning: You did not close a PDF Document




--
View this message in context: 
http://lucene.472066.n3.nabble.com/PDFStreamEngine-returning-a-NULL-pointer-error-tp4138722.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: aliasing for Stats component

2014-05-29 Thread Mohit Jain
Thanks Shalin. I will have a look at this. Currently we are using 4.3.1 so
it should not be much trouble to patch it.

Regards
Mohit


On Wed, May 28, 2014 at 6:57 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Support for keys, tagging and excluding filters in StatsComponent was added
 with SOLR-3177 in v4.8.0

 You can specify e.g. stats.field={!key=xyz}id and the output will use xyz
 instead of id.


 On Wed, May 28, 2014 at 1:55 PM, Mohit Jain mo...@bloomreach.com wrote:

  Hi,
 
  In a solr request one can specify aliasing for returned fields using
  key:fl_name in fl param. I was looking at stats component and found
  that similar support is not available. I do not want to expose internal
  field names to external world. The plan is to do it in fl fashion
 instead
  of post-processing the response at external layer.
 
  I was wondering if exclusion of this feature is by choice or it's just
 that
  it was not added till now.
 
  Thanks
  Mohit
 



 --
 Regards,
 Shalin Shekhar Mangar.



Solr: IndexNotFoundException: no segments* file HdfsDirectoryFactory

2014-05-29 Thread praneethvarma
I'm trying to write some integration tests against SolrCloud for which I'm
setting up a solr instance backed with a zookeeper and pointing it to a
namenode (all in memory using hadoop testing utilities and JettySolrRunner).
I'm getting the following error when I'm trying to create a collection (btw,
the exact same configuration works just fine in dev with solrcloud).

org.apache.lucene.index.IndexNotFoundException: no segments* file found
in NRTCachingDirectory(HdfsDirectory@2ea2a4e4
lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@4cf0e472;
maxCacheMB=192.0 maxMergeSizeMB=16.0): files: [HdfsDirectory@6bf4fc1c
lockFactory=org.apache.solr.store.hdfs.hdfslockfact...@51115f81-write.lock]

I'm getting this error when I'm trying to create a collection (precisely,
when solr is actually trying to open a searcher on the new index.). There
are no segment files in the index directory on HDFS. So this error is
expected on opening a searcher on the index but I thought that the segment
file is created the first time (when a collection is being created). 

After some debugging I noticed that the IndexWriter  is being initialized
explicitly with APPEND mode by overriding the default APPEND_CREATE mode,
which means that the segment files won't be created if at least one doesn't
exist. I'm not sure why this is the case and also I may be going down the
wrong path with the error. Again this only happens in my in-memory solrcloud
setup.

Can someone help me with this? Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-IndexNotFoundException-no-segments-file-HdfsDirectoryFactory-tp4138737.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: overseer queue clogged

2014-05-29 Thread ryan.cooke
We were running Solr 4.2, and are in the process of upgrading. I believe that
the particular scenario that was clogging our queue was resolved in 4.7.1 -
https://issues.apache.org/jira/browse/SOLR-5811



--
View this message in context: 
http://lucene.472066.n3.nabble.com/overseer-queue-clogged-tp4047878p4138746.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Error enquiry- exceeded limit of maxWarmingSearchers=2

2014-05-29 Thread M, Arjun (NSN - IN/Bangalore)
Hi Shawn,

Thanks a lot for your nice explanation..  Now I understood the 
difference between autoCommit and autoSoftCommit.. Now my config looks like 
below.

autoCommit 
   maxDocs1/maxDocs 
   openSearcherfalse/openSearcher 
 /autoCommit

autoSoftCommit 
   maxTime15000/maxTime 
 /autoSoftCommit


With this now I am getting some other error like this.  

org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: version 
conflict for 140142167803912812800030383128128 expected=1469497192978841608 
actual=1469497212082847746

What could be the reason?

Thanks  Regards,
Arjun M


-Original Message-
From: ext Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Thursday, May 29, 2014 10:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Error enquiry- exceeded limit of maxWarmingSearchers=2

On 5/29/2014 7:52 AM, M, Arjun (NSN - IN/Bangalore) wrote:
 Thanks Shawn... Just one more question..

 Can both autoCommit and autoSoftCommit be enabled? If both are enabled, which 
 one takes precedence?

Yes, and it's a very common configuration.  If you do enable both, you
want openSearcher to be false on autoCommit, so that your hard commits
are not making documents visible.  That is a job for autoSoftCommit.  If
you use openSearcher=false on autoCommit, then the question of which one
takes precendence actually has no meaning, because the two kinds of
commits will be doing different things.

Read this until you completely understand it:

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks,
Shawn



DataImportHandler while Replication

2014-05-29 Thread Robin Woods
Hi,

What would happen to DataImportHandler that is setup on the master when the
slave is in the process of replicating the index.

Is there anyway to configure DataImportHandler to not do anything if
replication is in process and/or disable replication before
DataImportHandler starts its process?

Please share your thoughts..

Best,
Robin


Re: Wordbreak spellchecker excessive breaking.

2014-05-29 Thread S.L
James,

Thanks for clearly stating this , I was not able to find this documented
anywhere, yes I am using it with another spell checker (Direct) with the
collation on. I will try the maxChangtes and let you know.

On a side note , whenever I change the spellchecker parameter , I need to
rebuild the index  and delete the solr data directory before that  as my
Tomcat instance would not even start, can you let me know why ?

Thanks.




On Tue, May 27, 2014 at 12:21 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 You can do this if you set it up like in the mail Solr example:

 lst name=spellchecker
 str name=namewordbreak/str
 str name=classnamesolr.WordBreakSolrSpellChecker/str
 str name=fieldname/str
 str name=combineWordstrue/str
 str name=breakWordstrue/str
 int name=maxChanges10/int
 /lst

 The combineWords and breakWords flags let you tell it which kind of
 workbreak correction you want.  maxChanges controls the maximum number of
 words it can break 1 word into, or the maximum number of words it can
 combine.  It is reasonable to set this to 1 or 2.

 The best way to use this is in conjunction with a regular spellchecker
 like DirectSolrSpellChecker.  When used together with the collation
 functionality, it should take a query like mob ile and depending on what
 actually returns results from your data, suggest either mobile or perhaps
 mob lie or both.  The one thing is cannot do is fix a transposition or
 misspelling and combine or break words in one shot.  That is, it cannot
 detect that mob lie should become mobile.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: S.L [mailto:simpleliving...@gmail.com]
 Sent: Saturday, May 24, 2014 4:21 PM
 To: solr-user@lucene.apache.org
 Subject: Wordbreak spellchecker excessive breaking.

 I am using Solr wordbreak spellchecker and the issue is that when I search
 for a term like mob ile expecting that the wordbreak spellchecker would
 actually resutn a suggestion for mobile it breaks the search term into
 letters like m o b  I have two issues with this behavior.

  1. How can I make Solr combine mob ile to mobile?
  2. Not withstanding the fact that my search term mob ile is being broken
 incorrectly into individual letters , I realize that the wordbreak is
 needed in certain cases, how do I control the wordbreak so that it does not
 break it into letters like m o b which seems like excessive breaking to
 me ?

 Thanks.