date:20150316

Re: Relevancy : Keyword stuffing

2015-03-16 Thread Mihran Shahinian

Thank you Markus and Chris, for pointers.
For SweetSpotSimilarity I am thinking perhaps a set of closed ranges
exposed via similarity config is easier to maintain as data changes than
making adjustments to fit a
function. Another piece of info would've been handy is to know the average
position info + position info for the first few occurrences for each term.
This would allow
perhaps higher boosting for term occurrences earlier in the doc. In my case
extra keywords are towards the end of the doc,but that info does not seem
to be propagated into scorer.
Thanks again,
Mihran

On Mon, Mar 16, 2015 at 1:52 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:

You should start by checking out the SweetSpotSimilarity .. it was
heavily designed arround the idea of dealing with things like excessively
verbose titles, and keyword stuffing in summary text ... so you can
configure your expectation for what a normal length doc is, and they
will be penalized for being longer then that. similarly you can say what
a 'resaonable' tf is, and docs that exceed that would't get added boost
(which in conjunction with teh lengthNorm penality penalizes docs that
stuff keywords)

https://lucene.apache.org/solr/5_0_0/solr-core/org/apache/solr/search/similarities/SweetSpotSimilarityFactory.html

https://lucene.apache.org/core/5_0_0/misc/org/apache/lucene/misc/doc-files/ss.computeLengthNorm.svg

https://lucene.apache.org/core/5_0_0/misc/org/apache/lucene/misc/doc-files/ss.hyperbolicTf.svg

-Hoss
http://www.lucidworks.com/

Re: Nginx proxy for Solritas

2015-03-16 Thread Erik Hatcher

Have a look at the requests being made to Solr while using /browse (without 
nginx) and that will show you what resources need to be accessible.


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On Mar 16, 2015, at 4:42 PM, LongY zhangyulin8...@hotmail.com wrote:
 
 Thank you for the reply.
 
 I also thought the relevant resources (CSS, images, JavaScript) need to 
 be accessible for Nginx. 
 
 I copied the velocity folder to solr-webapp/webapp folder. It didn't work.
 
 So how to allow /browse resource accessible by the Nginx rule?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Nginx-proxy-for-Solritas-tp4193347p4193352.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Nginx proxy for Solritas

2015-03-16 Thread LongY

Thanks to Erik and Shawn, I figured out the solution.

* place main.css in velocity folder into
/usr/share/nginx/html/solr/collection1/admin/file/
* don't forget to change the permission of main.css by sudo chmod 755
main.css
* add main.css to the configuration file of Ngix:
server {
listen 80 default_server;
listen [::]:80 default_server ipv6only=on;
index main.css;
server_name localhost;
location ~* /solr/\w+/browse {
proxy_pass   http://localhost:8983; 
allow   127.0.0.1;
denyall;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
 
} 
}
That will work.
Also /var/log/nginx/error.log is good for debugging.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nginx-proxy-for-Solritas-tp4193347p4193415.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing db records via SolrJ

2015-03-16 Thread sreedevi s

Hi,
I had checked this post.I dont know whether this is possible but my query
is whether I can use the configuration for DIH for indexing via SolrJ

Best Regards,
Sreedevi S

On Mon, Mar 16, 2015 at 4:17 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello,

 Did you see the great post http://lucidworks.com/blog/indexing-with-solrj/
 ?

 On Mon, Mar 16, 2015 at 1:30 PM, sreedevi s sreedevi.payik...@gmail.com
 wrote:

  Hi,
 
  I am a beginner in Solr. I have a scenario, where I need to index data
 from
  my MySQL db and need to query them. I have figured out to provide my db
  data import configs using DIH. I also know to query my index via SolrJ.
 
  How can I do indexing via SorJ client for my db as well other than
 reading
  the db records into documents one by one?
 
  This question is in point whether is there any way I can make use of my
  configuration files and achieve the same. We need to use java APIs, so
 all
  indexing and querying can be done only via SolrJ.
  Best Regards,
  Sreedevi S
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: indexing db records via SolrJ

2015-03-16 Thread Mikhail Khludnev

Hello,

Did you see the great post http://lucidworks.com/blog/indexing-with-solrj/
?

On Mon, Mar 16, 2015 at 1:30 PM, sreedevi s sreedevi.payik...@gmail.com
wrote:

 Hi,

 I am a beginner in Solr. I have a scenario, where I need to index data from
 my MySQL db and need to query them. I have figured out to provide my db
 data import configs using DIH. I also know to query my index via SolrJ.

 How can I do indexing via SorJ client for my db as well other than reading
 the db records into documents one by one?

 This question is in point whether is there any way I can make use of my
 configuration files and achieve the same. We need to use java APIs, so all
 indexing and querying can be done only via SolrJ.
 Best Regards,
 Sreedevi S




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

Re: Solr returns incorrect results after sorting

2015-03-16 Thread david.w.smi...@gmail.com

I noticed you have an ‘’ immediately preceding the geodist() asc at the
very end of the query/URL; that’s supposed to be a comma since group.sort
is a comma delimited list of sorts.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Mar 16, 2015 at 7:51 AM, kumarraj rajitpro2...@gmail.com wrote:

 Hi,

 I am using group.sort to internally sort the values first based on
 store(using function),then stock and finally distance and sort the output
 results based on price, but solr does not return the correct results after
 sorting.
 Below is the  sample query:

 q=*:*start=0rows=200sort=pricecommon_double

 descd=321spatial=truesfield=store_locationfl=geodist(),*pt=37.1037311,-76.5104751


 group.ngroups=truegroup.limit=1group.facet=truegroup.field=code_stringgroup=truegroup.sort=max(if(exists(query({!v='storeName_string:212'})),2,0),if(exists(query({!v='storeName_string:203'})),1,0))
 desc,inStock_boolean descgeodist() asc


 I am expecting all the docs to be sorted by price from high to low after
 grouping,  but i see the records not matching the order, Do you see any
 issues with the query or having functions in group.sort is not supported in
 solr?




 Regards,
 Raj



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing db records via SolrJ

2015-03-16 Thread Shawn Heisey

On 3/16/2015 7:15 AM, sreedevi s wrote:
 I had checked this post.I dont know whether this is possible but my query
 is whether I can use the configuration for DIH for indexing via SolrJ

You can use SolrJ for accessing DIH.  I have code that does this, but
only for full index rebuilds.

It won't be particularly obvious how to do it.  Writing code that can
intepret DIH status and know when it finishes, succeeds, or fails is
very tricky because DIH only uses human-readable status info, not
machine-readable, and the info is not very consistent.

I can't just share my code, because it's extremely convoluted ... but
the general gist is to create a SolrQuery object, use setRequestHandler
to set the handler to /dataimport or whatever your DIH handler is, and
set the other parameters on the request like command to full-import
and so on.

Thanks,
Shawn

Re: [Poll]: User need for Solr security

2015-03-16 Thread Ahmet Arslan

Hi John,

ManifoldCF in Action book is publicly available to anyone : 
https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/

For solr integration please see :
https://svn.apache.org/repos/asf/manifoldcf/integration/solr-5.x/trunk/README.txt

Ahmet

On Friday, March 13, 2015 2:50 AM, johnmu...@aol.com johnmu...@aol.com 
wrote:



I would love to see record level (or even field level) restricted access in 
Solr / Lucene.

This should be group level, LDAP like or some rule base (which can be dynamic). 
 If the solution means having a second core, so be it.

The following is the closest I found: 
https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security but I cannot 
use Manifold CF (Connector Framework).  Does anyone know how Manifold does it?

- MJ


-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Thursday, March 12, 2015 6:51 PM
To: solr-user@lucene.apache.org
Subject: RE: [Poll]: User need for Solr security

Jan - we don't really need any security for our products, nor for most clients. 
However, one client does deal with very sensitive data so we proposed to 
encrypt the transfer of data and the data on disk through a Lucene Directory. 
It won't fill all gaps but it would adhere to such a client's guidelines. 

I think many approaches of security in Solr/Lucene would find advocates, be it 
index encryption or authentication/authorization or transport security, which 
is now possible. I understand the reluctance of the PMC, and i agree with it, 
but some users would definitately benefit and it would certainly make 
Solr/Lucene the search platform to use for some enterprises.

Markus 

-Original message-
 From:Henrique O. Santos hensan...@gmail.com
 Sent: Thursday 12th March 2015 23:43
 To: solr-user@lucene.apache.org
 Subject: Re: [Poll]: User need for Solr security
 
 Hi,
 
 I’m currently working with indexes that need document level security. Based 
 on the user logged in, query results would omit documents that this user 
 doesn’t have access to, with LDAP integration and such.
 
 I think that would be nice to have on a future Solr release.
 
 Henrique.
 
  On Mar 12, 2015, at 7:32 AM, Jan Høydahl jan@cominvent.com wrote:
  
  Hi,
  
  Securing various Solr APIs has once again surfaced as a discussion 
  in the developer list. See e.g. SOLR-7236 Would be useful to get some 
  feedback from Solr users about needs in the field.
  
  Please reply to this email and let us know what security aspect(s) would be 
  most important for your company to see supported in a future version of 
  Solr.
  Examples: Local user management, AD/LDAP integration, SSL, 
  authenticated login to Admin UI, authorization for Admin APIs, e.g. 
  admin user vs read-only user etc
  
  --
  Jan Høydahl, search solution architect Cominvent AS - 
  www.cominvent.com

indexing db records via SolrJ

2015-03-16 Thread sreedevi s

Hi,

I am a beginner in Solr. I have a scenario, where I need to index data from
my MySQL db and need to query them. I have figured out to provide my db
data import configs using DIH. I also know to query my index via SolrJ.

How can I do indexing via SorJ client for my db as well other than reading
the db records into documents one by one?

This question is in point whether is there any way I can make use of my
configuration files and achieve the same. We need to use java APIs, so all
indexing and querying can be done only via SolrJ.
Best Regards,
Sreedevi S

[ANNOUNCE] Luke 4.10.4 released

2015-03-16 Thread Dmitry Kan

Hello,

Luke 4.10.4 has been released. Download it here:

https://github.com/DmitryKey/luke/releases/tag/luke-4.10.4

The release has been tested against the solr-4.10.4 based index.

Changes:
Trivial pom upgrade to lucene 4.10.4.
Got rid of index version warning on the index summary tab
Luke is now distributed as a tar.gz with the luke binary and a launcher
script.


There is currently luke atop apache pivot cooking in its own branch. You
can try it out already for some basic index loading and search operations:

https://github.com/DmitryKey/luke/tree/pivot-luke

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Solr returns incorrect results after sorting

2015-03-16 Thread kumarraj

Hi,

I am using group.sort to internally sort the values first based on
store(using function),then stock and finally distance and sort the output
results based on price, but solr does not return the correct results after
sorting.  
Below is the  sample query: 

q=*:*start=0rows=200sort=pricecommon_double
descd=321spatial=truesfield=store_locationfl=geodist(),*pt=37.1037311,-76.5104751

group.ngroups=truegroup.limit=1group.facet=truegroup.field=code_stringgroup=truegroup.sort=max(if(exists(query({!v='storeName_string:212'})),2,0),if(exists(query({!v='storeName_string:203'})),1,0))
desc,inStock_boolean descgeodist() asc


I am expecting all the docs to be sorted by price from high to low after
grouping,  but i see the records not matching the order, Do you see any
issues with the query or having functions in group.sort is not supported in
solr?




Regards,
Raj



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Deleted Docs Issue

2015-03-16 Thread Erick Erickson

bq: If this operation is continuously done I would end up with a large set of
deleted docs which will affect the performance of the queries I hit on this
solr.

No, you won't. They'll be merged away as background segments are merged.
Here's a great visualization of the process, the third one down is the
default TieredMergePolicy.

In general, even in the case of replacing all the docs, you'll have 10% of your
corpus be deleted docs. The % of deleted docs in a segment weighs quite
heavily when it comest to the decision of which segment to merge (note that
merging purges the deleted docs).

Also in general, the results of small tests like this simply do not generalize.
i.e. the number of deleted docs in a 200 doc sample size can't be
extrapolated to a reasonable-sized corpus.

Finally, I don't know if this is something temporary, but the implication of
If total commit operations I hit are 20 is that you're committing after every
batch of docs is sent to Solr. You should not do this, let your autocommit
settings handle this.

Here's Mike's blog:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Best,
Erick

On Mon, Mar 16, 2015 at 8:51 AM, Shawn Heisey apa...@elyograg.org wrote:
 On 3/16/2015 9:11 AM, vicky desai wrote:
 I am having an issue with my solr setup. In my solr config I have set
 following property
 *mergeFactor10/mergeFactor*

 The mergeFactor setting is deprecated ... but you are setting it to the
 default value of 10 anyway, so that's not really a big deal.  It's
 possible that mergeFactor will no longer work in 5.0, but I'm not sure
 on that.  You should instead use the settings specific to the merge
 policy, which normally is TieredMergePolicy.

 Note that when mergeFactor is 10, you *will* end up with more than 10
 segments in your index.  There are multiple merge tiers, each one can
 have up to 10 segments before it is merged.

 Now consider following situation. I have* 200* documents in my index. I need
 to update all the 200 docs
 If total commit operations I hit are* 20* i.e I update batches of 10 docs
 merging is done after every 10th update and so the max Segment Count I can
 have is 10 which is fine. However even when merging happens deleted docs are
 not cleared and I end up with 100 deleted docs in index.

 If this operation is continuously done I would end up with a large set of
 deleted docs which will affect the performance of the queries I hit on this
 solr.

 Because there are multiple merge tiers and you cannot easily
 pre-determine which segments will be chosen for a particular merge, the
 merge behavior may not be exactly what you expect.

 The only guaranteed way to get rid of your deleted docs is to do an
 optimize operation, which forces a merge of the entire index down to a
 single segment.  This gets rid of all deleted docs in those segments.
 If you index more data while you are doing the optimize, then you may
 end up with additional deleted docs.

 Thanks,
 Shawn

Re: Solr tlog and soft commit

2015-03-16 Thread vidit.asthana

Can someone please reply to these questions? 

Thanks in advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105p4193311.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Whole RAM consumed while Indexing.

2015-03-16 Thread Erick Erickson

First start by lengthening your soft and hard commit intervals
substantially. Start with 6 and work backwards I'd say.

Ramkumar has tuned the heck out of his installation to get the commit
intervals to be that short ;).

I'm betting that you'll see your RAM usage go way down, but that' s a
guess until you test.

Best,
Erick

On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote:
Hi Erick,
You are saying correct. Something, **overlapping searchers
warning messages** are coming in logs.
**numDocs numbers** are changing when documents are adding at the time of
indexing.
Any help?

On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com
wrote:

First, the soft commit interval is very short. Very, very, very, very
short. 300ms is
just short of insane unless it's a typo ;).

Here's a long background:

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

But the short form is that you're opening searchers every 300 ms. The
hard commit is better,
but every 3 seconds is still far too short IMO. I'd start with soft
commits of 6 and hard
commits of 6 (60 seconds), meaning that you're going to have to
wait 1 minute for
docs to show up unless you explicitly commit.

You're throwing away all the caches configured in solrconfig.xml more
than 3 times a second,
executing autowarming, etc, etc, etc

Changing these to longer intervals might cure the problem, but if not
then, as Hoss would
say, details matter. I suspect you're also seeing overlapping
searchers warning messages
in your log, and it;s _possible_ that what's happening is that you're
just exceeding the
max warming searchers and never opening a new searcher with the
newly-indexed documents.
But that's a total shot in the dark.

How are you looking for docs (and not finding them)? Does the numDocs
number in
the solr admin screen change?

Best,
Erick

On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com
wrote:
Hi Alexandre,

*Hard Commit* is :

autoCommit
maxTime${solr.autoCommit.maxTime:3000}/maxTime
openSearcherfalse/openSearcher
/autoCommit

*Soft Commit* is :

autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:300}/maxTime
/autoSoftCommit

And I am committing 2 documents each time.
Is it good config for committing?
Or I am good something wrong ?

On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch
arafa...@gmail.com
wrote:

What's your commit strategy? Explicit commits? Soft commits/hard
commits (in solrconfig.xml)?

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 12 March 2015 at 23:19, Nitin Solanki nitinml...@gmail.com wrote:
Hello,
I have written a python script to do 2 documents
indexing
each time on Solr. I have 28 GB RAM with 8 CPU.
When I started indexing, at that time 15 GB RAM was freed. While
indexing,
all RAM is consumed but **not** a single document is indexed. Why so?
And it through *HTTPError: HTTP Error 503: Service Unavailable* in
python
script.
I think it is due to heavy load on Zookeeper by which all nodes went
down.
I am not sure about that. Any help please..
Or anything else is happening..
And how to overcome this issue.
Please assist me towards right path.
Thanks..

Warm Regards,
Nitin Solanki

Relevancy : Keyword stuffing

2015-03-16 Thread Mihran Shahinian

Hi all,
I have a use case where the data is generated by SEO minded authors and
more often than not
they perfectly guess the synonym expansions for the document titles skewing
results in their favor.
At the moment I don't have an offline processing infrastructure to detect
these (I can't punish these docs either... just have to level the playing
field).
I am experimenting with taking the max of the term scores, cutting off
scores after certain number of terms,etc but would appreciate any hints if
anyone has experience dealing with a similar use case in solr.

Much appreciated,
Mihran

Re: indexing db records via SolrJ

2015-03-16 Thread Hal Roberts

We import anywhere from five to fifty million small documents a day from 
a postgres database.  I wrestled to get the DIH stuff to work for us for 
about a year and was much happier when I ditched that approach and 
switched to writing the few hundred lines of relatively simple code to 
handle directly the logic of what gets updated and how it gets queried 
from postgres ourselves.


The DIH stuff is great for lots of cases, but if you are getting to the 
point of trying to hack its undocumented internals, I suspect you are 
better off spending a day or two of your time just writing all of the 
update logic yourself.


We found a relatively simple combination of postgres triggers, export to 
csv based on those triggers, and then just calling update/csv to work 
best for us.


-hal

On 3/16/15 9:59 AM, Shawn Heisey wrote:

On 3/16/2015 7:15 AM, sreedevi s wrote:

I had checked this post.I dont know whether this is possible but my query
is whether I can use the configuration for DIH for indexing via SolrJ


You can use SolrJ for accessing DIH.  I have code that does this, but
only for full index rebuilds.

It won't be particularly obvious how to do it.  Writing code that can
intepret DIH status and know when it finishes, succeeds, or fails is
very tricky because DIH only uses human-readable status info, not
machine-readable, and the info is not very consistent.

I can't just share my code, because it's extremely convoluted ... but
the general gist is to create a SolrQuery object, use setRequestHandler
to set the handler to /dataimport or whatever your DIH handler is, and
set the other parameters on the request like command to full-import
and so on.

Thanks,
Shawn



--
Hal Roberts
Fellow
Berkman Center for Internet  Society
Harvard University

Solr Deleted Docs Issue

2015-03-16 Thread vicky desai

Hi,

I am having an issue with my solr setup. In my solr config I have set
following property
*mergeFactor10/mergeFactor*

Now consider following situation. I have* 200* documents in my index. I need
to update all the 200 docs
If total commit operations I hit are* 20* i.e I update batches of 10 docs
merging is done after every 10th update and so the max Segment Count I can
have is 10 which is fine. However even when merging happens deleted docs are
not cleared and I end up with 100 deleted docs in index. 

If this operation is continuously done I would end up with a large set of
deleted docs which will affect the performance of the queries I hit on this
solr.

Can anyone please help me if I have missed a config or if this is an
expected behaviour



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Deleted-Docs-Issue-tp4193292.html
Sent from the Solr - User mailing list archive at Nabble.com.

thresholdTokenFrequency changes suggestion frequency..

2015-03-16 Thread Nitin Solanki

Hi,
  I am not getting that why suggestion frequency goes varies from
original frequency.
Example - I have a word = *who* and its original frequency is *100* but
when I find suggestion of it. It suggestion goes change to *50*.

I think it is happening because of *thresholdTokenFrequency*.
When I set the value of thresholdTokenFrequency to *0.1* then it gives
different frequency for 'who' suggestion while  set the value of
thresholdTokenFrequency to *0.0001* then it gives something different
frequency. Why so? I am not getting logic behind this..

As we know suggestion frequency is same as the index original frequency -

*The spellcheck.extendedResults=true parameter provides frequency of each
original term in the index (origFreq) as well as the frequency of each
suggestion in the index (frequency).*

Re: Solr Deleted Docs Issue

2015-03-16 Thread Shawn Heisey

On 3/16/2015 9:11 AM, vicky desai wrote:
 I am having an issue with my solr setup. In my solr config I have set
 following property
 *mergeFactor10/mergeFactor*

The mergeFactor setting is deprecated ... but you are setting it to the
default value of 10 anyway, so that's not really a big deal.  It's
possible that mergeFactor will no longer work in 5.0, but I'm not sure
on that.  You should instead use the settings specific to the merge
policy, which normally is TieredMergePolicy.

Note that when mergeFactor is 10, you *will* end up with more than 10
segments in your index.  There are multiple merge tiers, each one can
have up to 10 segments before it is merged.

 Now consider following situation. I have* 200* documents in my index. I need
 to update all the 200 docs
 If total commit operations I hit are* 20* i.e I update batches of 10 docs
 merging is done after every 10th update and so the max Segment Count I can
 have is 10 which is fine. However even when merging happens deleted docs are
 not cleared and I end up with 100 deleted docs in index. 

 If this operation is continuously done I would end up with a large set of
 deleted docs which will affect the performance of the queries I hit on this
 solr.

Because there are multiple merge tiers and you cannot easily
pre-determine which segments will be chosen for a particular merge, the
merge behavior may not be exactly what you expect.

The only guaranteed way to get rid of your deleted docs is to do an
optimize operation, which forces a merge of the entire index down to a
single segment.  This gets rid of all deleted docs in those segments. 
If you index more data while you are doing the optimize, then you may
end up with additional deleted docs.

Thanks,
Shawn

RE: Relevancy : Keyword stuffing

2015-03-16 Thread Markus Jelsma

Hello - setting (e)dismax' tie breaker to 0 or much low than default would 
`solve` this for now.
Markus 

-Original message-
 From:Mihran Shahinian slowmih...@gmail.com
 Sent: Monday 16th March 2015 16:29
 To: solr-user@lucene.apache.org
 Subject: Relevancy : Keyword stuffing

 Hi all,
 I have a use case where the data is generated by SEO minded authors and
 more often than not
 they perfectly guess the synonym expansions for the document titles skewing
 results in their favor.
 At the moment I don't have an offline processing infrastructure to detect
 these (I can't punish these docs either... just have to level the playing
 field).
 I am experimenting with taking the max of the term scores, cutting off
 scores after certain number of terms,etc but would appreciate any hints if
 anyone has experience dealing with a similar use case in solr.

 Much appreciated,
 Mihran

maxQueryFrequency v/s thresholdTokenFrequency

2015-03-16 Thread Nitin Solanki

Hello Everyone,
 Please anybody can explain me what is the
difference between maxQueryFrequency and thresholdTokenFrequency?
Got the link -
http://wiki.apache.org/solr/SpellCheckComponent#thresholdTokenFrequency but
unable to understand..
I am very much confusing in both of them.
Your help is appreciated.


Warm Regards,
Nitin

RE: Relevancy : Keyword stuffing

2015-03-16 Thread Markus Jelsma

Hello - Chris' suggestion is indeed a good one but it can be tricky to properly
configure the parameters. Regarding position information, you can override
dismax to have it use SpanFirstQuery. It allows for setting strict boundaries
from the front of the document to a given position. You can also override
SpanFirstQuery to incorporate a gradient, to decrease boosting as distance from
the front increases.

I don't know how you ingest document bodies, but if they are unstructured HTML,
you may want to install proper main content extraction if you haven't already.
Having decent control over HTML is a powerful tool.

You may also want to look at Lucene's BM25 implementation. It is simple to set
up and easier to control. It isn't as rough a tool as TFIDF is regarding to
length normalization. Plus it allows you to smooth TF, which in your case
should also help.

If you like to scrutinize SSS and get some proper results, you are more than
welcome to share them here :)

Markus

-Original message-
From:Mihran Shahinian slowmih...@gmail.com
Sent: Monday 16th March 2015 22:41
To: solr-user@lucene.apache.org
Subject: Re: Relevancy : Keyword stuffing

On Mon, Mar 16, 2015 at 1:52 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:

https://lucene.apache.org/solr/5_0_0/solr-core/org/apache/solr/search/similarities/SweetSpotSimilarityFactory.html

https://lucene.apache.org/core/5_0_0/misc/org/apache/lucene/misc/doc-files/ss.computeLengthNorm.svg

https://lucene.apache.org/core/5_0_0/misc/org/apache/lucene/misc/doc-files/ss.hyperbolicTf.svg

-Hoss
http://www.lucidworks.com/

discrepancy between LuceneQParser and ExtendedDismaxQParser

2015-03-16 Thread Arsen

Hello,

Found discrepancy between LuceneQParser and ExtendedDismaxQParser when 
executing following query:
((*:* AND -area) OR area:[100 TO 300]) AND objectId:40105451

When executing it through Solr Admin panel and placing query in q field I 
having following debug output for LuceneQParser   
--
debug: {
rawquerystring: ((*:* AND -area) OR area:[100 TO 300]) AND 
objectId:40105451,
querystring: ((*:* AND -area) OR area:[100 TO 300]) AND 
objectId:40105451,
parsedquery: +((+MatchAllDocsQuery(*:*) -text:area) area:[100 TO 300]) 
+objectId:40105451,
parsedquery_toString: +((+*:* -text:area) area:[100 TO 300]) +objectId: 
\u0001\u\u\u\u\u\u0013\u000fkk,
explain: {
  40105451: \n14.3511 = (MATCH) sum of:\n  0.034590416 = (MATCH) product 
of:\n0.06918083 = (MATCH) sum of:\n  0.06918083 = (MATCH) sum of:\n 
   0.06918083 = (MATCH) MatchAllDocsQuery, product of:\n  0.06918083 = 
queryNorm\n0.5 = coord(1/2)\n  14.316509 = (MATCH) weight(objectId: 
\u0001\u\u\u\u\u\u0013\u000fkk in 1109978) 
[DefaultSimilarity], result of:\n14.316509 = score(doc=1109978,freq=1.0), 
product of:\n  0.9952025 = queryWeight, product of:\n14.385524 = 
idf(docFreq=1, maxDocs=1301035)\n0.06918083 = queryNorm\n  
14.385524 = fieldWeight in 1109978, product of:\n1.0 = tf(freq=1.0), 
with freq of:\n  1.0 = termFreq=1.0\n14.385524 = idf(docFreq=1, 
maxDocs=1301035)\n1.0 = fieldNorm(doc=1109978)\n
},
--
So, one object found which is expectable

For ExtendedDismaxQParser (only difference is checkbox edismax checked) I am 
seeing this output
--
debug: {
rawquerystring: ((*:* AND -area) OR area:[100 TO 300]) AND 
objectId:40105451,
querystring: ((*:* AND -area) OR area:[100 TO 300]) AND 
objectId:40105451,
parsedquery: (+(+((+DisjunctionMaxQuery((text:*\\:*)) 
-DisjunctionMaxQuery((text:area))) area:[100 TO 300]) 
+objectId:40105451))/no_coord,
parsedquery_toString: +(+((+(text:*\\:*) -(text:area)) area:[100 TO 
300]) +objectId: \u0001\u\u\u\u\u\u0013\u000fkk),
explain: {},
--
oops, no objects found!

I hastened to fill https://issues.apache.org/jira/browse/SOLR-7249 (sorry, my 
bad)
You may refer to it for additional info (not going to duplicate it here)

Thanks

-- 
Best regards,
 Arsen  mailto:barracuda...@mail.ru

Re: Nginx proxy for Solritas

2015-03-16 Thread Shawn Heisey

On 3/16/2015 2:42 PM, LongY wrote:
 Thank you for the reply.

 I also thought the relevant resources (CSS, images, JavaScript) need to 
 be accessible for Nginx. 

 I copied the velocity folder to solr-webapp/webapp folder. It didn't work.

 So how to allow /browse resource accessible by the Nginx rule?

The /browse handler causes your browser to make requests directly to
Solr on handlers other than /browse.  You must figure out what those
requests are and allow them in the proxy configuration.  I do not know
whether they are relative URLs ... I would not be terribly surprised to
learn that they have port 8983 in them rather than the port 80 on your
proxy.  Hopefully that's not the case, or you'll really have problems
making it work on port 80.

I've never spent any real time with the /browse handler.  Requiring
direct access to Solr is completely unacceptable for us.

Thanks,
Shawn

Re: Nginx proxy for Solritas

2015-03-16 Thread LongY

Thank you for the reply.

I also thought the relevant resources (CSS, images, JavaScript) need to 
be accessible for Nginx. 

I copied the velocity folder to solr-webapp/webapp folder. It didn't work.

So how to allow /browse resource accessible by the Nginx rule?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nginx-proxy-for-Solritas-tp4193347p4193352.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: discrepancy between LuceneQParser and ExtendedDismaxQParser

2015-03-16 Thread Jack Krupansky

There was a Solr release with a bug that required that you put a space
between the left parenthesis and the *:*. The edismax parsed query here
indicates that the *:* has not parsed properly.

You have area, but in your jira you had a range query.

-- Jack Krupansky

On Mon, Mar 16, 2015 at 6:42 PM, Arsen barracuda...@mail.ru wrote:

 Hello,

 Found discrepancy between LuceneQParser and ExtendedDismaxQParser when
 executing following query:
 ((*:* AND -area) OR area:[100 TO 300]) AND objectId:40105451

 When executing it through Solr Admin panel and placing query in q field
 I having following debug output for LuceneQParser
 --
 debug: {
 rawquerystring: ((*:* AND -area) OR area:[100 TO 300]) AND
 objectId:40105451,
 querystring: ((*:* AND -area) OR area:[100 TO 300]) AND
 objectId:40105451,
 parsedquery: +((+MatchAllDocsQuery(*:*) -text:area) area:[100 TO
 300]) +objectId:40105451,
 parsedquery_toString: +((+*:* -text:area) area:[100 TO 300])
 +objectId: \u0001\u\u\u\u\u\u0013\u000fkk,
 explain: {
   40105451: \n14.3511 = (MATCH) sum of:\n  0.034590416 = (MATCH)
 product of:\n0.06918083 = (MATCH) sum of:\n  0.06918083 = (MATCH)
 sum of:\n0.06918083 = (MATCH) MatchAllDocsQuery, product of:\n
 0.06918083 = queryNorm\n0.5 = coord(1/2)\n  14.316509 = (MATCH)
 weight(objectId: \u0001\u\u\u\u\u\u0013\u000fkk in
 1109978) [DefaultSimilarity], result of:\n14.316509 =
 score(doc=1109978,freq=1.0), product of:\n  0.9952025 = queryWeight,
 product of:\n14.385524 = idf(docFreq=1, maxDocs=1301035)\n
 0.06918083 = queryNorm\n  14.385524 = fieldWeight in 1109978, product
 of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n14.385524 = idf(docFreq=1, maxDocs=1301035)\n
 1.0 = fieldNorm(doc=1109978)\n
 },
 --
 So, one object found which is expectable

 For ExtendedDismaxQParser (only difference is checkbox edismax checked)
 I am seeing this output
 --
 debug: {
 rawquerystring: ((*:* AND -area) OR area:[100 TO 300]) AND
 objectId:40105451,
 querystring: ((*:* AND -area) OR area:[100 TO 300]) AND
 objectId:40105451,
 parsedquery: (+(+((+DisjunctionMaxQuery((text:*\\:*))
 -DisjunctionMaxQuery((text:area))) area:[100 TO 300])
 +objectId:40105451))/no_coord,
 parsedquery_toString: +(+((+(text:*\\:*) -(text:area)) area:[100 TO
 300]) +objectId: \u0001\u\u\u\u\u\u0013\u000fkk),
 explain: {},
 --
 oops, no objects found!

 I hastened to fill https://issues.apache.org/jira/browse/SOLR-7249
 (sorry, my bad)
 You may refer to it for additional info (not going to duplicate it here)

 Thanks

 --
 Best regards,
  Arsen  mailto:barracuda...@mail.ru

Re: Whole RAM consumed while Indexing.

2015-03-16 Thread Ramkumar R. Aiyengar

Yes, and doing so is painful and takes lots of people and hardware
resources to get there for large amounts of data and queries :)

As Erick says, work backwards from 60s and first establish how high the
commit interval can be to satisfy your use case..
On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote:

First start by lengthening your soft and hard commit intervals
substantially. Start with 6 and work backwards I'd say.

Ramkumar has tuned the heck out of his installation to get the commit
intervals to be that short ;).

I'm betting that you'll see your RAM usage go way down, but that' s a
guess until you test.

Best,
Erick

On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com
wrote:
Hi Erick,
You are saying correct. Something, **overlapping searchers
warning messages** are coming in logs.
**numDocs numbers** are changing when documents are adding at the time of
indexing.
Any help?

On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson
erickerick...@gmail.com
wrote:

First, the soft commit interval is very short. Very, very, very, very
short. 300ms is
just short of insane unless it's a typo ;).

Here's a long background:

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

You're throwing away all the caches configured in solrconfig.xml more
than 3 times a second,
executing autowarming, etc, etc, etc

How are you looking for docs (and not finding them)? Does the numDocs
number in
the solr admin screen change?

Best,
Erick

On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com
wrote:
Hi Alexandre,

*Hard Commit* is :

autoCommit
maxTime${solr.autoCommit.maxTime:3000}/maxTime
openSearcherfalse/openSearcher
/autoCommit

*Soft Commit* is :

autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:300}/maxTime
/autoSoftCommit

And I am committing 2 documents each time.
Is it good config for committing?
Or I am good something wrong ?

On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch
arafa...@gmail.com
wrote:

What's your commit strategy? Explicit commits? Soft commits/hard
commits (in solrconfig.xml)?

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 12 March 2015 at 23:19, Nitin Solanki nitinml...@gmail.com
wrote:
Hello,
I have written a python script to do 2 documents
indexing
each time on Solr. I have 28 GB RAM with 8 CPU.
When I started indexing, at that time 15 GB RAM was freed. While
indexing,
all RAM is consumed but **not** a single document is indexed. Why
so?
And it through *HTTPError: HTTP Error 503: Service Unavailable* in
python
script.
I think it is due to heavy load on Zookeeper by which all nodes
went
down.
I am not sure about that. Any help please..
Or anything else is happening..
And how to overcome this issue.
Please assist me towards right path.
Thanks..

Warm Regards,
Nitin Solanki

Re: indexing db records via SolrJ

2015-03-16 Thread mike st. john

Take a look at some of the integrations people are using with apache storm,
  we do something similar on a larger scale , having created a pgsql spout
and having a solr indexing bolt.


-msj

On Mon, Mar 16, 2015 at 11:08 AM, Hal Roberts 
hrobe...@cyber.law.harvard.edu wrote:

 We import anywhere from five to fifty million small documents a day from a
 postgres database.  I wrestled to get the DIH stuff to work for us for
 about a year and was much happier when I ditched that approach and switched
 to writing the few hundred lines of relatively simple code to handle
 directly the logic of what gets updated and how it gets queried from
 postgres ourselves.

 The DIH stuff is great for lots of cases, but if you are getting to the
 point of trying to hack its undocumented internals, I suspect you are
 better off spending a day or two of your time just writing all of the
 update logic yourself.

 We found a relatively simple combination of postgres triggers, export to
 csv based on those triggers, and then just calling update/csv to work best
 for us.

 -hal


 On 3/16/15 9:59 AM, Shawn Heisey wrote:

 On 3/16/2015 7:15 AM, sreedevi s wrote:

 I had checked this post.I dont know whether this is possible but my query
 is whether I can use the configuration for DIH for indexing via SolrJ


 You can use SolrJ for accessing DIH.  I have code that does this, but
 only for full index rebuilds.

 It won't be particularly obvious how to do it.  Writing code that can
 intepret DIH status and know when it finishes, succeeds, or fails is
 very tricky because DIH only uses human-readable status info, not
 machine-readable, and the info is not very consistent.

 I can't just share my code, because it's extremely convoluted ... but
 the general gist is to create a SolrQuery object, use setRequestHandler
 to set the handler to /dataimport or whatever your DIH handler is, and
 set the other parameters on the request like command to full-import
 and so on.

 Thanks,
 Shawn


 --
 Hal Roberts
 Fellow
 Berkman Center for Internet  Society
 Harvard University

Re: indexing db records via SolrJ

2015-03-16 Thread Jean-Sebastien Vachon

Do you have any references to such integrations (Solr + Storm)?

Thanks

From: mike st. john mstj...@gmail.com
Sent: Monday, March 16, 2015 2:39 PM
To: solr-user@lucene.apache.org
Subject: Re: indexing db records via SolrJ

Take a look at some of the integrations people are using with apache storm,
  we do something similar on a larger scale , having created a pgsql spout
and having a solr indexing bolt.

-msj

On Mon, Mar 16, 2015 at 11:08 AM, Hal Roberts 
hrobe...@cyber.law.harvard.edu wrote:

 We import anywhere from five to fifty million small documents a day from a
 postgres database.  I wrestled to get the DIH stuff to work for us for
 about a year and was much happier when I ditched that approach and switched
 to writing the few hundred lines of relatively simple code to handle
 directly the logic of what gets updated and how it gets queried from
 postgres ourselves.

 The DIH stuff is great for lots of cases, but if you are getting to the
 point of trying to hack its undocumented internals, I suspect you are
 better off spending a day or two of your time just writing all of the
 update logic yourself.

 We found a relatively simple combination of postgres triggers, export to
 csv based on those triggers, and then just calling update/csv to work best
 for us.

 -hal

 On 3/16/15 9:59 AM, Shawn Heisey wrote:

 On 3/16/2015 7:15 AM, sreedevi s wrote:

 I had checked this post.I dont know whether this is possible but my query
 is whether I can use the configuration for DIH for indexing via SolrJ

 You can use SolrJ for accessing DIH.  I have code that does this, but
 only for full index rebuilds.

 It won't be particularly obvious how to do it.  Writing code that can
 intepret DIH status and know when it finishes, succeeds, or fails is
 very tricky because DIH only uses human-readable status info, not
 machine-readable, and the info is not very consistent.

 I can't just share my code, because it's extremely convoluted ... but
 the general gist is to create a SolrQuery object, use setRequestHandler
 to set the handler to /dataimport or whatever your DIH handler is, and
 set the other parameters on the request like command to full-import
 and so on.

 Thanks,
 Shawn

 --
 Hal Roberts
 Fellow
 Berkman Center for Internet  Society
 Harvard University

Re: Data Import Handler - reading GET

2015-03-16 Thread Alexandre Rafalovitch

Have you tried? As ${dih.request.foo}?

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 16 March 2015 at 14:51, Kiran J kiranjuni...@gmail.com wrote:
 Hi,

 In data import handler, I can read the clean query parameter using
 ${dih.request.clean} and pass it on to the queries. Is it possible to read
 any query parameter from the URL ? for eg ${foo} ?

 Thanks

Data Import Handler - reading GET

2015-03-16 Thread Kiran J

Hi,

In data import handler, I can read the clean query parameter using
${dih.request.clean} and pass it on to the queries. Is it possible to read
any query parameter from the URL ? for eg ${foo} ?

Thanks

Re: Relevancy : Keyword stuffing

2015-03-16 Thread Chris Hostetter


You should start by checking out the SweetSpotSimilarity .. it was 
heavily designed arround the idea of dealing with things like excessively 
verbose titles, and keyword stuffing in summary text ... so you can 
configure your expectation for what a normal length doc is, and they 
will be penalized for being longer then that.  similarly you can say what 
a 'resaonable' tf is, and docs that exceed that would't get added boost 
(which in conjunction with teh lengthNorm penality penalizes docs that 
stuff keywords)

https://lucene.apache.org/solr/5_0_0/solr-core/org/apache/solr/search/similarities/SweetSpotSimilarityFactory.html

https://lucene.apache.org/core/5_0_0/misc/org/apache/lucene/misc/doc-files/ss.computeLengthNorm.svg
https://lucene.apache.org/core/5_0_0/misc/org/apache/lucene/misc/doc-files/ss.hyperbolicTf.svg


-Hoss
http://www.lucidworks.com/

Re: [Poll]: User need for Solr security

2015-03-16 Thread Jan Høydahl

Hi,

We tend to recommend ManifoldCF for document level security since that is 
exactly what it is built for. So I doubt we'll see that as a built in feature 
in Solr.
However, the Solr integration is really not that advanced, and I also see 
customers implementing similar logic themselves with success.
On the document feeding side you need to add a few more fields to all your 
documents, typically include_acl and exclude_acl. Populate those fields
with data from LDAP about who (what groups) have access to that document and 
who not. If it is open information, index a special token open in the include 
field.
Then assuming your search client application has authenticated a user, you 
would construct a filter with this users groups, e.g. 
  fq=include_acl:(groupA OR open)fq=-exclude_acl:(groupA)
The filter would be constructed either in your application or in a Solr search 
component or query parser.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

 13. mar. 2015 kl. 01.48 skrev johnmu...@aol.com:
 
 I would love to see record level (or even field level) restricted access in 
 Solr / Lucene.
 
 This should be group level, LDAP like or some rule base (which can be 
 dynamic).  If the solution means having a second core, so be it.
 
 The following is the closest I found: 
 https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security but I 
 cannot use Manifold CF (Connector Framework).  Does anyone know how Manifold 
 does it?
 
 - MJ
 
 -Original Message-
 From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
 Sent: Thursday, March 12, 2015 6:51 PM
 To: solr-user@lucene.apache.org
 Subject: RE: [Poll]: User need for Solr security
 
 Jan - we don't really need any security for our products, nor for most 
 clients. However, one client does deal with very sensitive data so we 
 proposed to encrypt the transfer of data and the data on disk through a 
 Lucene Directory. It won't fill all gaps but it would adhere to such a 
 client's guidelines. 
 
 I think many approaches of security in Solr/Lucene would find advocates, be 
 it index encryption or authentication/authorization or transport security, 
 which is now possible. I understand the reluctance of the PMC, and i agree 
 with it, but some users would definitately benefit and it would certainly 
 make Solr/Lucene the search platform to use for some enterprises.
 
 Markus 
 
 -Original message-
 From:Henrique O. Santos hensan...@gmail.com
 Sent: Thursday 12th March 2015 23:43
 To: solr-user@lucene.apache.org
 Subject: Re: [Poll]: User need for Solr security
 
 Hi,
 
 I’m currently working with indexes that need document level security. Based 
 on the user logged in, query results would omit documents that this user 
 doesn’t have access to, with LDAP integration and such.
 
 I think that would be nice to have on a future Solr release.
 
 Henrique.
 
 On Mar 12, 2015, at 7:32 AM, Jan Høydahl jan@cominvent.com wrote:
 
 Hi,
 
 Securing various Solr APIs has once again surfaced as a discussion 
 in the developer list. See e.g. SOLR-7236 Would be useful to get some 
 feedback from Solr users about needs in the field.
 
 Please reply to this email and let us know what security aspect(s) would be 
 most important for your company to see supported in a future version of 
 Solr.
 Examples: Local user management, AD/LDAP integration, SSL, 
 authenticated login to Admin UI, authorization for Admin APIs, e.g. 
 admin user vs read-only user etc
 
 --
 Jan Høydahl, search solution architect Cominvent AS - 
 www.cominvent.com

Re: Nginx proxy for Solritas

2015-03-16 Thread Erik Hatcher

The links to the screenshots aren’t working for me.  I’m not sure what the 
issue is - but do be aware that /browse with its out of the box templates do 
refer to resources (CSS, images, JavaScript) that isn’t under /browse, so 
you’ll need to allow those to be accessible as well with different rules.


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On Mar 16, 2015, at 3:39 PM, LongY zhangyulin8...@hotmail.com wrote:
 
 Dear Community Members,
 
 I have searched over the forum and googled a lot, still didn't find the
 solution. Finally got me here for help.
 
 I am implementing a Nginx reverse proxy for Solritas
 (VelocityResponseWriter) of the example included in Solr.
 . Nginx listens on port 80, and solr runs on port 8983. This is my Nginx
 configuration file (It only permits localhost
 to access the browse request handler).
 
 *location ~* /solr/\w+/browse {
   proxy_pass  http://localhost:8983;
 
allow   127.0.0.1;
denyall;
 
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
 
}*
 
 when I input http://localhost/solr/collection1/browse in the browser address
 bar. 
 The output I got is this. 
 http://lucene.472066.n3.nabble.com/file/n4193346/left.png 
 The supposed output should be like this 
 http://lucene.472066.n3.nabble.com/file/n4193346/right.png 
 
 I tested the Admin page with this Nginx configuration file with some minor
 modifications, it worked well,
 but when used in velocity templates, it did not render the output properly.
 
 Any input is welcome.
 Thank you.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Nginx-proxy-for-Solritas-tp4193346.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Nginx proxy for Solritas

2015-03-16 Thread LongY

Dear Community Members,

I have searched over the forum and googled a lot, still didn't find the
solution. Finally got me here for help.
 
I am implementing a Nginx reverse proxy for Solritas
(VelocityResponseWriter) of the example included in Solr.
. Nginx listens on port 80, and solr runs on port 8983. This is my Nginx
configuration file (It only permits localhost
to access the browse request handler).

*location ~* /solr/\w+/browse {
   proxy_pass  http://localhost:8983;

allow   127.0.0.1;
denyall;

proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
  
}*

when I input http://localhost/solr/collection1/browse in the browser address
bar. 
 The output I got is this. 
http://lucene.472066.n3.nabble.com/file/n4193346/left.png 
The supposed output should be like this 
http://lucene.472066.n3.nabble.com/file/n4193346/right.png 

I tested the Admin page with this Nginx configuration file with some minor
modifications, it worked well,
but when used in velocity templates, it did not render the output properly.
 
Any input is welcome.
Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nginx-proxy-for-Solritas-tp4193346.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-16 Thread Dmitry Kan

Hi,

I can confirm similar behaviour, but for solr 4.3.1. We use default values
for merge related settings. Even though mergeFactor=10
by default, there are 13 segments in one core and 30 segments in another. I
am not sure it proves there is a bug in the merging, because it depends on
the TieredMergePolicy. Relevant discussion from the past:
http://lucene.472066.n3.nabble.com/TieredMergePolicy-reclaimDeletesWeight-td4071487.html
Apart from other policy parameters you could play with ReclaimDeletesWeight,
in case you'd like to affect on merging the segments with deletes in them.
See
http://stackoverflow.com/questions/18361300/informations-about-tieredmergepolicy


Regarding your attachment: I believe it got cut by the mailing list system,
could you share it via a file sharing system?

On Sat, Mar 14, 2015 at 7:36 AM, Summer Shire shiresum...@gmail.com wrote:

 Hi All,

 Did anyone get a chance to look at my config and the InfoStream File ?

 I am very curious to see what you think

 thanks,
 Summer

  On Mar 6, 2015, at 5:20 PM, Summer Shire shiresum...@gmail.com wrote:
 
  Hi All,
 
  Here’s more update on where I am at with this.
  I enabled infoStream logging and quickly figured that I need to get rid
 of maxBufferedDocs. So Erick you
  were absolutely right on that.
  I increased my ramBufferSize to 100MB
  and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
  My config looks like this
 
  indexConfig
 useCompoundFilefalse/useCompoundFile
 ramBufferSizeMB100/ramBufferSizeMB
 
 
 !--maxMergeSizeForForcedMerge9223372036854775807/maxMergeSizeForForcedMerge--
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy
   int name=maxMergeAtOnce3/int
   int name=segmentsPerTier3/int
 /mergePolicy
 mergeScheduler
 class=org.apache.lucene.index.ConcurrentMergeScheduler/
 infoStream file=“/tmp/INFOSTREAM.txt”true/infoStream
   /indexConfig
 
  I am attaching a sample infostream log file.
  In the infoStream logs though you an see how the segments keep on adding
  and it shows (just an example )
  allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0
 
  I looked at TieredMergePolicy.java to see how allowedSegmentCount is
 getting calculated
  // Compute max allowed segs in the index
 long levelSize = minSegmentBytes;
 long bytesLeft = totIndexBytes;
 double allowedSegCount = 0;
 while(true) {
   final double segCountLevel = bytesLeft / (double) levelSize;
   if (segCountLevel  segsPerTier) {
 allowedSegCount += Math.ceil(segCountLevel);
 break;
   }
   allowedSegCount += segsPerTier;
   bytesLeft -= segsPerTier * levelSize;
   levelSize *= maxMergeAtOnce;
 }
 int allowedSegCountInt = (int) allowedSegCount;
  and the minSegmentBytes is calculated as follows
  // Compute total index bytes  print details about the index
 long totIndexBytes = 0;
 long minSegmentBytes = Long.MAX_VALUE;
 for(SegmentInfoPerCommit info : infosSorted) {
   final long segBytes = size(info);
   if (verbose()) {
 String extra = merging.contains(info) ?  [merging] : ;
 if (segBytes = maxMergedSegmentBytes/2.0) {
   extra +=  [skip: too large];
 } else if (segBytes  floorSegmentBytes) {
   extra +=  [floored];
 }
 message(  seg= + writer.get().segString(info) +  size= +
 String.format(Locale.ROOT, %.3f, segBytes/1024/1024.) +  MB + extra);
   }
 
   minSegmentBytes = Math.min(segBytes, minSegmentBytes);
   // Accum total byte size
   totIndexBytes += segBytes;
 }
 
 
  any input is welcome.
 
  myinfoLog.rtf
 
 
  thanks,
  Summer
 
 
  On Mar 5, 2015, at 8:11 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  I would, BTW, either just get rid of the maxBufferedDocs all together
 or
  make it much higher, i.e. 10. I don't think this is really your
  problem, but you're creating a lot of segments here.
 
  But I'm kind of at a loss as to what would be different about your
 setup.
  Is there _any_ chance that you have some secondary process looking at
  your index that's maintaining open searchers? Any custom code that's
  perhaps failing to close searchers? Is this a Unix or Windows system?
 
  And just to be really clear, you _only_ seeing more segments being
  added, right? If you're only counting files in the index directory, it's
  _possible_ that merging is happening, you're just seeing new files take
  the place of old ones.
 
  Best,
  Erick
 
  On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey apa...@elyograg.org
 wrote:
  On 3/4/2015 4:12 PM, Erick Erickson wrote:
  I _think_, but don't know for sure, that the merging stuff doesn't get
  triggered until you commit, it doesn't just happen.
 
  Shot in the dark...
 
  I believe that new segments are created when the indexing buffer
  (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
  anytime a new segment is created, the merge policy is

Re: Relevancy : Keyword stuffing

Re: Nginx proxy for Solritas

Re: Nginx proxy for Solritas

Re: indexing db records via SolrJ

Re: indexing db records via SolrJ

Re: Solr returns incorrect results after sorting

Re: indexing db records via SolrJ

Re: [Poll]: User need for Solr security

indexing db records via SolrJ

[ANNOUNCE] Luke 4.10.4 released

Solr returns incorrect results after sorting

Re: Solr Deleted Docs Issue

Re: Solr tlog and soft commit

Re: Whole RAM consumed while Indexing.

Relevancy : Keyword stuffing

Re: indexing db records via SolrJ

Solr Deleted Docs Issue

thresholdTokenFrequency changes suggestion frequency..

Re: Solr Deleted Docs Issue

RE: Relevancy : Keyword stuffing

maxQueryFrequency v/s thresholdTokenFrequency

RE: Relevancy : Keyword stuffing

discrepancy between LuceneQParser and ExtendedDismaxQParser

Re: Nginx proxy for Solritas

Re: Nginx proxy for Solritas

Re: discrepancy between LuceneQParser and ExtendedDismaxQParser

Re: Whole RAM consumed while Indexing.

Re: indexing db records via SolrJ

Re: indexing db records via SolrJ

Re: Data Import Handler - reading GET

Data Import Handler - reading GET

Re: Relevancy : Keyword stuffing

Re: [Poll]: User need for Solr security

Re: Nginx proxy for Solritas

Nginx proxy for Solritas

Re: solr 4.7.2 mergeFactor/ Merge policy issue

36 matches

Site Navigation

Mail list logo

Footer information