Hey,
Is it possible to change the url for solr admin??
What i want is :
http://192.168.0.89:8983/solr/private/coreName/admin
i want to add /private/ before the coreName. Is that possible? If yes how?
Ankita.
I have figured out what was wrong... The field Warehouse was not marked as
indexed... It was being stored, but not indexed... It is now working as
expected.
Thanks.
--Tiernan
On Wed, Oct 26, 2011 at 1:01 PM, Tiernan OToole lsmart...@gmail.com wrote:
Ok, so now i am getting something back,
Hi,
Is SQL Like operator feature available in Apache Solr Just like we have it
in SQL.
SQL example below -
*Select * from Employee where employee_name like '%Solr%'*
If not is it a Bug with Solr. If this feature available, please tell the
examples available.
Thanks!
--
Best Regards,
Arshad
Arshad
Actually it is available, you need to use the ReversedWildcardFilterFactory
which I am sure you can Google for.
Solr and SQL address different problem sets with some overlaps but there are
significant differences between the two technologies. Actually '%Solr%' is a
worse case for SQL
Hi,
this is not exactly true. In Solr, you can't have the wildcard operator
on both sides of the operator.
However, you can tokenize your fields and simply query for Solr. This
is what's Solr made for. :)
-Kuli
Am 01.11.2011 13:24, schrieb François Schiettecatte:
Arshad
Actually it is
Kuli
Good point about just tokenizing the fields :)
I ran a couple of tests to double-check my understanding and you can have a
wildcard operator at either or both ends of a term. Adding
ReversedWildcardFilterFactory to your field analyzer will make leading wildcard
searches a lot faster of
Hi all,
Is there a good guide on using Solr components as a dictionary
matcher? I'm need to do some pre-processing that involves lots of
dictionary lookups and it doesn't seem right to query solr for each
instance.
Thanks in advance,
Nagendra
Yes, that's expected behavior. When you optimize, all segments are
copied over to new
segments(s). Since all changed/new segments are replicated to the slave,
you'll (temporarily) have twice the data on your disk.
You can stop optimizing, it's often not really very useful despite its name.
That
NGrams are often used in Solr for this case, but they will also add to
your index size.
It might be worthwhile to look closely at your user requirements
before going ahead
and supporting this functionality
Best
Erick
2011/11/1 François Schiettecatte fschietteca...@gmail.com:
Kuli
Good
Optimization merges index to a single segment (one huge file), so entire index
will be copied on replication. So you really do need 2x disk in some cases
then.
Do you really need to optimize? We have a pretty big total index (about 200
million docs) and we never optimize. But we do have a
Am 01.11.2011 16:06, schrieb Erick Erickson:
NGrams are often used in Solr for this case, but they will also add to
your index size.
It might be worthwhile to look closely at your user requirements
before going ahead
and supporting this functionality
Best
Erick
My opinion. Wildcards are
Eric,
NGrams could you elaborate on that ? -- haven't seen that before.
Thanks.
On Tue, Nov 1, 2011 at 11:06 AM, Erick Erickson erickerick...@gmail.comwrote:
NGrams are often used in Solr for this case, but they will also add to
your index size.
It might be worthwhile to look closely at
Greetings guys,
I have been thinking of using Solr as a simple database due to it's
blinding speed -- actually I've used that approach in some projects with
decent success.
Any thoughts on that?
Thanks,
MM.
I guess it could be many things.
Typically an easy one to spot is if you have insufficient heap (i.e.
your 16Gb) and the jvm is full gc'ing constantly and not freeing up any
memory and using lots of cpu. This would make solr slow and hangs up
as well during potentially long gc pauses.
add:
Other than it isn't a database?
If you want a key/value store, use one of those. If you want a full DB with
transactions, use one of those.
wunder
On Nov 1, 2011, at 8:47 AM, Memory Makers wrote:
Greetings guys,
I have been thinking of using Solr as a simple database due to it's
blinding
Well I want something beyond a key value store.
I want to be able to free-text search documents
I want to be able to retrieve documents based on other criteria
I'm not sure how that would compare with something like MongoDB.
Thanks.
On Tue, Nov 1, 2011 at 11:49 AM, Walter Underwood
It is not a horrible idea. Lucene has a pretty reliable index now (it should
not get corrupted). And you can do backups with replication.
If you need ranked results (sort by relevance), and lots of free-text queries
then using it makes sense. If you just need boolean search and maybe some
One other potentially huge consideration is how updatable you need documents
to be. Lucene only can replace existing documents, it cannot modify existing
documents directly (so an update is essentially a delete followed by an insert
of a new document with the same primary key). There are
Thanks Robert.
We optimize less frequently than we used to. Down to twice a month from once
a day.
Without optimizing the search speed stays the same, however the index size
increases to 70+ GB.
Perhaps there is a different way to restrict disk usage.
Thanks,
Jason
Robert Stewart
Hi,
I was wondering if it's a good idea to expose Solr to the outside world,
so that our clients running on smart phones will be able to use Solr.
If we decide to do this, what's the security concerns about it?
For example, someone suggested we should limit the number of
rows requested in order
Start here:
http://lucene.apache.org/solr/api/org/apache/solr/analysis/NGramFilterFactory.html
But the idea is that you define a field with the NGramFilterFactory and it
indexes, (here are bigrams) mysolrstuff as separate tokens: my ys so ol lr
rs st tu uf ff. This supports the %solr% idea if
Do you do a lot of deletes (or 'updates' of existing documents)?
Do you store lots of large fields? Maybe you can use compressed fields in that
case (we never have tried it so I cannot confirm how well it works or performs).
You can also turn off things like norms and vectors, etc. if you
Well,
I've done a lot of work with MySQL and content management systems -- and
frankly whenever I have to integrate with Solr or do some Lucene work I am
amazed at the speed -- even when I index web pages for search -- MySQL
pales by comparison when data sets get large (2 million rows)
Thanks,
You would need to setup request handlers in solrconfig.xml to limit what types
of queries people can send to SOLR (and define things like max page size, etc).
You need to restrict people from sending update/delete commands as well.
Then at the minimum, setup some proxy in front of SOLR that
Thanks Erick,
Will take a look at this article.
Cheers,
Jason
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, November 01, 2011 8:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Replicating Large Indexes
Yes, that's expected behavior. When
This is very good idea and I used it several times over the years with
great success. As long as you understand limitations (global
transactions, not being able to update records, ...)
On Tue, Nov 1, 2011 at 8:47 AM, Memory Makers memmakers...@gmail.com wrote:
Greetings guys,
I have been
Greetings. We're finally kicking off our little Solr project. We're
indexing a paltry 25,000 records but each has MANY documents attached, so
we're using Tika to parse those documents into a big long string, which we
use in a call to solrj.addField(relateddoccontents,
rather then mucking with system properties, i find that using JNDI is the
easiest and cleanest way to configure solr home with tomcat.
https://wiki.apache.org/solr/SolrTomcat#Configuring_Solr_Home_with_JNDI
...those instructions are fairly simple, and will work on both windows and
linux (just
Hi,
I have 2 issues.
1. I have an enum column in my sql table.i want to index that column.which
fieldtype should i specify in the schema.xml for enum?
2. Normally we can index one column in a table using the column header as
entity name and the column data as value of the entity.Can i index 2
: What I'm looking for is to do everything in single shot in Solr.
: I'm not even sure if it's possible or not.
: Finding the max value and then running another query is NOT my ideal
: solution.
stats component to determine the max value, and a second query to search
for docs containing that
Thanks Robert,
But do you also think limiting the page size inside a request handler is a
good
solution for attackers? Honestly, I'm not sure if it's a good solution,
that doesn't
save a server from attackers at all. Do you agree with me?
We are not security experts, just developers, but any
: I was wondering if it's a good idea to expose Solr to the outside world,
: so that our clients running on smart phones will be able to use Solr.
As a general rule of thumb, i would say that it is not a good idea to
expose solr directly to the public internet.
there are exceptions to this
: Sounds like a custom sorting collector would work - one that throws away
: docs with less than some minimum score, so that it only collects/sorts
did you look at the example query Karsten mentioned (and also discussedin
the linked thread)
there is no need for a custom collector to do this,
Martijn v Groningen-2 wrote:
When using the group.field option values must be the same otherwise
they don't get grouped together. Maybe fuzzy grouping would be nice.
Grouping videos and images based on mimetype should be easy, right?
Videos have a mimetype that start with video/ and images
What if we just expose '/select' paths - by firewalls and load balancers -
and
also use SSL and HTTP basic or digest access control?
On Tue, Nov 1, 2011 at 2:20 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:
: I was wondering if it's a good idea to expose Solr to the outside world,
: so
Be aware that even /select could have some harmful effects, see
https://issues.apache.org/jira/browse/SOLR-2854 (addressed on trunk).
Even disregarding that issue, /select is a potential gateway to any request
handler defined via /select?qt=/req_handler
Again, in general it's not a good idea
I once had to deal with a severe performance problem caused by a bot that was
requesting results starting at 5000. We disallowed requests over a certain
number of pages in the front end to fix it.
wunder
On Nov 1, 2011, at 12:57 PM, Erik Hatcher wrote:
Be aware that even /select could have
I'm not sure if anybody has asked these questions before or not.
Sorry if they are duplicates.
The problem is that the clients (smart phones) of our Solr machines
are outside the network in which solr machines are located. So, we
need to somehow expose their service to the outside word.
What's
This is definitely an interesting case that i don't think anyone ever
really considered before. It seems like a strong argument in favor of
adding an hl.q param that the HighlightingComponent would use as an
override for whatever the QueryComponent thinks the highlighting query
should be,
sorry, I didn't explain that part. We are the developers of client codes
too.
Meaning that just we know the credentials to access the web container,
and we won't run such queries.
Right now, I'm writing a subclass of SearchHandler which changes the
SolrParams
to remove 'qt' parameter and limit
: Subject: Usage of Double quotes for single terms (camelcase) while querying
: References: a5f2f6ef-d601-432f-a49f-4ec23578d...@mac.com
: camjgjxrykbdxckk4yutpcz8e-8bf+v4qrbsn_yc+b6hvfwg...@mail.gmail.com
: 6640582f-568a-4402-8ce7-bb6d8c9fc...@mac.com
:
I think you can address a lot of these concerns by running some proxy in front
of SOLR, such as HAProxy. You should be able to limit only certain URIs (so
you can prevent /select queries).HAProxy is a free software load-balancer,
and it is very configurable and fairly easy to setup.
On
Yeah, actually our firewalls/loadbalancers can handle these issues.
If they don't, then I'll use HAProxy.
Thanks for all info :-)
On Tue, Nov 1, 2011 at 5:42 PM, Robert Stewart bstewart...@gmail.comwrote:
I think you can address a lot of these concerns by running some proxy in
front of SOLR,
: For the record, I figured out something that will work, although it is
: somewhat inelegant. My q parameter is now:
:
: (+content:notes -genre:Citation)^20 (+content:notes genre:Citation)^0.01
:
: Can I improve on that?
not really (although you can probably get cleaner seperate of query and
: We optimize less frequently than we used to. Down to twice a month from
once a day.
:
: Without optimizing the search speed stays the same, however the index size
increases to 70+ GB.
:
: Perhaps there is a different way to restrict disk usage.
Consider using the maxSegments option on
: I don't think I'm quite getting this. Instead of going down that low,
: could you make your own ResponseWriter? That has access to all
: the information in the doc, and it seems like you could reach out to
: the DB at that point and get your information merrily adding it to the
: docs.
Agreed.
: But Solr is (intentionally) stupid about dates, and
: requires the (almost) full date format. There are
I'm not sure how i feel about intentionally stupid ... but the
underlying sentiment is correct: Solr requires clients to be *VERY*
explicit about dates, because that way the client is in
: This would need 2*3*100 = 600 dynamicfields to cover the openinghours. You
: mention this is peanuts for constructing a booleanquery, but how about
: memory consumption?
: I'm particularly concerned about the Lucene FieldCache getting populated for
: each of the 600 fields. (Since I had some
: If using CommonsHttpSolrServer query() method with parameter wt=json, when
: retrieving QueryResponse, how to do to get JSON result output stream ?
when you are using the CommonsHttpSolrServer level of API, the client
takes care of parsing the response (which is typically in an efficient
: I have these queries in Lucene 2.9.4, is there a way to convert these
: exactly to Solr 3.4 but using only the solrconfig.xml? I will figure out the
: queries but I wanted to know if it is even possible to go from here to
: having something like this:
:
: requestHandler name=/custom
Grrr cut/paste mistake.
This...
: public class FieldQParserPlugin extends QParserPlugin {
...should have been something like...
public class MyQParserPlugin extends QParserPlugin {
...to match the configuration example...
queryParser name=customQP
Wow, 50 lines is tiny! Is that how small you need to go, to get good
highlighting performance?
I'm looking at documents that can be up to 800MB in size, so I've decided to
split them down into 256k chunks. I'm still indexing right now - I'm curious
to see how performance is when the
Oh by the way - what analyzer are you using for your log files? Here's what
I'm trying:
fieldType name=text_pl class=solr.TextField
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter
: Is it possible to change the url for solr admin??
: What i want is :
: http://192.168.0.89:8983/solr/private/coreName/admin
:
: i want to add /private/ before the coreName. Is that possible? If yes how?
You can either do this via settings in your servlet container (to specify
that hte
Hi All,
I recently started working on SOLR 3.3 and would need your expertise to
provide a solution. I'm working on a POC, in which I've imported 3.5 million
document records using DIH. We have a source system which publishes change
data capture in a XML format. The requirement is to integrate
Roman,
How frequently do you update your index? I have a need to do real time
add/delete to SOLR documents at a rate of approximately 20/min.
The total number of documents are in the range of 4 million. Will there
be any performance issues?
Thanks,
Shishir
-Original Message-
From: Roman
I am not very clear. Could you explain a bit in detail or give an example.
Ankita.
On 2 November 2011 06:26, Chris Hostetter hossman_luc...@fucit.org wrote:
: Is it possible to change the url for solr admin??
: What i want is :
: http://192.168.0.89:8983/solr/private/coreName/admin
:
: i
We have a rate of 2K small docs/sec which translates into 90 GB/day of index
space
You should be fine
Roman
Awasthi, Shishir wrote:
Roman,
How frequently do you update your index? I have a need to do real time
add/delete to SOLR documents at a rate of approximately 20/min.
The total
58 matches
Mail list logo