Hi,
I have an app running on weblogic and oracle. Oracle DB is quite huge;
say some 10 millions of records. I need to integrate Solr for this and I
am planning to use multicore. How can multicore feature can be at the
best?
-Raghu
To give you more information.
The error I get is this one:
java.lang.NoClassDefFoundError:
org/apache/solr/request/VelocityResponseWriter (wrong name:
contrib/velocity/src/main/java/org/apache/solr/request/VelocityResponseWriter)
at java.lang.ClassLoader.defineClass1(Native Method) at
On Mon, Nov 17, 2008 at 2:17 PM, Raghunandan Rao
[EMAIL PROTECTED] wrote:
I have an app running on weblogic and oracle. Oracle DB is quite huge;
say some 10 millions of records. I need to integrate Solr for this and I
am planning to use multicore. How can multicore feature can be at the
Hey there,
I have posted before telling about my situation but I thing my explanation
was a bit confusing...
I am using dataImportHanlder and delta-import and it's working perfectly. I
have also coded my own SqlEntityProcesor to delete from the index and
database expired rows.
Now I need to do
Any update processor can be used with DIH . First of all you may
register your dedupe update processor as you do now. You can either
pass the update.processor is the request parameter pr you can keep the
it in the 'defaults' of datataimport handler
str name=update.processordedupe/str
On Mon,
Hi Todd,
Thanks for this answer, ok but it's not just showing or not in the list,
if a field is not shown but it's boost using qf do I need to store it ???
For a language field which need some special configuration like stemming ...
thanks a lot for your clear answer,
I believe (someone
Hi,
After 5-6 searches I run out of memory :-(
Examples:
String homeDir = /var/lib/tomcat5.5/webapps/solr;
File configFile = new File( homeDir, solr.xml );
CoreContainer myCoreContainer = new CoreContainer(
homeDir, configFile );
mySolrCore =
On Thu, Nov 13, 2008 at 10:43 PM, sunnyfr [EMAIL PROTECTED] wrote:
Hi everybody,
I don't get really when do I have to re index datas or not.
I did a full import but I realised I stored too many fields which I don't
need.
So I have to change some fields inedexed which are stored to not
Thank you so much. I have it sorted.
I am wondering now if there is any more stable way to use deduplication than
adding to the solr source project this patch:
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
(SOLR-799.patch
Marc Sturlese wrote:
Thank you so much. I have it sorted.
I am wondering now if there is any more stable way to use deduplication
than adding to the solr source project this patch:
On Mon, Nov 17, 2008 at 5:18 PM, Marc Sturlese [EMAIL PROTECTED]wrote:
Thank you so much. I have it sorted.
I am wondering now if there is any more stable way to use deduplication
than
adding to the solr source project this patch:
On Nov 17, 2008, at 3:55 AM, JCodina wrote:
java.lang.NoClassDefFoundError:
org/apache/solr/request/VelocityResponseWriter (wrong name:
...
[jar] Building jar:
/home/joan/workspace/solr/contrib/dataimporthandler/target/apache-
solr-dataimporthandler-1.4-dev.jar
dist:
...
[jar]
On Nov 16, 2008, at 1:40 PM, Matthias Epheser wrote:
Matthias and Ryan - let's get SolrJS integrated into contrib/
velocity. Any objections/reservations?
As SolrJS may be used without velocity at all (using eg.
ClientSideWidgets), is it possible to put it into contrib/
javascript and
On Nov 16, 2008, at 6:12 PM, Ian Holsman wrote:
famous last words and all, but you shouldn't be just passing what a
user types directly into a application should you?
LOL
I'd be parsing out wildcards, boosts, and fuzzy searches (or at
least thinking about the effects).
I mean jakarta
On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote:
my assumption with solrjs is that you are hitting read-only solr
servers that you don't mind if people query directly.
Exactly the assumption I'm going with too.
It would not be appropriate for something where you don't want
people (who
On Nov 16, 2008, at 6:27 PM, Ryan McKinley wrote:
I'd be parsing out wildcards, boosts, and fuzzy searches (or at
least thinking about the effects).
I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a
regular query.
Even if you leave the solr instance public, you can still
On Nov 16, 2008, at 6:55 PM, Walter Underwood wrote:
Limiting the maximum number of rows doesn't work, because
they can request rows 2-20100. --wunder
But you could limit how many rows could be returned in a single
request... that'd close off one DoS mechanism.
Erik
On Mon, Nov 17, 2008 at 8:54 AM, Erik Hatcher
[EMAIL PROTECTED] wrote:
Sounds like the perfect case for a query parser plugin... or use dismax as
Ryan mentioned. Shouldn't Solr be hardened for these cases anyway? Or at
least hardenable.
Say you do filtering by user - how would you enforce
Are all the documents in the same search space? That is, for a given
query, could any of the 10MM docs be returned?
If so, I don't think you need to worry about multicore. You may
however need to put part of the index on various machines:
http://wiki.apache.org/solr/DistributedSearch
Hello,
is it possible to use properties from core configuration in data-config.xml?
I want to define the baseDir for DataImportHandler.
I tried the following configuration:
*** solr.xml ***
solr persistent=false
cores adminPath='null'
core name=core0 instanceDir=/opt/solr/cores/core0
On Nov 17, 2008, at 9:07 AM, Yonik Seeley wrote:
On Mon, Nov 17, 2008 at 8:54 AM, Erik Hatcher
[EMAIL PROTECTED] wrote:
Sounds like the perfect case for a query parser plugin... or use
dismax as
Ryan mentioned. Shouldn't Solr be hardened for these cases
anyway? Or at
least hardenable.
Erik Hatcher schrieb:
On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote:
my assumption with solrjs is that you are hitting read-only solr
servers that you don't mind if people query directly.
Exactly the assumption I'm going with too.
It would not be appropriate for something where you
Limiting the number of rows only handles one attack. The one I mentioned,
fetching one page deep in the result set, caused a big issue on prod at
our site. We needed to limit the max for start as well as rows.
It is possible to make it safe, but a lot of work. We did this for
Ultraseek. I would
On Nov 17, 2008, at 10:22 AM, Walter Underwood wrote:
It is possible to make it safe, but a lot of work. We did this for
Ultraseek. I would always, always front it with Apache, to get some
of Apache's protection.
What protections specifically are you speaking of with Apache in
front?
How would you best deal with a page field in solr?
Possible ranges are numbers (1 to 1000s) but also could include appendix
page that include roman and alphabet characters (i, ii, iii, iv, as well as
a, b, c, etc).
It makes sense people would want to search for things between page 1 to 5
but I
Say you do filtering by user - how would you enforce that the client
(if it's a browser) only send in the proper filter?
Ryan already mentioned his technique... and here's how I'd do it
similarly...
Write a custom servlet Filter that grokked roles/authentication
(this piece you'd need
Ryan McKinley wrote:
solr.jar on the other hand lets you package what you want around
search features to build a setup for your needs. Java already has so
many options for how to secure / authenticate that you can just plug
them into your own app. (if that is appropriate). In the past I
Erik Hatcher schrieb:
However, it isn't currently suitable for wiring to SolrJS - Matthias and
I will have to resolve that.
Just noticed that VelocityResponeWriter in trunk is very reduced to my last
patch from 2008-07-25.
Moving the templates into a jar shouldn't be a problem. Setting the
Ryan McKinley schrieb:
however I have found that in any site where
stability/load and uptime are a serious concern, this is better handled
in a tier in front of java -- typically the loadbalancer / haproxy /
whatever -- and managed by people more cautious then me.
Full ack. What do you think
I see value in this in the form of protecting the client from itself.
For example, our Solr isn't accessible from the Internet. It's all
behind firewalls. But, the client applications can make programming
mistakes. I would love the ability to lock them down to a certain number
of rows, just in
Distributed queries:
curl 'http://devxen0:8983/solr/core0/select?
shards=search3:0,search3:8983/solr/
core2version=2.2start=0rows=10q=instance%3Arit%5C-
csm.symplicity.com+AND+label%3ALoginwt=php'
curl 'http://devxen0:8983/solr/core0/select?
shards=search3:0,search3:8983/solr/
On Nov 17, 2008, at 11:45 AM, Matthias Epheser wrote:
Just noticed that VelocityResponeWriter in trunk is very reduced to
my last
patch from 2008-07-25.
Right, that was intentional for my own simplicity's sake...
The crucial difference is the missing translation into a solrj
response by
On Nov 17, 2008, at 12:06 PM, Matthias Epheser wrote:
Ryan McKinley schrieb:
however I have found that in any site where
stability/load and uptime are a serious concern, this is better
handled in a tier in front of java -- typically the loadbalancer /
haproxy / whatever -- and managed by
Erik Hatcher schrieb:
On Nov 17, 2008, at 11:45 AM, Matthias Epheser wrote:
Just noticed that VelocityResponeWriter in trunk is very reduced to my
last
patch from 2008-07-25.
Right, that was intentional for my own simplicity's sake...
The crucial difference is the missing translation into
Can you elaborate on the use case for why you need the raw response
like that?
I vaguely get it, but want to really understand the need here.
I'm weary of the EmbeddedSolrServer usage in there, as I want to
distill the VrW stuff to be able to use SolrJ's API rather than assume
embedded
On Nov 17, 2008, at 1:35 PM, Erik Hatcher wrote:
Can you elaborate on the use case for why you need the raw response
like that?
I vaguely get it, but want to really understand the need here.
I'm weary of the EmbeddedSolrServer usage in there, as I want to
distill the VrW stuff to be able
Ryan McKinley schrieb:
On Nov 17, 2008, at 1:35 PM, Erik Hatcher wrote:
Can you elaborate on the use case for why you need the raw response
like that?
I vaguely get it, but want to really understand the need here.
I'm weary of the EmbeddedSolrServer usage in there, as I want to
distill
Hello -
I wanted to forward this on, since I thought that people here might be
able to use this to build indexes. So long as the lucene version in
LuSQL matches the version in Solr, it would work fine for indexing -
yea?
Thanks for your time!
Matthew Runo
Software Engineer, Zappos.com
On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote:
After we add the SolrQueryResponse to the templates first, we
realized that some convenience methods for iterating the result
docs, accessing facets etc. would be fine.
The idea was to reuse the existing wrappers (eg. QueryResponse). It
On Nov 17, 2008, at 2:59 PM, Erik Hatcher wrote:
On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote:
After we add the SolrQueryResponse to the templates first, we
realized that some convenience methods for iterating the result
docs, accessing facets etc. would be fine.
The idea was to
Yeah, it'd work, though not only does the version of Lucene need to
match, but the field indexing/storage attributes need to jive as well
- and that is the trickier part of the equation.
But yeah, LuSQL looks slick!
Erik
On Nov 17, 2008, at 2:17 PM, Matthew Runo wrote:
Hello -
Erik Hatcher schrieb:
On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote:
After we add the SolrQueryResponse to the templates first, we realized
that some convenience methods for iterating the result docs, accessing
facets etc. would be fine.
The idea was to reuse the existing wrappers
There was a patch by Sean Timm you should investigate as well.
It limited a query so it would take a maximum of X seconds to execute,
and would just return the rows it had found in that time.
Feak, Todd wrote:
I see value in this in the form of protecting the client from itself.
For
http://issues.apache.org/jira/browse/SOLR-527 (An XML commit only
request handler) is pertinent to this discussion as well.
-Sean
Ian Holsman wrote:
There was a patch by Sean Timm you should investigate as well.
It limited a query so it would take a maximum of X seconds to execute,
and
About that read-only switch for Solr: one of the basic HTTP design
guidelines is that GET should only return values, and should never change
the state of the data. All changes to the data should be made with POST. (In
REST style guidelines, PUT, POST, and DELETE.) This prevents you from
passing
I believe the Solr replication scripts require POSTing a commit to read
in the new index--so at least limited POST capability is required in
most scenarios.
-Sean
Lance Norskog wrote:
About that read-only switch for Solr: one of the basic HTTP design
guidelines is that GET should only return
I've tried searching for this answer all over but have found no results
thus far. I am trying to add a new field to my schema.xml with a
default value of 0. I have a ton of data indexed right now and it would
be very hard to retrieve all of the original sources to rebuild my
index. So my
if thats the case putting apache in front of it would be handy.
something like
limit POST
order deny,allow
deny from all
allow from 192.168.0.1
/limit
might be helpful.
Sean Timm wrote:
I believe the Solr replication scripts require POSTing a commit to
read in the new index--so at least
Erik Hatcher schrieb:
On Nov 16, 2008, at 1:40 PM, Matthias Epheser wrote:
Matthias and Ryan - let's get SolrJS integrated into
contrib/velocity. Any objections/reservations?
As SolrJS may be used without velocity at all (using eg.
ClientSideWidgets), is it possible to put it into
Any suggestions?
-Original Message-
From: Nguyen, Joe
Sent: Monday, November 17, 2008 9:40 Joe
To: 'solr-user@lucene.apache.org'
Subject: RE: abt Multicore
Are all the documents in the same search space? That is, for a given
query, could any of the 10MM docs be returned?
If so, I
Hi All,
Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it
will be stored in the index and needed to be removed while searching. In my
case the HTML tags has no need at all. So I created HTMLStripTransformer for
the DIH to remove the HTML tags and save space on the index. I
trouble is, you can also GET /solr/update, even all on the URL, no
request body...
http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true
Solr is a bad RESTafarian.
Getting warmer!
Erik
On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote:
trouble is, you can also GET /solr/update, even all on the URL, no
request body...
http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true
Solr is a
Don't know whether this would work... Just speculate :-)
A. You'll need to create a new schema with the new field or you could
use dynamic field in your current schema (assume you already config the
default value to 0).
B. Add a couple of new documents
C. Run optimize script. Since optimize
Ryan McKinley wrote:
On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote:
trouble is, you can also GET /solr/update, even all on the URL, no
request body...
Hi Alok,
I don't think it's a known issue and 2. a) sounds like the best and most
appreciated approach! :)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: Alok Dhir [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday,
Hello,
I am currently performing a query to a Solr index I've set up and I'm trying
to 1) sort on the score and 2) sort on the date_created (a custom field I've
added). The sort command looks like: sort=score+desc,created_date+desc.
The gist of it is that I will 1) first return the most relevant
i find url not same as the others
--
regards
j.L
nope . It is not possible as of now. the placeholders are not aware of
the core properties.
Is it possible to pass the values as request params? Request
parameters can be accessed .
You can raise an issue and we can address this separately
On Mon, Nov 17, 2008 at 7:57 PM, [EMAIL PROTECTED]
On Tue, Nov 18, 2008 at 2:49 AM, Ahmed Hammad [EMAIL PROTECTED] wrote:
Hi All,
Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it
will be stored in the index and needed to be removed while searching. In my
case the HTML tags has no need at all. So I created
A function query is the likely candidate - no such quantization
function exists, but it would be relatively easy to write one.
-Yonik
On Mon, Nov 17, 2008 at 8:17 PM, Derek Springer [EMAIL PROTECTED] wrote:
Hello,
I am currently performing a query to a Solr index I've set up and I'm trying
to
Some high level thoughts:
On Mon, Nov 17, 2008 at 11:10 PM, Nguyen, Joe [EMAIL PROTECTED]wrote:
Are all the documents in the same search space? That is, for a given
query, could any of the 10MM docs be returned?
If so, I don't think you need to worry about multicore. You may however
need
There may be one way to do this.
Add your property in the invariant section of solrconfig's DataImportHandler
element. For example, add this section:
lst name=invariants
str name=xmlDataDir${xmlDataDir}/str
/lst
Then you can use it as ${dataimporter.request.xmlDataDir} in your
data-config
If the user is using the new java Solr replication then he can get rid
of the /update and /update/csv handlers altogether. So the slaves are
completely read-only
--Noble
On Tue, Nov 18, 2008 at 2:14 AM, Sean Timm [EMAIL PROTECTED] wrote:
I believe the Solr replication scripts require POSTing a
: Full ack. What do you think about the only solr related thing left, the
: paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a
: Filter delivered by solr? Of course as an optional alternative.
: As eric mentioned earlier, this could be done in a QueryComponent -- the
:
Thanks for the heads up. Can anyone point me to (or provide me with) an
example of writing a function query?
-Derek
On Mon, Nov 17, 2008 at 8:17 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
A function query is the likely candidate - no such quantization
function exists, but it would be
Hello.
The data:
I have a dataset containing ~500.000 documents.
In each document there is an email, a name and an user ID.
The problem:
I would like to be able to search in it, but it should be like the MySQL
LIKE.
So when a user enters the search term: carsten, then the query looks like:
69 matches
Mail list logo