abt Multicore

2008-11-17 Thread Raghunandan Rao
Hi, I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the best? -Raghu

Re: Build Solr to run SolrJS

2008-11-17 Thread JCodina
To give you more information. The error I get is this one: java.lang.NoClassDefFoundError: org/apache/solr/request/VelocityResponseWriter (wrong name: contrib/velocity/src/main/java/org/apache/solr/request/VelocityResponseWriter) at java.lang.ClassLoader.defineClass1(Native Method) at

Re: abt Multicore

2008-11-17 Thread Shalin Shekhar Mangar
On Mon, Nov 17, 2008 at 2:17 PM, Raghunandan Rao [EMAIL PROTECTED] wrote: I have an app running on weblogic and oracle. Oracle DB is quite huge; say some 10 millions of records. I need to integrate Solr for this and I am planning to use multicore. How can multicore feature can be at the

using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese
Hey there, I have posted before telling about my situation but I thing my explanation was a bit confusing... I am using dataImportHanlder and delta-import and it's working perfectly. I have also coded my own SqlEntityProcesor to delete from the index and database expired rows. Now I need to do

Re: using deduplication with dataimporthandler

2008-11-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
Any update processor can be used with DIH . First of all you may register your dedupe update processor as you do now. You can either pass the update.processor is the request parameter pr you can keep the it in the 'defaults' of datataimport handler str name=update.processordedupe/str On Mon,

RE: solr 1.3 Modification field in schema.xml

2008-11-17 Thread sunnyfr
Hi Todd, Thanks for this answer, ok but it's not just showing or not in the list, if a field is not shown but it's boost using qf do I need to store it ??? For a language field which need some special configuration like stemming ... thanks a lot for your clear answer, I believe (someone

Re: Need help with SolrIndexSearcher CoreContainer

2008-11-17 Thread Kraus, Ralf | pixelhouse GmbH
Hi, After 5-6 searches I run out of memory :-( Examples: String homeDir = /var/lib/tomcat5.5/webapps/solr; File configFile = new File( homeDir, solr.xml ); CoreContainer myCoreContainer = new CoreContainer( homeDir, configFile ); mySolrCore =

Re: solr 1.3 Modification field in schema.xml

2008-11-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Nov 13, 2008 at 10:43 PM, sunnyfr [EMAIL PROTECTED] wrote: Hi everybody, I don't get really when do I have to re index datas or not. I did a full import but I realised I stored too many fields which I don't need. So I have to change some fields inedexed which are stored to not

Re: using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese
Thank you so much. I have it sorted. I am wondering now if there is any more stable way to use deduplication than adding to the solr source project this patch: https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel (SOLR-799.patch

Re: using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese
Marc Sturlese wrote: Thank you so much. I have it sorted. I am wondering now if there is any more stable way to use deduplication than adding to the solr source project this patch:

Re: using deduplication with dataimporthandler

2008-11-17 Thread Shalin Shekhar Mangar
On Mon, Nov 17, 2008 at 5:18 PM, Marc Sturlese [EMAIL PROTECTED]wrote: Thank you so much. I have it sorted. I am wondering now if there is any more stable way to use deduplication than adding to the solr source project this patch:

Re: Build Solr to run SolrJS

2008-11-17 Thread Erik Hatcher
On Nov 17, 2008, at 3:55 AM, JCodina wrote: java.lang.NoClassDefFoundError: org/apache/solr/request/VelocityResponseWriter (wrong name: ... [jar] Building jar: /home/joan/workspace/solr/contrib/dataimporthandler/target/apache- solr-dataimporthandler-1.4-dev.jar dist: ... [jar]

Re: Build Solr to run SolrJS

2008-11-17 Thread Erik Hatcher
On Nov 16, 2008, at 1:40 PM, Matthias Epheser wrote: Matthias and Ryan - let's get SolrJS integrated into contrib/ velocity. Any objections/reservations? As SolrJS may be used without velocity at all (using eg. ClientSideWidgets), is it possible to put it into contrib/ javascript and

Re: Solr security

2008-11-17 Thread Erik Hatcher
On Nov 16, 2008, at 6:12 PM, Ian Holsman wrote: famous last words and all, but you shouldn't be just passing what a user types directly into a application should you? LOL I'd be parsing out wildcards, boosts, and fuzzy searches (or at least thinking about the effects). I mean jakarta

Re: Solr security

2008-11-17 Thread Erik Hatcher
On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote: my assumption with solrjs is that you are hitting read-only solr servers that you don't mind if people query directly. Exactly the assumption I'm going with too. It would not be appropriate for something where you don't want people (who

Re: Solr security

2008-11-17 Thread Erik Hatcher
On Nov 16, 2008, at 6:27 PM, Ryan McKinley wrote: I'd be parsing out wildcards, boosts, and fuzzy searches (or at least thinking about the effects). I mean jakarta apache~1000 or roam~0.1 aren't as efficient as a regular query. Even if you leave the solr instance public, you can still

Re: Solr security

2008-11-17 Thread Erik Hatcher
On Nov 16, 2008, at 6:55 PM, Walter Underwood wrote: Limiting the maximum number of rows doesn't work, because they can request rows 2-20100. --wunder But you could limit how many rows could be returned in a single request... that'd close off one DoS mechanism. Erik

Re: Solr security

2008-11-17 Thread Yonik Seeley
On Mon, Nov 17, 2008 at 8:54 AM, Erik Hatcher [EMAIL PROTECTED] wrote: Sounds like the perfect case for a query parser plugin... or use dismax as Ryan mentioned. Shouldn't Solr be hardened for these cases anyway? Or at least hardenable. Say you do filtering by user - how would you enforce

Re: abt Multicore

2008-11-17 Thread Ryan McKinley
Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need to put part of the index on various machines: http://wiki.apache.org/solr/DistributedSearch

Using properties from core configuration in data-config.xml

2008-11-17 Thread gistolero
Hello, is it possible to use properties from core configuration in data-config.xml? I want to define the baseDir for DataImportHandler. I tried the following configuration: *** solr.xml *** solr persistent=false cores adminPath='null' core name=core0 instanceDir=/opt/solr/cores/core0

Re: Solr security

2008-11-17 Thread Erik Hatcher
On Nov 17, 2008, at 9:07 AM, Yonik Seeley wrote: On Mon, Nov 17, 2008 at 8:54 AM, Erik Hatcher [EMAIL PROTECTED] wrote: Sounds like the perfect case for a query parser plugin... or use dismax as Ryan mentioned. Shouldn't Solr be hardened for these cases anyway? Or at least hardenable.

Re: Solr security

2008-11-17 Thread Matthias Epheser
Erik Hatcher schrieb: On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote: my assumption with solrjs is that you are hitting read-only solr servers that you don't mind if people query directly. Exactly the assumption I'm going with too. It would not be appropriate for something where you

Re: Solr security

2008-11-17 Thread Walter Underwood
Limiting the number of rows only handles one attack. The one I mentioned, fetching one page deep in the result set, caused a big issue on prod at our site. We needed to limit the max for start as well as rows. It is possible to make it safe, but a lot of work. We did this for Ultraseek. I would

Re: Solr security

2008-11-17 Thread Erik Hatcher
On Nov 17, 2008, at 10:22 AM, Walter Underwood wrote: It is possible to make it safe, but a lot of work. We did this for Ultraseek. I would always, always front it with Apache, to get some of Apache's protection. What protections specifically are you speaking of with Apache in front?

Solr build with Rich text document plugin added?

2008-11-17 Thread Rav Bhagdev

Solr build with Rich Document (Doc/PDF etc) plugin already added?

2008-11-17 Thread Rav Bhagdev

Advice for indexing page numbers

2008-11-17 Thread Ian Connor
How would you best deal with a page field in solr? Possible ranges are numbers (1 to 1000s) but also could include appendix page that include roman and alphabet characters (i, ii, iii, iv, as well as a, b, c, etc). It makes sense people would want to search for things between page 1 to 5 but I

Re: Solr security

2008-11-17 Thread Ryan McKinley
Say you do filtering by user - how would you enforce that the client (if it's a browser) only send in the proper filter? Ryan already mentioned his technique... and here's how I'd do it similarly... Write a custom servlet Filter that grokked roles/authentication (this piece you'd need

Re: Solr security

2008-11-17 Thread Mark Miller
Ryan McKinley wrote: solr.jar on the other hand lets you package what you want around search features to build a setup for your needs. Java already has so many options for how to secure / authenticate that you can just plug them into your own app. (if that is appropriate). In the past I

Re: Build Solr to run SolrJS

2008-11-17 Thread Matthias Epheser
Erik Hatcher schrieb: However, it isn't currently suitable for wiring to SolrJS - Matthias and I will have to resolve that. Just noticed that VelocityResponeWriter in trunk is very reduced to my last patch from 2008-07-25. Moving the templates into a jar shouldn't be a problem. Setting the

Re: Solr security

2008-11-17 Thread Matthias Epheser
Ryan McKinley schrieb: however I have found that in any site where stability/load and uptime are a serious concern, this is better handled in a tier in front of java -- typically the loadbalancer / haproxy / whatever -- and managed by people more cautious then me. Full ack. What do you think

RE: Solr security

2008-11-17 Thread Feak, Todd
I see value in this in the form of protecting the client from itself. For example, our Solr isn't accessible from the Internet. It's all behind firewalls. But, the client applications can make programming mistakes. I would love the ability to lock them down to a certain number of rows, just in

sole 1.3: bug in phps response writer

2008-11-17 Thread Alok Dhir
Distributed queries: curl 'http://devxen0:8983/solr/core0/select? shards=search3:0,search3:8983/solr/ core2version=2.2start=0rows=10q=instance%3Arit%5C- csm.symplicity.com+AND+label%3ALoginwt=php' curl 'http://devxen0:8983/solr/core0/select? shards=search3:0,search3:8983/solr/

Re: Build Solr to run SolrJS

2008-11-17 Thread Erik Hatcher
On Nov 17, 2008, at 11:45 AM, Matthias Epheser wrote: Just noticed that VelocityResponeWriter in trunk is very reduced to my last patch from 2008-07-25. Right, that was intentional for my own simplicity's sake... The crucial difference is the missing translation into a solrj response by

Re: Solr security

2008-11-17 Thread Ryan McKinley
On Nov 17, 2008, at 12:06 PM, Matthias Epheser wrote: Ryan McKinley schrieb: however I have found that in any site where stability/load and uptime are a serious concern, this is better handled in a tier in front of java -- typically the loadbalancer / haproxy / whatever -- and managed by

Re: Build Solr to run SolrJS

2008-11-17 Thread Matthias Epheser
Erik Hatcher schrieb: On Nov 17, 2008, at 11:45 AM, Matthias Epheser wrote: Just noticed that VelocityResponeWriter in trunk is very reduced to my last patch from 2008-07-25. Right, that was intentional for my own simplicity's sake... The crucial difference is the missing translation into

Re: Build Solr to run SolrJS

2008-11-17 Thread Erik Hatcher
Can you elaborate on the use case for why you need the raw response like that? I vaguely get it, but want to really understand the need here. I'm weary of the EmbeddedSolrServer usage in there, as I want to distill the VrW stuff to be able to use SolrJ's API rather than assume embedded

Re: Build Solr to run SolrJS

2008-11-17 Thread Ryan McKinley
On Nov 17, 2008, at 1:35 PM, Erik Hatcher wrote: Can you elaborate on the use case for why you need the raw response like that? I vaguely get it, but want to really understand the need here. I'm weary of the EmbeddedSolrServer usage in there, as I want to distill the VrW stuff to be able

Re: Build Solr to run SolrJS

2008-11-17 Thread Matthias Epheser
Ryan McKinley schrieb: On Nov 17, 2008, at 1:35 PM, Erik Hatcher wrote: Can you elaborate on the use case for why you need the raw response like that? I vaguely get it, but want to really understand the need here. I'm weary of the EmbeddedSolrServer usage in there, as I want to distill

Fwd: Software Announcement: LuSql: Database to Lucene indexing

2008-11-17 Thread Matthew Runo
Hello - I wanted to forward this on, since I thought that people here might be able to use this to build indexes. So long as the lucene version in LuSQL matches the version in Solr, it would work fine for indexing - yea? Thanks for your time! Matthew Runo Software Engineer, Zappos.com

Re: Build Solr to run SolrJS

2008-11-17 Thread Erik Hatcher
On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote: After we add the SolrQueryResponse to the templates first, we realized that some convenience methods for iterating the result docs, accessing facets etc. would be fine. The idea was to reuse the existing wrappers (eg. QueryResponse). It

Re: Build Solr to run SolrJS

2008-11-17 Thread Ryan McKinley
On Nov 17, 2008, at 2:59 PM, Erik Hatcher wrote: On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote: After we add the SolrQueryResponse to the templates first, we realized that some convenience methods for iterating the result docs, accessing facets etc. would be fine. The idea was to

Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-17 Thread Erik Hatcher
Yeah, it'd work, though not only does the version of Lucene need to match, but the field indexing/storage attributes need to jive as well - and that is the trickier part of the equation. But yeah, LuSQL looks slick! Erik On Nov 17, 2008, at 2:17 PM, Matthew Runo wrote: Hello -

Re: Build Solr to run SolrJS

2008-11-17 Thread Matthias Epheser
Erik Hatcher schrieb: On Nov 17, 2008, at 2:11 PM, Matthias Epheser wrote: After we add the SolrQueryResponse to the templates first, we realized that some convenience methods for iterating the result docs, accessing facets etc. would be fine. The idea was to reuse the existing wrappers

Re: Solr security

2008-11-17 Thread Ian Holsman
There was a patch by Sean Timm you should investigate as well. It limited a query so it would take a maximum of X seconds to execute, and would just return the rows it had found in that time. Feak, Todd wrote: I see value in this in the form of protecting the client from itself. For

Re: Solr security

2008-11-17 Thread Sean Timm
http://issues.apache.org/jira/browse/SOLR-527 (An XML commit only request handler) is pertinent to this discussion as well. -Sean Ian Holsman wrote: There was a patch by Sean Timm you should investigate as well. It limited a query so it would take a maximum of X seconds to execute, and

RE: Solr security

2008-11-17 Thread Lance Norskog
About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return values, and should never change the state of the data. All changes to the data should be made with POST. (In REST style guidelines, PUT, POST, and DELETE.) This prevents you from passing

Re: Solr security

2008-11-17 Thread Sean Timm
I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least limited POST capability is required in most scenarios. -Sean Lance Norskog wrote: About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return

Updating schema.xml without deleting index?

2008-11-17 Thread Jeff Lerman
I've tried searching for this answer all over but have found no results thus far. I am trying to add a new field to my schema.xml with a default value of 0. I have a ton of data indexed right now and it would be very hard to retrieve all of the original sources to rebuild my index. So my

Re: Solr security

2008-11-17 Thread Ian Holsman
if thats the case putting apache in front of it would be handy. something like limit POST order deny,allow deny from all allow from 192.168.0.1 /limit might be helpful. Sean Timm wrote: I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least

Re: Build Solr to run SolrJS

2008-11-17 Thread Matthias Epheser
Erik Hatcher schrieb: On Nov 16, 2008, at 1:40 PM, Matthias Epheser wrote: Matthias and Ryan - let's get SolrJS integrated into contrib/velocity. Any objections/reservations? As SolrJS may be used without velocity at all (using eg. ClientSideWidgets), is it possible to put it into

RE: abt Multicore

2008-11-17 Thread Nguyen, Joe
Any suggestions? -Original Message- From: Nguyen, Joe Sent: Monday, November 17, 2008 9:40 Joe To: 'solr-user@lucene.apache.org' Subject: RE: abt Multicore Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I

Re: Regex Transformer Error

2008-11-17 Thread Ahmed Hammad
Hi All, Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it will be stored in the index and needed to be removed while searching. In my case the HTML tags has no need at all. So I created HTMLStripTransformer for the DIH to remove the HTML tags and save space on the index. I

Re: Solr security

2008-11-17 Thread Erik Hatcher
trouble is, you can also GET /solr/update, even all on the URL, no request body... http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true Solr is a bad RESTafarian. Getting warmer! Erik

Re: Solr security

2008-11-17 Thread Ryan McKinley
On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote: trouble is, you can also GET /solr/update, even all on the URL, no request body... http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ESTREAMED%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true Solr is a

RE: Updating schema.xml without deleting index?

2008-11-17 Thread Nguyen, Joe
Don't know whether this would work... Just speculate :-) A. You'll need to create a new schema with the new field or you could use dynamic field in your current schema (assume you already config the default value to 0). B. Add a couple of new documents C. Run optimize script. Since optimize

Re: Solr security

2008-11-17 Thread Ian Holsman
Ryan McKinley wrote: On Nov 17, 2008, at 4:20 PM, Erik Hatcher wrote: trouble is, you can also GET /solr/update, even all on the URL, no request body...

Re: sole 1.3: bug in phps response writer

2008-11-17 Thread Otis Gospodnetic
Hi Alok, I don't think it's a known issue and 2. a) sounds like the best and most appreciated approach! :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Alok Dhir [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday,

Query Response Doc Score - Int Value

2008-11-17 Thread Derek Springer
Hello, I am currently performing a query to a Solr index I've set up and I'm trying to 1) sort on the score and 2) sort on the date_created (a custom field I've added). The sort command looks like: sort=score+desc,created_date+desc. The gist of it is that I will 1) first return the most relevant

Re: sole 1.3: bug in phps response writer

2008-11-17 Thread James liu
i find url not same as the others -- regards j.L

Re: Using properties from core configuration in data-config.xml

2008-11-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
nope . It is not possible as of now. the placeholders are not aware of the core properties. Is it possible to pass the values as request params? Request parameters can be accessed . You can raise an issue and we can address this separately On Mon, Nov 17, 2008 at 7:57 PM, [EMAIL PROTECTED]

Re: Regex Transformer Error

2008-11-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Nov 18, 2008 at 2:49 AM, Ahmed Hammad [EMAIL PROTECTED] wrote: Hi All, Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it will be stored in the index and needed to be removed while searching. In my case the HTML tags has no need at all. So I created

Re: Query Response Doc Score - Int Value

2008-11-17 Thread Yonik Seeley
A function query is the likely candidate - no such quantization function exists, but it would be relatively easy to write one. -Yonik On Mon, Nov 17, 2008 at 8:17 PM, Derek Springer [EMAIL PROTECTED] wrote: Hello, I am currently performing a query to a Solr index I've set up and I'm trying to

Re: abt Multicore

2008-11-17 Thread Shalin Shekhar Mangar
Some high level thoughts: On Mon, Nov 17, 2008 at 11:10 PM, Nguyen, Joe [EMAIL PROTECTED]wrote: Are all the documents in the same search space? That is, for a given query, could any of the 10MM docs be returned? If so, I don't think you need to worry about multicore. You may however need

Re: Using properties from core configuration in data-config.xml

2008-11-17 Thread Shalin Shekhar Mangar
There may be one way to do this. Add your property in the invariant section of solrconfig's DataImportHandler element. For example, add this section: lst name=invariants str name=xmlDataDir${xmlDataDir}/str /lst Then you can use it as ${dataimporter.request.xmlDataDir} in your data-config

Re: Solr security

2008-11-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
If the user is using the new java Solr replication then he can get rid of the /update and /update/csv handlers altogether. So the slaves are completely read-only --Noble On Tue, Nov 18, 2008 at 2:14 AM, Sean Timm [EMAIL PROTECTED] wrote: I believe the Solr replication scripts require POSTing a

Re: Solr security

2008-11-17 Thread Chris Hostetter
: Full ack. What do you think about the only solr related thing left, the : paramter filtering/blocking (eg. rows1000). Is this suitable to do it in a : Filter delivered by solr? Of course as an optional alternative. : As eric mentioned earlier, this could be done in a QueryComponent -- the :

Re: Query Response Doc Score - Int Value

2008-11-17 Thread Derek Springer
Thanks for the heads up. Can anyone point me to (or provide me with) an example of writing a function query? -Derek On Mon, Nov 17, 2008 at 8:17 PM, Yonik Seeley [EMAIL PROTECTED] wrote: A function query is the likely candidate - no such quantization function exists, but it would be

Use SOLR like the MySQL LIKE

2008-11-17 Thread Carsten L
Hello. The data: I have a dataset containing ~500.000 documents. In each document there is an email, a name and an user ID. The problem: I would like to be able to search in it, but it should be like the MySQL LIKE. So when a user enters the search term: carsten, then the query looks like: