change solr url

2011-11-01 Thread Ankita Patil
Hey, Is it possible to change the url for solr admin?? What i want is : http://192.168.0.89:8983/solr/private/coreName/admin i want to add /private/ before the coreName. Is that possible? If yes how? Ankita.

Re: MultiValued fields and Facets...

2011-11-01 Thread Tiernan OToole
I have figured out what was wrong... The field Warehouse was not marked as indexed... It was being stored, but not indexed... It is now working as expected. Thanks. --Tiernan On Wed, Oct 26, 2011 at 1:01 PM, Tiernan OToole lsmart...@gmail.com wrote: Ok, so now i am getting something back,

Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread arshad ansari
Hi, Is SQL Like operator feature available in Apache Solr Just like we have it in SQL. SQL example below - *Select * from Employee where employee_name like '%Solr%'* If not is it a Bug with Solr. If this feature available, please tell the examples available. Thanks! -- Best Regards, Arshad

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread François Schiettecatte
Arshad Actually it is available, you need to use the ReversedWildcardFilterFactory which I am sure you can Google for. Solr and SQL address different problem sets with some overlaps but there are significant differences between the two technologies. Actually '%Solr%' is a worse case for SQL

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Michael Kuhlmann
Hi, this is not exactly true. In Solr, you can't have the wildcard operator on both sides of the operator. However, you can tokenize your fields and simply query for Solr. This is what's Solr made for. :) -Kuli Am 01.11.2011 13:24, schrieb François Schiettecatte: Arshad Actually it is

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread François Schiettecatte
Kuli Good point about just tokenizing the fields :) I ran a couple of tests to double-check my understanding and you can have a wildcard operator at either or both ends of a term. Adding ReversedWildcardFilterFactory to your field analyzer will make leading wildcard searches a lot faster of

Using Solr components for dictionary matching?

2011-11-01 Thread Nagendra Mishr
Hi all, Is there a good guide on using Solr components as a dictionary matcher? I'm need to do some pre-processing that involves lots of dictionary lookups and it doesn't seem right to query solr for each instance. Thanks in advance, Nagendra

Re: Replicating Large Indexes

2011-11-01 Thread Erick Erickson
Yes, that's expected behavior. When you optimize, all segments are copied over to new segments(s). Since all changed/new segments are replicated to the slave, you'll (temporarily) have twice the data on your disk. You can stop optimizing, it's often not really very useful despite its name. That

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Erick Erickson
NGrams are often used in Solr for this case, but they will also add to your index size. It might be worthwhile to look closely at your user requirements before going ahead and supporting this functionality Best Erick 2011/11/1 François Schiettecatte fschietteca...@gmail.com: Kuli Good

Re: Replicating Large Indexes

2011-11-01 Thread Robert Stewart
Optimization merges index to a single segment (one huge file), so entire index will be copied on replication. So you really do need 2x disk in some cases then. Do you really need to optimize? We have a pretty big total index (about 200 million docs) and we never optimize. But we do have a

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Michael Kuhlmann
Am 01.11.2011 16:06, schrieb Erick Erickson: NGrams are often used in Solr for this case, but they will also add to your index size. It might be worthwhile to look closely at your user requirements before going ahead and supporting this functionality Best Erick My opinion. Wildcards are

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Memory Makers
Eric, NGrams could you elaborate on that ? -- haven't seen that before. Thanks. On Tue, Nov 1, 2011 at 11:06 AM, Erick Erickson erickerick...@gmail.comwrote: NGrams are often used in Solr for this case, but they will also add to your index size. It might be worthwhile to look closely at

simple persistance layer on top of Solr

2011-11-01 Thread Memory Makers
Greetings guys, I have been thinking of using Solr as a simple database due to it's blinding speed -- actually I've used that approach in some projects with decent success. Any thoughts on that? Thanks, MM.

Re: Solr Profiling

2011-11-01 Thread Andre Parodi
I guess it could be many things. Typically an easy one to spot is if you have insufficient heap (i.e. your 16Gb) and the jvm is full gc'ing constantly and not freeing up any memory and using lots of cpu. This would make solr slow and hangs up as well during potentially long gc pauses. add:

Re: simple persistance layer on top of Solr

2011-11-01 Thread Walter Underwood
Other than it isn't a database? If you want a key/value store, use one of those. If you want a full DB with transactions, use one of those. wunder On Nov 1, 2011, at 8:47 AM, Memory Makers wrote: Greetings guys, I have been thinking of using Solr as a simple database due to it's blinding

Re: simple persistance layer on top of Solr

2011-11-01 Thread Memory Makers
Well I want something beyond a key value store. I want to be able to free-text search documents I want to be able to retrieve documents based on other criteria I'm not sure how that would compare with something like MongoDB. Thanks. On Tue, Nov 1, 2011 at 11:49 AM, Walter Underwood

Re: simple persistance layer on top of Solr

2011-11-01 Thread Robert Stewart
It is not a horrible idea. Lucene has a pretty reliable index now (it should not get corrupted). And you can do backups with replication. If you need ranked results (sort by relevance), and lots of free-text queries then using it makes sense. If you just need boolean search and maybe some

Re: simple persistance layer on top of Solr

2011-11-01 Thread Robert Stewart
One other potentially huge consideration is how updatable you need documents to be. Lucene only can replace existing documents, it cannot modify existing documents directly (so an update is essentially a delete followed by an insert of a new document with the same primary key). There are

Re: Replicating Large Indexes

2011-11-01 Thread Jason Biggin
Thanks Robert. We optimize less frequently than we used to. Down to twice a month from once a day. Without optimizing the search speed stays the same, however the index size increases to 70+ GB. Perhaps there is a different way to restrict disk usage. Thanks, Jason Robert Stewart

Questions about Solr's security

2011-11-01 Thread Alireza Salimi
Hi, I was wondering if it's a good idea to expose Solr to the outside world, so that our clients running on smart phones will be able to use Solr. If we decide to do this, what's the security concerns about it? For example, someone suggested we should limit the number of rows requested in order

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread Erick Erickson
Start here: http://lucene.apache.org/solr/api/org/apache/solr/analysis/NGramFilterFactory.html But the idea is that you define a field with the NGramFilterFactory and it indexes, (here are bigrams) mysolrstuff as separate tokens: my ys so ol lr rs st tu uf ff. This supports the %solr% idea if

Re: Replicating Large Indexes

2011-11-01 Thread Robert Stewart
Do you do a lot of deletes (or 'updates' of existing documents)? Do you store lots of large fields? Maybe you can use compressed fields in that case (we never have tried it so I cannot confirm how well it works or performs). You can also turn off things like norms and vectors, etc. if you

Re: simple persistance layer on top of Solr

2011-11-01 Thread Memory Makers
Well, I've done a lot of work with MySQL and content management systems -- and frankly whenever I have to integrate with Solr or do some Lucene work I am amazed at the speed -- even when I index web pages for search -- MySQL pales by comparison when data sets get large (2 million rows) Thanks,

Re: Questions about Solr's security

2011-11-01 Thread Robert Stewart
You would need to setup request handlers in solrconfig.xml to limit what types of queries people can send to SOLR (and define things like max page size, etc). You need to restrict people from sending update/delete commands as well. Then at the minimum, setup some proxy in front of SOLR that

RE: Replicating Large Indexes

2011-11-01 Thread Jason Biggin
Thanks Erick, Will take a look at this article. Cheers, Jason -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 01, 2011 8:05 AM To: solr-user@lucene.apache.org Subject: Re: Replicating Large Indexes Yes, that's expected behavior. When

Re: simple persistance layer on top of Solr

2011-11-01 Thread Mikhail Garber
This is very good idea and I used it several times over the years with great success. As long as you understand limitations (global transactions, not being able to update records, ...) On Tue, Nov 1, 2011 at 8:47 AM, Memory Makers memmakers...@gmail.com wrote: Greetings guys, I have been

Multivalued fields question

2011-11-01 Thread Travis Low
Greetings. We're finally kicking off our little Solr project. We're indexing a paltry 25,000 records but each has MANY documents attached, so we're using Tika to parse those documents into a big long string, which we use in a call to solrj.addField(relateddoccontents,

Re: Can't find resource 'solrconfig.xml'

2011-11-01 Thread Chris Hostetter
rather then mucking with system properties, i find that using JNDI is the easiest and cleanest way to configure solr home with tomcat. https://wiki.apache.org/solr/SolrTomcat#Configuring_Solr_Home_with_JNDI ...those instructions are fairly simple, and will work on both windows and linux (just

index enum

2011-11-01 Thread Radha Krishna Reddy
Hi, I have 2 issues. 1. I have an enum column in my sql table.i want to index that column.which fieldtype should i specify in the schema.xml for enum? 2. Normally we can index one column in a table using the column header as entity name and the column data as value of the entity.Can i index 2

Re: Find Documents with field = maxValue

2011-11-01 Thread Chris Hostetter
: What I'm looking for is to do everything in single shot in Solr. : I'm not even sure if it's possible or not. : Finding the max value and then running another query is NOT my ideal : solution. stats component to determine the max value, and a second query to search for docs containing that

Re: Questions about Solr's security

2011-11-01 Thread Alireza Salimi
Thanks Robert, But do you also think limiting the page size inside a request handler is a good solution for attackers? Honestly, I'm not sure if it's a good solution, that doesn't save a server from attackers at all. Do you agree with me? We are not security experts, just developers, but any

Re: Questions about Solr's security

2011-11-01 Thread Chris Hostetter
: I was wondering if it's a good idea to expose Solr to the outside world, : so that our clients running on smart phones will be able to use Solr. As a general rule of thumb, i would say that it is not a good idea to expose solr directly to the public internet. there are exceptions to this

Re: Limit by score? sort by other field

2011-11-01 Thread Chris Hostetter
: Sounds like a custom sorting collector would work - one that throws away : docs with less than some minimum score, so that it only collects/sorts did you look at the example query Karsten mentioned (and also discussedin the linked thread) there is no need for a custom collector to do this,

Re: Selective Result Grouping

2011-11-01 Thread entdeveloper
Martijn v Groningen-2 wrote: When using the group.field option values must be the same otherwise they don't get grouped together. Maybe fuzzy grouping would be nice. Grouping videos and images based on mimetype should be easy, right? Videos have a mimetype that start with video/ and images

Re: Questions about Solr's security

2011-11-01 Thread Alireza Salimi
What if we just expose '/select' paths - by firewalls and load balancers - and also use SSL and HTTP basic or digest access control? On Tue, Nov 1, 2011 at 2:20 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I was wondering if it's a good idea to expose Solr to the outside world, : so

Re: Questions about Solr's security

2011-11-01 Thread Erik Hatcher
Be aware that even /select could have some harmful effects, see https://issues.apache.org/jira/browse/SOLR-2854 (addressed on trunk). Even disregarding that issue, /select is a potential gateway to any request handler defined via /select?qt=/req_handler Again, in general it's not a good idea

Re: Questions about Solr's security

2011-11-01 Thread Walter Underwood
I once had to deal with a severe performance problem caused by a bot that was requesting results starting at 5000. We disallowed requests over a certain number of pages in the front end to fix it. wunder On Nov 1, 2011, at 12:57 PM, Erik Hatcher wrote: Be aware that even /select could have

Re: Questions about Solr's security

2011-11-01 Thread Alireza Salimi
I'm not sure if anybody has asked these questions before or not. Sorry if they are duplicates. The problem is that the clients (smart phones) of our Solr machines are outside the network in which solr machines are located. So, we need to somehow expose their service to the outside word. What's

Re: LocalParams, bq, and highlighting

2011-11-01 Thread Demian Katz
This is definitely an interesting case that i don't think anyone ever really considered before. It seems like a strong argument in favor of adding an hl.q param that the HighlightingComponent would use as an override for whatever the QueryComponent thinks the highlighting query should be,

Re: Questions about Solr's security

2011-11-01 Thread Alireza Salimi
sorry, I didn't explain that part. We are the developers of client codes too. Meaning that just we know the credentials to access the web container, and we won't run such queries. Right now, I'm writing a subclass of SearchHandler which changes the SolrParams to remove 'qt' parameter and limit

Re: Usage of Double quotes for single terms (camelcase) while querying

2011-11-01 Thread Chris Hostetter
: Subject: Usage of Double quotes for single terms (camelcase) while querying : References: a5f2f6ef-d601-432f-a49f-4ec23578d...@mac.com : camjgjxrykbdxckk4yutpcz8e-8bf+v4qrbsn_yc+b6hvfwg...@mail.gmail.com : 6640582f-568a-4402-8ce7-bb6d8c9fc...@mac.com :

Re: Questions about Solr's security

2011-11-01 Thread Robert Stewart
I think you can address a lot of these concerns by running some proxy in front of SOLR, such as HAProxy. You should be able to limit only certain URIs (so you can prevent /select queries).HAProxy is a free software load-balancer, and it is very configurable and fairly easy to setup. On

Re: Questions about Solr's security

2011-11-01 Thread Alireza Salimi
Yeah, actually our firewalls/loadbalancers can handle these issues. If they don't, then I'll use HAProxy. Thanks for all info :-) On Tue, Nov 1, 2011 at 5:42 PM, Robert Stewart bstewart...@gmail.comwrote: I think you can address a lot of these concerns by running some proxy in front of SOLR,

Re: edismax/boost: certain documents should be last

2011-11-01 Thread Chris Hostetter
: For the record, I figured out something that will work, although it is : somewhat inelegant. My q parameter is now: : : (+content:notes -genre:Citation)^20 (+content:notes genre:Citation)^0.01 : : Can I improve on that? not really (although you can probably get cleaner seperate of query and

Re: Replicating Large Indexes

2011-11-01 Thread Chris Hostetter
: We optimize less frequently than we used to. Down to twice a month from once a day. : : Without optimizing the search speed stays the same, however the index size increases to 70+ GB. : : Perhaps there is a different way to restrict disk usage. Consider using the maxSegments option on

Re: How to use an External Database for Fields?

2011-11-01 Thread Chris Hostetter
: I don't think I'm quite getting this. Instead of going down that low, : could you make your own ResponseWriter? That has access to all : the information in the doc, and it seems like you could reach out to : the DB at that point and get your information merrily adding it to the : docs. Agreed.

Re: Uncomplete date expressions

2011-11-01 Thread Chris Hostetter
: But Solr is (intentionally) stupid about dates, and : requires the (almost) full date format. There are I'm not sure how i feel about intentionally stupid ... but the underlying sentiment is correct: Solr requires clients to be *VERY* explicit about dates, because that way the client is in

Re: multiple dateranges/timeslots per doc: modeling openinghours.

2011-11-01 Thread Chris Hostetter
: This would need 2*3*100 = 600 dynamicfields to cover the openinghours. You : mention this is peanuts for constructing a booleanquery, but how about : memory consumption? : I'm particularly concerned about the Lucene FieldCache getting populated for : each of the 600 fields. (Since I had some

Re: Newbie question

2011-11-01 Thread Chris Hostetter
: If using CommonsHttpSolrServer query() method with parameter wt=json, when : retrieving QueryResponse, how to do to get JSON result output stream ? when you are using the CommonsHttpSolrServer level of API, the client takes care of parsing the response (which is typically in an efficient

Re: Lucene queries to Solr requestHandler

2011-11-01 Thread Chris Hostetter
: I have these queries in Lucene 2.9.4, is there a way to convert these : exactly to Solr 3.4 but using only the solrconfig.xml? I will figure out the : queries but I wanted to know if it is even possible to go from here to : having something like this: : : requestHandler name=/custom

Re: Lucene queries to Solr requestHandler

2011-11-01 Thread Chris Hostetter
Grrr cut/paste mistake. This... : public class FieldQParserPlugin extends QParserPlugin { ...should have been something like... public class MyQParserPlugin extends QParserPlugin { ...to match the configuration example... queryParser name=customQP

Re: Can Solr handle large text files?

2011-11-01 Thread Peter Spam
Wow, 50 lines is tiny! Is that how small you need to go, to get good highlighting performance? I'm looking at documents that can be up to 800MB in size, so I've decided to split them down into 256k chunks. I'm still indexing right now - I'm curious to see how performance is when the

Re: Can Solr handle large text files?

2011-11-01 Thread Peter Spam
Oh by the way - what analyzer are you using for your log files? Here's what I'm trying: fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter

Re: change solr url

2011-11-01 Thread Chris Hostetter
: Is it possible to change the url for solr admin?? : What i want is : : http://192.168.0.89:8983/solr/private/coreName/admin : : i want to add /private/ before the coreName. Is that possible? If yes how? You can either do this via settings in your servlet container (to specify that hte

Solr real-time update taking time

2011-11-01 Thread vijay.sampath
Hi All, I recently started working on SOLR 3.3 and would need your expertise to provide a solution. I'm working on a POC, in which I've imported 3.5 million document records using DIH. We have a source system which publishes change data capture in a XML format. The requirement is to integrate

RE: large scale indexing issues / single threaded bottleneck

2011-11-01 Thread Awasthi, Shishir
Roman, How frequently do you update your index? I have a need to do real time add/delete to SOLR documents at a rate of approximately 20/min. The total number of documents are in the range of 4 million. Will there be any performance issues? Thanks, Shishir -Original Message- From: Roman

Re: change solr url

2011-11-01 Thread Ankita Patil
I am not very clear. Could you explain a bit in detail or give an example. Ankita. On 2 November 2011 06:26, Chris Hostetter hossman_luc...@fucit.org wrote: : Is it possible to change the url for solr admin?? : What i want is : : http://192.168.0.89:8983/solr/private/coreName/admin : : i

RE: large scale indexing issues / single threaded bottleneck

2011-11-01 Thread Roman Alekseenkov
We have a rate of 2K small docs/sec which translates into 90 GB/day of index space You should be fine Roman Awasthi, Shishir wrote: Roman, How frequently do you update your index? I have a need to do real time add/delete to SOLR documents at a rate of approximately 20/min. The total