Re: Solr - Tika(?) memory leak

2012-01-17 Thread Otis Gospodnetic
You'll need to reindex everything indeed. Otis  Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html > > From: Wayne W >To: solr-user@lucene.apache.org >Sent: Tuesday, January 17, 2012 12:36 AM >Subject: Re

Re: Can Apache Solr Handle TeraByte Large Data

2012-01-17 Thread Otis Gospodnetic
Could indexing English Wikipedia dump over and over get you there? Otis  Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html > > From: Memory Makers >To: solr-user@lucene.apache.org >Sent: Tuesday, January

Re: DataImportHandler in Solr 4.0

2012-01-17 Thread Rob
Not a java pro, and the documentation hasn't been updated to include these instructions (at least that I could find). What do I need to do to perform the steps that Alexandre is talking about? -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-in-Solr-4-0-tp2563

RE: Question on Reverse Indexing

2012-01-17 Thread Shyam Bhaskaran
Hi Francois, I understand that disabling of ReversedWildcardFilterFactory has improved the performance. But I am puzzled over how the leading wild card search like *lock is working even though I have now disabled the ReversedWildcardFilterFactory and the indexes have been created without Rever

Re: Question on Reverse Indexing

2012-01-17 Thread François Schiettecatte
Using ReversedWildcardFilterFactory will double the size of your dictionary (more or less), maybe the drop in performance that you are seeing is a result of that? François On Jan 17, 2012, at 9:01 PM, Shyam Bhaskaran wrote: > Hi, > > For reverse indexing we are using the ReversedWildcardFilte

Question on Reverse Indexing

2012-01-17 Thread Shyam Bhaskaran
Hi, For reverse indexing we are using the ReversedWildcardFilterFactory on Solr 4.0 ReversedWildcardFilterFactory was helping us to perform leading wild card searches like *lock. But it was observed that the performance of the searches was not good after introducing ReversedWildcardFilterF

Re: Highlighting "text" field when query is for "string" field

2012-01-17 Thread solrdude
Just to be clear, I do phrase query on string field like q=keyword_text:"smooth skin". I am expecting highlighting to be done on excerpt field. What I see is: These numbers are unique id's of documents. Where is excerpts with highlighted text? Any idea? Thanks -- View this message in conte

Re: Sorting results within the fields

2012-01-17 Thread aronitin
Hi Jan, Thanks for the reply. Here is the concrete explanation of the problem that I'm trying to solve. *SOLR Schema* Here is the definition of the SOLR schema *There are 3 dynamic fields* *There are 4 searchable fields* *Description*: Data in this field is Whitespace Tokenized,

Re: Trying to understand SOLR memory requirements

2012-01-17 Thread Lance Norskog
Which version of Solr do you use? 3.1 and 3.2 had a memory leak bug in spellchecking. This was fixed in 3.3. On Tue, Jan 17, 2012 at 5:59 AM, Robert Muir wrote: > I committed it already: so you can try out branch_3x if you want. > > you can either wait for a nightly build or compile from svn > (h

Re: Solr Cloud Indexing

2012-01-17 Thread Lance Norskog
Cloud upload bandwidth is free, but download bandwidth costs money. If you upload a lot of data but do not query it often, Amazon can make sense. You can also rent much cheaper hardware in other hosting services where you pay by the month or even by the year. If you know you have a cap on how much

Re: Facet auto-suggest

2012-01-17 Thread Jan Høydahl
Hi, Sure, you can use filters and facets for this. Start a query with ...&facet.field=source&facet.field=topics&facet.field=type When you click a "button", you set the corresponding filter (fq=source:people), and the new query will return the same facets with new counts. In the Audi example, yo

Facet auto-suggest

2012-01-17 Thread Jon Drukman
I don't even know what to call this feature. Here's a website that shows the problem: http://pulse.audiusanews.com/pulse/index.php Notice that you can end up in a situation where there are no results. For example, in order, press: People, Performance, Technology, Photos. The client wants it so th

Re: first time query is very slow

2012-01-17 Thread Yonik Seeley
On Tue, Jan 17, 2012 at 9:39 AM, gabriel shen wrote: > For those customers who unluckily send un-prewarmed query, they will suffer > from bad response time, it is not too pleasant anyway. The "warming caches" part isn't about unique queries, but more about caches used for sorting and faceting (an

Re: Sorting results within the fields

2012-01-17 Thread Jan Høydahl
Hi, Complex problems like this is much better explained with concrete examples than generalized text. Please create a real example with real documents and their content, along with real queries. You don't explain what "the score value which is generate by my application" is - which application

Re: really slow performance when trying to get facet.field

2012-01-17 Thread Daniel Bruegge
Ok, I have now changed the static warming in the solrconfig.xml using first- and newSearcher. "Content" is my field to facet on. Now the commits take longer, which is OK for me, but the searches are really faster right now. I also reduced the number of documents on my shards to 15mio/shard. So the

Re: Function in facet.query like min,max

2012-01-17 Thread Eric Grobler
Hi Erick Thanks for your feedback. I will try it tomorrow - if it works it will be perfect for my needs. Have a nice day Ericz On Tue, Jan 17, 2012 at 4:28 PM, Erick Erickson wrote: > I don't believe that's the case, have you tried it? From the page > I referenced: > > "The stats component retu

Re: Sorting results within the fields

2012-01-17 Thread aronitin
It's been almost a week and there is no response to the question that I asked. Is the question has less details or there is no way to achieve the same in Lucene? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-results-within-the-fields-tp3656049p3666983.html Sent f

How to return the distance geo distance on solr 3.5 with bbox filtering

2012-01-17 Thread Maxim Veksler
Hello, I'm querying with bbox which should be faster then geodist, my queries are looking like this: http://localhost:8983/solr/select?indent=true&fq={!bbox}&sfield=loc&pt=39.738548,-73.130322&d=100&sort=geodist()%20asc&q=trafficRouteId:235 the trouble is, that with bbox solr does not return the

Re: PositionIncrementGap inside a field

2012-01-17 Thread marotosg
Hi Erik, what I'm trying to achieve here is trying to verify if we can run a query like this: "\""IBM Ltd"~15\" \""Dublin Ireland"~15\""~100 on a field where the gaps are like this: IBM Ireland Ltd *gap of 30* Dublin USA *gap of 300* IBM Ltd *gap of 30* Dublin

Re: PositionIncrementGap inside a field

2012-01-17 Thread Erick Erickson
Hmmm, no I don't know how to do that out of the box. Two things: 1> why do you want to do this? Perhaps if you describe the high-level problem you're trying to solve there might be other ways to approach it. 2> I *think* you could write your own Tokenizer that recognized the special

Re: Function in facet.query like min,max

2012-01-17 Thread Erick Erickson
I don't believe that's the case, have you tried it? From the page I referenced: "The stats component returns simple statistics for indexed numeric fields within the DocSet." And running a very quick test on the example data, I get different results when I used *:* and name:maxtor. That said, I'm

Re: How can I index this?

2012-01-17 Thread Erick Erickson
Well, if you can make an HTTP request, you can parse the return and stuff it into a SolrInputDocument in SolrJ and then send it to Solr. At least that seems possible if I'm understanding your setup. There are other Solr clients that allow similar processes, but the Java version is the one I know be

Re: PositionIncrementGap inside a field

2012-01-17 Thread marotosg
Hi Erick. Thanks for your asnwer. This is almost what i want to do but my problem is that i want to be able to introduce two different sizes of gaps. Something like IBM Corporation some information *gap of 30* more information *gap of 100* IBM Limited more info *ga

Re: Function in facet.query like min,max

2012-01-17 Thread Eric Grobler
Yes, I have, but unfortunately it works on the whole index and not for a particular query. On Tue, Jan 17, 2012 at 3:37 PM, Erick Erickson wrote: > have you seen the Stats component? See: > http://wiki.apache.org/solr/StatsComponent > > Best > Erick > > On Tue, Jan 17, 2012 at 8:34 AM, Eric Grob

Re: How can I index this?

2012-01-17 Thread ahammad
Perhaps I was a little confusing... Normally when I have DB access, I do a regular indexing process using DIH. For these two sources, I do not have direct DB access. I can only view the two sources like any end-user would. I do have a java class that can get the information that I need. That clas

Re: How can I index this?

2012-01-17 Thread Erick Erickson
This sounds like, for the database source, that using SolrJ would be the way to go. Assuming you can access the database from Java this is pretty easy. As for the website, Nutch is certainly an option... But I'm a little puzzled. You mention a website, and sharepoint as your sources, then ask abo

Re: Function in facet.query like min,max

2012-01-17 Thread Erick Erickson
have you seen the Stats component? See: http://wiki.apache.org/solr/StatsComponent Best Erick On Tue, Jan 17, 2012 at 8:34 AM, Eric Grobler wrote: > Hi Solr community, > > Is it possible to return the lowest, highest and average price of a search > result using facets? > I tried something like:

Re: Solr Cloud Indexing

2012-01-17 Thread Erick Erickson
This only really makes sense if you don't have enough in-house resources to do your indexing locally, but it certainly is possible. Amazon's EC2 has been used, but really any hosting service should do. Best Erick On Tue, Jan 17, 2012 at 12:09 AM, Sujatha Arun wrote: > Would it make sense to  In

Re: PositionIncrementGap inside a field

2012-01-17 Thread Erick Erickson
This is just adding the field repeatedly, something like IBM Corporation some information IBM limited more info multiValued="true"/> > > >   >       IBM Corporation some information *"here a gap"* more information >   >   >      IBM Limited more info "here a gap" and some more data >

Re: SolrJ Embedded

2012-01-17 Thread Erick Erickson
Quantify slower, does it matter? At issue is that usually Solr spends far more time doing the search than transmitting the query and response over HTTP. Http is not really slow *as a protocol* in the first place. The usual place people have problems here is when there are a bunch of requests made

Re: first time query is very slow

2012-01-17 Thread gabriel shen
Thanks darren, I understand it will take longer time before warming up. What I am trying to find out is at the situation where we have no cache, why it will take so long time to complete the query, and what is the bottleneck? Fx, if I remove all qf, pf fields, the query speed will improve dramati

PositionIncrementGap inside a field

2012-01-17 Thread marotosg
Hi. At the moment I have a multivalued field where i would like to add information with gaps at the end of every line in the multivalued field and I would like to add gaps as well in the middle of the lines. For instance IBM Corporation some information *"here a gap"* more informati

PositionIncrementGap inside a field

2012-01-17 Thread marotosg
Hi. At the moment I have a multivalued field where i would like to add information with gaps at the end of every line in the multivalued field and I would like to add gaps as well in the middle of the lines. For instance IBM Corporation some information *"here a gap"* more informati

Re: first time query is very slow

2012-01-17 Thread darren
First query will cause the index caches to be warmed up and this is why the first query takes some time. You can prewarm the caches with a query (when solr starts up) of your choosing in the config file. Google around the SolrWiki on cache/index warming. hth > hi, > > I had an solr3.3 index of 2

[Job] Sales Engineer at Lucid Imagination

2012-01-17 Thread Grant Ingersoll
Hi Solr Users, Lucid Imagination is looking for a sales engineer. If you know search, Solr and like working with customers, the sales engineer job may be of interest to you. I've included the job description below. If you are interested, please send your resume (off-list) to melissa.qu...@lu

Re: Trying to understand SOLR memory requirements

2012-01-17 Thread Robert Muir
I committed it already: so you can try out branch_3x if you want. you can either wait for a nightly build or compile from svn (http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). On Tue, Jan 17, 2012 at 8:35 AM, Dave wrote: > Thank you Robert, I'd appreciate that. Any idea how long

first time query is very slow

2012-01-17 Thread gabriel shen
hi, I had an solr3.3 index of 200,000 documents, all text are stored and the total index size is 27gb. I used dismax query with over 10 qf and pf boosting field each, plus sorting on score and other 2 fields. It took quite a few seconds(5-8) for the first time query to return any result(no highlig

How can I index this?

2012-01-17 Thread ahammad
Hello, I am looking into indexing two data sources. One of those is a standard website and the other is a Sharepoint site. The problem is that I have no direct database access. Normally I would just use the DIH and get what I need from the DB. I do have a java DAO (data access object) class that I

Re: really slow performance when trying to get facet.field

2012-01-17 Thread Daniel Bruegge
Evictions are 0 for all cache types. Your server max heap space with 12G is pretty huge. Which is good I think. The CPU on my server is a 8-Core Intel i7 965. Commit frequency is low, because shards are added and old shards exist for historical reasons. Old shards will be then cleaned after coupl

Re: Trying to understand SOLR memory requirements

2012-01-17 Thread Dave
Thank you Robert, I'd appreciate that. Any idea how long it will take to get a fix? Would I be better switching to trunk? Is trunk stable enough for someone who's very much a SOLR novice? Thanks, Dave On Mon, Jan 16, 2012 at 10:08 PM, Robert Muir wrote: > looks like https://issues.apache.org/ji

Function in facet.query like min,max

2012-01-17 Thread Eric Grobler
Hi Solr community, Is it possible to return the lowest, highest and average price of a search result using facets? I tried something like: facet.query={!max(price,0)} Is it possible and what is the correct syntax? q=htc android facet=true facet.query=price:[* TO 10] facet.query=price:[11 TO 100]

Re: really slow performance when trying to get facet.field

2012-01-17 Thread Daniel Bruegge
Evictions are 0 for all cache types. Your server max heap space with 12G is pretty huge. Which is good I think. The CPU on my server is a 8-Core Intel i7 965. Commit frequency is low, because shards are added and old shards exist for historical reasons. Old shards will be then cleaned after coupl

Re: really slow performance when trying to get facet.field

2012-01-17 Thread Dmitry Kan
Hi Daniel, My index is 6,5G. I'm sure it can be bigger. facet.limit we ask for is beyond 100 thousand. It is sub-second speed. I run it with -Xms1024m -Xmx12000m under tomcat, it currently takes 5,4G of RAM. Amount of docs is over 6,5 million. Do you see any evictions in your caches? What kind of

Re: really slow performance when trying to get facet.field

2012-01-17 Thread Daniel Bruegge
Hi Dmitry, I had everything on one Solr Instance before, but this got to heavy and I had the same issue here, that the 1st facet.query was really slow. When querying the facet: - facet.limit = 100 Cache settings are like this: How big was your index? Did it fit into the RAM wh

Re: Solr - Tomcat new versions

2012-01-17 Thread Erik Hatcher
Perhaps this the known issue with the 3.5 example schema being used in Tomcat and the VelocityResponseWriter issue? I'm on my mobile now so don't have easy access to a pointer with details but check the archives if this seems to be the issue on how to resolve it. Erik On Jan 17, 2012, a

Re: really slow performance when trying to get facet.field

2012-01-17 Thread Dmitry Kan
I had a similar problem for a similar task. And in my case merging the results from two shards turned out to be a culprit. If you can logically store your data just in one shard, your faceting should become faster. Size wise it should not be a problem for SOLR. Also, you didn't say anything about

really slow performance when trying to get facet.field

2012-01-17 Thread Daniel Bruegge
Hi, I have 2 Solr-shards. One is filled with approx. 25mio documents (local index 6GB), the other with 10mio documents (2.7GB size). I am trying to create some kind of 'word cloud' to see the frequency of words for a *text_general *field. For this I am currently using a facet over this field and I

Re: Solr - Tomcat new versions

2012-01-17 Thread Luca Cavanna
Hi Alessio, in order to help you, we'd need to know something more about what's going wrong. Could you give us a stacktrace or an error you're reading? How do you know solr isn't working? Thanks Luca On Tue, Jan 17, 2012 at 10:52 AM, Alessio Crisantemi < alessio.crisant...@gioconews.it> wrote: >

Re: FacetComponent: suppress original query

2012-01-17 Thread Dmitry Kan
Yes, that's what I have started to use already. Probably, this is the easiest solution. Thanks. On Tue, Jan 17, 2012 at 3:03 AM, Erick Erickson wrote: > Why not just up the maxBooleanClauses parameter in solrconfig.xml? > > Best > Erick > > On Sat, Jan 14, 2012 at 1:41 PM, Dmitry Kan wrote: > >

Re: Solr - Tomcat new versions

2012-01-17 Thread Alessio Crisantemi
Dear Luca, I follow the Solr installation procedures signed on Official guide, but with Solr 3,5 don't works. While with solr 1.4.1 it's all right. I don't know why...but now I work with Solr 1.4.1 and more: I would install TIKA 1.0 on Solr 1.4.1. Is possible? How can i do? can you help me? be

Re: Solr - Tomcat new versions

2012-01-17 Thread Luca Cavanna
Hi Alessio, I've seen Solr 3.5 running within Tomcat 7.0.23, it shouldn't be a bug I guess. Could you please provide some more details about the problem you have? Do you have a stacktrace? Are you upgrading an existing Solr 1.4.1, right? By the way, which jdk are you using? Thanks Luca On Tue, Ja

Re: Query regarding solr custom sort order

2012-01-17 Thread umaswayam
Hi, Let me clarify the situation here in details. The default sort which Websphere commerce provide is based on name & price of any item. but we are having unique values of every item. hence sorting goes on fine either as intger or as string but while preprocess we generate some temporary tables

Re: SolrJ Embedded

2012-01-17 Thread Maxim Veksler
On Tue, Jan 17, 2012 at 3:13 AM, Erick Erickson wrote: > I don't see why not. I'm assuming a *nix system here so when Solr > updated an index, any deleted files would hang around. > > But I have to ask why bother with the Embedded server in the > first place? You already have a Solr instance up an

Re: Solr - Tomcat new versions

2012-01-17 Thread Alessio Crisantemi
Hi, I installed Apache tomct on Windows (Vista) and Solr. But I have any problem between Tomcat 7.0.23 and Solr 3.5 No problem if I install Solr 1.4.1 with the same version of Tomcat. (I check it with binary and source code installation for omcat but the result is the same). It's a bug, I think