Personalized Search

2010-05-20 Thread Rih
Has anybody done personalized search with Solr? I'm thinking of including fields such as bought or like per member/visitor via dynamic fields to a product search schema. Another option is to have a multi-value field that can contain user IDs. What are the possible performance issues with this

RE: how to achieve filters

2010-05-20 Thread Doddamani, Prakash
Hi All I am getting Error in Solr Error loading class 'Solr.TrieField' I have added following in Types of schema file fieldType name=tint class=solr.TrieField omitNorms=true / And in custom fields of schema have added field name=bitrate type=tint indexed=true stored=true / I am using

Re: Personalized Search

2010-05-20 Thread findbestopensource
Hi Rih, You going to include either of the two field bought or like to per member/visitor OR a unique field per member / visitor? If it's one or two common fields are included then there will not be any impact in performance. If you want to include unique field then you need to consider multi

Re: Personalized Search

2010-05-20 Thread dc tech
Another approach would be to do query time boosts of 'my' items under the assumption that count is limited: - keep the SOLR index independent of bought/like - have a db table with user prefs on a per item basis - at query time, specify boosts for 'my items' items We are planning to do this in the

RE: how to achieve filters

2010-05-20 Thread Ahmet Arslan
I am getting Error in Solr Error loading class 'Solr.TrieField' I have added following in Types of schema file fieldType name=tint class=solr.TrieField omitNorms=true / And in custom fields of schema have added field name=bitrate type=tint indexed=true stored=true /     I am

RE: how to achieve filters

2010-05-20 Thread Doddamani, Prakash
Hey Ahmet I have added field name=bitrate type=sint indexed=true stored=true default=0/ And the request I am passing is /solr/select?indent=onversion=2.2q=rockfq={!field%20f=content}mp3fq:bitrate:[* TO 127] start=0rows=10fl=*%2Cscoreqt=dismaxwt=standardexplainOther=hl.fl Still I am seeing

RE: how to achieve filters

2010-05-20 Thread Ahmet Arslan
And the request I am passing is /solr/select?indent=onversion=2.2q=rockfq={!field%20f=content}mp3fq:bitrate:[* TO 127] start=0rows=10fl=*%2Cscoreqt=dismaxwt=standardexplainOther=hl.fl Still I am seeing documents above bitarate 127 There is a typo instead of fq: there should be fq=

RE: how to achieve filters

2010-05-20 Thread Doddamani, Prakash
Oops my bad, Thanks much -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, May 20, 2010 4:31 PM To: solr-user@lucene.apache.org Subject: RE: how to achieve filters And the request I am passing is

Statistics exposed as JSON

2010-05-20 Thread Andreas Jung
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Are the Solr 1.4 statistics like #docs, #docsPending etc. exposed in JSON format? Andreas -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

solr caches from external caching system like memcached

2010-05-20 Thread bharath venkatesh
Hi, Is it possible to use solr caches such as query cache , filter cache and document cache from external caching system like memcached as it has several advantages such as centralized caching system and reducing the pause time of JVM 's garbage collection as we can assign less

Re: index merge

2010-05-20 Thread uma m
Hi All, The problem is resolved. It is purely due to filesystem. My filesystem is of 32-bit, running on 64 bit OS. I changed to 64 bit filesystem and all works as expected. Uma -- View this message in context: http://lucene.472066.n3.nabble.com/index-merge-tp472904p832053.html Sent from

Re: Personalized Search

2010-05-20 Thread MitchK
Hi dc, - at query time, specify boosts for 'my items' items Do you mean something like document-boost or do you want to include something like OR myItemId:100^100 ? Can you tell us how you would specify document-boostings at query-time? Or are you querying something like a boolean field

Re: Personalized Search

2010-05-20 Thread Ken Krugler
On May 19, 2010, at 11:43pm, Rih wrote: Has anybody done personalized search with Solr? I'm thinking of including fields such as bought or like per member/visitor via dynamic fields to a product search schema. Another option is to have a multi-value field that can contain user IDs. What

Machine utilization while indexing

2010-05-20 Thread Thijs
Hi. I have a question about how I can get solr to index quicker then it does at the moment. I have to index (and re-index) some 3-5 million documents. These documents are preprocessed by a java application that effectively combines multiple database tables with each-other to form the

seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Hey everyone, I've recently been given a requirement that is giving me some trouble. I need to retrieve up to 100 documents, but I can't see a way to do it without making 100 different queries. My schema has a multi-valued field like 'listOfIds'. Each document has between 0 and N of these ids

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
How about throwing a blockingqueue, http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueue.html, between your document-creator and solrserver? Give it a size of 10,000 or something, with one thread trying to feed it, and one thread waiting for it to get near full then

Solr Delta Queries

2010-05-20 Thread Vladimir Sutskever
I have a indexed_timestamp field in my index - which lets me know when document was indexed: field name=indexed_timestamp type=date indexed=true stored=true default=NOW multiValued=false/ For some reason when doing delta indexing via DIH, this field is not being updated. Are timestamp

RE: Machine utilization while indexing

2010-05-20 Thread Dennis Gearon
It takes that long to do indexing? I'm HOPING to have a site that has low 10's of millions of documents to billions. Sounds to me like I will DEFINITELY need a cloud account at indexing time. For the original author of this thread, that's what I'd recommend. 1/ Optimize as best as you can on

Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Tim Gilbert
Hi guys/gals, I am using apache-solr-1.4.0.war deployed to glassfishv3 on my development machine which is Ubuntu 9.10 64-bit. I am using Solrj 1.4 using the CommonsHttpSolrServer connection to that Solr instance (http://localhost:8080/apache-solr-1.4.0) during my development. To simplify

Re: Machine utilization while indexing

2010-05-20 Thread Thijs
I already have a blockingqueue in place (that's my custom queue) and luckily I'm indexing faster then what your doing.Currently it takes about 2hour to index the 5m documents I'm talking about. But I still feel as if my machine is under utilized. Thijs On 20-5-2010 17:16, Nagelberg, Kallin

Re: Machine utilization while indexing

2010-05-20 Thread Thijs
Why would I need faster hardware if my current hardware isn't reaching it's max capacity? I'm already using a different machine for querying and indexing so while indexing the queries aren't affected. Pulling an optimized snapshot isn't even noticeable on the query-machines. Thijs On

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
Well to be fair I'm indexing on a modest virtualized machine with only 2 gigs ram, and a doc size of 5-10k maybe substantially larger than what you have. They could be substantially smaller too. As another point of reference my index ends up being about 20Gigs with the 5 million docs. I

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
You're sure it's not blocking on indexing IO? If not then I guess it must be a thread waiting unnecessarily in solr or your loading program. To get my loader running at full speed I hooked it up to jprofiler's thread views to see where the stalls were and optimized from there. -Kallin

RE: Machine utilization while indexing

2010-05-20 Thread Dennis Gearon
Here is a good article from IBM, with code, on how to do hybrid/cloud computing. http://www.ibm.com/developerworks/library/x-cloudpt1/ Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at

Re: Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Ahmet Arslan
In my SolrJ using application, I have a test case which queries for “numéro” and succeeds if I use Embedded and fails if I use CommonsHttpSolrServer… I don’t want to use embedded for a number of reasons including that its not recommended (http://wiki.apache.org/solr/EmbeddedSolr)   I am sorry

Re: seemingly impossible query

2010-05-20 Thread darren
Ok. I think I understand. What's impossible about this? If you have a single field name called id that is multivalued then you can retrieved the documents with something like: id:1 OR id:2 OR id:56 ... id:100 then add limit 100. There's probably a more succinct way to do this, but I'll leave

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Thanks Darren, The problem with that is that it may not return one document per id, which is what I need. IE, I could give 100 ids in that OR query and retrieve 100 documents, all containing just 1 of the IDs. -Kallin Nagelberg -Original Message- From: dar...@ontrenet.com

RE: seemingly impossible query

2010-05-20 Thread darren
I see. Well, now you're asking Solr to ignore its prime directive of returning hits that match a query. Hehe. I'm not sure if Solr has a unique attribute. But this sounds, to me, like you will have to filter the results yourself. But at least you hit Solr only once before doing so. Good luck!

Re: Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Abdelhamid ABID
I had had the same issue within tomcat, further to what Ahmet wrote I recommend to plug a filter in your solr context that forces responses and requests to be encodded in UTF8 On Thu, May 20, 2010 at 5:11 PM, Ahmet Arslan iori...@yahoo.com wrote: In my SolrJ using application, I have a test

Re: seemingly impossible query

2010-05-20 Thread Geert-Jan Brits
Would each Id need to return a different doc? If not: you could probably use FieldCollapsing: http://wiki.apache.org/solr/FieldCollapsing http://wiki.apache.org/solr/FieldCollapsingi.e: - collapse on listOfIds (see wiki entry for syntax) - constrain the field to only return the id's you

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Yeah I need something like: (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that.. I'm not sure how I can hit solr once. If I do try and do them all in one big OR query then I'm probably not going to get a hit for each ID. I would need to request probably 1000 documents to

RE: seemingly impossible query

2010-05-20 Thread darren
The problem here, I think, is that you only want 1 of many _results_ for a particular ID. How would Solr know which result you want to keep? And which to throw away? However... You can do this in two queries if you want. Have a separate solr document with unique ID equal to the listOfIds value

Re: seemingly impossible query

2010-05-20 Thread Geert-Jan Brits
Hi Kallin, again please look at FieldCollapsinghttp://wiki.apache.org/solr/FieldCollapsing , that should do the trick. basically: first you constrain the field: 'listOfIds' to only contain docs that contain any of the (up to) 100 random ids as you know how to do Next, in the same query, specify

Solr highlighter and custom queries?

2010-05-20 Thread Daniel Shane
Hi all! I'm trying to do some simple highlighting, but I cannot seem to figure out how to make it work. I'm using my own QueryParser which generates custom made queries. I would like Solr to be able to highlight them. I've tried many options in the highlighter but cannot get any snippets to

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Thanks, I'm going to take a look at fieldcollapsingquery as it seems like it should do the trick! -Kallin Nagelberg -Original Message- From: Geert-Jan Brits [mailto:gbr...@gmail.com] Sent: Thursday, May 20, 2010 1:03 PM To: solr-user@lucene.apache.org Subject: Re: seemingly impossible

Debugging - DIH Delta Queries-

2010-05-20 Thread Vladimir Sutskever
Hi All, How can I see all of the queries sent to my DB during a Delta Import? It seems like my documents are not being updated via delta import When I use SOLR's DataIMport Handler Console - with delta-import selected I see lst name=entity:getall lst name=document#1/ /lst − lst

Re: Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Chris Hostetter
: I am using apache-solr-1.4.0.war deployed to glassfishv3 on my ... : INFO: [] webapp=/apache-solr-1.4.0 path=/select : params={indent=onversion=2.2q=numérofq=start=0rows=10fl=*,scoreqt=standardwt=standardexplainOther=hl.fl=} : hits=0 status=0 QTime=16 ... : In my SolrJ

Re: Machine utilization while indexing

2010-05-20 Thread Chris Hostetter
I'm really only guessing here, but based on your description of what you are doing it sounds like you only have one thread streaming documents to solr (via a single StreamingUpdateSolrServer instance which creates a single HTTP connection) Have you at all attempted to have parallel threads in

RE: Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Tim Gilbert
Chris, You are the best. Switching to POST solved the problem. I hadn't noticed that option earlier but after finding: https://issues.apache.org/jira/browse/SOLR-612 I found the option in the code. Thank you, you just made my day. Secondly, in an effort to narrow down whether this was a

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
StreamingUpdateSolrServer already has multiple threads and uses multiple connections under the covers. At least the api says ' Uses an internal MultiThreadedHttpConnectionManager to manage http connections'. The constructor allows you to specify the number of threads used,

RE: Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Chris Hostetter
: Starting with glassfishv3 (I think) UTF-8 is the default for URI. You : can see this by going to the admin site, clicking on Network Config | : Network Listeners | then select the listener. Select the tab HTTP and : about half way down, you will see URI Encoding: UTF-8. : : HOWEVER, that

Re: Subclassing DIH

2010-05-20 Thread Chris Hostetter
: I am trying to subclass DIH to add I am having a hard time trying to get : access to the current Solr Context. How is this possible? I don't think DIH was particularly designed to be subclassed (i'm suprised it's not final) ... it was built with the assumption that people would write

RE: Machine utilization while indexing

2010-05-20 Thread Chris Hostetter
: StreamingUpdateSolrServer already has multiple threads and uses multiple : connections under the covers. At least the api says ' Uses an internal Hmmm... i think one of us missunderstands the point behind StreamingUpdateSolrServer and it's internal threads/queues. (it's very possible that

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Yeah this looks perfect. Too bad it's not in 1.4, I guess I can build from trunk and patch it. This is probably a stupid question but is there any feeling as to when 1.5 might come out? Thanks, -Kallin Nagelberg -Original Message- From: Geert-Jan Brits [mailto:gbr...@gmail.com] Sent:

Re: Subclassing DIH

2010-05-20 Thread Blargy
Ok to further explain myself. Well first off I was experience a StackOverFlow error during my delta-imports after doing a full-import. The strange thing was, it only happened sometimes. Thread is here:

RE: Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Tim Gilbert
I wanted to improve the documentation in the solr wiki by adding in my findings. However, when I try to log in and create a new account, I receive this error message: You are not allowed to do newaccount on this page. Login and try again. Does anyone know how I can get permission to add a page

Re: Solr highlighter and custom queries?

2010-05-20 Thread Daniel Shane
Actually, its not as much a Solr problem as a Lucene one, as it turns out, the WeightedSpanTermExtractor is in Lucene and not Solr. Why they decided to only highlight queries that are in Lucene I don't know, but what I did to solve this problem was simply to make my queries extends a Lucene

Endeca vs Solr?

2010-05-20 Thread kkieser
First of all, I'd like to apologize in advance for being a pretty raw newbie when it comes to search technologies, so please bear with me! The situation: My company has a system that moderates 15 character free form text fields. We have a dictionary of words in our database that are banned due

Re: Solr Shard - Strange results

2010-05-20 Thread TonyBray
I know this post is old but did you ever get a resolution to this problem? I am running into the exact same issue. I even switched my id from text to string and reindexed as that was the last suggestion and still no resolution. --Tony -- View this message in context:

Re: Solr Shard - Strange results

2010-05-20 Thread TonyBray
So are we the only ones who never got sharding working with multi-cores? Bummer... Hopefully someone else will chime in with an answer. --Tony -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Shard-Strange-results-tp496373p832863.html Sent from the Solr - User mailing

Re: Endeca vs Solr?

2010-05-20 Thread David Smiley (@MITRE.org)
Hello kkieser. I've used both and my name may of come up in your searches. For your system, I would definitely not use Endeca as its too complicated for the relatively simple needs that you have. You asked if there are technical differences and of course being two different systems, the

Re: Non-English query via Solr Example Admin corrupts text

2010-05-20 Thread Dennis Gearon
rant_by_HTTP_Verb_Nazi Using POST totally violates the access model for an entity in the HTTP Verb model. Basically: GET=READ POST=CREATE PUT=MODIFY DELETE=(drum roll please)DELETE Granted, the whole web uses POST for modify, but let's not make the situation worse by using it for everything.

Re: Endeca vs Solr?

2010-05-20 Thread kkieser
Thanks for your response David! At the moment we have over 40,000 words on our banned list, and only recently added the white list, so we anticipate this number to jump quite quickly. I've heard Solr can handle up to around 2 million records before slowing down so I'm not too worried about

Re: Endeca vs Solr?

2010-05-20 Thread David Smiley (@MITRE.org)
kkieser, It just occurred to me that Solr might actually fit the bill. Your scenario is definitely not present a use of Solr that is typical at all, but a novel use of Solr I am about to describe could totally get what you want. A Solr index is composed of documents which are typically similar

Re: jmx issue with solr

2010-05-20 Thread Lance Norskog
http://wiki.apache.org/solr/SolrJmx#Remote_Connection_to_Solr_JMX Ask the wiki! On Wed, May 19, 2010 at 6:19 AM, Na_D nabam...@zaloni.com wrote: Thanks for the info , using the above properties solved the issue . -- View this message in context:

How real-time are Soir/Lucene queries?

2010-05-20 Thread Thomas J. Buhr
Hello Soir, Soir looks like an excellent API and its nice to have a tutorial that makes it easy to discover the basics of what Soir does, I'm impressed. I can see plenty of potential uses of Soir/Lucene and I'm interested now in just how real-time the queries made to an index can be? For

Special Circumstances for embedded Solr

2010-05-20 Thread Ken Krugler
Hi all, We'd started using embedded Solr back in 2007, via a patched version of the in-progress 1.3 code base. I recently was reading http://wiki.apache.org/solr/EmbeddedSolr, and wondered about the paragraph that said: The simplest, safest, way to use Solr is via Solr's standard HTTP

Re: How real-time are Soir/Lucene queries?

2010-05-20 Thread Walter Underwood
Solr is a very good engine, but it is not real-time. You can turn off the caches and reduce the delays, but it is fundamentally not real-time. I work at MarkLogic, and we have a real-time transactional search engine (and respository). If you are curious, contact me directly. I do like Solr for