Re: Building documents using content residing both in database tables and text files

2009-08-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
isn't better to make a jar of PlaintextEntityProcessor and drop into solr.home/lib ? On Tue, Aug 11, 2009 at 11:05 PM, Sascha Szottsz...@zib.de wrote: Hi Noble, Noble Paul wrote: isn't it possible to do this by having two datasources (one Js=dbc and another File) and two entities . The

Re: DIH problem passing HTTP parameters into data-config

2009-08-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Aug 13, 2009 at 4:08 AM, Erik Hatchererik.hatc...@gmail.com wrote: My hunch, though I'll try to make some time to test this out thoroughly, is that the entity is parsed initially with variables resolved, but not per request.  Variables/expressions do get expanded for fields of course,

Re: DIH problem passing HTTP parameters into data-config

2009-08-13 Thread Shalin Shekhar Mangar
On Thu, Aug 13, 2009 at 3:08 AM, John Lowe jbl...@johnblowe.com wrote: Hmmm...perhaps my original note was a bit TLTR. Trying again: The v1.3 docs say that one can pass one's own parameters in to DIH via the HTTP request: DIH in Solr 1.3 had a bug due to which request parameters in

Re: facet performance tips

2009-08-13 Thread Jérôme Etévé
Thanks everyone for your advices. I increased my filterCache, and the faceting performances improved greatly. My faceted field can have at the moment ~4 different terms, so I did set a filterCache size of 5 and it works very well. However, I'm planning to increase the number of terms to

Questions about XPath in data import handler

2009-08-13 Thread Andrew Clegg
A couple of questions about the DIH XPath syntax... The docs say it supports: xpath=/a/b/subje...@qualifier='fullTitle'] xpath=/a/b/subject/@qualifier xpath=/a/b/c Does the second one mean select the value of the attribute called qualifier in the /a/b/subject element? e.g. For this

Re: Questions about XPath in data import handler

2009-08-13 Thread Andrew Clegg
Andrew Clegg wrote: subject qualifier=some text / Sorry, Nabble swallowed my XML example. That was supposed to be [a] [b] [subject qualifier=some text /] [/b] [/a] ... but in XML. Andrew. -- View this message in context:

Re: Questions about XPath in data import handler

2009-08-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Aug 13, 2009 at 6:35 PM, Andrew Cleggandrew.cl...@gmail.com wrote: A couple of questions about the DIH XPath syntax... The docs say it supports:   xpath=/a/b/subje...@qualifier='fullTitle']   xpath=/a/b/subject/@qualifier   xpath=/a/b/c Does the second one mean select the value

Re: Query with no cache without editing solrconfig?

2009-08-13 Thread Koji Sekiguchi
Jason Rutherglen wrote: Is there a way to do this via a URL? I think - no there isn't. Koji

Re: Distributed query returns time consumed by each Solr shard?

2009-08-13 Thread Grant Ingersoll
Not that I am aware of. I think there is a patch for timing out shards and returning partial results if a shard takes to long. I believe it is slated for 1.4, but it doesn't have any unit tests at the moment. On Aug 12, 2009, at 7:12 PM, Jason Rutherglen wrote: Is there a way to do

Re: Questions about XPath in data import handler

2009-08-13 Thread Andrew Clegg
Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: On Thu, Aug 13, 2009 at 6:35 PM, Andrew Cleggandrew.cl...@gmail.com wrote: Does the second one mean select the value of the attribute called qualifier in the /a/b/subject element? yes you are right. Isn't that the semantics of standard xpath

I think this is a bug

2009-08-13 Thread Paul Tomblin
I don't want to join yet another mailing list or register for JIRA, but I just noticed that the Javadocs for SolrInputDocument.addField(String name, Object value, float boost) is incredibly wrong - it looks like it was copied from a deleteAll method. -- http://www.linkedin.com/in/paultomblin

Re: I think this is a bug

2009-08-13 Thread Chris Male
Hi Paul, Yes the comment does look very wrong. I'll open a JIRA issue and include a fix. On Thu, Aug 13, 2009 at 4:43 PM, Paul Tomblin ptomb...@xcski.com wrote: I don't want to join yet another mailing list or register for JIRA, but I just noticed that the Javadocs for

Curl error 26 failed creating formpost data

2009-08-13 Thread Kevin Miller
I am trying to use the curl command located on the Extracting Request Handler on the Solr Wiki. I am using the command in the following way: curl http://echo12:8983/solr/update/extract?literal.id=doc1uprefix=attrmap .content=attr_contentcommit=true -F myfi...@../../BadNews.doc echo12 is the

RE: Using Lucene's payload in Solr

2009-08-13 Thread Ensdorf Ken
It looks like things have changed a bit since this subject was last brought up here. I see that there are support in Solr/Lucene for indexing payload data (DelimitedPayloadTokenFilterFactory and DelimitedPayloadTokenFilter). Overriding the Similarity class is straight forward. So

RE: [OT] Solr Webinar

2009-08-13 Thread Chenini, Mohamed
I also registered to attend but I am not going to because here at work a last minute meeting has been scheduled at the same time. Is it possible in the future to schedule such webinars starting 5-6 PM ET? Thanks, Mohamed -Original Message- From: Grant Ingersoll

Re: Using Lucene's payload in Solr

2009-08-13 Thread Bill Au
Thanks for the tip on BFTQ. I have been using a nightly build before that was committed. I have upgrade to the latest nightly build and will use that instead of BTQ. I got DelimitedPayloadTokenFilter to work and see that the terms and payload of the field are correct but the delimiter and

Boosting relevance as terms get nearer to each other

2009-08-13 Thread Michael _
Hello, I'd like to score documents higher that have the user's search terms nearer each other. For example, if a user searches for a AND b AND c the standard query handler should return all documents with [a] [b] and [c] in them, but documents matching the phrase a b c should get a boost over

Re: Using Lucene's payload in Solr

2009-08-13 Thread Grant Ingersoll
On Aug 13, 2009, at 11:58 AM, Bill Au wrote: Thanks for the tip on BFTQ. I have been using a nightly build before that was committed. I have upgrade to the latest nightly build and will use that instead of BTQ. I got DelimitedPayloadTokenFilter to work and see that the terms and

Issue with Collection Distribution

2009-08-13 Thread william pink
Hello, I am having a few problems with the snapinstaller/commit on the slave, I have a pull_from_master script which is the following #!/bin/bash cd /opt/solr/solr/bin -v ./snappuller -v -P 18983 ./snapinstaller -v I have been executing snapshooter manually on the master then running the above

Re: Solr 1.4 Clustering / mlt AS search?

2009-08-13 Thread Stanislaw Osinski
Hi, On Tue, Aug 11, 2009 at 22:19, Mark Bennett mbenn...@ideaeng.com wrote: Carrot2 has several pluggable algorithms to choose from, though I have no evidence that they're better than Lucene's. Where TF/IDF is sort of a one step algebraic calculation, some clustering algorithms use iterative

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
I took 1.4 from trunk three days ago, it seems Ok for production (at least for my Master instance which is doing writes-only). I use the same config files. 500 000 terms are Ok too; I am using several millions with pre-1.3 SOLR taken from trunk. However, do not try to facet (probably outdated

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
It seems BOBO-Browse is alternate faceting engine; would be interesting to compare performance with SOLR... Distributed? -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: August-12-09 6:12 PM To: solr-user@lucene.apache.org Subject: Re: facet

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
Interesting, it has BoboRequestHandler implements SolrRequestHandler - easy to try it; and shards support [Fuad Efendi] It seems BOBO-Browse is alternate faceting engine; would be interesting to compare performance with SOLR... Distributed? [Jason Rutherglen] For your fields with many terms

Re: Using Lucene's payload in Solr

2009-08-13 Thread Bill Au
I need to boost a field differently according to the content of the field. Here is an example: doc field name=nameSolr/field field name=category payload=3.0information retrieval/category field name=category payload=2.0webapp/category field name=category payload=2.0java/category field

Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
Yeah we need a performance comparison, I haven't had time to put one together. If/when I do I'll compare Bobo performance against Solr bitset intersection based facets, compare memory consumption. For near realtime Solr needs to cache and merge bitsets at the SegmentReader level, and Bobo needs

Re: Issue with Collection Distribution

2009-08-13 Thread Bill Au
Have you check the solr log on the slave to see if there was any commit done? It looks to me you are still using an older version of the commit script that is not compatible with the newer Solr response format. If thats' the case, the commit was actually performed. It is just that the script

RE: JVM Heap utilization Memory leaks with Solr

2009-08-13 Thread Fuad Efendi
Most OutOfMemoryException (if not 100%) happening with SOLR are because of http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/FieldCache. html - it is used internally in Lucene to cache Field value and document ID. My very long-term observations: SOLR can run without any problems

Re: Solr 1.4 Clustering / mlt AS search?

2009-08-13 Thread Mark Bennett
* mlb: comments On Thu, Aug 13, 2009 at 9:39 AM, Stanislaw Osinski stac...@gmail.comwrote: Hi, On Tue, Aug 11, 2009 at 22:19, Mark Bennett mbenn...@ideaeng.com wrote: Carrot2 has several pluggable algorithms to choose from, though I have no evidence that they're better than Lucene's.

RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-13 Thread Fuad Efendi
UPDATE: I have 100,000,000 new documents in 24 hours, including possible updates OR possibly adding same document several times. I have two segments now (30Gb total), and network is overloaded (I use web crawler to generate documents). I never had more than 25,000,000 within a month before... I

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to be); check this http://issues.apache.org/jira/browse/SOLR-475 (and probably http://issues.apache.org/jira/browse/SOLR-711) -Original Message- From: Jason Rutherglen Yeah we need a performance comparison, I haven't

Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
Right, I haven't used SOLR-475 yet and am more familiar with Bobo. I believe there are differences but I haven't gone into them yet. As I'm using Solr 1.4 now, maybe I'll test the UnInvertedField modality. Feel free to report back results as I don't think I've seen much yet? On Thu, Aug 13, 2009

Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-13 Thread Grant Ingersoll
BTW, what version of Solr are you on? On Aug 13, 2009, at 1:43 PM, Fuad Efendi wrote: UPDATE: I have 100,000,000 new documents in 24 hours, including possible updates OR possibly adding same document several times. I have two segments now (30Gb total), and network is overloaded (I use web

Re: Solr 1.4 Clustering / mlt AS search?

2009-08-13 Thread Grant Ingersoll
On Aug 13, 2009, at 1:29 PM, Mark Bennett wrote: * mlb: comments On Thu, Aug 13, 2009 at 9:39 AM, Stanislaw Osinski stac...@gmail.comwrote: Hi, On Tue, Aug 11, 2009 at 22:19, Mark Bennett mbenn...@ideaeng.com wrote: Carrot2 has several pluggable algorithms to choose from, though I

HTTP ERROR: 500 No default field name specified

2009-08-13 Thread Kevin Miller
I have a different error once I direct the curl to look in the correct folder for the file. I am getting an HTTP ERROR: 500 No default field name specified. I am using a test word document in the exampledocs folder. I am issuing the curl command from the exampledocs folder. Following is the

RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-13 Thread Fuad Efendi
I upgraded master to 1.4-dev from trunk 3 days ago BTW such performance broke my commodity hardware, most probably network card... can't SSH to check stats; need to check onsite what happened... -Original Message- From: Grant Ingersoll Sent: August-13-09 4:20 PM To:

Re: Facets with an IDF concept

2009-08-13 Thread wojtekpia
Hi Asif, Did you end up implementing this as a custom sort order for facets? I'm facing a similar problem, but not related to time. Given 2 terms: A: appears twice in half the search results B: appears once in every search result I think term A is more interesting. Using facets sorted by

Re: Lock timed out 2 worker running

2009-08-13 Thread renz052496
Yes, I missunderstood you question (re: the crashed). Solr did not crash but we shutdown the JVM (tomcat) gracefully after we kill all our workers. But upon restarting, solr just throwing the error. Regards, /Renz 2009/8/11 Chris Hostetter hossman_luc...@fucit.org : 5) are these errors

Re: [OT] Solr Webinar

2009-08-13 Thread Lukáš Vlček
Hello, they [Lucid Imagination guys] said it should be published on their blog. I hope I understood it correctly. Regards, Lukas http://blog.lukas-vlcek.com/ On Fri, Aug 14, 2009 at 7:52 AM, Mani Kumar manikumarchau...@gmail.comwrote: if anyone has any pointer to this webinar, please share

Re: [OT] Solr Webinar

2009-08-13 Thread Mani Kumar
if anyone has any pointer to this webinar, please share it. thanks! mani On Thu, Aug 13, 2009 at 9:26 PM, Chenini, Mohamed mchen...@geico.comwrote: I also registered to attend but I am not going to because here at work a last minute meeting has been scheduled at the same time. Is it possible