Re: Seattle Hadoop/Scalability/NoSQL Meetup Tonight!

2010-02-25 Thread Bradford Stephens
Thanks for coming, everyone! We had around 25 people. A *huge* success, for Seattle. And a big thanks to 10gen for sending Richard. Can't wait to see you all next month. On Wed, Feb 24, 2010 at 2:15 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: The Seattle Hadoop/Scalability/NoSQL

Is it possible to disable fieldNorm?

2010-02-25 Thread Jason Chaffee
I would like to either disable fieldNorm in the scoring or make sure that it is the same for all documents. I am creating EdgeNGrams and that can cause the number of terms for a document to be variable, but I do not want it do affect the scoring for this field. Is there an easy way to do

Warning : no lockType configured for...

2010-02-25 Thread Mani EZZAT
Hello, I have this warning even if (I think) everything is setted correctly I'm dynamically creating cores with a new index, using the same schema and solrconfig.xml I looked at the solr code (SolrCore, Config, SolrConfig, SolrIndexWriter etc...) and everything seems fine to me. The log

Re: Autosuggest/Autocomplete with solr 1.4 and EdgeNGrams

2010-02-25 Thread Sachin
Hello Joe, The whitespacetokenizerfactory seems to have done the trick, I would for now keep it like this and closely monitor to see if there are any performance implications of using EdgeNGrams but for now this works like a charm. Thanks! -Original Message- From: Joe

Re: If you could have one feature in Solr...

2010-02-25 Thread Stefano Cherchi
Grant, I'm not a java developer but a sysadmin and I've been struggling for a couple of month now to build a full web search engine stack based on hadoop + nutch + solr . I don't know much about the documentation for developers so I trust you if you say it's good. What I do know is that I

Re: Seattle Hadoop/Scalability/NoSQL Meetup Tonight!

2010-02-25 Thread Tim Terlegård
2010/2/25 Bradford Stephens bradfordsteph...@gmail.com: Thanks for coming, everyone! We had around 25 people. A *huge* success, for Seattle. And a big thanks to 10gen for sending Richard. Can't wait to see you all next month. Did anyone record the event? /Tim

unexpected result using OR in query

2010-02-25 Thread György Frivolt
Hi, I run into an unexpected behaviour for Solr with query parsing. I need to fetch article which contain several expressions. However, I noticed the following behaviour: - when I fetch results for query A I get a number of results X - for query B I get a number of results Y - for query A B

Re: If you could have one feature in Solr...

2010-02-25 Thread Robert Muir
Gora, have you tried the Hindi Analyzer in lucene? if you add it to lucene, the results exceed at least everything from FIRE 2008. So I don't really understand where you are getting this information! Actually, the state of the art for NLP in Indian languages is quite poor, at least in the

Re: SolrJ commit options

2010-02-25 Thread Shalin Shekhar Mangar
On Thu, Feb 25, 2010 at 5:34 PM, gunjan_versata gunjanga...@gmail.comwrote: We are using SolrJ to handle commits to our solr server.. All runs fine.. But whenever the commit happens, the server becomes slow and stops responding.. therby resulting in TimeOut errors on our production. We are

Re: If you could have one feature in Solr...

2010-02-25 Thread Gora Mohanty
On Thu, 25 Feb 2010 07:37:33 -0500 Robert Muir rcm...@gmail.com wrote: Gora, have you tried the Hindi Analyzer in lucene? if you add it to lucene, the results exceed at least everything from FIRE 2008. [...] Oh! No, sorry, I haven't. So far, I have only looked at search through Solr, and I

Removing duplicate values from multivalued fields

2010-02-25 Thread muneeb
Hi, Is there a way to remove duplicate values from the multivalued fields? I am using Solrj client with solr 1.4 version. Thanks in advance, -Ali -- View this message in context: http://old.nabble.com/Removing-duplicate-values-from-multivalued-fields-tp27714441p27714441.html Sent from the

Re: unexpected result using OR in query

2010-02-25 Thread Ahmet Arslan
--- On Thu, 2/25/10, György Frivolt gyorgy.friv...@gmail.com wrote:     I run into an unexpected behaviour for Solr with query parsing. I need to fetch article which contain several expressions. However, I noticed the following behaviour: - when I fetch results for query A I get a number

Re: Is it possible to disable fieldNorm?

2010-02-25 Thread Ahmet Arslan
I would like to either disable fieldNorm in the scoring or make sure that it is the same for all documents.  I am creating EdgeNGrams and that can cause the number of terms for a document to be variable, but I do not want it do affect the scoring for this field.  Is there an easy way to do

Solr Extract

2010-02-25 Thread Lee Smith
Hey All I am having a go at extracting some file as per the wiki guide. I cd to the root directory of the folder and run the command with no success apart from some broken HTML If you see this here: http://screencast.com/t/MGRiZTU5M It might help to understand what Im doing wrong. hope

Re: CoreAdmin

2010-02-25 Thread Siddhant Goel
Hi, Did you *really* go through this page - http://wiki.apache.org/solr/CoreAdmin ? On Thu, Feb 25, 2010 at 7:40 PM, Sudhakar_Thangavel reactive...@yahoo.comwrote: Hi, Am new to Solr .Am not getting clearly in wiki..can any one tell me how to configure coreAdmin i need step by step

Schema configurations for setting non-case sensitive search and matching partial word in a search string

2010-02-25 Thread Turner, Robbin J
This is probably stated somewhere, but I've look an obviously missing it. But is there a specific field type analyzer or setting for case sensitivity? And something on the worddelimiterfilterfactor to allow for a match if it's not the whole word in the query, such as animal will match animals?

Re: If you could have one feature in Solr...

2010-02-25 Thread Gora Mohanty
On Thu, 25 Feb 2010 07:54:06 -0500 Robert Muir rcm...@gmail.com wrote: Gora, I wonder perhaps if there is a documentation issue. e.g. Thai, Arabic, Chinese were mentioned here previously, these are all supported, too. Let me know if you have any ideas! Sorry, are you saying that these

Re: sorting

2010-02-25 Thread Claudio Martella
Chris Hostetter wrote: :str name=bftitle^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8/str :str name=qftitle^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8/str FWIW: I don't think you understand what the bf param is for ... it's not analogous to qf and pf, it's for expressing a list

Issues with spell checker

2010-02-25 Thread cjkadakia
First of all, I want to thank you guys for your help thus far. It's been very useful during my search-engine integration project. :) So I'm having a few issues with the spell checking component of Solr. For reference, I'm using Solr 1.4. First, I'm not getting any search results period. Here's

new/first searcher

2010-02-25 Thread solrquestion6
Hi, Is it the wrong approach to have the same warmup queries in both new and first searcher? The wiki shows a sorting query for the newSearcher and the same sorting query plus facet/filter queries for the firstSearcher. -- View this message in context:

Re: Schema configurations for setting non-case sensitive search and matching partial word in a search string

2010-02-25 Thread Erick Erickson
Pipe things through LowerCaseFilterFactory to turn everything into lower case, at both index and query time, assuming you want to perform caseless matches. Depending on the behavior you want as far as partial word matches, it depends (tm). The specific example you cite would be handled by

Delta Query - DIH

2010-02-25 Thread JavaGuy84
Hi,My data config looks like below,entity x query= select * from x entity y query=select * from y where id=x.id entity y query=select * from z where id=x.idI am able to successfully run the Full-Import query without any issue. I am not sure how can I implement a delta query as each of

DIH Issue in Delta Query

2010-02-25 Thread JavaGuy84
Hi, My data config looks like below doc entity x query= select * from x entity y query=select * from y where id=x.id entity z query=select * from z where id=x.id doc I am able to successfully run the Full-Import query without any issue. I am not sure how can I implement a

Re: Performance issue in indexing the data with DIH when using subqueries

2010-02-25 Thread JavaGuy84
Thanks a lot Shalin.. This resolve my issue :). Thanks, Barani Shalin Shekhar Mangar wrote: On Tue, Feb 23, 2010 at 1:01 AM, JavaGuy84 bbar...@gmail.com wrote: Hi, I am facing a performace issue when I am trying to index the data using DIH.. I have a model as below Tables

Re: Extended stats via JMX

2010-02-25 Thread Shalin Shekhar Mangar
On Thu, Feb 25, 2010 at 10:56 AM, Dan Trainor dtrai...@toolbox.com wrote: Right now, being inexperienced with JMX and all, I was wondering if there was a way to pull all Solr-specific items out of there. I see some general counters pertaining to each of my Solr instances, but nothing along

Re: If you could have one feature in Solr...

2010-02-25 Thread Ron Mayer
Erik Hatcher wrote: Ron - I think SOLR-792 meets the need you describe. What do you think? It's tree faceting, allowing you to facet down 2 levels deep arbitrarily on any two fields. Ideally we'd enhance it to be of arbitrary depth too. Nice! It certainly handles my main use case. There

Re: If you could have one feature in Solr...

2010-02-25 Thread Gora Mohanty
On Thu, 25 Feb 2010 13:06:03 -0500 Robert Muir rcm...@gmail.com wrote: Yeah, Thai and Arabic have the stuff in Solr 1.4 For Chinese, if you want to do CJK bigram indexing, this is there too. If you want to do word-based smart indexing, you need to add an additional jar file to your classpath.

Re: If you could have one feature in Solr...

2010-02-25 Thread Shawn Heisey
I would like to be able to do a delta import on arbitrary data, not a last modified date. Specifically, our database has an auto_increment field called DID, or document identifier. For changes to existing data. this field is updated anytime a row is changed in any way, effectively turning it

Re: If you could have one feature in Solr...

2010-02-25 Thread Robert Muir
Yeah, Thai and Arabic have the stuff in Solr 1.4 For Chinese, if you want to do CJK bigram indexing, this is there too. If you want to do word-based smart indexing, you need to add an additional jar file to your classpath. we can add a wiki page with examples of how to use these maybe to make it

Re: Extended stats via JMX

2010-02-25 Thread Matthew Runo
https://issues.apache.org/jira/browse/SOLR-1750 might help you, since I don't think that all of stats.jsp is exposed via MBeans. I could be wrong about that though.. (apologies, our solr servers are firewalled and I can't connect via JMX at the moment) Thanks for your time! Matthew Runo

Solr Cell RTF Woes

2010-02-25 Thread Bill Engle
Any RTF file I tried to index in Solr 1.4 throws these errors out. I have no issues with doc, pdf. Any thoughts? Thanks. htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2

Re: If you could have one feature in Solr...

2010-02-25 Thread Erik Hatcher
Ron - I think SOLR-792 meets the need you describe. What do you think? It's tree faceting, allowing you to facet down 2 levels deep arbitrarily on any two fields. Ideally we'd enhance it to be of arbitrary depth too. Erik On Feb 24, 2010, at 6:40 PM, Ron Mayer wrote: Another

Solr 1.4 distributed search configuration

2010-02-25 Thread Jeffrey Zhao
How do define a new search handler with a shards parameter? I defined as following way but it doesn't work. If I put the shards parameter in default handler, it seems I got an infinite loop. requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query

Re: Solr 1.4 distributed search configuration

2010-02-25 Thread Mark Miller
On 02/25/2010 03:32 PM, Jeffrey Zhao wrote: How do define a new search handler with a shards parameter? I defined as following way but it doesn't work. If I put the shards parameter in default handler, it seems I got an infinite loop. requestHandler name=standard class=solr.SearchHandler

Re: Solr 1.4 distributed search configuration

2010-02-25 Thread Jeffrey Zhao
Hi Mark, Thanks for your reply. I did make a new handler as following, but it does not work, anything wrong with my configuration? Thanks, requestHandler name=search class=solr.SearchHandler !-- default values for query parameters -- lst name=defaults str

Re: If you could have one feature in Solr...

2010-02-25 Thread Smiley, David W.
1. Spatial search 2. Ease of managing a sharded index, multi-server Solr instance. I am aware these are in-progress, slated for Solr 1.5. I may find myself getting involved on these shortly because I'm working on a very large scale search project requiring both. ~ David On Feb 24, 2010, at

How to use dismax and boosting properly?

2010-02-25 Thread Jason Chaffee
I am using dismax and I have configured to search 3 different fields with one field getting an extra boost so that I the results of that field are at the top of result set. Then, I sort the results by another field to get the ordering. My problem is that the scores are being skewed by the

Re: Solr 1.4 distributed search configuration

2010-02-25 Thread Mark Miller
Can you elaborate on doesn't work when you put it in the /search handler? You get an error in the logs? Nothing happens? On 02/25/2010 03:47 PM, Jeffrey Zhao wrote: Hi Mark, Thanks for your reply. I did make a new handler as following, but it does not work, anything wrong with my

Advice on deployment

2010-02-25 Thread Shawn Heisey
We are currently using a commerical indexing product based on Lucene for our indexing needs, and would like to replace it with SOLR. The source database for this system has 40 million records, growing by about 30,000 items per day. It is a repository for all the metadata relating to an

RE: How to use dismax and boosting properly?

2010-02-25 Thread Nagelberg, Kallin
Try setting the boost to 0 for the fields you don't want to contribute to the score. Kallin Nagelberg -Original Message- From: Jason Chaffee [mailto:jchaf...@ebates.com] Sent: Thursday, February 25, 2010 4:03 PM To: solr-user@lucene.apache.org Subject: How to use dismax and boosting

Re: Seattle Hadoop/Scalability/NoSQL Meetup Tonight!

2010-02-25 Thread Nick Dimiduk
Not that I'm aware of. 2010/2/25 Tim Terlegård tim.terleg...@gmail.com 2010/2/25 Bradford Stephens bradfordsteph...@gmail.com: Thanks for coming, everyone! We had around 25 people. A *huge* success, for Seattle. And a big thanks to 10gen for sending Richard. Can't wait to see you all

RE: How to use dismax and boosting properly?

2010-02-25 Thread Jason Chaffee
I thought I tried that, but I guess I didn't restart Solr to pick up the configuration. That did the trick. Thanks! -Original Message- From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] Sent: Thursday, February 25, 2010 1:10 PM To: 'solr-user@lucene.apache.org' Subject: RE:

RE: Free Webinar: Mastering Solr 1.4 with Yonik Seeley

2010-02-25 Thread Bernadette Houghton
Yonk, can you please advise whether this event will be recorded and available for later download? (It starts 5am our time ;-) ) Regards Bern -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, 25 February 2010 10:23 AM To:

Re: new/first searcher

2010-02-25 Thread Otis Gospodnetic
Hi, There is nothing wrong with using the same query for both events. As a matter of fact, it makes sense to use the same (type of) query for both events. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original

Re: Index size

2010-02-25 Thread Otis Gospodnetic
It depends on many factors - how big those docs are (compare a tweet to a news article to a book chapter) whether you store the data or just index it, whether you compress it, how and how much you analyze the data, etc. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop

Re: Strange search behavior

2010-02-25 Thread Otis Gospodnetic
Jan, If you go to Solr Admin Analysis page and enter your problematic query, what do you see? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Jan Simon Winkelmann

HTTP ERROR: 404 missing core name in path after integrating nutch

2010-02-25 Thread Ian M. Evans
Hi everyone, Last night I was able to get solr up and running. Ran and was able to access: http://localhost:8983/solr/admin This morning, I started on the nutch crawling instructions over at: http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ After adding the following to

Re: Using XSLT with DIH for a URLDataSource

2010-02-25 Thread Lance Norskog
There could be a common 'open an url' utility method. This would help make the DIH components consistent. 2010/2/24 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com: you are right. The StreamSource class is not throwing the proper exception Do we really have to handle this.? On Thu, Feb 25,

Re: Delta Query - DIH

2010-02-25 Thread Lance Norskog
It may be easier to understand the problem if you create views for the full- and delta-import queries. On Thu, Feb 25, 2010 at 9:09 AM, JavaGuy84 bbar...@gmail.com wrote: Hi,My data config looks like below,entity x query= select * from x entity y query=select * from y where id=x.id        

SOLR Multivalued field and length norm

2010-02-25 Thread Pooja Verlani
Hi, I understand if I query on a multivalued field, length norm takes the total length of the multivalued field. Is it possible to use the length of only the particular value in the array of multivalued field? It would be easier and more efficient in searching then. Regards, Pooja

Re: If you could have one feature in Solr...

2010-02-25 Thread Lance Norskog
Error messages that make sense. I have to read the source far too often when a simple change to errror-handling would make some feature easy to use. If I want to read Java I'll use Lucene! Passive-aggressive error handling is a related problem: when I do something nonsensical I too often get 0

Changing term frequency according to value of one of the fields

2010-02-25 Thread Pooja Verlani
Hi, I want to modify Similarity class for my app like the following- Right now tf is Math.sqrt(termFrequency) I would like to modify it to Math.sqrt(termFrequncy/solrDoc.getFieldValue(count)) where count is one of the fields in the particular solr document. Is it possible to do so? Can I import

Re: Migrating to Solr

2010-02-25 Thread Bernd Fehling
Hi list, is this true, no downloaded copy of the documentprocessor anywhere available? Regards, Bernd Bernd Fehling schrieb: Was anyone able to get a copy of: http://sesat.no/svn/sesat-documentprocessor/ Unfortunately it is offline. Would be pleased to get a copy. Regards, Bernd