data import scheduling

2010-11-11 Thread Tri Nguyen
Hi, Has anyone gotten solr to schedule data imports at a certain time interval through configuring solr? I tried setting interval=1, which is import every minute but I don't see it happening. I'm trying to avoid cron jobs. Thanks, Tri

Re: solr dynamic core creation

2010-11-11 Thread nizan
Does anyone has any idea on how to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-dynamic-core-creation-tp1867705p1881374.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: To cache or to not cache

2010-11-11 Thread Em
Jonathan, thanks for your statement. In fact, you are quite right: A lot of people developed great caching mechanisms. However, the solution I got in mind was something like an HTTP-Cache - in most cases on the same box. I talked to some experts who told me that Squid would be a relatively

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-11 Thread Jakub Godawa
Hi! Sorry for such a break, but I was moving house... anyway: 1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java file and modified it (named as StempelFilterFactory.java) in Vim that way: package org.getopt.solr.analysis; import

Error while indexing files with Solr

2010-11-11 Thread Kaustuv Royburman
Hi, I am trying to index documents (PDF, Doc, XLS, RTF) using the ExtractingRequestHandler. I am following the tutorial at http://wiki.apache.org/solr/ExtractingRequestHandler But when i run the following command *curl

index just new articles from rss feeds - Data Import Request Handler

2010-11-11 Thread Matteo Moci
Hello, I'd like to use solr to index some documents coming from an rss feed, like the example at [1], but it seems that the configuration used there is just for a one-time indexing, trying to get all the articles exposed in the rss feed of the website. Is it possible to manage and index just the

IndexTank technology...

2010-11-11 Thread Glen Newton
Does anyone know what technology they are using: http://www.indextank.com/ Is it Lucene under the hood? Thanks, and apologies for cross-posting. -Glen http://zzzoot.blogspot.com -- -

solr 1.3 how to parse rich documents

2010-11-11 Thread Nikola Garafolic
Hi, I use solr 1.3 with patch for parsing rich documents, and when uploading for example pdf file, only thing I see in solr.log is following: INFO: [] webapp=/solr path=/update/rich

Re: Adding new field after data is already indexed

2010-11-11 Thread Erick Erickson
@Jerry Li What version of Solr were you using? And was there any data in the new field? I have no problems here with a quick test I ran on trunk... Best Erick On Thu, Nov 11, 2010 at 1:37 AM, Jerry Li | 李宗杰 zongjie...@gmail.comwrote: but if I use this field to do sorting, there will be an

Re: solr dynamic core creation

2010-11-11 Thread Robert Sandiford
Hi, nizan. I didn't realize that just replying to a thread from my email client wouldn't get back to you. Here's some info on this thread since your original post: On Nov 10, 2010, at 12:30pm, Bob Sandiford wrote: Why not use replication? Call it inexperience... We're really early into

Issue with facet fields

2010-11-11 Thread gauravshetti
I am facing this weird issue in facet fields Within config xml under requestHandler name=standard class=solr.SearchHandler !-- default values for query parameters -- − lst name=defaults I have defined the fl as str name=fl file_id folder_id display_name file_name priority_text

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Solr User
Hi, I have a question about boosting. I have the following fields in my schema.xml: 1. title 2. description 3. ISBN etc I want to boost the field title. I tried index time boosting but it did not work. I also tried Query time boosting but with no luck. Can someone help me on how to implement

Boosting

2010-11-11 Thread Solr User
Hi, I have a question about boosting. I have the following fields in my schema.xml: 1. title 2. description 3. ISBN etc I want to boost the field title. I tried index time boosting but it did not work. I also tried Query time boosting but with no luck. Can someone help me on how to implement

Re: solr dynamic core creation

2010-11-11 Thread nizan
Hi, Thanks for the offers, I'll take deeper look into them. In the offers you showed me, if I understand correctly, the call for creation is done in the client side. I need the mechanism we'll work in the server side. I know it sounds stupid, but I need the client side wouldn't know about

problem with wildcard

2010-11-11 Thread Jean-Sebastien Vachon
Hi All, I'm having some trouble with a query using some wildcard and I was wondering if anyone could tell me why these two similar queries do not return the same number of results. Basically, the query I'm making should return all docs whose title starts (or contain) the string lowe'. I

Re: solr dynamic core creation

2010-11-11 Thread Robert Sandiford
Hmmm. Maybe you need to define what you mean by 'server' and what you mean by 'client'. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-dynamic-core-creation-tp1867705p1883238.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr dynamic core creation

2010-11-11 Thread nizan
Hi, Maybe just don't understand all the concept there and I mix up server and client... Client - The place where I make the http calls (for index, search etc.) - where I use the CommonsHttpSolrServer as the solr server. This machine isn't defined as master or slave, it just use solr as search

Re: Crawling with nutch and mapping fields to solr

2010-11-11 Thread Jean-Luc
I'm going down the route of patching nutch so I can use this ParseMetaTags plugin: https://issues.apache.org/jira/browse/NUTCH-809 Also wondering whether I will be able to use the XMLParser to allow me to parse well formed XHTML, using xpath would be bonus:

EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
Hi, consider the following fieldtype (used for autocompletion): fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter

Re: solr dynamic core creation

2010-11-11 Thread Robert Sandiford
No - in reading what you just wrote, and what you originally wrote, I think the misunderstanding was mine, based on the architecture of my code. In my code, it is our 'server' level that does the SolrJ indexing calls, but you meant 'server' to be the Solr instance, and what you mean by 'client'

Re: Any Copy Field Caveats?

2010-11-11 Thread Tod
I've noticed that using camelCase in field names causes problems. On 11/5/2010 11:02 AM, Will Milspec wrote: Hi all, we're moving from an old lucene version to solr and plan to use the Copy Field functionality. Previously we had rolled our own implementation, sticking title, description,

Re: Concatenate multiple tokens into one

2010-11-11 Thread Nick Martin
Hi Robert, All, I have a similar problem, here is my fieldType, http://paste.pocoo.org/show/289910/ I want to include stopword removal and lowercase the incoming terms. The idea being to take, Foo Bar Baz Ltd and turn it into foobarbaz for the EdgeNgram filter factory. If anyone can tell me a

Rollback can't be done after committing?

2010-11-11 Thread Kouta Osabe
Hi, all I have a question about Solr and SolrJ's rollback. I try to rollback like below try{ server.addBean(dto); server.commit; }catch(Exception e){ if (server != null) { server.rollback();} } I wonder if any Exception thrown, rollback process is run. so all data would not be updated. but

Re: Rollback can't be done after committing?

2010-11-11 Thread Jonathan Rochkind
What you say is true. Solr is not an rdbms. Kouta Osabe wrote: Hi, all I have a question about Solr and SolrJ's rollback. I try to rollback like below try{ server.addBean(dto); server.commit; }catch(Exception e){ if (server != null) { server.rollback();} } I wonder if any Exception thrown,

using CJKTokenizerFactory for Japanese language

2010-11-11 Thread Kumar Pandey
I am exploring support for Japanese language in solr. Solr seems to provide CJKTokenizerFactory. How useful is this module? Has anyone been using this in production for Japanese language? One shortfall it seems to have from what I have been able to read up on is that it can generate lot of false

Re: EdgeNGram relevancy

2010-11-11 Thread Ahmet Arslan
You can add an additional field, with using KeywordTokenizerFactory instead of WhitespaceTokenizerFactory. And query both these fields with an OR operator. edgytext:(Bill Cl) OR edgytext2:Bill Cl You can even apply boost so that begins with matches comes first. --- On Thu, 11/11/10, Robert

Re: Issue with facet fields

2010-11-11 Thread Paige Cook
Are you storing the upload_by and business fields? You will not be able to retrieve a field from your index if it is not stored. Check that you have stored=true for both of those fields. - Paige On Thu, Nov 11, 2010 at 10:23 AM, gauravshetti gaurav.she...@tcs.comwrote: I am facing this weird

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
thanks a lot, that setup works pretty well now. the only problem now is that the StopWords do not work that good anymore. I'll provide an example, but first the 2 fieldtypes: !-- autocomplete field which finds matches inside strings (scor matches Martin Scorsese) -- fieldType

Re: Concatenate multiple tokens into one

2010-11-11 Thread Robert Gründler
I've posted a ConcaFilter in my previous mail which does concatenate tokens. This works fine, but i realized that what i wanted to achieve is implemented easier in another way (by using 2 separate field types). Have a look at a previous mail i wrote to the list and the reply from Ahmet Arslan

Memory used by facet queries

2010-11-11 Thread Charlie Gildawie
Hello All. My first time post so be kind. Developing a document store with lots and lots of very small documents. (200 million at the moment. Final size will probably be double this at 400 million documents). This is Proof of concept development so we are seeing what a single code can do for

Re: EdgeNGram relevancy

2010-11-11 Thread Ahmet Arslan
This setup now makes troubles regarding StopWords, here's an example: Let's say the index contains 2 Strings: Mr Martin Scorsese and Martin Scorsese. Mr is in the stopword list. Query: edgytext:Mr Scorsese OR edgytext2:Mr Scorsese^2.0 This way, the only result i get is Mr Martin

Search Result Differences a Puzzle

2010-11-11 Thread Eric Martin
Hi, I cannot find out how this is occurring: Nolosearch/com/search/apachesolr_search/law You can see that the John Paul Stevens result yields more description in the search result because of the keyword relevancy, whereas, the other results just give you a snippet of the title

Retrieving indexed content containing multiple languages

2010-11-11 Thread Tod
My Solr corpus is currently created by indexing metadata from a relational database as well as content pointed to by URLs from the database. I'm using a pretty generic out of the box Solr schema. The search results are presented via an AJAX enabled HTML page. When I perform a search the

Re: Concatenate multiple tokens into one

2010-11-11 Thread Nick Martin
Thanks Robert, I had been trying to get your ConcatFilter to work, but I'm not sure what i need in the classpath and where Token comes from. Will check the thread you mention. Best Nick On 11 Nov 2010, at 18:13, Robert Gründler wrote: I've posted a ConcaFilter in my previous mail which does

Re: Concatenate multiple tokens into one

2010-11-11 Thread Robert Gründler
this is the full source code, but be warned, i'm not a java developer, and i have no background in lucine/solr development: // ConcatFilter import java.io.IOException; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.TokenFilter; import

Re: EdgeNGram relevancy

2010-11-11 Thread Nick Martin
On 12 Nov 2010, at 01:46, Ahmet Arslan iori...@yahoo.com wrote: This setup now makes troubles regarding StopWords, here's an example: Let's say the index contains 2 Strings: Mr Martin Scorsese and Martin Scorsese. Mr is in the stopword list. Query: edgytext:Mr Scorsese OR edgytext2:Mr

Re: Retrieving indexed content containing multiple languages

2010-11-11 Thread Dennis Gearon
I look forward to the eanswers to this one. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from

Re: EdgeNGram relevancy

2010-11-11 Thread Andy
Could anyone help me understand what does Clyde Phillips appear in the results for Bill Cl?? Clyde Phillips doesn't produce any EdgeNGram that would match Bill Cl, so why is it even in the results? Thanks. --- On Thu, 11/11/10, Ahmet Arslan iori...@yahoo.com wrote: You can add an additional

Re: problem with wildcard

2010-11-11 Thread Ahmet Arslan
I'm having some trouble with a query using some wildcard and I was wondering if anyone could tell me why these two similar queries do not return the same number of results. Basically, the query I'm making should return all docs whose title starts (or contain) the string lowe'. I suspect some

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
according to the fieldtype i posted previously, i think it's because of: 1. WhiteSpaceTokenizer splits the String Clyde Phillips into 2 tokens: Clyde and Phillips 2. EdgeNGramFilter gets the 2 tokens, and creates an EdgeNGram for each token: C Cl Cly ... AND P Ph Phi ... The Query String

FAST ESP - Solr migration webinar

2010-11-11 Thread Yonik Seeley
We're holding a free webinar on migration from FAST to Solr. Details below. -Yonik http://www.lucidimagination.com = Solr To The Rescue: Successful Migration From FAST ESP to Open Source Search Based on Apache Solr

Re: problem with wildcard

2010-11-11 Thread Jean-Sebastien Vachon
On 2010-11-11, at 3:45 PM, Ahmet Arslan wrote: I'm having some trouble with a query using some wildcard and I was wondering if anyone could tell me why these two similar queries do not return the same number of results. Basically, the query I'm making should return all docs whose title

facet+shingle in autosuggest

2010-11-11 Thread Lukas Kahwe Smith
Hi, I am using a facet.prefix search with shingle's in my autosuggest: fieldType name=shingle class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer tokenizer class=solr.StandardTokenizerFactory / filter class=solr.LowerCaseFilterFactory /

Re: problem with wildcard

2010-11-11 Thread Ahmet Arslan
select?q=*:*fq=title:(+lowe')debugQuery=onrows=0 wildcard queries are not analyzed http://search-lucene.com/m/pnmlH14o6eM1/ Yeah I found out about this a couple of minutes after I posted my problem. If there is no analyzer then why is Solr not finding any documents when a single

Re: EdgeNGram relevancy

2010-11-11 Thread Andy
Ah I see. Thanks for the explanation. Could you set the defaultOperator to AND? That way both Bill and Cl must be a match and that would exclude Clyde Phillips. --- On Thu, 11/11/10, Robert Gründler rob...@dubture.com wrote: From: Robert Gründler rob...@dubture.com Subject: Re: EdgeNGram

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Erick Erickson
There's not much to go on here. Boosting works, and index time as opposed to query time boosting addresses two different needs. Could you add some detail? All you've really said is it didn't work, which doesn't allow a very constructive response. Perhaps you could review:

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Solr User
Eric, Thank you so much for the reply and apologize for not providing all the details. The following are the field definitons in my schema.xml: field name=title type=string indexed=true stored=true omitNorms=false / field name=author type=string indexed=true stored=true multiValued=true

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Ahmet Arslan
There are several mistakes in your approach: copyField just copies data. Index time boost is not copied. There is no such boosting syntax. /select?q=Eachtitle^9fl=score You are searching on your default field. This is not your cause of your problem but omitNorms=true disables index time

Re: facet+shingle in autosuggest

2010-11-11 Thread Erick Erickson
I don't know all the implications here, but can't you just insert the StopwordFilterFactory before the ShingleFilterFactory and turn it loose? Best Erick On Thu, Nov 11, 2010 at 4:02 PM, Lukas Kahwe Smith m...@pooteeweet.orgwrote: Hi, I am using a facet.prefix search with shingle's in my

Re: using CJKTokenizerFactory for Japanese language

2010-11-11 Thread Koji Sekiguchi
(10/11/12 1:49), Kumar Pandey wrote: I am exploring support for Japanese language in solr. Solr seems to provide CJKTokenizerFactory. How useful is this module? Has anyone been using this in production for Japanese language? CJKTokenizer is used in a lot of places in Japan. One shortfall it

Re: facet+shingle in autosuggest

2010-11-11 Thread Lukas Kahwe Smith
On 11.11.2010, at 17:42, Erick Erickson wrote: I don't know all the implications here, but can't you just insert the StopwordFilterFactory before the ShingleFilterFactory and turn it loose? havent tried this, but i would suspect that i would then get in trouble with stuff like united

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
Did you run your query without using () and operators? If yes can you try this? q=edgytext:(Mr Scorsese) OR edgytext2:Mr Scorsese^2.0 I didn't use () and in my query before. Using the query with those operators works now, stopwords are thrown out as the should, thanks. However, i don't

Re: EdgeNGram relevancy

2010-11-11 Thread Jonathan Rochkind
Without the parens, the edgytext: only applied to Mr, the default field still applied to Scorcese. The double quotes are neccesary in the second case (rather than parens), because on a non-tokenized field because the standard query parser will pre-tokenize on whitespace before sending

Re: WELCOME to solr-user@lucene.apache.org

2010-11-11 Thread Ramavtar Meena
Hi, If you are looking for query time boosting on title field you can do the following: /select?q=title:android^10 Also unless you have a very good reason to use string for date data (in your case pubdate and reldate), you should be using solr.DateField. regards, Ram On Fri, Nov 12, 2010 at

Best practices to rebuild index on live system

2010-11-11 Thread Robert Gründler
Hi again, we're coming closer to the rollout of our newly created solr/lucene based search, and i'm wondering how people handle changes to their schema on live systems. In our case, we have 3 cores (ie. A,B,C), where the largest one takes about 1.5 hours for a full dataimport from the

Re: Best practices to rebuild index on live system

2010-11-11 Thread Jonathan Rochkind
You can do a similar thing to your case #1 with Solr replication, handling a lot of the details for you instead of you manually switching cores and such. Index to a new core, then tell your production solr to be a slave replicating from that master new core. It still may have some of the same

Re: Best practices to rebuild index on live system

2010-11-11 Thread Erick Erickson
If by corrupt index you mean an index that's just not quite up to date, could you do a delta import? In other words, how do you make our Solr index reflect changes to the DB even without a schema change? Could you extend that method to handle your use case? So the scenario is something like this:

Re: Spatial search in Solr 1.5

2010-11-11 Thread Scott K
I just upgraded to a later version of the trunk and noticed my geofilter queries stopped working, apparently because the sfilt function was renamed to geofilt. I realize trunk is not stable, but other than looking at every change, is there an easy way to find changes that are not backward

Re: index just new articles from rss feeds - Data Import Request Handler

2010-11-11 Thread Shalin Shekhar Mangar
On Thu, Nov 11, 2010 at 8:21 AM, Matteo Moci mox...@gmail.com wrote: Hello, I'd like to use solr to index some documents coming from an rss feed, like the example at [1], but it seems that the configuration used there is just for a one-time indexing, trying to get all the articles exposed in

Re: Boosting

2010-11-11 Thread Shalin Shekhar Mangar
On Thu, Nov 11, 2010 at 10:35 AM, Solr User solr...@gmail.com wrote: Hi, I have a question about boosting. I have the following fields in my schema.xml: 1. title 2. description 3. ISBN etc I want to boost the field title. I tried index time boosting but it did not work. I also tried

Looking for help with Solr implementation

2010-11-11 Thread AC
Hi, Not sure if this is the correct place to post but I'm looking for someone to help finish a Solr install on our LAMP based website.  This would be a paid project.  The programmer that started the project got too busy with his full-time job to finish the project.  Solr has been installed

Link to download solr4.0 is not working?

2010-11-11 Thread Deche Pangestu
Hello, Does anyone know where to download solr4.0 source? I tried downloading from this page: http://wiki.apache.org/solr/FrontPage#solr_development but the link is not working... Best, Deche

importing from java

2010-11-11 Thread Tri Nguyen
Hi, I'm restricted to the following in regards to importing. I have access to a list (Iterator) of Java objects I need to import into solr. Can I import the java objects as part of solr's data import interface (whenever an http request to solr to do a dataimport, it'll call my java class to

Re: Rollback can't be done after committing?

2010-11-11 Thread gengshaoguang
Hi, Kouta: Any data store does not support rollback AFTER commit, rollback works only BEFORE. On Friday, November 12, 2010 12:34:18 am Kouta Osabe wrote: Hi, all I have a question about Solr and SolrJ's rollback. I try to rollback like below try{ server.addBean(dto); server.commit;

Re: Rollback can't be done after committing?

2010-11-11 Thread Pradeep Singh
In some cases you can rollback to a named checkpoint. I am not too sure but I think I read in the lucene documentation that it supported named checkpointing. On Thu, Nov 11, 2010 at 7:12 PM, gengshaoguang gengshaogu...@ceopen.cnwrote: Hi, Kouta: Any data store does not support rollback AFTER

A Newbie Question

2010-11-11 Thread K. Seshadri Iyer
Hi, Pardon me if this sounds very elementary, but I have a very basic question regarding Solr search. I have about 10 storage devices running Solaris with hundreds of thousands of text files (there are other files, as well, but my target is these text files). The directories on the Solaris boxes

Re: importing from java

2010-11-11 Thread Tri Nguyen
another question is, can I write my own DataImportHandler class? thanks, Tri From: Tri Nguyen tringuye...@yahoo.com To: solr user solr-user@lucene.apache.org Sent: Thu, November 11, 2010 7:01:25 PM Subject: importing from java Hi, I'm restricted to the

RE: importing from java

2010-11-11 Thread Eric Martin
http://wiki.apache.org/solr/DIHQuickStart http://wiki.apache.org/solr/DataImportHandlerFaq http://wiki.apache.org/solr/DataImportHandler -Original Message- From: Tri Nguyen [mailto:tringuye...@yahoo.com] Sent: Thursday, November 11, 2010 9:34 PM To: solr-user@lucene.apache.org Subject:

Re: Rollback can't be done after committing?

2010-11-11 Thread gengshaoguang
Oh, Pardeep: I don't think lucene is a advanced storage app to support rollback to a history check point (which would be support only in distributed system, such as tow phase commit or transactional web services) yours On Friday, November 12,

Looking for help with Solr implementation

2010-11-11 Thread AC
Hi, Not sure if this is the correct place to post but I'm looking for someone to help finish a Solr install on our LAMP based website.  This would be a paid project.  The programmer that started the project got too busy with his full-time job to finish the project.  Solr has been installed

Re: Best practices to rebuild index on live system

2010-11-11 Thread Shawn Heisey
On 11/11/2010 4:45 PM, Robert Gründler wrote: So far, i can only think of 2 scenarios for rebuilding the index, if we need to update the schema after the rollout: 1. Create 3 more cores (A1,B1,C1) - Import the data from the database - After importing, switch the application to cores A1, B1,

Re: Link to download solr4.0 is not working?

2010-11-11 Thread Shawn Heisey
On 11/11/2010 7:44 PM, Deche Pangestu wrote: Hello, Does anyone know where to download solr4.0 source? I tried downloading from this page: http://wiki.apache.org/solr/FrontPage#solr_development but the link is not working... Your best bet is to use svn.