How can i get collect stemmed query?

2010-10-18 Thread Jerad
Hi~. I'm beginner who wanna make search system by using solr 1.4.1 and lucene 2.92. I got a collect lucene query from my custom Analyzer and filter from given query, but no result displayed. Here is my Analyzer source.

Re: How can i get collect stemmed query?

2010-10-18 Thread Ahmet Arslan
Are you using KLTQueryAnalyzer outside of the Solr? (pre-process) Or you defined a fieldType in schema.xml that uses KLTQueryAnalyzer? Can you append debugQuery=on to your search url and paste output? --- On Mon, 10/18/10, Jerad ag...@naver.com wrote: From: Jerad ag...@naver.com Subject: How

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Peter Karich
Hi, you can try to parse the xml via Java yourself and then push the SolrInputDocuments it via SolrJ to solr. setting format to binaray + using the streaming update processor should improve performance, but I am not sure... and performant (+less mem!) reading xml in Java is another topic ... ;-)

Re: query between two date

2010-10-18 Thread Savvas-Andreas Moysidis
You'll have to supply your dates in a format Solr expects (e.g. 2010-10-19T08:29:43Z and not 2010-10-19). If you don't need millisecond granularity you can use the DateMath syntax to specify that. Please, also check http://wiki.apache.org/solr/SolrQuerySyntax. On 17 October 2010 10:54, nedaha

AW: How do you programatically create new cores?

2010-10-18 Thread Bastian
A http-get call is simply made by entering the url into your browser, like shown in the example in the wiki: http://localhost:8983/solr/admin/cores?action=CREATEname=coreXinstanceDir= path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_nam e.xmldataDir=data -Ursprüngliche

Re: query between two date

2010-10-18 Thread nedaha
Thanks for your reply. I know about the solr date format!! Check-in and Check-out dates are user-friendly format that we use in our search form for system's users. and i change the format via code and then send them to solr. I want to know how can i make a query to compare a range between

Re: How can i get collect stemmed query?

2010-10-18 Thread Jerad
Oops, I'm Sorry! I found some mistakes on previous posted source.( Main class name has been wrong :) This is the collect analyzer source. --- public class MyCustomQueryAnalyzer extends Analyzer{

Boosting documents based on the vote count

2010-10-18 Thread Alexandru Badiu
Hello all, I have a field in my schema which holds the number of votes a document has. How can I boost documents based on that number? Something like the one which has the maximum number has a boost of 10, the one with the smallest number has 0.5 and in between the values get calculated

Re: query between two date

2010-10-18 Thread Savvas-Andreas Moysidis
ok, maybe don't get this right.. are you trying to match something like check-in date 2010-10-19T00:00:00Z AND check-out date 2010-10-21T00:00:00Z *or* check-in date = 2010-10-19T00:00:00Z AND check-out date = 2010-10-21T00:00:00Z? On 18 October 2010 10:05, nedaha neda...@gmail.com wrote:

solr requirements

2010-10-18 Thread satya swaroop
Hi All, I am planning to have a separate server for solr and regarding hardware requirements i have a doubt about what configuration to be needed. I know it will be hard to tell but i just need a minimum requirement for the particular situation as follows:: 1) There are 1000 regular

Re: query between two date

2010-10-18 Thread nedaha
The exact query that i want is: check-in date = 2010-10-19T00:00:00Z AND check-out date = 2010-10-21T00:00:00Z but because of the structure that i have to index i don't have specific start date and end date in my solr index to compare with check-in and check-out date range. I have some dates

Re: API for using Multi cores with SolrJ

2010-10-18 Thread Peter Karich
I asked this myself ... here could be some pointers: http://lucene.472066.n3.nabble.com/SolrJ-and-Multi-Core-Set-up-td1411235.html http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-in-Single-Core-td475238.html Hi everyone, I'm trying to write some code for creating and using multi cores.

Re: query between two date

2010-10-18 Thread Savvas-Andreas Moysidis
ok, I see now..well, the only query that comes to mind is something like: check-in date:[2010-10-19T00:00:00Z TO *] AND check-out date:[* TO 2010-10-21T00:00:00Z] would something like that work? On 18 October 2010 11:04, nedaha neda...@gmail.com wrote: The exact query that i want is:

Re: How can i get collect stemmed query?

2010-10-18 Thread Ahmet Arslan
rawquerystring = +body:flyaway parsedquery = +body:fly +body:away shows that your custom filter is working as you expected. However you are using different tokenizers in query (standardtokenizer hard-coded) and index (whitespacetokenizer) time. That may cause numFound=0. For example if your

Re: Boosting documents based on the vote count

2010-10-18 Thread Ahmet Arslan
I have a field in my schema which holds the number of votes a document has. How can I boost documents based on that number? you can do it with http://wiki.apache.org/solr/FunctionQuery

Re: Boosting documents based on the vote count

2010-10-18 Thread Alexandru Badiu
I know but I can't figure out what functions to use. :) On Mon, Oct 18, 2010 at 1:38 PM, Ahmet Arslan iori...@yahoo.com wrote: I have a field in my schema which holds the number of votes a document has. How can I boost documents based on that number? you can do it with

Implementing Search Suggestion on Solr

2010-10-18 Thread Pablo Recio Quijano
Hi! I'm trying to implement some kind of Search Suggestion on a search engine I have implemented. This search suggestions should not be automatically like the one described for the SpellCheckComponent [1]. I'm looking something like: SAS oppositions = Public job offers for some-company So

Re: Term is duplicated when updating a document

2010-10-18 Thread Thomas Kellerer
Thanks. Not really the answer I wanted to hear, but at least I know this is not my fault ;) Regards Thomas Erick Erickson, 15.10.2010 20:42: This is actually known behavior. The problem is that when you update a document, it's deleted and re-added, but the original is marked as deleted.

Re: Virtual field, Statistics

2010-10-18 Thread Tanguy Moal
Hello Lance, thank you for your reply. I created the following JIRA issue: https://issues.apache.org/jira/browse/SOLR-2171, as suggested. Can you tell me how new issues are handled by the development teams, and whether there's a way I could help/contribute ? -- Tanguy 2010/10/16 Lance Norskog

Re: SOLR DateTime and SortableLongField field type problems

2010-10-18 Thread Ken Stanley
Just following up to see if anybody might have some words of wisdom on the issue? Thank you, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Fri,

Re: indexing mysql database

2010-10-18 Thread Erick Erickson
Also, the little-advertised DIH debug page can help, see: solr/admin/dataimport.jsp Best Erick On Sun, Oct 17, 2010 at 11:56 AM, William Pierce evalsi...@hotmail.comwrote: Two suggestions: a) Noticed that your dih spec in the solrconfig.xml seems to to refer to db-data-config.xml but you

RE: SOLR DateTime and SortableLongField field type problems

2010-10-18 Thread Michael Sokolov
I think if you look closely you'll find the date quoted in the Exception report doesn't match any of the declared formats in the schema. I would suggest, as a first step, hunting through your data to see where that date is coming from. -Mike -Original Message- From: Ken Stanley

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Jason, Kim
Hi, Gora I haven't tried yet indexing huge amount of xml files through curl or pure java(like a post.jar). Indexing through xml is really fast? How many files did you index? And How did it(using curl or pure java)? Thanks, Gora -- View this message in context:

Re: solr requirements

2010-10-18 Thread Erick Erickson
Well, always get the biggest, fastest machine you can G... On a serious note, you're right, there's not much info to go on here. And even if there were more info, Solr performance depends on how you search your data as well as how much data you have... About the only way you can really tell is

Re: Virtual field, Statistics

2010-10-18 Thread Erick Erickson
The beauty/problem with open source is issues are picked up when somebody thinks they're important enough and has the time/energy to work on it. And that person can be you G... What usually happens is that someone submits a patch, various people comment on it, look it over, ask for changes or

Re: Boosting documents based on the vote count

2010-10-18 Thread Ahmet Arslan
I know but I can't figure out what functions to use. :) Oh, I see. Why not just use {!boost b=log(vote)}? May be scale(vote,0.5,10)?

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Gora Mohanty
On Mon, Oct 18, 2010 at 5:26 PM, Jason, Kim hialo...@gmail.com wrote: Hi, Gora I haven't tried yet indexing huge amount of xml files through curl or pure java(like a post.jar). Indexing through xml is really fast? How many files did you index? And How did it(using curl or pure java)? [...]

Re: solr requirements

2010-10-18 Thread satya swaroop
Hi, here is some more info about it. I use Solr to output only the file names(file id's). Here i enclose the fields in my schema.xml and presently i have only about 40MB of indexed data. field name=id type=string indexed=true stored=true required=true / field name=sku type=textTight

RE: query between two date

2010-10-18 Thread Jonathan Rochkind
Recommend using the pdate format for faster range queries. Here's how (or one way) to do a range query in solr defType=luceneq=some_field:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z] Does that answer your question? I don't really understand what you're trying to do with your two

Re: SOLR DateTime and SortableLongField field type problems

2010-10-18 Thread Ken Stanley
On Mon, Oct 18, 2010 at 7:52 AM, Michael Sokolov soko...@ifactory.comwrote: I think if you look closely you'll find the date quoted in the Exception report doesn't match any of the declared formats in the schema. I would suggest, as a first step, hunting through your data to see where that

Re: API for using Multi cores with SolrJ

2010-10-18 Thread Tharindu Mathew
Thanks Peter. That helps a lot. It's weird that this not documented anywhere. :( On Mon, Oct 18, 2010 at 3:42 PM, Peter Karich peat...@yahoo.de wrote: I asked this myself ... here could be some pointers: http://lucene.472066.n3.nabble.com/SolrJ-and-Multi-Core-Set-up-td1411235.html

Re: API for using Multi cores with SolrJ

2010-10-18 Thread Ryan McKinley
On Mon, Oct 18, 2010 at 10:12 AM, Tharindu Mathew mcclou...@gmail.com wrote: Thanks Peter. That helps a lot. It's weird that this not documented anywhere. :( Feel free to edit the wiki :)

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Ryan McKinley
Do you already have the files as solr XML? If so, I don't think you need solrj If you need to build SolrInputDocuments from your existing structure, solrj is a good choice. If you are indexing lots of stuff, check the StreamingUpdateSolrServer:

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Jason, Kim
Thank you for reply, Gora But I still have several questions. Did you use separate index? If so, you indexed 0.7 million Xml files per instance and merged it. Is it Right? Please let me know how to work multiple instances and cores in your case. Regards, -- View this message in context:

Re: Disable (or prohibit) per-field overrides

2010-10-18 Thread Jonathan Rochkind
You know about the 'invariant' that can be set in the request handler, right? Not sure if that will do for you or not, but sounds related. Added recnetly to some wiki page somewhere although the feature has been there for a long time. Let's see if I can find the wiki page...Ah yes:

Re: Disable (or prohibit) per-field overrides

2010-10-18 Thread Markus Jelsma
Thanks for your reply. But as i replied the following to Erick's suggestion which is quite the same: Yes, we're using it but the problem is that there can be many fields and that means quite a large list of parameters to set for each request handler, and there can be many request handlers.

query pending commits?

2010-10-18 Thread Ryan McKinley
I have an indexing pipeline that occasionally needs to check if a document is already in the index (even if not commited yet). Any suggestions on how to do this without calling commit/ before each check? I have a list of document ids and need to know which ones are in the index (actually I need

Commits on service after shutdown

2010-10-18 Thread Ezequiel Calderara
Hi, i'm new in the mailing list. I'm implementing Solr in my actual job, and i'm having some problems. I was testing the consistency of the commits. I found for example that if we add X documents to the index (without commiting) and then we restart the service, the documents are commited. They

Re: Commits on service after shutdown

2010-10-18 Thread Israel Ekpo
The documents should be implicitly committed when the Lucene index is closed. When you perform a graceful shutdown, the Lucene index gets closed and the documents get committed implicitly. When the shutdown is abrupt as in a KILL -9, then this does not happen and the updates are lost. You can

RE: how can i use solrj binary format for indexing?

2010-10-18 Thread Sharp, Jonathan
Hi all I have a huge amount of xml files for indexing. I want to index using solrj binary format to get performance gain. Because I heard that using xml files to index is quite slow. But I don't know how to use index through solrj binary format and can't find examples. Please give some help.

ApacheCon Atlanta Meetup

2010-10-18 Thread Grant Ingersoll
Is there interest in having a Meetup at ApacheCon? Who's going? Would anyone like to present? We could do something less formal, too, and just have drinks and QA/networking. Thoughts? -Grant

Spell checking question from a Solr novice

2010-10-18 Thread Xin Li
Hi, I am looking for a quick solution to improve a search engine's spell checking performance. I was wondering if anyone tried to integrate Google SpellCheck API with Solr search engine (if possible). Google spellcheck came to my mind because of two reasons. First, it is costly to clean up

Re: Commits on service after shutdown

2010-10-18 Thread Ezequiel Calderara
I understand, but i want to have control of what is commit or not. In our scenario, we want to add documents to the index, and maybe after an hour trigger the commit. If in the middle, we have a server shutdown or any process sending a Shutdown signal to the process. I don't want those documents

Re: Commits on service after shutdown

2010-10-18 Thread Matthew Hall
No.. you would just turn autocommit off, and have the thread that is doing updates to your indexes commit every hour. I'd think that this would take care of the scenario that you are describing. Matt On 10/18/2010 3:50 PM, Ezequiel Calderara wrote: I understand, but i want to have control

Re: Commits on service after shutdown

2010-10-18 Thread Ezequiel Calderara
But if something happens in between that hour, i will have lost or committed the documents to the index out of the schedule. How can i handle this scenario? I think that Solr (or Lucene) should make sure of the durabilityhttp://en.wikipedia.org/wiki/Durability_(database_systems)of the data even

RE: Spell checking question from a Solr novice

2010-10-18 Thread Xin Li
Oops, never mind. Just read Google API policy. 1000 queries per day limit for non-commercial use only. -Original Message- From: Xin Li Sent: Monday, October 18, 2010 3:43 PM To: solr-user@lucene.apache.org Subject: Spell checking question from a Solr novice Hi, I am looking for a

Re: Commits on service after shutdown

2010-10-18 Thread Ezequiel Calderara
I'll see if i can resolve this adding an extra core with the same schema for holding this documents. So, Core0 will act as a Queue and the Core1 will be the real index. And the commit in the core0 will trigger an add to the core1 and its commit. That way i can be sure of not losing data. It

Re: Spell checking question from a Solr novice

2010-10-18 Thread Jonathan Rochkind
In general, the benefit of the built-in Solr spellcheck is that it can use a dictionary based on your actual index. If you want to use some external API, you certainly can, in your actual client app -- but it doesn't really need to involve Solr at all anymore, does it? Is there any benefit

Re: Spell checking question from a Solr novice

2010-10-18 Thread Pradeep Singh
I think a spellchecker based on your index has clear advantages. You can spellcheck words specific to your domain which may not be available in an outside dictionary. You can always dump the list from wordnet to get a starter english dictionary. But then it also means that misspelled words from

Re: Spell checking question from a Solr novice

2010-10-18 Thread Jason Blackerby
If you know the misspellings you could prevent them from being added to the dictionary with a StopFilterFactory like so: fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter

Schema required?

2010-10-18 Thread Frank Calfo
We need to index documents where the fields in the document can change frequently. It appears that we would need to update our Solr schema definition before we can reindex using new fields. Is there any way to make the Solr schema optional? --frank

I need to indexing the first character of a field in another field

2010-10-18 Thread Renato Wesenauer
Hello guys, I need to indexing the first character of the field autor in another field inicialautor. Example: autor = Mark Webber inicialautor = M I did a javascript function in the dataimport, but the field inicialautor indexing empty. The function: function InicialAutor(linha) {

RE: Schema required?

2010-10-18 Thread Tim Gilbert
Hi Frank, Check out the Dynamic Fields option from here http://wiki.apache.org/solr/SchemaXml Tim -Original Message- From: Frank Calfo [mailto:fca...@aravo.com] Sent: Monday, October 18, 2010 5:25 PM To: solr-user@lucene.apache.org Subject: Schema required? We need to index documents

Admin for spellchecker?

2010-10-18 Thread Pradeep Singh
Do we need an admin screen for spellchecker? Where you can browse the words and delete the ones you don't like so that they don't get suggested?

Re: Spell checking question from a Solr novice

2010-10-18 Thread Ezequiel Calderara
You can cross the new words against a dictionary and keep them in the file as Jason described... What Pradeep said is true, is always better to have suggestions related to your index that have suggestions with no results... On Mon, Oct 18, 2010 at 6:24 PM, Jason Blackerby

Re: I need to indexing the first character of a field in another field

2010-10-18 Thread Ezequiel Calderara
How are you declaring the transformer in the dataconfig? On Mon, Oct 18, 2010 at 6:31 PM, Renato Wesenauer renato.wesena...@gmail.com wrote: Hello guys, I need to indexing the first character of the field autor in another field inicialautor. Example: autor = Mark Webber inicialautor

Re: I need to indexing the first character of a field in another field

2010-10-18 Thread Pradeep Singh
You can use regular expression based template transformer without writing a separate function. It's pretty easy to use. On Mon, Oct 18, 2010 at 2:31 PM, Renato Wesenauer renato.wesena...@gmail.com wrote: Hello guys, I need to indexing the first character of the field autor in another field

Re: Admin for spellchecker?

2010-10-18 Thread Ezequiel Calderara
i was thinking about, you also would need to mark a word like valid, so it doesn't mark it as wrong. On Mon, Oct 18, 2010 at 6:37 PM, Pradeep Singh pksing...@gmail.com wrote: Do we need an admin screen for spellchecker? Where you can browse the words and delete the ones you don't like so that

Re: Schema required?

2010-10-18 Thread Jonathan Rochkind
Frank Calfo wrote: We need to index documents where the fields in the document can change frequently. It appears that we would need to update our Solr schema definition before we can reindex using new fields. Is there any way to make the Solr schema optional? No. But you can design your

Re: I need to indexing the first character of a field in another field

2010-10-18 Thread Jonathan Rochkind
You can just do this with a copyfield in your schema.xml instead. Copy to a field which uses regexpfilter or some other analyzer to limit to first non-whitespace char (and perhaps force upcase too if you want). That's what I'd do, easier and will work if you index to Solr from something other

Re: I need to indexing the first character of a field in another field

2010-10-18 Thread Chris Hostetter
This exact topic was just discussed a few days ago... http://search.lucidimagination.com/search/document/7b6e2cc37bbb95c8/faceting_and_first_letter_of_fields#3059a28929451cb4 My comments on when/where it makes sense to put this logic...

Removing Common Web Page Header and Footer from All Content Fetched by Nutch

2010-10-18 Thread Israel Ekpo
Hi All, I am indexing a web application with approximately 9500 distinct URL and contents using Nutch and Solr. I use Nutch to fetch the urls, links and the crawl the entire web application to extract all the content for all pages. Then I run the solrindex command to send the content to Solr.

Re: How can i get collect stemmed query?

2010-10-18 Thread Jerad
Thanks for your reply :) 1. I tested that q=*:*fl=body , 1 doc returned as result as I expected. 2. I'm edit my scheme.xml as you instructed. analyzer type=query class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer //No filter description. /analyzer but no result

Setting solr home directory in websphere

2010-10-18 Thread Kevin Cunningham
I've installed Solr a hundred times using Tomcat (on Windows) but now need to get it going with WebSphere (on Windows). For whatever reason this seems to be black magic :) I've installed the war file but have no idea how to set Solr home to let WebSphere know where the index and config files

Re: Setting solr home directory in websphere

2010-10-18 Thread Israel Ekpo
You need to make sure that the following system variable is one of the values specific in the JAVA_OPTS environment variable -Dsolr.solr.home=path_to_solr_home On Mon, Oct 18, 2010 at 10:20 PM, Kevin Cunningham kcunning...@telligent.com wrote: I've installed Solr a hundred times using

snapshot-4.0 and maven

2010-10-18 Thread Matt Mitchell
I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is this possible to do? If so, could someone give me a tip or two on getting started? Thanks, Matt

Re: snapshot-4.0 and maven

2010-10-18 Thread Tommy Chheng
Once you built the solr 4.0 jar, you can use mvn's install command like this: mvn install:install-file -DgroupId=org.apache -DartifactId=solr -Dpackaging=jar -Dversion=4.0-SNAPSHOT -Dfile=solr-4.0-SNAPSHOT.jar -DgeneratePom=true @tommychheng On 10/18/10 7:28 PM, Matt Mitchell wrote: I'd

Re: Spell checking question from a Solr novice

2010-10-18 Thread Dennis Gearon
The first question to ask is will it work for you. The SECOND question is do you want google to know what's in your data? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so

Re: ApacheCon Atlanta Meetup

2010-10-18 Thread Dennis Gearon
I would love to go, but funds are low right now. NEXT year, I'd have something to demo though :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make

'Advertising' a site

2010-10-18 Thread Dennis Gearon
When I get my site which uses Solr/Lucene going, is is considered polite to post a small paragraph about it with a link? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so

Re: 'Advertising' a site

2010-10-18 Thread Otis Gospodnetic
Hi Dennis, There is a PoweredBy page on the Wiki that's good for that. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dennis Gearon gear...@sbcglobal.net To:

Re: Schema required?

2010-10-18 Thread Otis Gospodnetic
Solr requires a schema. But Lucene does not! :) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Frank Calfo fca...@aravo.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org

Re: 'Advertising' a site

2010-10-18 Thread Dennis Gearon
Cool, thanks! Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

Re: Removing Common Web Page Header and Footer from All Content Fetched by Nutch

2010-10-18 Thread Otis Gospodnetic
Hi Israel, You can use this: http://search-lucene.com/?q=boilerpipefc_project=Tika Not sure if it's built into Nutch, though... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Israel

count(*) equivilent in Solr/Lucene

2010-10-18 Thread Dennis Gearon
Is there something in Solr/Lucene that could give me the equivalent to: SELECT COUNT(*) WHERE date_column1 :start_date AND date_column2 :end_date; Providing I take into account deleted documents, of course (I.E., do some sort of averaging or some tracking function over time.) Dennis

Re: 'Advertising' a site

2010-10-18 Thread Chris Hostetter
: There is a PoweredBy page on the Wiki that's good for that. Even better is a post to the list telling folks about your usee case, index size, hardware, etc A lot of new users find that information really helpful for comparison. -Hoss

Re: count(*) equivilent in Solr/Lucene

2010-10-18 Thread Chris Hostetter
: : SELECT : COUNT(*) : WHERE : date_column1 :start_date AND : date_column2 :end_date; q=*:*fq=column1:[start TO *]fq=column2:[end TO *]rows=0 ...every result includes a total count. -Hoss