timezone DIH and dataimport.properties

2011-04-26 Thread stockii
Hello. How can i set the timezone oft java in my java properties ? my problem is, that in the dataimport-properties is a wrong timezone and i dont know how to set the correct timezone ... !?!? thx - --- System One

Re: how to concatenate two nodes of xml with xpathentityprocessor

2011-04-26 Thread Stefan Matheis
Vishal, i don't really understand what you're trying to achieve? indexing what (complete/sample documents, valid if possible)? And getting what exactly as result? Regards Stefan On Mon, Apr 25, 2011 at 5:01 PM, vrpar...@gmail.com vrpar...@gmail.com wrote: hello , i am using

Re: timezone DIH and dataimport.properties

2011-04-26 Thread Stefan Matheis
java -Duser.timezone=UTC -jar start.jar ? On Tue, Apr 26, 2011 at 9:54 AM, stockii stock.jo...@googlemail.com wrote: Hello. How can i set the timezone oft java in my java properties ? my problem is, that in the dataimport-properties is a wrong timezone and i dont know how to set the correct

Problem with autogeneratePhraseQueries

2011-04-26 Thread Solr Beginner
Hi, I'm new to solr. My solr instance version is: Solr Specification Version: 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Specification Version: 3.1.0 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Current Time: Tue Apr 26

Re: Problem with autogeneratePhraseQueries

2011-04-26 Thread Robert Muir
What do you have in solrconfig.xml for luceneMatchVersion? If you don't set this, then its going to default to Lucene 2.9 emulation so that old solr 1.4 configs work the same way. I tried your example and it worked fine here, and I'm guessing this is probably whats happening. the default in the

Re: Query regarding solr plugin.

2011-04-26 Thread Erick Erickson
Sorry, but there's too much here to debug remotely. I strongly advise you back wy up. Undo (but save) all your changes. Start by doing the simplest thing you can, just get a dummy class in place and get it called. Perhaps create a really dumb logger method that opens a text file, writes a

Re: Problem with autogeneratePhraseQueries

2011-04-26 Thread Solr Beginner
Thank you very much for answer. You were right. There was no luceneMatchVersion in solrconfig.xml of our dev core. We thought that values not present in core configuration are copied from main solrconfig.xml. I will investigate if our administrators did something wrong during upgrade to 3.1. On

Re: how to concatenate two nodes of xml with xpathentityprocessor

2011-04-26 Thread vrpar...@gmail.com
Thanks Stefan currently in dataconfig file part of xPathEntityProcessor entity name=x processor=XPathEntityProcessor forEach=/FULL url=D:\Files\${_FileName} dataSource=FD field column=id xpath=/FULL/Customer/@id / field column=Customer

What initialize new searcher?

2011-04-26 Thread Solr Beginner
Hi, I'm reading solr cache documentation - http://wiki.apache.org/solr/SolrCaching I found there The current Index Searcher serves requests and when a new searcher is opened Could you explain when new searcher is opened? Does it have something to do with index commit? Best Regards, Solr

TermsCompoment + Dist. Search + Large Index + HEAP SPACE

2011-04-26 Thread mdz-munich
Hi! We've got one index splitted into 4 shards á 70.000 records of large full-text data from (very dirty) OCR. Thus we got a lot of unique terms. No we try to obtain the first 400 most common words for CommonGramsFilter via TermsComponent but the request runs allways out of memory. The VM is

org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'

2011-04-26 Thread vrpar...@gmail.com
Hello, i got following source org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423) at

Re: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'

2011-04-26 Thread Stefan Matheis
http://www.lucidimagination.com/blog/2011/04/01/solr-powered-isfdb-part-8/ On Tue, Apr 26, 2011 at 3:34 PM, vrpar...@gmail.com vrpar...@gmail.com wrote: Hello, i got following source org.apache.solr.common.SolrException: Error loading class

Re: Automatic synonyms for multiple variations of a word

2011-04-26 Thread Robert Muir
On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: But somehow this feels bad (well, so does sticking word variations in what's supposed to be a synonyms file), partly because it means that the person adding new synonyms would need to know what they stem to

WhitespaceTokenizer and scoring(field length)

2011-04-26 Thread roySolr
Hello, I have a problem with the whitespaceTokenizer and scoring. An example: id Titel 1 Manchester united 2 Manchester With the whitespaceTokenizer Manchester united will be splitted to Manchester and united. When i search for

RE: TermsCompoment + Dist. Search + Large Index + HEAP SPACE

2011-04-26 Thread Burton-West, Tom
Don't know your use case, but if you just want a list of the 400 most common words you can use the lucene contrib. HighFreqTerms.java with the - t flag. You have to point it at your lucene index. You also probably don't want Solr to be running and want to give the JVM running HighFreqTerms a

Apache Solr 3.1.0

2011-04-26 Thread Wodek Siebor
I'm trying to tokenize email and IP addresses using StandardTokenizerFactory. It does correctly tokenize IP address but it divides email address into two tokens one with value before '@' and the other with value after that. It works correctly under Solr 1.4.1 Has anybody else tried similar

RE: Apache Solr 3.1.0

2011-04-26 Thread Steven A Rowe
Hi Wodek, UAX29URLEmailTokenizer includes all of StandardTokenizer's rules and adds rules to tokenize URLs and Emails: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.UAX29URLEmailTokenizerFactory Steve -Original Message- From: Wodek Siebor

Problems with Spellchecker in 3.1

2011-04-26 Thread Bob Sandiford
Hi, all. Sorry for any duplication - seems like what I sent yesterday never made it through... We're having some troubles with the Solr Spellcheck Response. We're running version 3.1. Overview: If we search for something really ugly like: kljhklsdjahfkljsdhf book rck then

Ebay Kleinanzeigen and Auto Suggest

2011-04-26 Thread Eric Grobler
Hi Someone told me that ebay is using solr. I was looking at their Auto Suggest implementation and I guess they are using Shingles and the TermsComponent. I managed to get a satisfactory implementation but I have a problem with category specific filtering. Ebay suggestions are sensitive to

Solr Newbie: Starting embedded server with multicore

2011-04-26 Thread Simon, Richard T
I'm just starting with Solr. I'm using Solr 3.1.0, and I want to use EmbeddedSolrServer with a multicore setup, even though I currently have only one core (various documents I read suggest starting that way even if you have one core, to get the better administrative tools supported by

RE: TermsCompoment + Dist. Search + Large Index + HEAP SPACE

2011-04-26 Thread mdz-munich
Thanks for your suggestion. It seems to be the use of shards and TermsComponent together. Now we simple requesting shard-by-shard without shard and shard.qt params and merge the results via XSLT. Sebastian -- View this message in context:

Re: What initialize new searcher?

2011-04-26 Thread Erick Erickson
You're on the right track. In a system where the indexing process and search process are on the same machine, commits by the index process cause a new searcher to opened. In a master/slave situation (assuming you are indexing on the master and searching on the slave), then the searchers are

Re: WhitespaceTokenizer and scoring(field length)

2011-04-26 Thread Erick Erickson
First, you can give us some more data to work with G... In particular, attach debugQuery=on to your http request and post the results. That will show how the documents got their score. Also, show us the fieldType definition and field definition for the field in question. Best Erick On Tue, Apr

Question on Batch process

2011-04-26 Thread Charles Wardell
I am sure that this question has been asked a few times, but I can't seem to find the sweetspot for indexing. I have about 100,000 files each containing 1,000 xml documents ready to be posted to Solr. My desire is to have it index as quickly as possible and then once completed the daily stream

Re: Automatic synonyms for multiple variations of a word

2011-04-26 Thread Mike Sokolov
Suppose your analysis stack includes lower-casing, but your synonyms are only supposed to apply to upper-case tokens. For example, PET might be a synonym of positron emission tomography, but pet wouldn't be. -Mike On 04/26/2011 09:51 AM, Robert Muir wrote: On Tue, Apr 26, 2011 at 12:24 AM,

Re: Automatic synonyms for multiple variations of a word

2011-04-26 Thread Robert Muir
Mike, thanks a lot for your example: the idea here would be you would put the lowercasefilter after the synonymfilter, and then you get this exact flexibility? e.g. WhitespaceTokenizer SynonymFilter - no lowercasing of tokens are done as it analyzes your synonyms with just the tokenizer

Re: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'

2011-04-26 Thread Scott Bigelow
I experienced the same issue. With Solr 1.x, I was copying out the 'example' directory to make my solr installation. However, for the Solr 3.x distributions, the DataImportHandler class exists in a directory that is at the same level as example: dist, not a directory within. You'll either want to

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Robert Petersen
OK this is even more weird... everything is working much better except for one thing: I was testing use cases with our top query terms to make sure the below query settings wouldn't break any existing behavior, and got this most unusual result. The analyzer stack completely eliminated the word

Re: What initialize new searcher?

2011-04-26 Thread Otis Gospodnetic
Hi, Yes, typically after your index has been replicated from master to a slave a commit will be issued and the new searcher will be opened. Before being exposed to regular clients it's a good practice to warm things up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Re: Automatic synonyms for multiple variations of a word

2011-04-26 Thread Mike Sokolov
Yes, I see. Makes sense. It is a bit hard to see a bad case for your proposal in that light. Here is one other example; I'm not sure whether it presents difficulties or not, and may be a bit contrived, but hey, food for thought at least: Say you have set up synonyms between names and

Re: Ebay Kleinanzeigen and Auto Suggest

2011-04-26 Thread Otis Gospodnetic
Hi Eric, Before using the terms component, allow me to point out: * http://sematext.com/products/autocomplete/index.html (used on http://search-lucene.com/ for example) * http://wiki.apache.org/solr/Suggester Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene

SynonymFilterFactory case changes

2011-04-26 Thread Robert Petersen
So if there is a hit in the synonym filter factory, do I need to put the various case changes for a term so that the following WordDelimiterFilter analyzer can do its 'split on case changes' work? Here we see SynonymFilterFactory makes all terms lowercase because this is what is in my

Re: Question on Batch process

2011-04-26 Thread Otis Gospodnetic
Charlie, How's this: * -Xmx2g * ramBufferSizeMB 512 * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows) * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB * use SolrStreamingUpdateServer (with params matching your number of CPU cores) or send batches

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Otis Gospodnetic
Hi Robert, I'm no WDFF expert, but all these zero look suspicious: org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} A quick visit to

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Robert Petersen
Yeah I am about to try turning one on at a time and see what happens. I had a meeting so couldn't do it yet... (darn those meetings) (lol) -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, April 26, 2011 2:37 PM To:

Reader per query request

2011-04-26 Thread cyang2010
Hi, I was wondering if solr open a new lucene IndexReader for every query request? From performance point of view, is there any problem of opening a lot of IndexReaders concurrently, or application shall have some logic to reuse the same IndexReader? Thanks, cy -- View this message in

Field Length and Highlight

2011-04-26 Thread Alejandro Delgadillo
Hi, I¹ve been using solr with Coldfusion9, I¹ve made a couple of adjustment to it in order to fulfill my needs of my client, I¹m using solr as a document search engine for a online library which has documents larger then 20MB and some of them have more than 20 pages. The thing is that... At

Re: SynonymFilterFactory case changes

2011-04-26 Thread Erick Erickson
Yes, order does matter. You're right, putting, say, lowercase in front of WordDelimiter... will mess up the operations of WDFF. The admin/analysis page is *extremely* useful for understanding what happens in the analysis of input. Make sure to check the verbose checkbox. Best Erick On Tue, Apr

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Erick Erickson
I second Otis' comments. Is it possible that you've gotten twisted around by trying to modify these settings and would be better off going back to the WDDF settings in the example schema? I've sometimes found that to be very useful. Also (although I don't think it applies in this case) be aware

Re: Reader per query request

2011-04-26 Thread Erick Erickson
See below On Tue, Apr 26, 2011 at 6:15 PM, cyang2010 ysxsu...@hotmail.com wrote: Hi, I was wondering if solr open a new lucene IndexReader for every query request? no, absolutely not. Solr only opens a reader when the underlying index has changed, say a commit or a replication happens.

Re: Too many open files exception related to solrj getServer too often?

2011-04-26 Thread cyang2010
Just pushing up the topic and look for answers. -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-open-files-exception-related-to-solrj-getServer-too-often-tp2808718p2867976.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reader per query request

2011-04-26 Thread cyang2010
Thanks a lot. That makes sense. -- CY -- View this message in context: http://lucene.472066.n3.nabble.com/Reader-per-query-request-tp2867778p2867995.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: SynonymFilterFactory case changes

2011-04-26 Thread Robert Petersen
But in this case lowercase is after WDF. The question is that when you get a hit in the SynonymFilter on a synonym and where the entries in synonmyms.txt file are all in lower case do I need to add the case changing versions to make WDF work on case changes because it appears the synonym text

Re: Field Length and Highlight

2011-04-26 Thread Koji Sekiguchi
(11/04/27 7:35), Alejandro Delgadillo wrote: Hi, I¹ve been using solr with Coldfusion9, I¹ve made a couple of adjustment to it in order to fulfill my needs of my client, I¹m using solr as a document search engine for a online library which has documents larger then 20MB and some of them have

Re: Question on Batch process

2011-04-26 Thread Charles Wardell
Thank you Otis. Without trying to appear to stupid, when you refer to having the params matching your # of CPU cores, you are talking about the # of threads I can spawn with the StreamingUpdateSolrServer object? Up until now, I have been just utilizing post.sh or post.jar. Are these capable of

Re: SynonymFilterFactory case changes

2011-04-26 Thread Erick Erickson
Ahhh, I mis-read your post.. First, it's not the synonymfilterfactory that's lowercasing anything. The ingorecase=true affects the matching, not the output. The output is probably lowercased because you have it that way in the synonyms.txt file. At least that's what I just saw using the analysis

Suggester or spellcheck return stored fields

2011-04-26 Thread wakemaster 39
Hello all, I am trying to build an autocomplete solution for a website that I run. The current implementation of it is going to be used on who you want to send PM's too. I have it basically working up to this point, The UI is done and the suggester is working in returning possible solutions

Re: How to Update Value of One Field of a Document in Index?

2011-04-26 Thread Peter Spam
My schema: id, name, checksum, body, notes, date I'd like for a user to be able to add notes to the notes field, and not have to re-index the document (since the body field may contain 100MB of text). Some ideas: 1) How about creating another core which only contains id, checksum, and notes?

Re: What initialize new searcher?

2011-04-26 Thread Solr Beginner
Thank you for the answers. I'm moving forward and have few more questions but for separate threads. On Tue, Apr 26, 2011 at 10:47 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Yes, typically after your index has been replicated from master to a slave a commit will be issued and

fieldCache only on stats page

2011-04-26 Thread Solr Beginner
Hi, I can see only fieldCache (nothing about filter, query or document cache) on stats page. What I'm doing wrong? We have two servers with replication. There are two cores(prod, dev) on each server. Maybe I have to add something to solrconfig.xml of cores? Best Regards, Solr Beginner

DataImportHandler in Solr 3.1.0: not updating dataimport.properties last_index_time on delta-import?

2011-04-26 Thread Scott Bigelow
Title pretty much says it all; I've configured the DIH in 3.1.0, and it works great, except the delta-imports are always from the last time a full-import happened, not a delta-import. After a delta-import, dataimport.properties is completely untouched. The documentation implies that the