Re: Chinese chars are not indexed ?

2010-06-28 Thread Ahmet Arslan
I am using the sample, not deploying Solr in Tomcat. Is there a place I can modify this setting ? Ha, okey if you are using jetty with java -jar start.jar then it is okey. But for Chinese you need special tokenizer since Chinese is written without spaces between words. tokenizer

Re: Chinese chars are not indexed ?

2010-06-28 Thread go canal
oh yes, *...* works. thanks. I saw tokenizer is defined in schema.xml. There are a few places that define the tokenizer. Wondering if it is enough to define one for: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index !-- this is the

Re: Chinese chars are not indexed ?

2010-06-28 Thread Ahmet Arslan
oh yes, *...* works. thanks. I saw tokenizer is defined in schema.xml. There are a few places that define the tokenizer. Wondering if it is enough to define one for: It is better to define a brand new field type specific to Chinese.

one to many denormalization approach

2010-06-28 Thread Michael Delaney
Hi, I have an architectural question about using apache solr/lucene. I'm building a solr index for searching a CV database. Basically every CV on there will have some fields like: rate of pay, address, title these fields are straight forward. The area I need advise on is, skills and job

Question about the mailinglist (junk on my behalf)

2010-06-28 Thread MitchK
Hello community, since a few days I recieve daily some mails with suspicious content. It is said that some of my mails were rejected, because of the file-types of the mail's attachements and other things. This wonders me a lot, because I didn't send any mails with attachements and even the

RE: is there a delete all command in updateHandler?

2010-06-28 Thread Daniel Alheiros
Hi Li, Yes, you can issue a delete all by: curl http://your_solr_server:your_solr_port/solr/update -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete'; Hope it helps. Cheers, Daniel -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: 28 June 2010 03:41

preside != president

2010-06-28 Thread Darren Govoni
Hi, It seems to me that because the stemming does not produce grammatically correct stems in many of the cases, search anomalies can occur like the one I am seeing where I have a document with president in it and it is returned when I search for preside, a different word entirely. Is this

Search limit to the first 50 000 chars for one field

2010-06-28 Thread judauphant
Hi, I use solr 1.4 for search contents in documents (pdf, doc, odt ...). I use the module /update/extract. When I am researching, I am limited to the first 5 characters (approximately). Any word or sentence after is not found (but the field has more than 5 characters when I recovered it

Re: Search limit to the first 50 000 chars for one field

2010-06-28 Thread Ahmet Arslan
I use solr 1.4 for search contents in documents (pdf, doc, odt ...). I use the module /update/extract. When I am researching, I am limited to the first 5 characters (approximately). Any word or sentence after is not found (but the field has more than 5 characters when I recovered

Re: Data Import Handler Rich Format Documents

2010-06-28 Thread Alexey Serba
Ok, I'm trying to integrate the TikaEntityProcessor as suggested.  I'm using Solr Version: 1.4.0 and getting the following error: java.lang.ClassNotFoundException: Unable to load BinURLDataSource or org.apache.solr.handler.dataimport.BinURLDataSource It seems that DIH-Tika integration is not

Strange query behavior

2010-06-28 Thread Marc Ghorayeb
Hello, I have a title that says 3DVIA Studio amp; Virtools Maya and 3dsMax Exporters. The analysis tool for this field gives me these tokens:3dviadviastudio;virtoolmaya3dsmaxdssystèmmaxexport However, when i search for 3dsmax, i get no results :( Furthermore, if i search for dsmax i get the

Re: Search limit to the first 50 000 chars for one field

2010-06-28 Thread judauphant
Ok thanks, it works. Best regards, Julien -- View this message in context: http://lucene.472066.n3.nabble.com/Search-limit-to-the-first-50-000-chars-for-one-field-tp927635p927725.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: preside != president

2010-06-28 Thread Brendan Grainger
Hi Darren, You might want to look at the KStemmer (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem) instead of the standard PorterStemmer. It essentially has a 'dictionary' of exception words where stemming stops if found, so in your case president won't be stemmed any

Re: preside != president

2010-06-28 Thread darren
Thanks for the tip. Yeah, I think the stemming confounds search results as it stands (porter stemmer). I was also thinking of using my dictionary of 500,000 words with their complete morphologies and conjugations and create a synonyms.txt to provide english accurate morphology. Is this a good

DataImportHandler $deleteDocById question

2010-06-28 Thread André Maldonado
Hi all. I'm trying to get $deleteDocById working, but any document is being deleted from my index. I'm using Full-Import (withOUT cleaning) and a script with: row.put('$deleteDocById', row.get('codAnuncio')); The script is passing in this line for every document it processes (for testing

custom core admin handler

2010-06-28 Thread Dave Hall
Hi all, I have been using Solr for quite a while, but I never really got into looking at the code. Last week that all changed, I decided to write a custom core admin handler. I've posted something on my blog about it, along with a Drupal centric howto. I'd be interested to know what people

Re: Chinese chars are not indexed ?

2010-06-28 Thread Andy
What if Chinese is mixed with English? I have text that is entered by users and it could be a mix of Chinese, English, etc. What's the best way to handle that? Thanks. --- On Mon, 6/28/10, Ahmet Arslan iori...@yahoo.com wrote: From: Ahmet Arslan iori...@yahoo.com Subject: Re: Chinese chars

Re: preside != president

2010-06-28 Thread Joe Calderon
the general consensus among people who run into the problem you have is to use a plurals only stemmer, a synonyms file or a combination of both (for irregular nouns etc) if you search the archives you can find info on a plurals stemmer On Mon, Jun 28, 2010 at 6:49 AM, dar...@ontrenet.com wrote:

Re: Strange query behavior

2010-06-28 Thread Joe Calderon
splitOnCaseChange is creating multiple tokens from 3dsMax disable it or enable catenateAll, use the analysys page in the admin tool to see exactly how your text will be indexed by analyzers without having to reindex your documents, once you have it right you can do a full reindex. On Mon, Jun 28,

Re: questions about Solr shards

2010-06-28 Thread Joe Calderon
there is a first pass query to retrieve all matching document ids from every shard along with relevant sorting information, the document ids are then sorted and limited to the amount needed, then a second query is sent for the rest of the documents metadata. On Sun, Jun 27, 2010 at 7:32 PM, Babak

Too Many Open Files

2010-06-28 Thread Anderson vasconcelos
Hi all When i send a delete query to SOLR, using the SOLRJ i received this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Too many open files 11:53:06,964 INFO [HttpMethodDirector] I/O exception (java.net.SocketException) caught when processing request:

Re: Too Many Open Files

2010-06-28 Thread Erick Erickson
This probably means you're opening new readers without closing old ones. But that's just a guess. I'm guessing that this really has nothing to do with the delete itself, but the delete is what's finally pushing you over the limit. I know this has been discussed before, try searching the mail

solr data config questions

2010-06-28 Thread Peng, Wei
Hi All, I am a new user of Solr. We are now trying to enable searching on Digg dataset. It has story_id as the primary key and comment_id are the comment id which commented story_id, so story_id and comment_id is one-to-many relationship. These comment_ids can be replied by some repliers,

Re: preside != president

2010-06-28 Thread Jan Høydahl / Cominvent
Hi, You might also want to check out the new Lucene-Hunspell stemmer at http://code.google.com/p/lucene-hunspell/ It uses OpenOffice dictionaries with known stems in combination with a large set of language specific rules. It handles your example, but it is an early release, so test it

Re: SweetSpotSimilarity

2010-06-28 Thread Blargy
iorixxx wrote: it is in schema.xml: similarity class=org.apache.lucene.search.SweetSpotSimilarity/ How would you configure the tfBaselineTfFactors and LengthNormFactors when configuring via schema.xml? Do I have to create a subclass that hardcodes these values? -- View this message in

Re: SweetSpotSimilarity

2010-06-28 Thread Ahmet Arslan
How would you configure the tfBaselineTfFactors and LengthNormFactors when configuring via schema.xml? CustomSimilarityFactory that extends org.apache.solr.schema.SimilarityFactory should do it. There is an example CustomSimilarityFactory.java under src/test/org...

Re: Spatial types and DIH

2010-06-28 Thread Grant Ingersoll
On Jun 24, 2010, at 12:32 AM, Eric Angel wrote: I'm using solr 4.0-2010-06-23_08-05-33 and can't figure out how to add the spatial types (LatLon, Point, GeoHash or SpatialTile) using dataimporthandler. My lat/lngs from the database are in separate fields. Does anyone know how to do his?

Re: Spatial types and DIH

2010-06-28 Thread Eric Angel
Yes. For now, I've gone back to Lucene 1.4 and installed Local Lucene. I just couldn't get the sfilt to work. I'm sure I was probably missing something, but I think I'll just wait until 1.5 is ready to be shipped. On Jun 28, 2010, at 12:02 PM, Grant Ingersoll wrote: On Jun 24, 2010, at

Re: SweetSpotSimilarity

2010-06-28 Thread Blargy
iorixxx wrote: CustomSimilarityFactory that extends org.apache.solr.schema.SimilarityFactory should do it. There is an example CustomSimilarityFactory.java under src/test/org... This is exactly what I was looking for... this is very similar ( no put intended ;) ) to the

spellcheckcomponent and frequency thresholds

2010-06-28 Thread Matthew Goldfield
Hi, I'm adding the spellCheckComponent to my current configuration of solr, and I was wondering if there was a way to set a minimum frequency threshold for the IndexBasedSpellChecker through solr like there is in the depreciated Spell Check Request Handler. I know that you can fix most

Re: Too Many Open Files

2010-06-28 Thread Michel Bottan
Hi Anderson, If you are using SolrJ, it's recommended to reuse the same instance per solr server. http://wiki.apache.org/solr/Solrj#CommonsHttpSolrServer But there are other scenarios which may cause this situation: 1. Other application running in the same Solr JVM which doesn't close properly

Re: solr data config questions

2010-06-28 Thread Alexey Serba
Hi, You can add additional commentreplyjoin entity to story entity, i.e. entity name=story ... ... entity name=commenttable ... ... entity name=replytable ... ... /entity /entity entity name=commentreplyjoin query=select concat(comment_id, ',',

Very basic questions: Indexing text

2010-06-28 Thread Peter Spam
Hi everyone, I'm looking for a way to index a bunch of (potentially large) text files. I would love to see results like Google, so I went through a few tutorials, but I've still got questions: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to

Re: Very basic questions: Indexing text

2010-06-28 Thread Ahmet Arslan
1) I can get my docs in the index, but when I search, it returns the entire document.  I'd love to have it only return the line (or two) around the search term. Solr can generate Google-like snippets as you describe. http://wiki.apache.org/solr/HighlightingParameters 2) There are one or two

Re: Very basic questions: Indexing text

2010-06-28 Thread Peter Spam
Great, thanks for the pointers. Thanks, Peter On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to have it only return the line (or two) around the search term. Solr can generate Google-like

DIH and denormalizing

2010-06-28 Thread Shawn Heisey
I am trying to do some denormalizing with DIH from a MySQL source. Here's part of my data-config.xml: entity name=dataTable pk=did query=SELECT *,FROM_UNIXTIME(post_date) as pd FROM ncdat WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did

Re: Too Many Open Files

2010-06-28 Thread Anderson vasconcelos
Thanks for responses. I instantiate one instance of per request (per delete query, in my case). I have a lot of concurrency process. Reusing the same instance (to send, delete and remove data) in solr, i will have a trouble? My concern is if i do this, solr will commit documents with data from

Optimizing cache

2010-06-28 Thread Blargy
Here is a screen shot for our cache from New Relic. http://s4.postimage.org/mmuji-31d55d69362066630eea17ad7782419c.png Query cache: 55-65% Filter cache: 100% Document cache: 63% Cache size is 512 for above 3 caches. How do I interpret this data? What are some optimal configuration changes

RE: DIH and denormalizing

2010-06-28 Thread caman
In your query 'query=SELECT webtable as wt FROM ncdat_wt WHERE featurecode='${ncdat.feature}' .. instead of ${ncdat.feature} use ${dataTable.feature} where dataTable is your parent entity name. From: Shawn Heisey-4 [via Lucene] [mailto:ml-node+929151-1527242139-124...@n3.nabble.com]

unknown handler dataimport

2010-06-28 Thread Lance Hill
Hi, I am trying to get db indexing up and running, but I am having trouble getting it working. In the solrconfig.xml file, I added requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str

Re: DIH and denormalizing

2010-06-28 Thread Shawn Heisey
On 6/28/2010 3:28 PM, caman wrote: In your query 'query=SELECT webtable as wt FROM ncdat_wt WHERE featurecode='${ncdat.feature}' .. instead of ${ncdat.feature} use ${dataTable.feature} where dataTable is your parent entity name. I knew it would be something stupid like that. I thought I

Re: DIH and denormalizing

2010-06-28 Thread Alexey Serba
It seems that ${ncdat.feature} is not being set. Try ${dataTable.feature} instead. On Tue, Jun 29, 2010 at 1:22 AM, Shawn Heisey s...@elyograg.org wrote: I am trying to do some denormalizing with DIH from a MySQL source.  Here's part of my data-config.xml: entity name=dataTable pk=did      

Re: Too Many Open Files

2010-06-28 Thread Anderson vasconcelos
Other question, Why SOLRJ d'ont close the StringWriter e OutputStreamWriter ? thanks 2010/6/28 Anderson vasconcelos anderson.v...@gmail.com Thanks for responses. I instantiate one instance of per request (per delete query, in my case). I have a lot of concurrency process. Reusing the same

Re: Very basic questions: Indexing text

2010-06-28 Thread Peter Spam
On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to have it only return the line (or two) around the search term. Solr can generate Google-like snippets as you describe.

Re: Very basic questions: Indexing text

2010-06-28 Thread Erick Erickson
try adding hl.fl=text to specify your highlight field. I don't understand why you're only getting the ID field back though. Do note that the highlighting is after the docs, related by the ID. Try a (non highlighting) query of just * to verify that you're pointing at the index you think you are.

What is the proper procedure to reopen closed bugs?

2010-06-28 Thread Teruhiko Kurosaka
I'd like to reopen a bug SOLR-1960 https://issues.apache.org/jira/browse/SOLR-1960 http://wiki.apache.org/solr/ : non-English users get generic MoinMoin page instead of the desired information as I submitted a patch. But jira won't let me do it. Do I have to clone it? Teruhiko Kuro

AutoSuggest Question

2010-06-28 Thread Neil Lott
Hi, I've read some on the autosuggest and I would like to know if the following is possible with my current configuration. I'm using solr 1.4. field name=title type=text indexed=true stored=true required=true/ field name=titleac3 type=autocomplete3 indexed=true stored=true omitNorms=true

Re: Very basic questions: Indexing text

2010-06-28 Thread Michael Lackhoff
On 28.06.2010 23:00 Ahmet Arslan wrote: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to have it only return the line (or two) around the search term. Solr can generate Google-like snippets as you describe.