date:20100628

Re: Chinese chars are not indexed ?

2010-06-28 Thread Ahmet Arslan

I am using the sample, not deploying Solr in Tomcat. Is there a place I can modify this setting ? Ha, okey if you are using jetty with java -jar start.jar then it is okey. But for Chinese you need special tokenizer since Chinese is written without spaces between words. tokenizer

Re: Chinese chars are not indexed ?

2010-06-28 Thread go canal

oh yes, *...* works. thanks. I saw tokenizer is defined in schema.xml. There are a few places that define the tokenizer. Wondering if it is enough to define one for: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index !-- this is the

Re: Chinese chars are not indexed ?

2010-06-28 Thread Ahmet Arslan

oh yes, *...* works. thanks. I saw tokenizer is defined in schema.xml. There are a few places that define the tokenizer. Wondering if it is enough to define one for: It is better to define a brand new field type specific to Chinese.

one to many denormalization approach

2010-06-28 Thread Michael Delaney

Hi, I have an architectural question about using apache solr/lucene. I'm building a solr index for searching a CV database. Basically every CV on there will have some fields like: rate of pay, address, title these fields are straight forward. The area I need advise on is, skills and job

Question about the mailinglist (junk on my behalf)

2010-06-28 Thread MitchK

Hello community, since a few days I recieve daily some mails with suspicious content. It is said that some of my mails were rejected, because of the file-types of the mail's attachements and other things. This wonders me a lot, because I didn't send any mails with attachements and even the

RE: is there a delete all command in updateHandler?

2010-06-28 Thread Daniel Alheiros

Hi Li, Yes, you can issue a delete all by: curl http://your_solr_server:your_solr_port/solr/update -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete'; Hope it helps. Cheers, Daniel -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: 28 June 2010 03:41

preside != president

2010-06-28 Thread Darren Govoni

Hi, It seems to me that because the stemming does not produce grammatically correct stems in many of the cases, search anomalies can occur like the one I am seeing where I have a document with president in it and it is returned when I search for preside, a different word entirely. Is this

Search limit to the first 50 000 chars for one field

2010-06-28 Thread judauphant

Hi, I use solr 1.4 for search contents in documents (pdf, doc, odt ...). I use the module /update/extract. When I am researching, I am limited to the first 5 characters (approximately). Any word or sentence after is not found (but the field has more than 5 characters when I recovered it

Re: Search limit to the first 50 000 chars for one field

2010-06-28 Thread Ahmet Arslan

I use solr 1.4 for search contents in documents (pdf, doc, odt ...). I use the module /update/extract. When I am researching, I am limited to the first 5 characters (approximately). Any word or sentence after is not found (but the field has more than 5 characters when I recovered

Re: Data Import Handler Rich Format Documents

2010-06-28 Thread Alexey Serba

Ok, I'm trying to integrate the TikaEntityProcessor as suggested. I'm using Solr Version: 1.4.0 and getting the following error: java.lang.ClassNotFoundException: Unable to load BinURLDataSource or org.apache.solr.handler.dataimport.BinURLDataSource It seems that DIH-Tika integration is not

Strange query behavior

2010-06-28 Thread Marc Ghorayeb

Hello, I have a title that says 3DVIA Studio amp; Virtools Maya and 3dsMax Exporters. The analysis tool for this field gives me these tokens:3dviadviastudio;virtoolmaya3dsmaxdssystèmmaxexport However, when i search for 3dsmax, i get no results :( Furthermore, if i search for dsmax i get the

Re: Search limit to the first 50 000 chars for one field

2010-06-28 Thread judauphant

Ok thanks, it works. Best regards, Julien -- View this message in context: http://lucene.472066.n3.nabble.com/Search-limit-to-the-first-50-000-chars-for-one-field-tp927635p927725.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: preside != president

2010-06-28 Thread Brendan Grainger

Hi Darren, You might want to look at the KStemmer (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem) instead of the standard PorterStemmer. It essentially has a 'dictionary' of exception words where stemming stops if found, so in your case president won't be stemmed any

Re: preside != president

2010-06-28 Thread darren

Thanks for the tip. Yeah, I think the stemming confounds search results as it stands (porter stemmer). I was also thinking of using my dictionary of 500,000 words with their complete morphologies and conjugations and create a synonyms.txt to provide english accurate morphology. Is this a good

DataImportHandler $deleteDocById question

2010-06-28 Thread André Maldonado

Hi all. I'm trying to get $deleteDocById working, but any document is being deleted from my index. I'm using Full-Import (withOUT cleaning) and a script with: row.put('$deleteDocById', row.get('codAnuncio')); The script is passing in this line for every document it processes (for testing

custom core admin handler

2010-06-28 Thread Dave Hall

Hi all, I have been using Solr for quite a while, but I never really got into looking at the code. Last week that all changed, I decided to write a custom core admin handler. I've posted something on my blog about it, along with a Drupal centric howto. I'd be interested to know what people

Re: Chinese chars are not indexed ?

2010-06-28 Thread Andy

What if Chinese is mixed with English? I have text that is entered by users and it could be a mix of Chinese, English, etc. What's the best way to handle that? Thanks. --- On Mon, 6/28/10, Ahmet Arslan iori...@yahoo.com wrote: From: Ahmet Arslan iori...@yahoo.com Subject: Re: Chinese chars

Re: preside != president

2010-06-28 Thread Joe Calderon

the general consensus among people who run into the problem you have is to use a plurals only stemmer, a synonyms file or a combination of both (for irregular nouns etc) if you search the archives you can find info on a plurals stemmer On Mon, Jun 28, 2010 at 6:49 AM, dar...@ontrenet.com wrote:

Re: Strange query behavior

2010-06-28 Thread Joe Calderon

splitOnCaseChange is creating multiple tokens from 3dsMax disable it or enable catenateAll, use the analysys page in the admin tool to see exactly how your text will be indexed by analyzers without having to reindex your documents, once you have it right you can do a full reindex. On Mon, Jun 28,

Re: questions about Solr shards

2010-06-28 Thread Joe Calderon

there is a first pass query to retrieve all matching document ids from every shard along with relevant sorting information, the document ids are then sorted and limited to the amount needed, then a second query is sent for the rest of the documents metadata. On Sun, Jun 27, 2010 at 7:32 PM, Babak

Too Many Open Files

2010-06-28 Thread Anderson vasconcelos

Hi all When i send a delete query to SOLR, using the SOLRJ i received this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Too many open files 11:53:06,964 INFO [HttpMethodDirector] I/O exception (java.net.SocketException) caught when processing request:

Re: Too Many Open Files

2010-06-28 Thread Erick Erickson

This probably means you're opening new readers without closing old ones. But that's just a guess. I'm guessing that this really has nothing to do with the delete itself, but the delete is what's finally pushing you over the limit. I know this has been discussed before, try searching the mail

solr data config questions

2010-06-28 Thread Peng, Wei

Hi All, I am a new user of Solr. We are now trying to enable searching on Digg dataset. It has story_id as the primary key and comment_id are the comment id which commented story_id, so story_id and comment_id is one-to-many relationship. These comment_ids can be replied by some repliers,

Re: preside != president

2010-06-28 Thread Jan Høydahl / Cominvent

Hi, You might also want to check out the new Lucene-Hunspell stemmer at http://code.google.com/p/lucene-hunspell/ It uses OpenOffice dictionaries with known stems in combination with a large set of language specific rules. It handles your example, but it is an early release, so test it

Re: SweetSpotSimilarity

2010-06-28 Thread Blargy

iorixxx wrote: it is in schema.xml: similarity class=org.apache.lucene.search.SweetSpotSimilarity/ How would you configure the tfBaselineTfFactors and LengthNormFactors when configuring via schema.xml? Do I have to create a subclass that hardcodes these values? -- View this message in

Re: SweetSpotSimilarity

2010-06-28 Thread Ahmet Arslan

How would you configure the tfBaselineTfFactors and LengthNormFactors when configuring via schema.xml? CustomSimilarityFactory that extends org.apache.solr.schema.SimilarityFactory should do it. There is an example CustomSimilarityFactory.java under src/test/org...

Re: Spatial types and DIH

2010-06-28 Thread Grant Ingersoll

On Jun 24, 2010, at 12:32 AM, Eric Angel wrote: I'm using solr 4.0-2010-06-23_08-05-33 and can't figure out how to add the spatial types (LatLon, Point, GeoHash or SpatialTile) using dataimporthandler. My lat/lngs from the database are in separate fields. Does anyone know how to do his?

Re: Spatial types and DIH

2010-06-28 Thread Eric Angel

Yes. For now, I've gone back to Lucene 1.4 and installed Local Lucene. I just couldn't get the sfilt to work. I'm sure I was probably missing something, but I think I'll just wait until 1.5 is ready to be shipped. On Jun 28, 2010, at 12:02 PM, Grant Ingersoll wrote: On Jun 24, 2010, at

Re: SweetSpotSimilarity

2010-06-28 Thread Blargy

iorixxx wrote: CustomSimilarityFactory that extends org.apache.solr.schema.SimilarityFactory should do it. There is an example CustomSimilarityFactory.java under src/test/org... This is exactly what I was looking for... this is very similar ( no put intended ;) ) to the

spellcheckcomponent and frequency thresholds

2010-06-28 Thread Matthew Goldfield

Hi, I'm adding the spellCheckComponent to my current configuration of solr, and I was wondering if there was a way to set a minimum frequency threshold for the IndexBasedSpellChecker through solr like there is in the depreciated Spell Check Request Handler. I know that you can fix most

Re: Too Many Open Files

2010-06-28 Thread Michel Bottan

Hi Anderson, If you are using SolrJ, it's recommended to reuse the same instance per solr server. http://wiki.apache.org/solr/Solrj#CommonsHttpSolrServer But there are other scenarios which may cause this situation: 1. Other application running in the same Solr JVM which doesn't close properly

Re: solr data config questions

2010-06-28 Thread Alexey Serba

Hi, You can add additional commentreplyjoin entity to story entity, i.e. entity name=story ... ... entity name=commenttable ... ... entity name=replytable ... ... /entity /entity entity name=commentreplyjoin query=select concat(comment_id, ',',

Very basic questions: Indexing text

2010-06-28 Thread Peter Spam

Hi everyone, I'm looking for a way to index a bunch of (potentially large) text files. I would love to see results like Google, so I went through a few tutorials, but I've still got questions: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to

Re: Very basic questions: Indexing text

2010-06-28 Thread Ahmet Arslan

1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to have it only return the line (or two) around the search term. Solr can generate Google-like snippets as you describe. http://wiki.apache.org/solr/HighlightingParameters 2) There are one or two

Re: Very basic questions: Indexing text

2010-06-28 Thread Peter Spam

Great, thanks for the pointers. Thanks, Peter On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to have it only return the line (or two) around the search term. Solr can generate Google-like

DIH and denormalizing

2010-06-28 Thread Shawn Heisey

I am trying to do some denormalizing with DIH from a MySQL source. Here's part of my data-config.xml: entity name=dataTable pk=did query=SELECT *,FROM_UNIXTIME(post_date) as pd FROM ncdat WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did

Re: Too Many Open Files

2010-06-28 Thread Anderson vasconcelos

Thanks for responses. I instantiate one instance of per request (per delete query, in my case). I have a lot of concurrency process. Reusing the same instance (to send, delete and remove data) in solr, i will have a trouble? My concern is if i do this, solr will commit documents with data from

Optimizing cache

2010-06-28 Thread Blargy

Here is a screen shot for our cache from New Relic. http://s4.postimage.org/mmuji-31d55d69362066630eea17ad7782419c.png Query cache: 55-65% Filter cache: 100% Document cache: 63% Cache size is 512 for above 3 caches. How do I interpret this data? What are some optimal configuration changes

RE: DIH and denormalizing

2010-06-28 Thread caman

In your query 'query=SELECT webtable as wt FROM ncdat_wt WHERE featurecode='${ncdat.feature}' .. instead of ${ncdat.feature} use ${dataTable.feature} where dataTable is your parent entity name. From: Shawn Heisey-4 [via Lucene] [mailto:ml-node+929151-1527242139-124...@n3.nabble.com]

unknown handler dataimport

2010-06-28 Thread Lance Hill

Hi, I am trying to get db indexing up and running, but I am having trouble getting it working. In the solrconfig.xml file, I added requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str

Re: DIH and denormalizing

2010-06-28 Thread Shawn Heisey

On 6/28/2010 3:28 PM, caman wrote: In your query 'query=SELECT webtable as wt FROM ncdat_wt WHERE featurecode='${ncdat.feature}' .. instead of ${ncdat.feature} use ${dataTable.feature} where dataTable is your parent entity name. I knew it would be something stupid like that. I thought I

Re: DIH and denormalizing

2010-06-28 Thread Alexey Serba

It seems that ${ncdat.feature} is not being set. Try ${dataTable.feature} instead. On Tue, Jun 29, 2010 at 1:22 AM, Shawn Heisey s...@elyograg.org wrote: I am trying to do some denormalizing with DIH from a MySQL source. Here's part of my data-config.xml: entity name=dataTable pk=did

Re: Too Many Open Files

2010-06-28 Thread Anderson vasconcelos

Other question, Why SOLRJ d'ont close the StringWriter e OutputStreamWriter ? thanks 2010/6/28 Anderson vasconcelos anderson.v...@gmail.com Thanks for responses. I instantiate one instance of per request (per delete query, in my case). I have a lot of concurrency process. Reusing the same

Re: Very basic questions: Indexing text

2010-06-28 Thread Peter Spam

On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to have it only return the line (or two) around the search term. Solr can generate Google-like snippets as you describe.

Re: Very basic questions: Indexing text

2010-06-28 Thread Erick Erickson

try adding hl.fl=text to specify your highlight field. I don't understand why you're only getting the ID field back though. Do note that the highlighting is after the docs, related by the ID. Try a (non highlighting) query of just * to verify that you're pointing at the index you think you are.

What is the proper procedure to reopen closed bugs?

2010-06-28 Thread Teruhiko Kurosaka

I'd like to reopen a bug SOLR-1960 https://issues.apache.org/jira/browse/SOLR-1960 http://wiki.apache.org/solr/ : non-English users get generic MoinMoin page instead of the desired information as I submitted a patch. But jira won't let me do it. Do I have to clone it? Teruhiko Kuro

AutoSuggest Question

2010-06-28 Thread Neil Lott

Hi, I've read some on the autosuggest and I would like to know if the following is possible with my current configuration. I'm using solr 1.4. field name=title type=text indexed=true stored=true required=true/ field name=titleac3 type=autocomplete3 indexed=true stored=true omitNorms=true

Re: Very basic questions: Indexing text

2010-06-28 Thread Michael Lackhoff

On 28.06.2010 23:00 Ahmet Arslan wrote: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to have it only return the line (or two) around the search term. Solr can generate Google-like snippets as you describe.

48 matches

Mail list logo