I am using the sample, not deploying Solr in Tomcat. Is
there a place I can modify this setting ?
Ha, okey if you are using jetty with java -jar start.jar then it is okey.
But for Chinese you need special tokenizer since Chinese is written without
spaces between words.
tokenizer
oh yes, *...* works. thanks.
I saw tokenizer is defined in schema.xml. There are a few places that define
the tokenizer. Wondering if it is enough to define one for:
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
!-- this is the
oh yes, *...* works. thanks.
I saw tokenizer is defined in schema.xml. There are a few
places that define the tokenizer. Wondering if it is enough
to define one for:
It is better to define a brand new field type specific to Chinese.
Hi,
I have an architectural question about using apache solr/lucene.
I'm building a solr index for searching a CV database. Basically every CV on
there will have some fields like:
rate of pay, address, title
these fields are straight forward. The area I need advise on is, skills and
job
Hello community,
since a few days I recieve daily some mails with suspicious content. It is
said that some of my mails were rejected, because of the file-types of the
mail's attachements and other things.
This wonders me a lot, because I didn't send any mails with attachements and
even the
Hi Li,
Yes, you can issue a delete all by:
curl http://your_solr_server:your_solr_port/solr/update -H
Content-Type: text/xml --data-binary
'deletequery*:*/query/delete';
Hope it helps.
Cheers,
Daniel
-Original Message-
From: Li Li [mailto:fancye...@gmail.com]
Sent: 28 June 2010 03:41
Hi,
It seems to me that because the stemming does not produce
grammatically correct stems in many of the cases,
search anomalies can occur like the one I am seeing where I have a
document with president in it and it is returned
when I search for preside, a different word entirely.
Is this
Hi,
I use solr 1.4 for search contents in documents (pdf, doc, odt ...). I use
the module /update/extract.
When I am researching, I am limited to the first 5 characters
(approximately).
Any word or sentence after is not found (but the field has more than 5
characters when I recovered it
I use solr 1.4 for search contents in documents (pdf, doc,
odt ...). I use
the module /update/extract.
When I am researching, I am limited to the first 5
characters
(approximately).
Any word or sentence after is not found (but the field has
more than 5
characters when I recovered
Ok, I'm trying to integrate the TikaEntityProcessor as suggested. I'm using
Solr Version: 1.4.0 and getting the following error:
java.lang.ClassNotFoundException: Unable to load BinURLDataSource or
org.apache.solr.handler.dataimport.BinURLDataSource
It seems that DIH-Tika integration is not
Hello,
I have a title that says 3DVIA Studio amp; Virtools Maya and 3dsMax
Exporters. The analysis tool for this field gives me these
tokens:3dviadviastudio;virtoolmaya3dsmaxdssystèmmaxexport
However, when i search for 3dsmax, i get no results :( Furthermore, if i
search for dsmax i get the
Ok thanks, it works.
Best regards,
Julien
--
View this message in context:
http://lucene.472066.n3.nabble.com/Search-limit-to-the-first-50-000-chars-for-one-field-tp927635p927725.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Darren,
You might want to look at the KStemmer
(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem) instead of
the standard PorterStemmer. It essentially has a 'dictionary' of exception
words where stemming stops if found, so in your case president won't be stemmed
any
Thanks for the tip. Yeah, I think the stemming confounds search results as
it stands (porter stemmer).
I was also thinking of using my dictionary of 500,000 words with their
complete morphologies and conjugations and create a synonyms.txt to
provide english accurate morphology.
Is this a good
Hi all.
I'm trying to get $deleteDocById working, but any document is being deleted
from my index.
I'm using Full-Import (withOUT cleaning) and a script with:
row.put('$deleteDocById', row.get('codAnuncio'));
The script is passing in this line for every document it processes (for
testing
Hi all,
I have been using Solr for quite a while, but I never really got into
looking at the code. Last week that all changed, I decided to write a
custom core admin handler. I've posted something on my blog about it,
along with a Drupal centric howto. I'd be interested to know what
people
What if Chinese is mixed with English?
I have text that is entered by users and it could be a mix of Chinese, English,
etc.
What's the best way to handle that?
Thanks.
--- On Mon, 6/28/10, Ahmet Arslan iori...@yahoo.com wrote:
From: Ahmet Arslan iori...@yahoo.com
Subject: Re: Chinese chars
the general consensus among people who run into the problem you have
is to use a plurals only stemmer, a synonyms file or a combination of
both (for irregular nouns etc)
if you search the archives you can find info on a plurals stemmer
On Mon, Jun 28, 2010 at 6:49 AM, dar...@ontrenet.com wrote:
splitOnCaseChange is creating multiple tokens from 3dsMax disable it
or enable catenateAll, use the analysys page in the admin tool to see
exactly how your text will be indexed by analyzers without having to
reindex your documents, once you have it right you can do a full
reindex.
On Mon, Jun 28,
there is a first pass query to retrieve all matching document ids from
every shard along with relevant sorting information, the document ids
are then sorted and limited to the amount needed, then a second query
is sent for the rest of the documents metadata.
On Sun, Jun 27, 2010 at 7:32 PM, Babak
Hi all
When i send a delete query to SOLR, using the SOLRJ i received this
exception:
org.apache.solr.client.solrj.SolrServerException: java.net.SocketException:
Too many open files
11:53:06,964 INFO [HttpMethodDirector] I/O exception
(java.net.SocketException) caught when processing request:
This probably means you're opening new readers without closing
old ones. But that's just a guess. I'm guessing that this really
has nothing to do with the delete itself, but the delete is what's
finally pushing you over the limit.
I know this has been discussed before, try searching the mail
Hi All,
I am a new user of Solr.
We are now trying to enable searching on Digg dataset.
It has story_id as the primary key and comment_id are the comment id
which commented story_id, so story_id and comment_id is one-to-many
relationship.
These comment_ids can be replied by some repliers,
Hi,
You might also want to check out the new Lucene-Hunspell stemmer at
http://code.google.com/p/lucene-hunspell/
It uses OpenOffice dictionaries with known stems in combination with a large
set of language specific rules.
It handles your example, but it is an early release, so test it
iorixxx wrote:
it is in schema.xml:
similarity class=org.apache.lucene.search.SweetSpotSimilarity/
How would you configure the tfBaselineTfFactors and LengthNormFactors when
configuring via schema.xml? Do I have to create a subclass that hardcodes
these values?
--
View this message in
How would you configure the tfBaselineTfFactors and
LengthNormFactors when
configuring via schema.xml?
CustomSimilarityFactory that extends org.apache.solr.schema.SimilarityFactory
should do it. There is an example CustomSimilarityFactory.java under
src/test/org...
On Jun 24, 2010, at 12:32 AM, Eric Angel wrote:
I'm using solr 4.0-2010-06-23_08-05-33 and can't figure out how to add the
spatial types (LatLon, Point, GeoHash or SpatialTile) using
dataimporthandler. My lat/lngs from the database are in separate fields.
Does anyone know how to do his?
Yes. For now, I've gone back to Lucene 1.4 and installed Local Lucene. I just
couldn't get the sfilt to work. I'm sure I was probably missing something, but
I think I'll just wait until 1.5 is ready to be shipped.
On Jun 28, 2010, at 12:02 PM, Grant Ingersoll wrote:
On Jun 24, 2010, at
iorixxx wrote:
CustomSimilarityFactory that extends
org.apache.solr.schema.SimilarityFactory should do it. There is an example
CustomSimilarityFactory.java under src/test/org...
This is exactly what I was looking for... this is very similar ( no put
intended ;) ) to the
Hi,
I'm adding the spellCheckComponent to my current configuration of solr, and I
was wondering if there was a way to set a minimum frequency threshold for the
IndexBasedSpellChecker through solr like there is in the depreciated Spell
Check Request Handler. I know that you can fix most
Hi Anderson,
If you are using SolrJ, it's recommended to reuse the same instance per solr
server.
http://wiki.apache.org/solr/Solrj#CommonsHttpSolrServer
But there are other scenarios which may cause this situation:
1. Other application running in the same Solr JVM which doesn't close
properly
Hi,
You can add additional commentreplyjoin entity to story entity, i.e.
entity name=story ...
...
entity name=commenttable ...
...
entity name=replytable ...
...
/entity
/entity
entity name=commentreplyjoin query=select concat(comment_id,
',',
Hi everyone,
I'm looking for a way to index a bunch of (potentially large) text files. I
would love to see results like Google, so I went through a few tutorials, but
I've still got questions:
1) I can get my docs in the index, but when I search, it returns the entire
document. I'd love to
1) I can get my docs in the index, but when I search, it
returns the entire document. I'd love to have it only
return the line (or two) around the search term.
Solr can generate Google-like snippets as you describe.
http://wiki.apache.org/solr/HighlightingParameters
2) There are one or two
Great, thanks for the pointers.
Thanks,
Peter
On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
1) I can get my docs in the index, but when I search, it
returns the entire document. I'd love to have it only
return the line (or two) around the search term.
Solr can generate Google-like
I am trying to do some denormalizing with DIH from a MySQL source.
Here's part of my data-config.xml:
entity name=dataTable pk=did
query=SELECT *,FROM_UNIXTIME(post_date) as pd FROM ncdat WHERE
did gt; ${dataimporter.request.minDid} AND did lt;=
${dataimporter.request.maxDid} AND (did
Thanks for responses.
I instantiate one instance of per request (per delete query, in my case).
I have a lot of concurrency process. Reusing the same instance (to send,
delete and remove data) in solr, i will have a trouble?
My concern is if i do this, solr will commit documents with data from
Here is a screen shot for our cache from New Relic.
http://s4.postimage.org/mmuji-31d55d69362066630eea17ad7782419c.png
Query cache: 55-65%
Filter cache: 100%
Document cache: 63%
Cache size is 512 for above 3 caches.
How do I interpret this data? What are some optimal configuration changes
In your query 'query=SELECT webtable as wt FROM ncdat_wt WHERE
featurecode='${ncdat.feature}' .. instead of ${ncdat.feature} use
${dataTable.feature} where dataTable is your parent entity name.
From: Shawn Heisey-4 [via Lucene]
[mailto:ml-node+929151-1527242139-124...@n3.nabble.com]
Hi,
I am trying to get db indexing up and running, but I am having trouble
getting it working.
In the solrconfig.xml file, I added
requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str
On 6/28/2010 3:28 PM, caman wrote:
In your query 'query=SELECT webtable as wt FROM ncdat_wt WHERE
featurecode='${ncdat.feature}' .. instead of ${ncdat.feature} use
${dataTable.feature} where dataTable is your parent entity name.
I knew it would be something stupid like that. I thought I
It seems that ${ncdat.feature} is not being set.
Try ${dataTable.feature} instead.
On Tue, Jun 29, 2010 at 1:22 AM, Shawn Heisey s...@elyograg.org wrote:
I am trying to do some denormalizing with DIH from a MySQL source. Here's
part of my data-config.xml:
entity name=dataTable pk=did
Other question,
Why SOLRJ d'ont close the StringWriter e OutputStreamWriter ?
thanks
2010/6/28 Anderson vasconcelos anderson.v...@gmail.com
Thanks for responses.
I instantiate one instance of per request (per delete query, in my case).
I have a lot of concurrency process. Reusing the same
On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
1) I can get my docs in the index, but when I search, it
returns the entire document. I'd love to have it only
return the line (or two) around the search term.
Solr can generate Google-like snippets as you describe.
try adding hl.fl=text
to specify your highlight field. I don't understand why you're only
getting the ID field back though. Do note that the highlighting
is after the docs, related by the ID.
Try a (non highlighting) query of just * to verify that you're
pointing at the index you think you are.
I'd like to reopen a bug SOLR-1960
https://issues.apache.org/jira/browse/SOLR-1960
http://wiki.apache.org/solr/ : non-English users get generic MoinMoin page
instead of the desired information
as I submitted a patch. But jira won't let me do it.
Do I have to clone it?
Teruhiko Kuro
Hi,
I've read some on the autosuggest and I would like to know if the following is
possible with my current configuration.
I'm using solr 1.4.
field name=title type=text indexed=true stored=true required=true/
field name=titleac3 type=autocomplete3 indexed=true stored=true
omitNorms=true
On 28.06.2010 23:00 Ahmet Arslan wrote:
1) I can get my docs in the index, but when I search, it
returns the entire document. I'd love to have it only
return the line (or two) around the search term.
Solr can generate Google-like snippets as you describe.
48 matches
Mail list logo