Hi, all,
I am using Solr3.5.0 which applies Tika0.10 to do language detection,
and I have a couple of questions about this function.
1. I can see the outcome of the language detection in a field
language_s. But what action will be taken according to the different
language code? How to
Hello!
Score of the document is a value calculated for a given document in
the context of the given query showing how good the document 'fits'
the query.
The maxscore is the maximum score calculated by Lucene for a given
query.
Please have a look at
Dear all,
I can I do for indexing a complete directory with many pdf files on Solr?
Alessio Crisantemi
Direttore Responsabile Gioconews.it
www.gioconews.it
t: (+39)0744461296
f: (+39)0744461362
bb: (+39)3477939054
e: alessio.crisant...@gioconews.it
Hi all,
I’m having basically the exact same problem someone described in this email to
the list from just over a year ago (see below). The only suggested solution
given on the thread at the time was to ping the server before sending an add,
which I’m not particularly keen on; least of all
Hi,
I would like to know if there is any way where in I can get the src
code(baselined code) for SOLR 3.5 version. I got the code for 3.x
version(3.6) only. Is there a possibility in eclipse to check out the 3.5
code from svn or is there a zip file available for the same?
Thanks.
--
View
Hello!
Please look at the one of the mirrors. There should be a package
apache-solr-3.5.0-src.tgz which contain the source code.
For example the following link should work:
http://ftp.tpnet.pl/vol/d1/apache//lucene/solr/3.5.0/apache-solr-3.5.0-src.tgz
--
Regards,
Rafał Kuć
Hi,
I would
Hi.
I have phone numbers in my solr schema in a field. At the moment i have this
field as string.
I would like to be able to make searches that find parts of a phone number.
For instance:
Number +35384589458
search by *+35384* or search by *84589*.
Do you know if this is posible?
Thanks
Hi Marotosg,
you can index the phonenumber field with the ngram field type, which allows
for partial (wildcard) searches on this field. Have a look here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory
Cheers,
Patrick
2012/1/19 marotosg
I think the occassional Hey, we made something cool you might be
interested in! notice, even if commercial, is ok
because it addresses numerous issues we struggle with on this list.
Now, if it were something completely off-base or unrelated (e.g. male
enhancement pills), then yeah, I agree.
On Thu, Jan 19, 2012 at 4:51 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Huge is relative. ;)
Huge Solr clusters also often have huge hardware. Servers with 16 cores
and 32 GM RAM are becoming very common, for example.
Another thing to keep in mind is that while lots of
Oh, I see, haven't noticed you used solr 4.0. luke can only read 3.5 at
most, at the moment.
So when you search with a leading wildcard, do both your app and the SOLR
admin search give the same results?
Probably you can show relevant parts of your schema and solrconfig? Like
type(s) definition
Partially agree. If just the facts are given, and not a complete sales talk
instead, it'll be fine. Don't overdo it like this though.
Cheers,
Patrick
2012/1/19 Darren Govoni dar...@ontrenet.com
I think the occassional Hey, we made something cool you might be
interested in! notice, even if
Dimitry,
Yes my app and the Solr admin search results are giving me similar results.
Excerpt from schema.xml:
As you can see solr.ReversedWildcardFilterFactory is commented out.
types
fieldtype name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
fieldType
Hi,
Try DisMax parser with tie parameter:
q=solrqf=name^10.03 description^10.02 location^10.01tie=0.5defType=dismax
What will happen now is that the field which scores HIGHEST for the term will
win the max score (10). If all things are equal, name will win above because
it has slightly higher
Hi,
Can you paste exactly both fieldType and field definitions from your
schema? omitNorms=true should kill norms.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 19. jan. 2012, at 08:18, remi tassing wrote:
Hi,
just a
Hi,
You may use the string as you choose, for instance filtering (fq=language_s:en)
or for faceting (facet.field=language_s). What are you looking to do?
What would you like to detect on the query side? The language of the search
string? That is very hard since people type very few words into
Agree. There's probably some unwritten etiquette there.
On 01/19/2012 05:52 AM, Patrick Plaatje wrote:
Partially agree. If just the facts are given, and not a complete sales talk
instead, it'll be fine. Don't overdo it like this though.
Cheers,
Patrick
2012/1/19 Darren
Hello Jan,
My schema wasn't changed from the release 3.5.0. The content can be seen
below:
schema name=nutch version=1.1
types
fieldType name=string class=solr.StrField
sortMissingLast=true omitNorms=true/
fieldType name=long class=solr.LongField
A quick immediate observation:
first in the analysis and query chains you have some customer tokenizer
factory. Could it, by some chance, affect on the leading wildcard setting?
This setting does not require storing the reversed tokens in the index. It
is just run-time leading wildcard expansion
Hi,
I tried everything I could, changed version but nada!
Is there a working tutorial on how to make Nutch, Solr and Solritas work?
Remi
I can get the error:
HTTP ERROR 400
Problem accessing /solr/browse. Reason:
undefined field cat
--
*Powered by Jetty://*
On Thu, Jan 19, 2012 at 2:44 PM, remi tassing tassingr...@gmail.com wrote:
Hi,
I tried everything I could, changed version but nada!
Is
Dimitry,
Our custom tokenizer is similar to standard tokenizer, just for testing I
changed my custorm tokenizer with solr.StandardTokenizerFactory and performed
the index again - but still I observe the same behavior.
We have build Solr Lucene 4.0 from the source code trunk.
You say that
On Wed, Jan 18, 2012 at 10:15 PM, gabriel shen xshco...@gmail.com wrote:
Hi Yonik,
The index I am querying against is 20gb, containing 200,000documents, some
of the documents are quite big, the schema contains more than 50 fields.
Main content field are defined as both stored and indexed,
/browse is defined solrconfig.xml. Its details need adjusting for datasets
other than the example data that ships with Solr. Templates may also need
adjusting, but does handle arbitrary facet fields automatically.
Erik
On Jan 19, 2012, at 7:56, remi tassing tassingr...@gmail.com wrote:
I think I get your point.
Is there any solrconfig.xml sample that works with nutch in a default
configuration?
Just something to start play with
Remi
On Thu, Jan 19, 2012 at 3:02 PM, Erik Hatcher erik.hatc...@gmail.comwrote:
/browse is defined solrconfig.xml. Its details need adjusting for
Heya,
Question for you guys, Im trying to use the solr analysis.jsp tool
to debug a solr query.
I cant work out how to input sample data for the Field Value (Index)
box when the data is multiValued.
I was wondering if you could explain how to do this or point me to the
documentation where this
Hey Nick,
could you plz create a new thread?
Remi
On Thu, Jan 19, 2012 at 3:35 PM, Nicholas Fellows n...@djdownload.comwrote:
Heya,
Question for you guys, Im trying to use the solr analysis.jsp tool
to debug a solr query.
I cant work out how to input sample data for the Field Value
0 down vote favorite
share [fb] share [tw]
I have a question. SolrphpClient is not working with multicore.
I have two cores in my solr say core1 and core2. While creating object of
SolrPhpClient I am using the following syntax.
$solr = new Apache_Solr_Service('192.168.12.226',
Hi,
The schema you pasted in your mail is NOT Solr3.5's default example schema. Did
you get it from the Nutch project?
And the omitNorms parameter is supposed to go in the field tag in
schema.xml, and the content field in the example schema does not have
omitNorms=true. Try to change
I have some data in solr
where the text string could potentially be
Vic Bobs greatest hits
how can i ensure that when a user query is made
for Vic and Bobs greatest hits , a match is made?
this also needs to work the other way round.
i've not found any useful information about this scenario
Sincere apologies
My Bad!
N ...
On 19 January 2012 13:37, remi tassing tassingr...@gmail.com wrote:
Hey Nick,
could you plz create a new thread?
Remi
On Thu, Jan 19, 2012 at 3:35 PM, Nicholas Fellows n...@djdownload.comwrote:
Heya,
Question for you guys, Im trying to use the solr
Heya,
Question for you guys, Im trying to use the solr analysis.jsp
toolto debug a solr query. I cant work out how to input sample data
for the Field Value (Index)box when the data is multiValued.
I was wondering if you could explain how to do this or point me to
thedocumentation where this is
Hi all,
I am indexing some XML data which describes publications like brochures,
manuals etc.
The XML contains a section perfect for faceting that looks like this:
search_keywords
search_keywordM2M4-6655ENW/search_keyword
search_keywordfolding cartons/search_keyword
Nitin (and any other interested parties here):
Unfortunately, re-indexing the content did not resolve the problem and the
symptom remains the same. Any additional advice is appreciated.
Tim
--
View this message in context:
Hello solr-user list,
I appear to have a number of issues right now on my slave server.
1. The most confusing one is that my slave index is currently 67 gigs, but
my master index is only 27 gigs. Has anyone seen this before? Has anyone
an idea of what could cause this?
2. I haven't been
Hi,
I am currently running multiple Solr instances and often write data to
them. I also query them. Both works fine right now, because I don't have so
many search requests. For querying I recognized that the firstSearcher and
newSearcher static warming with one facet query really brings a
Hello ,
I am trying to index PDF files in Solr when the PDF file is simple
everything is fine but when i use Portfolio PDF Portfolio
(http://help.adobe.com/en_US/Acrobat/9.0/Standard/WSA2872EA8-9756-4a8c-9F20-8E93D59D91CE.html
)
using tika it does not works.
Someone know how to extract data
I want to retract my objection to commercial messages. I think Ted's position
is more reasonable: on-topic commercial messages that are responsive to (and
maybe even anticipatory of) users' needs will likely be welcomed by many
subscribed here.
Producing a policy statement that perfectly
On Thu, Jan 19, 2012 at 7:48 PM, Nicholas Fellows n...@djdownload.com wrote:
I have some data in solr
where the text string could potentially be
Vic Bobs greatest hits
how can i ensure that when a user query is made
for Vic and Bobs greatest hits , a match is made?
this also needs to
On Thu, Jan 19, 2012 at 9:32 PM, Steven A Rowe sar...@syr.edu wrote:
I want to retract my objection to commercial messages. I think Ted's
position is more reasonable: on-topic commercial messages that are responsive
to (and maybe even anticipatory of) users' needs will likely be welcomed by
On Thu, Jan 19, 2012 at 5:56 PM, solr lakshmi2...@gmail.com wrote:
How to import data from xml files to solr.
is this is the right command java -jar post.jar sample.xml
[...]
This should work, but please take a look at the format of
the sample XML files: Solr expects a specific format for
I'm also seeing the error when I try to start up the SOLR instance:
SEVERE: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:344)
at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:352)
at
It works. The query:
*
http://localhost:8983/solr/select?indent=truefq={!bbox}sfield=locpt=34.0415954,-118.298797d=1000.0sort=score%20ascfq=trafficRouteId:887q={!func}geodist()fl=*,scorerows=1
*
works perfectly, doing all the filtering needed and returning the distance
as score. Thank you very
I don't think the problem is FST, since it sorts offline in your case.
More importantly, what are you trying to put into the FST?
it appears you are indexing terms from your term dictionary, but your
term dictionary is over 1GB, why is that?
what do your terms look like? 1GB for 2,784,937
Jason,
If I understand you correctly, you're referring to a thread
http://search-lucene.com/m/iMCFOqzcmS1/%22Performance+Monitoring+SaaS+for+Solr%22/v=threaded
in which you objected to a commercial tagline.
At the time that thread was active, I didn't agree with you, though I didn't
engage
In my original post I included one of my terms:
Brooklyn, New York, United States?{ |id|: |2620829|,
|timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| },
|region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|:
|Brooklyn|, |name|: |Brooklyn, New York, United States|
Thanks for the response. I am using Linux (RedHat).
It sounds like it may possibly be related to that bug.
But the thing is, the timestamped index directory is looking to me like
it's the _current_ one, with the non-timestamped one being an old out of
date one. So that does not seem to be
On Thu, Jan 19, 2012 at 8:29 PM, Maxim Veksler ma...@vekslers.org wrote:
It works. The query:
*
http://localhost:8983/solr/select?indent=truefq={!bbox}sfield=locpt=34.0415954,-118.298797d=1000.0sort=score%20ascfq=trafficRouteId:887q={!func}geodist()fl=*,scorerows=1
*
works perfectly,
Hmm, I don't have a replication.properties file, I don't think. Oh
wait, yes I do there it is! I guess the replication process makes this
file?
Okay
I don't see an index directory in the replication.properties file at all
though. Below is my complete replication.properties.
So I'm
On 1/18/2012 1:53 PM, Tomás Fernández Löbbe wrote:
As far as I know, the replication is supposed to delete the old directory
index. However, the initial question is why is this new index directory
being created. Are you adding/updating documents in the slave? what about
optimizing it? Are you
Okay, I do have an index.properties file too, and THAT one does contain
the name of an index directory.
But it's got the name of the timestamped index directory! Not sure how
that happened, could have been Solr trying to recover from running out
of disk space in the middle of a replication?
I really don't think you should put a huge json document as a search term.
Just make Brooklyn, New York, United States or whatever you intend
the user to actually search on/type in as your search term.
put the rest in different fields (e.g. stored-only, not even indexed
if you dont need that) and
Peter,
My guess is that if you had said something along the lines of We have
developed some SSD support software that makes SOLR work better. I would
like to open a conversation here (link to external discussion) that would
have been reasonably well received. One of the things that makes SPAM
You can do all the steps to rename the timestamp dir back to index, but I
don't think you don't have to. Solr will know on restart to use the
timestamped directory so long as it is in the properties file (sorry, I must
have told you to look at the wrong file...I'm working on old memories
Hi gora,
thanks for your reply.am new to solr .I have been check solr tutorial.and
noticed example xmlfiles exampledcos folder in solr distribution.
1.here indexing means importdata into solr?
If i want to start new example insted in solrdistribution.How to proceed.
Am bit confusing about solr
Anybody have any suggestions or hints?
~Nitin
--
View this message in context:
http://lucene.472066.n3.nabble.com/Sorting-results-within-the-fields-tp3656049p3673371.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi all, I've been working with solr for a few months now and so far I only
had a few issues trying to implement some functionality, but this has gone
above my current solr skills, so any help or guidance is greatly
appreciated.
I have a multivalued field consited which contains destinations like:
HI,
Could you please help me with a quick question - Is there a way to restrict
lucene/solr fuzzy search to only analyze words that have more than 5 characters
and to ignore words with less than that (i.e. less than 6 character words)?
Thanks
-
Lance
How about having a single-valued field named firstDestination that has the
first destination in the list, and then your query could be something like
'destination:Buenos Aires firstDestination:Buenos Aires'. Docs that match
both should have a higher score and thus will be listed first.
2. they always *follow* on-topic discussion
Not in the example given.
3. the line is blurry, e.g. nobody will object to including one's employer in
a tagline.
Product placement is not blurry. The incentive is to then answer
someone else's user email, in order to post yet another spam'd
Hi,
What Solr version? 4.0
How many docs? 700
What do you use as qutowarm count?700
If it's too high, it may take time.
Do you use spellcheck and buildOnCommit?No we dont use this
--
View this message in context:
I have a project where the client wants to store time series data
(maybe in SOLR if it can work). We want to store daily prices over
last 20 years (about 6000 values with associate dates), for up to
500,000 entities.
This data currently exists in a SQL database. Access to SQL is too
slow for
I'm not sure there is a good way to this this currently. I think you'd just
have to issue a second query with mm=100 to get additional spelling suggestions
as maxCollationTries is designed to replicate the original query when trying
collations for hits. It might be a worthy enhancement to
Hello,
you can get the source code from the svn repository too :
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_5/
Ludovic.
-
Jouve
France.
--
View this message in context:
HI,
I'm trying to setup the latest version of Solr. Currently we're
running 1.3 so we're a bit out of date!
Having trouble setting up the tika/extractionhandler jars etc, but I
think I'm nearly there. However I've got this stack trace, thats
complaining about a required field missing. However as
Take a look at openTSDB.
You might want to use that as is, or steal some of the concepts. The major
idea to snitch is the idea of using a single row of hte data base (document
in Lucene or Solr) to hold many data points.
Thus, you could consider having documents with the following fields:
key:
Hi,
I want to know if these are possible using the
FieldCollapsing/ResultGrouping feature.
1) I want to group skus based on certain attribute. So i have an attribute
against a sku that i will use for grouping. Suppose i limit the result set
to 20 to display on the first page. Will i get a total
Hi,
Try lowering your autowarm to, say, 25, and see if it helps.
How often do you call commit? If you have too much warming so it takes longer
time than time between commits, you're lost... You can check the stats admin
page to see the autowarm time.
--
Jan Høydahl, search solution architect
Shouldn't it be literal.uid=foo, not ext.literal.uid ??
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 19. jan. 2012, at 23:08, Wayne W wrote:
HI,
I'm trying to setup the latest version of Solr. Currently we're
running 1.3
I'm missing some stuff here. The analysis page has nothing to
do with actual indexing. All it does is take the input and run it
through the defined chains and show what tokens come
out the other end and why. There's really nothing to do whatsoever
with multiValued, that's just orthogonal to what
It's generally recommended that you do the indexing on the master
and searches on the slaves. In that case, firstSearcher and
newSearcher sections are irrelevant on the master and shouldn't
be there.
I don't understand why you would need 5 more machines, are you
sharding?
Best
Erick
On Thu, Jan
Another approach is to use the stopwords and an appropriate
analyzer chain. Then both the and and would be removed
from the indexing stream and the query process and it would
just work..
Best
Erick
On Thu, Jan 19, 2012 at 8:09 AM, Gora Mohanty g...@mimirtech.com wrote:
On Thu, Jan 19, 2012 at
You cannot directly import arbitrary XML. You'd have to
read it into a program (look at SolrJ), parse and add it
to SolrInputDocuments and send to Solr.
Alternately, you could read the XML and transform it
to solr friendly XML and then index those files.
A third possibility is to use the Data
Hi.
It's possible to integrate Hibernate Search with SOLR? I wanna use
Hibernate Search in my entities and use SOLR to make the work of index
and search. Hibernate Search call SOLR to find in index and than find
the respective objects in database. Is that possible? Exists some
configuration for
Hi, Jan Høydahl You are right. I am hoping to detect the language of a query,
so that the serarching can be done according to the language detected. Since
people often type a few words, which is too few to detect, then it is hard to
do that. Let me describe a little bit about the solr
Normally this is done by putting a field on each document rather than
separating the documents into separate corpora. Keeping them together
makes the final search faster.
At query time, you can add all of the language keys that you think are
relevant based on your language id applied to the
Hi,
Thanks a lot for your reply, will try to get the code from the repository
you provided.
Thanks,
Ravi
--
View this message in context:
http://lucene.472066.n3.nabble.com/Src-code-download-url-needed-for-SOLR-3-5-tp3671810p3674513.html
Sent from the Solr - User mailing list archive at
Hi Peter,
Has anyone else tried adding SSDs as a cache to boost the performance of Solr
clusters? Can you share your results?
What do you mean by using SSD *as a cache*?
A few years ago, Toke Eskildsen and his colleagues compared Lucene performance
with traditional HDDs and SSDs and of
Hi Daniel,
- Original Message -
From: Daniel Bruegge daniel.brue...@googlemail.com
To: solr-user@lucene.apache.org; Otis Gospodnetic otis_gospodne...@yahoo.com
Cc:
Sent: Thursday, January 19, 2012 5:49 AM
Subject: Re: How can a distributed Solr setup scale to TB-data, if URL
Cuong,
If when you are indexing your AC suggestions you know Java Developer appears
twice in the index, why not give it appropriate index-time boost? Wouldn't
that work for you?
Otis
Performance Monitoring SaaS for Solr -
http://sematext.com/spm/solr-performance-monitoring/index.html
Hi,
I think guessing the language based purely on query string is OK *if* you are
OK it not being very accurate and finding ways to work around that, say by
giving users the options to switch to another language easily, allowing them to
easily select a default language for them in the future,
Hi Anderson,
Not sure if you saw http://wiki.apache.org/solr/DataImportHandler
Otis
Performance Monitoring SaaS for Solr -
http://sematext.com/spm/solr-performance-monitoring/index.html
- Original Message -
From: Anderson vasconcelos anderson.v...@gmail.com
To: solr-user
I agree that SSD boosts performance... In some rare not-real-life scenario:
- super frequent commits
That's it, nothing more except the fact that Lucene compile time including
tests takes up to two minutes on MacBook with SSD, or forty-fifty minutes on
Windows with HDD.
Of course, with non-empty
Is indexing xml files and import data from xml is same or different in solr
concept.
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-import-data-from-xml-files-to-solr-tp3672193p3674641.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Jan,
In Solr 1.3 we used that format. I'll give it a go
Thx
On Friday, January 20, 2012, Jan Høydahl jan@cominvent.com wrote:
Shouldn't it be literal.uid=foo, not ext.literal.uid ??
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training -
On Fri, Jan 20, 2012 at 11:00 AM, solr lakshmi2...@gmail.com wrote:
Is indexing xml files and import data from xml is same or different in solr
concept.
They end up doing the same thing, which is getting the data into Solr.
There are various ways of doing this, as Erick has pointed out, and
Hi, Ted Dunning,
Thank you for your reply. I can understand your point on putting a language_s
field and then keeping all the files together, which speed-up searching.
But then there occurs a problem of using analyzer in indexing. I assume files
encoded in different language should be
Hi, Ted Dunning,
Thank you for your reply. I can understand your point on putting a language_s
field and then keeping all the files together, which speed-up searching.
But then there occurs a problem of using analyzer in indexing. I assume files
encoded in different language should be
Actually, for search applications there is a reasonable amount of evidence
that holding the index in RAM is actually more cost effective than SSD's
because the throughput is enough faster to make up for the price
differential. There are several papers out of UMass that describe this
trade-off,
With Solr 4.0 you could use relevance functions to give a query time boost if
you don't have the information at index time.
Alternatively you could do term facet based autocomplete which would mean you
could sort by count rather than any other input.
Andrew
Sent on the run.
On 20/01/2012,
Hi everyone.
I'm having a bit of problem and was hoping you could help me.
My Solr instance is getting lot of wrongly spelled data in its input and I
was wondering is there a way to make Solr perform spell check on data
before importing it or if that's not possible perform spell check on
results
91 matches
Mail list logo