Apologies if things were a little vague.
Given the example snippet to index (numbered to show searches needed to
match)...
1: i am a sales-manager in here
2: using asp.net and .net daily
3: working in design.
4: using something called sage 200. and i'm fluent
5: german sausages.
6: busy AE dept
This is true with Lucene as it stands. It would be much faster if there
were a specialized in-memory index such as is typically used with high
performance search engines.
On Tue, Feb 7, 2012 at 9:50 PM, Lance Norskog goks...@gmail.com wrote:
Experience has shown that it is much faster to run
But the solr did not have the im-memory index, I am right?
At 2012-02-08 16:17:49,Ted Dunning ted.dunn...@gmail.com wrote:
This is true with Lucene as it stands. It would be much faster if there
were a specialized in-memory index such as is typically used with high
performance search
A start maybe to use a RAM disk for that. Mount is as a normal disk and
have the index files stored there. Have a read here:
http://en.wikipedia.org/wiki/RAM_disk
Cheers,
Patrick
2012/2/8 Ted Dunning ted.dunn...@gmail.com
This is true with Lucene as it stands. It would be much faster if
Hi,
This talk has some interesting details on setting up an Lucene index in RAM:
http://www.lucidimagination.com/devzone/events/conferences/revolution/2011/lucene-yelp
Would be great to hear your findings!
Dmitry
2012/2/8 James ljatreey...@163.com
Is there any practice to load index into
Hi,
I am using solr 3.5 version. I moved the data import handler files from solr
1.4(which I used previously) to the new solr. When I tried to start the solr
3.5, I got the following message in my log
WARNING: XML parse warning in solrres:/dataimport.xml, line 2, column 95:
Include operation
On 08/02/2012 09:17, Ted Dunning wrote:
This is true with Lucene as it stands. It would be much faster if there
were a specialized in-memory index such as is typically used with high
performance search engines.
This could be implemented in Lucene trunk as a Codec. The challenge
though is to
Hi Erick,
if we're not doing geo searches, we filter by location tags that we
attach to places. This is simply a hierachical regional id, which is
simple to filter for, but much less flexible. We use that on Web a
lot, but not on mobile, where we want to performance searches in
arbitrary radii
Hi,
I found a solution to it.
Adding the Weblogic Server Argument -Dfile.encoding=UTF-8 did not affect
the encoding.
Only a change to the .war file's weblogic.xml and redeployment of the
modified .war solved it.
I added the following to the weblogic.xml:
charset-params
input-charset
Hello folks,
i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another
Solr(1.4.1).
I changed my schema.xml (field types sing to slong), standard
replication would fail.
what is the fastest and smartest way to manage this?
this here sound great (EntityProcessor):
i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to
another
Solr(1.4.1).
I changed my schema.xml (field types sing to slong),
standard
replication would fail.
what is the fastest and smartest way to manage this?
this here sound great (EntityProcessor):
I concur with this. As long as index segment files are cached in OS file cache
performance is as about good as it gets. Pulling segment files into RAM inside
JVM process may actually be slower, given Lucene's existing data structures and
algorithms for reading segment file data. If you have
Hi Ahmet,
thanks for quick response:)
I've already thought the same...
And it will be a pain to export and import this huge doc-set as CSV.
Do i have an another solution?
Regards
Vadim
2012/2/8 Ahmet Arslan iori...@yahoo.com:
i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to
another
Hi,
I am following
http://www.lucidimagination.com/devzone/technical-articles/setting-apache-solr-eclipse
in order to be able to debug Solr in eclipse. I got it working fine.
Now, I usually use ./etc/jetty.xml to set logging configuration. When
starting jetty in eclipse I dont see any log files
Another problem appeared ;)
how can i export my docs in csv-format?
In Solr 3.1+ i can use the query-param wt=csv, but in Solr 1.4.1?
Best Regards
Vadim
2012/2/8 Vadim Kisselmann v.kisselm...@googlemail.com:
Hi Ahmet,
thanks for quick response:)
I've already thought the same...
And it will
Hi all,
I am trying to write a custom document clustering component that should
take all the docs in commit and cluster them; Solr Version:3.5.0
Main Class:
public class KMeansClusteringEngine extends DocumentClusteringEngine
implements SolrEventListener
I added newSearcher event listener, that
Hmmm, seems OK. Did you re-index after any
schema changes?
You'll learn to love admin/analysis for questions like this,
that page should show you what the actual tokenization
results are, make sure to click the verbose check boxes.
Best
Erick
On Tue, Feb 7, 2012 at 10:52 PM, geeky2
Yes, WDDF creates multiple tokens. But that has
nothing to do with the multiValued suggestion.
You can get exactly what you want by
1 setting multiValued=true in your schema file and re-indexing. Say
positionIncrementGap is set to 100
2 When you index, add the field for each sentence, so your doc
How does your schema for the fields look like?
On Wed, Feb 8, 2012 at 2:41 PM, Radu Toev radut...@gmail.com wrote:
Hi,
I am really new to Solr so I apologize if the question is a little off.
I was playing with DataImportHandler and tried to index a table in a MS SQL
database.
I configured
The schema.xml is the default file that comes with Solr 3.5, didn't change
anything there.
On Wed, Feb 8, 2012 at 2:45 PM, Dmitry Kan dmitry@gmail.com wrote:
How does your schema for the fields look like?
On Wed, Feb 8, 2012 at 2:41 PM, Radu Toev radut...@gmail.com wrote:
Hi,
I am
well, you should add these fields in schema.xml, otherwise solr won't know
them.
On Wed, Feb 8, 2012 at 2:48 PM, Radu Toev radut...@gmail.com wrote:
The schema.xml is the default file that comes with Solr 3.5, didn't change
anything there.
On Wed, Feb 8, 2012 at 2:45 PM, Dmitry Kan
I just realized that as I pushed the send button :P
Thanks, I'll have a look.
On Wed, Feb 8, 2012 at 2:58 PM, Dmitry Kan dmitry@gmail.com wrote:
well, you should add these fields in schema.xml, otherwise solr won't know
them.
On Wed, Feb 8, 2012 at 2:48 PM, Radu Toev radut...@gmail.com
Hi,
run-jetty-run issue #9:
...
In the VM Arguments of your launch configuration set
-Drjrxml=./jetty.xml
If jetty.xml is in the root of your project it will be used (you can also use a
fully
qualified path name).
The UI port, context and WebApp dir are ignored, since you can define them in
hello,
thank you for the reply.
yes - i did re-index after the changes to the schema.
also - thank you for the direction on using the analyzer - but i am not sure
if i am interpreting the feedback from the analyzer correctly.
here is what i did:
in the Field value (Index) box - i placed this:
Thanks Erick,
I didn't get confused with multiple tokens vs multiValued :)
Before I go ahead and re-index 4m docs, and believe me I'm using the
analysis page like a mad-man!
What do I need to configure to have the following both indexed with and
without the dots...
.net
sales manager.
£12.50
Add this as well:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.155.5030
On Wed, Feb 8, 2012 at 1:56 AM, Andrzej Bialecki a...@getopt.org wrote:
On 08/02/2012 09:17, Ted Dunning wrote:
This is true with Lucene as it stands. It would be much faster if there
were a specialized
Hi,
According solr documentation the dismax score is calculating after the
formula :
(score of matching clause with the highest score) + ( (tie paramenter) *
(scores of any other matching clauses) ).
Is there a way to identify the field on which the matching clause score is
the highest?
For
Hi all,
I want to sort a SolrDocumentList after it has been queried and obtained
from the QueryResponse.getResults(). The reason is i have a SolrDocumentList
obtained after querying using QueryResponse.getResults() and i have added
few docs to it. Now i want to sort this SolrDocumentList based on
Sorry for inaccurate title.
I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full)
containing same value:
title xmlns=http://www.tei-c.org/ns/1.0;cal.lígraf/title
and these fields are configured accordingly:
fieldType name=xml class=solr.TextField positionIncrementGap=100
If you can not read this mail easily check this ticket:
https://issues.apache.org/jira/browse/SOLR-3106 This is a copy.
Regards!
Dalius Sidlauskas
On 08/02/12 15:44, Dalius Sidlauskas wrote:
Sorry for inaccurate title.
I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full)
Hi Dalius,
If not already tried, Check http://localhost:8983/solr/admin/analysis.jsp
(enable verbose output for both Field Value index and query for details)
for your queries and see what all filters/tokenizers are being applied.
Hope it helps!
-param
On 2/8/12 10:48 AM, Dalius Sidlauskas
I have already tried this and it did not helped because it does not
highlight matches if wild-card is used. The field configuration turns
data to:
dc_title: calligraf
dc_title_unicode: cal·lígraf
dc_title_unicode_full: cal·lígraf
Debug parsedquery says:
[Search for *cal·ligraf*]
Hmmm, that all looks correct, from the output you pasted I'd expect
you to be finding the doc.
So next thing: add debugQuery=on to your query and look at
the debug information after the list of documents, particularly
the parsedQuery bit. Are you searching against the fields you
think you are? If
You'll probably have to index them in separate fields to
get what you want. The question is always whether it's
worth it, is the use-case really well served by having a
variant that keeps dots and things? But that's always more
a question for your product manager
Best
Erick
On Wed, Feb 8,
Attempting to re-produce legacy behaviour (i know!) of simple SQL
substring searching, with and without phrases.
I feel simply NGram'ing 4m CV's may be pushing it?
---
IntelCompute
Web Design Local Online Marketing
http://www.intelcompute.com
On Wed, 8 Feb 2012 11:27:24 -0500, Erick
hello,
thanks for sticking with me on this ...very frustrating
ok - i did perform the query with the debug parms using two scenarios:
1) a successful search (where i insert the period / dot) in to the itemNo
field and the search returns a document.
itemNo:BP2.1UAA
I have already tried this and it did
not helped because it does not
highlight matches if wild-card is used. The field
configuration turns
data to:
This writeup should explain your scenario :
http://wiki.apache.org/solr/MultitermQueryAnalysis
On Feb 8, 2012, at 10:31 AM, Adeel Qureshi wrote:
I have been using solr for a while and have recently started getting into
solrcloud .. i am a bit confused with some of the concepts ..
1. what exactly is the relationship between a collection and the core ..
can a core has multiple
I want to sort a SolrDocumentList after it has been queried
and obtained
from the QueryResponse.getResults(). The reason is i have a
SolrDocumentList
obtained after querying using QueryResponse.getResults() and
i have added
few docs to it. Now i want to sort this SolrDocumentList
based on
Hi Adeel,
I just started looking into SolrCloud and had some of the same questions.
I wrote a blog with the understanding I gained so far, maybe it will help
you:
http://outerthought.org/blog/491-ot.html
Regards,
Bruno.
On Wed, Feb 8, 2012 at 4:31 PM, Adeel Qureshi
Vadim,
Would using xslt output help?
Otis
Performance Monitoring SaaS for Solr -
http://sematext.com/spm/solr-performance-monitoring/index.html
From: Vadim Kisselmann v.kisselm...@googlemail.com
To: solr-user@lucene.apache.org
Sent: Wednesday,
Anderson
I would say that this is highly unlikely, but you would need to pay attention
to how they are generated, this would be a good place to start:
http://en.wikipedia.org/wiki/Universally_unique_identifier
Cheers
François
On Feb 8, 2012, at 1:31 PM, Anderson vasconcelos wrote:
All,
It appears my attempt at using solr for the application I support is
about to fail. I'm personally and professionally disappointed, but I
wanted to say Many Thanks to those of you who have provided so much
help to so many on this list. In the right hands and in the right
environments, it has
Hi,
I'm running solr+tomcat with the following configuration:
I have 16 slaves, which are being queried by aggregator, while aggregator
being queried by the users.
My slaveUrls variable in solr.xml (on aggregator) looks like - 'property
name=slaveUrls
Please forgive me if this is a dumb question. I've never dealt with SOLR
before, and I'm being asked to determine from the logs when a SOLR index is
kicked off (it is a Windows server). The TOMCAT service runs continually, so
no love there. In parsing the logs, I think
For those that are interested and have not noticed, the latest work on
SolrCloud and distributed indexing is now in trunk.
SolrCloud is our name for a new set of distributed capabilities that improve
upon the old style distributed search and index based replication.
It provides for high
Thanks
2012/2/8 François Schiettecatte fschietteca...@gmail.com
Anderson
I would say that this is highly unlikely, but you would need to pay
attention to how they are generated, this would be a good place to start:
http://en.wikipedia.org/wiki/Universally_unique_identifier
Cheers
Good job on this work. A monumental effort.
On Wed, 8 Feb 2012 16:41:13 -0500, Mark Miller markrmil...@gmail.com
wrote:
For those that are interested and have not noticed, the latest work on
SolrCloud and distributed indexing is now in trunk.
SolrCloud is our name for a new set of
Hi Matthias-
I'm trying to understand how you have your data indexed so we can give
reasonable direction.
What field type are you using for your locations? Is it using the
solr spatial field types? What do you see when you look at the debug
information from debugQuery=true?
From my
okay so after reading Bruno's blog post .. lets add slice to the mix as
well .. so we have got collections, cores, shards, partitions and slices :)
..
The whole point with cores is to be able to have different schemas on the
same solr server instance. So how does that changes with collections ..
hi,
I have a question around documents linking in solr and want to know if its
possible. lets say i have a set of blogs and their authors that i want to
index seperately. is it possible to link a document describing a blog to
another document describing an author? if yes, can i search for blogs
On Feb 8, 2012, at 5:26 PM, Adeel Qureshi wrote:
okay so after reading Bruno's blog post .. lets add slice to the mix as
well .. so we have got collections, cores, shards, partitions and slices :)
..
Yeah - heh - this has bugged me, but we have not really all come down on
agreement of
I compared locallucene to spatial search and saw a performance
degradation, even using geohash queries, though perhaps I indexed things
wrong? Locallucene across 6 machines handles 150 queries per second fine,
but using geofilt and geohash I got lots of timeouts even when I was doing
only 50
yes, I am using https://github.com/alexwinston/RunJettyRun that apparently is
a fork of the original project that originated in the need to use an
jetty.xml.
So I am already setting an additional jetty.xml, this can be done in the Run
configuration, no need to use -D param. But as I mentioned
Mark,
is the recommendation now to have each solr instance be a separate core in
solr cloud? I had thought that the core name was by default the collection
name? Or are you saying that although they have the same name they are
separate because they are in different JVMs?
On Wednesday, February 8,
On Feb 8, 2012, at 9:36 PM, Jamie Johnson wrote:
Mark,
is the recommendation now to have each solr instance be a separate core in
solr cloud? I had thought that the core name was by default the collection
name? Or are you saying that although they have the same name they are
separate
On Feb 8, 2012, at 9:52 PM, Jamie Johnson wrote:
In solr cloud what is a better approach / use of resources having multiple
cores on a single instance or multiple instances with a single core? What
are the benefits and drawbacks of each?
It depends I suppose. If you are talking about on a
Thanks Mark, in regards to failover I completely agree, I am wondering more
about performance and memory usage if the indexes are large and wondering
if the separate Java instances under heavy load would more or less
performant. Currently we deploy a single core per instance but deploy
multiple
Thanks for the explanation. It makes sense but I am hoping that you can
clarify things a bit more ..
so now it sounds like in solrcloud the concept of cores have changed a bit
.. as you explained that for me to have 2 cores with different schemas I
will need 2 different collections .. and one
No that sorting is based on multiple fields. Basically i want to sort them
as the group by statement like in the SQL based on few fields and many
loops to go through. The problem is that i have say 1,000,000 solr docs
after injecting my few solr docs and then i want to do group by these solr
docs
Hi all,
I have tried group by in solr with multiple shards but it does not work.
Basically i want to simply do GROUP BY statement like in SQL in solr with
multiple shards. Please suggest me how can i do this as it is not supported
currently OOB by solr.
Thanks regards,
Kashif Khan
--
View this
Hello,
Have you tried to specify debugQuery=on and look into explain section?
Though it's not really performant, but anyway I propose to start from it.
Regards
On Wed, Feb 8, 2012 at 7:32 PM, crisfromnova crisfromn...@gmail.com wrote:
Hi,
According solr documentation the dismax score is
62 matches
Mail list logo