--
Dan A. Dickey | Senior Software Engineer
Savvis
10900 Hampshire Ave. S., Bloomington, MN 55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dic...@savvis.net
--
Lance Norskog
goks...@gmail.com
--
Lance Norskog
goks...@gmail.com
anyone
else run into a problem like this? I'm using the Sept 22 nightly build.
- Charlie
--
Lance Norskog
goks...@gmail.com
result is Gilmore Girls
If I search on Gilmore, it gives me result Gilmore Girls in the output as
desired.
However, if I search on string gilmore* or gilm , it does not work whereas
we want it to work.
Any help highly appreciated.
Thanks!
--
Lance Norskog
goks...@gmail.com
--
Lance Norskog
goks...@gmail.com
attributes case
insensitive while building an index...
I am trying to research on it...
Do you got any pointer?
Thanks...
On Mon, Sep 28, 2009 at 2:29 PM, Lance Norskog goks...@gmail.com wrote:
Wildcards don't really get processed like other queries - Gilmore* will
work.
On Mon, Sep 28
My question is why isn't the DateField implementation of ISO 8601 broader so
that it could include and MM as acceptable date strings? What would
it take to do so?
Nobody ever cared? But yes, you're right, the spurious precision is
annoying. However, there is no fuzzy search for
show that they don't). If not, is there any
plan for adding it in?
Regards,
Steve
--
Lance Norskog
goks...@gmail.com
-search-server-2
gave me more information. The LucidImagination article helps too.
Now that the wiki is up again it is more obvious that I need to add:
str name=fmap.contentfulltext/str
str name=defaultFieldtext/str
to my solrconfig.xml
Tricia
--
Lance Norskog
goks...@gmail.com
where optimization can take more than
2x? I've heard of cases but have not observed them in my system.
I seem to recall a case where it can be 3x, but I don't know that it
has been observed much.
--
- Mark
http://www.lucidimagination.com
--
Lance Norskog
goks...@gmail.com
segments, and I have no idea how this will
translate to disk space. To minimize disk space, you could run it
repetitively with the number of segments decreasing to one.
On Thu, Oct 1, 2009 at 11:49 AM, Lance Norskog goks...@gmail.com wrote:
I've heard there is a new partial optimize feature
http://googleenterprise.blogspot.com/2009/08/compare-enterprise-search-relevance.html
This is really cool, and a version for Solr would help in doing
relevance experiments. We don't need the select A or B feature, just
seeing search result sets side-by-side would be great.
--
Lance Norskog
goks
by popularity.
Does anyone know if there is a way to do that with a single query, or I'll
have to send another query with desired sort criterion after I inspect
number of hits on my client?
Thx
--
Lance Norskog
goks...@gmail.com
field is not needed and is likely to be
really large so the queries will be much faster if it isn't returned.
Thanks,
Paul
--
Lance Norskog
goks...@gmail.com
to get that
info.
Lance Norskog wrote:
No, there is only list of fields, star, and score. You can choose
to index it and not store it, and then have your application fetch it
from the original data store. This is a common system design pattern
to avoid storing giant text blobs in the index
-data-in-response---reading-from-mySQL-database-tp25726655p25726976.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Lance Norskog
goks...@gmail.com
I've added a unit test for the problem down below. It feeds document
field data into the XPathEntityProcessor via the
FieldReaderDataSource, and the XPath EP does not emit unpacked fields.
Running this under the debugger, I can see the supplied StringReader,
with the XML string, being piped into
[])
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
java.lang.Thread.run()
--
Lance Norskog
goks...@gmail.com
A side note that might help: if I change the dataField from 'db.blob'
to 'blob', this DIH stack emits no documents.
On 10/5/09, Lance Norskog goks...@gmail.com wrote:
I've added a unit test for the problem down below. It feeds document
field data into the XPathEntityProcessor via
the same
problem and it never took up more than 2x.
If your index disks are really bursting at the seams, you could try
creating an empty index on a separate disk and merging your large
index into that index. The resulting index will be mostly optimized.
Lance Norskog
* in solrconfig.xml
this configuration has to be done?
Thanks
--
View this message in context:
http://www.nabble.com/manage-rights-tp25784152p25784152.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Lance Norskog
goks...@gmail.com
of a reporting tool which can hook into Solr for creating such
things.
--
Regards,
Shalin Shekhar Mangar.
--
Lance Norskog
goks...@gmail.com
it would be very helpful.
Thanks,
Eric
--
Lance Norskog
goks...@gmail.com
Wow! That's great. And it's a lot of work, especially getting it all
keyboard-complete. Thank you.
On 03/14/2013 01:29 AM, Chantal Ackermann wrote:
Hi all,
this is not a question. I just wanted to announce that I've written a blog post
on how to set up Maven for packaging and automatic
Seconded. Single-stepping really is the best way to follow the logic
chains and see how the data mutates.
On 04/05/2013 06:36 AM, Erick Erickson wrote:
Then there's my lazy method. Fire up the IDE and find a test case that
looks close to something you want to understand further. Step through
Outer distance AND NOT inner distance?
On 04/12/2013 09:02 AM, kfdroid wrote:
We currently do a radius search from a given Lat/Long point and it works
great. I have a new requirement to do a search on a larger radius from the
same point, but not include the smaller radius. Kind of a donut
Run checksums on all files in both master and slave, and verify that
they are the same.
TCP/IP has a checksum algorithm that was state-of-the-art in 1969.
On 04/18/2013 02:10 AM, Victor Ruiz wrote:
Also, I forgot to say... the same error started to happen again.. the index
is again corrupted
Great! Thank you very much Shawn.
On 05/04/2013 10:55 AM, Shawn Heisey wrote:
On 5/4/2013 11:45 AM, Shawn Heisey wrote:
Advance warning: this is a long reply.
I have condensed some relevant performance problem information into the
following wiki page:
If this is for the US, remove the age range feature before you get sued.
On 05/09/2013 08:41 PM, Kamal Palei wrote:
Dear SOLR experts
I might be asking a very silly question. As I am new to SOLR kindly guide
me.
I have a job site. Using SOLR to search resumes. When a HR user enters some
This is great; data like this is rare. Can you tell us any hardware or
throughput numbers?
On 05/17/2013 12:29 PM, Rishi Easwaran wrote:
Hi All,
Its Friday 3:00pm, warm sunny outside and it was a good week. Figured I'd
share some good news.
I work for AOL mail team and we use SOLR for our
If the indexed data includes positions, it should be possible to
implement ^ and $ as the first and last positions.
On 05/22/2013 04:08 AM, Oussama Jilal wrote:
There is no ^ or $ in the solr regex since the regular expression will
match tokens (not the complete indexed text). So the results
I will look at these problems. Thanks for trying it out!
Lance Norskog
On 05/28/2013 10:08 PM, Patrick Mi wrote:
Hi there,
Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems
Followed the wiki page instruction and set up a field
Let's assume that the Solr record includes the database record's
timestamp field.You can make a more complex DIH stack that does a Solr
query with the SolrEntityProcessor. You can do a query that gets the
most recent timestamp in the index, and then use that in the DB update
command.
On
Distributed search does the actual search twice: once to get the scores
and again to fetch the documents with the top N scores. This algorithm
does not play well with deep searches.
On 06/02/2013 07:32 PM, Niran Fajemisin wrote:
Thanks Daniel.
That's exactly what I thought as well. I did try
Patrick-
I found the problem with multiple documents. The problem was that the
API for the life cycle of a Tokenizer changed, and I only noticed part
of the change. You can now upload multiple documents in one post, and
the OpenNLPTokenizer will process each document.
You're right, the
,
Patrick
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Thursday, 6 June 2013 5:16 p.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP problems
Patrick-
I found the problem with multiple documents. The problem was that the
API for the life cycle of a Tokenizer
patch LUCENE-2899-x.patch
uploaded on 6th June but still had the same problem.
Regards,
Patrick
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Thursday, 6 June 2013 5:16 p.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP problems
Patrick-
I found
In 4.x and trunk is a close() method on Tokenizers and Filters. In
currently released up to 4.3, there is instead a reset(stream) method
which is how it resets a TokenizerFilter for a following document in
the same upload.
In both cases I had to track the first time the tokens are consumed,
No, they just learned a few features and then stopped because it was
good enough, and they had a thousand other things to code.
As to REST- yes, it is worth having a coherent API. Solr is behind the
curve here. Look at the HATEOS paradigm. It's ornate (and a really goofy
name) but it provides
One small thing: German u-umlaut is often flattened as 'ue' instead of
'u'. And the same with o-umlaut, it can be 'oe' or 'o'. I don't know if
Lucene has a good solution for this problem.
On 06/16/2013 06:44 AM, adityab wrote:
Thanks for the explanation Steve. I now see it clearly. In my case
Accumulo is a BigTable/Cassandra style distributed database. It is now
an Apache Incubator project. In the README we find this gem:
Synchronize your accumulo conf directory across the cluster. As a
precaution against mis-configured systems, servers using different
configuration files will not
I do not know what causes the error. This setup will not work. You need
one or three zookeepers. SolrCloud demands that a majority of the ZK
servers agree. If you have two ZKs this will not work.
On 06/29/2013 05:47 AM, Sagar Chaturvedi wrote:
Hi,
I setup 2 solr instances on 2 different
Solr HTTP caching also support e-tags. These are unique keys for the
output of a query. If you send a query twice, and the index has not
changed, the return will be the same. The e-tag is generated from the
query string and the index generation number.
If Varnish supports e-tags, you can keep
This usually means the end server timed out.
On 06/30/2013 06:31 AM, Shahar Davidson wrote:
Hi all,
We're getting the below exception sporadically when using distributed search.
(using Solr 4.2.1)
Note that 'core_3' is one of the cores mentioned in the 'shards' parameter.
Any ideas anyone?
The MappingCharFilter allows you to map both characters to one
characters. If you do this during indexing and querying, searching with
one should find the other. This is sort of like synonyms, but on a
character-by-character basis.
Lance
On 06/18/2013 11:08 PM, Yash Sharma wrote:
Hi,
we have
Also, total index file size. At 200-300gb managing an index becomes a pain.
Lance
On 07/08/2013 07:28 AM, Jack Krupansky wrote:
Other that the per-node/per-collection limit of 2 billion documents
per Lucene index, most of the limits of Solr are performance-based
limits - Solr can handle it,
Norms stay in the index even if you delete all of the data. If you just
changed the schema, emptied the index, and tested again, you've still
got norms in there.
You can examine the index with Luke to verify this.
On 07/09/2013 08:57 PM, William Bell wrote:
I have a field that has
I don't know about jvm crashes, but it is known that the Java 6 jvm had
various problems supporting Solr, including the 20-30 series. A lot of
people use the final jvm release (I think 6_30).
On 07/16/2013 12:25 PM, neoman wrote:
Hello Everyone,
We are using solrcloud with Tomcat in our
Are you feeding Graphite from Solr? If so, how?
On 07/19/2013 01:02 AM, Neil Prosser wrote:
That was overnight so I was unable to track exactly what happened (I'm
going off our Graphite graphs here).
Solr/Lucene does not automatically add when asked, the way DBMS systems
do. Instead, all data for a field is added at the same time. To get the
new field, you have to reload all of your data.
This is also true for deleting fields. If you remove a field, that data
does not go away until you
Cool!
On 08/05/2013 03:34 AM, Charlie Hull wrote:
On 03/08/2013 00:50, Mark wrote:
We have a set number of known terms we want to match against.
In Index:
term one
term two
term three
I know how to match all terms of a user query against the index but
we would like to know how/if we can
Block-quoting and plagiarism are two different questions.
Block-quoting is simple: break the text apart into sentences or even
paragraphs and make them separate documents. Make facets of the
post-analysis text. Now just pull counts of facets and block quotes will
be clear.
Mahout has a
You need to:
1) crawl the SVN database
2) index the files
3) make a UI that fetches the original file when you click on a search
results.
Solr only has #2. If you run a subversion web browser app, you can
download the developer-only version of the LucidWorks product and crawl
the SVN web
Solr does not by default generate unique IDs. It uses what you give as
your unique field, usually called 'id'.
What software do you use to index data from your RSS feeds? Maybe that
is creating a new 'id' field?
There is no partial update, Solr (Lucene) always rewrites the complete
Yes, Solr/Lucene works fine with other indexes this large. There are
many indexes with hundreds of gigabytes and hundreds of millions of
documents. My experience years ago was that at this scale, searching
worked great, sorting facets less so, and the real problem was IT: a
200G blob of data
On 10/13/2013 10:02 AM, Shawn Heisey wrote:
On 10/13/2013 10:16 AM, Josh Lincoln wrote:
I have a large solr response in xml format and would like to import it into
a new solr collection. I'm able to use DIH with solrEntityProcessor, but
only if I first truncate the file to a small subset of the
the solr result format while using the xpathentityprocessor
(i.e. a useSolrResultSchema option)
Any other ideas?
On Mon, Oct 14, 2013 at 6:24 PM, Lance Norskog goks...@gmail.com wrote:
On 10/13/2013 10:02 AM, Shawn Heisey wrote:
On 10/13/2013 10:16 AM, Josh Lincoln wrote:
I have
, it is working properly, results are
stable and correct.
Please help me to make solr results consistent.
Thanks in Advance.
--
Lance Norskog
goks...@gmail.com
Yes, you should use a recent Java 7. Java 6 is end-of-life and no longer
supported by Oracle. Also, read up on the various garbage collectors. It
is a complex topic and there are many guides online.
In particular there is a problem in some Java 6 releases that causes a
massive memory leak in
.
How can i use payloads for boosting?
What are the changes required in schema.xml?
Please provide me some pointers to move ahead
Thanks in advance
--
Lance Norskog
goks...@gmail.com
1301 - 1360 of 1360 matches
Mail list logo