ot;if", which should be commas. Also, you're
> using some odd syntax in the "exists" value data source which expects a field
> name or a function.
>
> -- Jack Krupansky
>
> -Original Message- From: Walter Underwood
> Sent: Wednesday, January 16, 201
None of the variants worked. I started with that syntax for both exists() and
if(). All gave the same stack trace. --wunder
On Jan 16, 2013, at 3:32 PM, Yonik Seeley wrote:
> On Wed, Jan 16, 2013 at 6:11 PM, Walter Underwood
> wrote:
>> I got the syntax fr
Ah, that would be it. Does 4.0 also give a stack trace if you call a function
that doesn't exist?
I can achieve most of what I want with bq, though that has IDF, which I'd
rather avoid here.
wunder
On Jan 16, 2013, at 3:38 PM, Yonik Seeley wrote:
> On Wed, Jan 16, 2013 at 6:
Or a different design.
You can mark collections for deletion, then delete them in an organized, safe
manner later.
wunder
On Jan 17, 2013, at 12:40 PM, snake wrote:
> Ok so is there any other to stop this problem I am having where any site
> can break solr by delering their collection?
> Seems
Have you tried boost query? bq=provider:fred
wunder
On Jan 17, 2013, at 9:08 PM, Jack Krupansky wrote:
> Start with "Query Elevation" and see if that helps:
> http://wiki.apache.org/solr/QueryElevationComponent
>
> Index-time document boost is a possibility.
>
> Maybe an ExternalFileField whe
want to be able to apply
> the boost to arbitrary queries.
>
> The source data comes from MySQL, and this is a seven-shard distributed index
> with 74075200 documents as of a few minutes ago. Although ExternalFileField
> probably wouldn't be impossible, it is rather impract
On Jan 17, 2013, at 10:53 PM, Shawn Heisey wrote:
> On 1/17/2013 11:41 PM, Walter Underwood wrote:
>> As I understand it, the bq parameter is a full Lucene query, but only used
>> for ranking, not for selection. This is the complement of fq.
>>
>> You can use
t;> big
>>>>>>> textual field.
>>>>>>> The queries on the index are non-trivial, and a little-bit long
>>>> (might
>>>>> be
>>>>>>> hundreds of terms). No query is identical to another.
>>>>>>>
>>>>>>> Now, I want to analyze the cache performance (before setting up the
>>>>> whole
>>>>>>> environment), in order to estimate how much RAM will I need.
>>>>>>>
>>>>>>> filterCache:
>>>>>>> In my scenariom, every query has some filters. let's say that each
>>>>> filter
>>>>>>> matches 1M documents, out of 10M. Does the estimated memory usage
>>>>> should
>>>>>> be
>>>>>>> 1M * sizeof(uniqueId) * num-of-filters-in-cache?
>>>>>>>
>>>>>>> fieldValueCache:
>>>>>>> Due to the difference between queries, I guess that fieldValueCache
>>>> is
>>>>>> the
>>>>>>> most important factor on query performance. Here comes a generic
>>>>>> question:
>>>>>>> I'm indexing new documents to the index constantly. Soft commits
>>> will
>>>>> be
>>>>>>> performed every 10 mins. Does it say that the cache is meaningless,
>>>>> after
>>>>>>> every 10 minutes?
>>>>>>>
>>>>>>> documentCache:
>>>>>>> enableLazyFieldLoading will be enabled, and "fl" contains a very
>>>> small
>>>>>> set
>>>>>>> of fields. BUT, I need to return highlighting on about (possibly)
>>> 20
>>>>>>> fields. Does the highlighting component use the documentCache? I
>>>> guess
>>>>>> that
>>>>>>> highlighting requires the whole field to be loaded into the
>>>>>> documentCache.
>>>>>>> Will it happen only for fields that matched a term from the query?
>>>>>>>
>>>>>>> And one more question: I'm planning to hard-commit once a day.
>>>> Should I
>>>>>>> prepare to a significant RAM usage growth between hard-commits?
>>>>>> (consider a
>>>>>>> lot of new documents in this period...)
>>>>>>> Does this RAM comes from the same pool as the caches? An
>>> OutOfMemory
>>>>>>> exception can happen is this scenario?
>>>>>>>
>>>>>>> Thanks a lot.
>>>>>>
>>>>>
>>>>
>>>
--
Walter Underwood
wun...@wunderwood.org
> *:*
> true
> true
> true
> 3
> 500
> ...
> 30,000 docs
> *:* name="querystring">*:* name="parsedquery">MatchAllDocsQuery(*:*) name="parsedquery_toString">*:*
> LuceneQParser
> 617.0 name="prepare">0.0 name="org.apache.solr.handler.component.QueryComponent"> name="time">0.0
> name="time">0.0
> name="time">0.0
> name="time">0.0
> name="time">0.0
> name="time">0.0
> 617.0 name="org.apache.solr.handler.component.QueryComponent"> name="time">516.0
> name="time">0.0
> name="time">0.0
> name="time">0.0
> name="time">0.0
> name="time">101.0
>
> Thank you.
> Best regards,
> Lyuba
--
Walter Underwood
wun...@wunderwood.org
Why? Just skip over that in the code. --wunder
On Jan 23, 2013, at 12:50 PM, hassancrowdc wrote:
> no I wanted it in json. i want it to start from where square bracket starts [
> . I want to remove everything before that. I can get it in json by including
> wt=json. I just want to remove Response
Am I missing something here?
wunder
--
Walter Underwood
wun...@wunderwood.org
The general solution is to add a "deleted" column to your database, or even a
"deleted date" column.
When you update Solr from the DB, issue a delete for each item deleted since
the last successful update.
You can delete those rows after the Solr update or to be extra safe, delete
them a few d
eDS.password}"
> batchSize="-1"/>
>
> solrconfig.xml:
>
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>
> data-config.xml
>
>
> ...
> ...
> ...
> ...
>
>
>
> Did I miss something or is it a bug?
>
> Thanks,
> Boris.
>
--
Walter Underwood
wun...@wunderwood.org
artup is
> only for "playing". You ought to load configs into ZK as a separate operation
> from starting Solrs (and creating collections for that matter). Also see
> recent mail-list dialog "Submit schema definition using curl via SOLR"
>
> Regards, Per Steffen
Oops, that is -DzkHost, not -Dzkhost. --wunder
On Jan 25, 2013, at 10:56 AM, Walter Underwood wrote:
> Thanks, it is working when using just a solr.xml for each node. I can't find
> that anywhere in the docs.
>
> As far as I can tell, the minimum config for a Zookee
This was discussed last week, with two different solutions:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/browser
In general, you can set a Java property, like "-Ddbpass=fred", then use it in
the config files as "${dbpass}".
wunder
On Jan 30, 2013, at 3:37 AM, Lapera-Va
ature. Given that the SimplePostTool is becoming
>>> far from simple, I wanted to see whether the feature is likely to be
>>> accepted before I put in the effort. Also, I would need to consider
>>> which parts of the tool to add that to. Currently I only want it for
>>> posting XML docs, but there's also crawling capabilities in it too.
>>>
>>> Thoughts?
>>>
>>> Upayavira
>>
--
Walter Underwood
wun...@wunderwood.org
sted in my changes is another matter.
>
> Upayavira
>
> On Tue, Feb 5, 2013, at 04:43 AM, Walter Underwood wrote:
>> Have you considered writing a script to upload them with curl and running
>> multiple copies of the script in the background?
>>
>> wunder
>>
You cannot do that. Solr does document-level updates with batch commits. The
value will be available after the batch commit completes. With Solr4 you can do
a realtime get after the commit, but it is still two operations.
wunder
On Feb 5, 2013, at 9:09 AM, Marcos Mendez wrote:
> Any ideas on t
This is apples and pomegranates. Lucene is a library, Solr is a server. In
features, they are more alike than different.
wunder
On Feb 12, 2013, at 7:40 AM, JohnRodey wrote:
> I know that Solr web-enables a Lucene index, but I'm trying to figure out
> what other things Solr offers over Lucene.
etter to use Solr
>> instead, and if there's something you need that Solr can't do, put your
>> development team to work writing the required plugin. They would likely
>> spend far less time doing that than writing an entire search system using
>> Lucene.
>>
>> Thanks,
>> Shawn
>>
>
>
>
> --
> -
> http://zzzoot.blogspot.com/
> -
--
Walter Underwood
wun...@wunderwood.org
between > a
>> > meta data field and a larger content field in Solr.
>> >
>> > Your current search (guessing here) iterates all terms in the content
>> > fields and take a comparatively large penalty when a large document is
>> > encountered. The inversion of index in Solr means that the search terms
>> are
>> > looked up in a dictionary and refers to the documents they belong to. > The
>> > penalty for having thousands or millions of terms as compared to tens or
>> > hundreds in a field in an inverted index is very small.
>> >
>> > We're still in "any random machine you've got available"-land so I > second
>> > Michael's suggestion.
>> >
>> > Regards,
>> > Toke Eskildsen
>
--
Walter Underwood
wun...@wunderwood.org
Seems like there is no way to change your vote. I saw the "... but upgrading"
options at the bottom after I'd already voted.
I would just remove those from the poll. They only complicate things.
wunder
On Feb 15, 2013, at 10:27 AM, Otis Gospodnetic wrote:
> Hi,
>
> I think the subject is self
Do you really want the time that Solr first saw it or do you want the time that
the document was really created in the system? I think an external create
timestamp would be a lot more useful.
wunder
On Feb 16, 2013, at 12:37 PM, Isaac Hebsh wrote:
> I opened a JIRA for this improvement request
to Solr set
>> the timestamp when it does so.
>>
>> Upayavira
>>
>> On Sat, Feb 16, 2013, at 08:56 PM, Isaac Hebsh wrote:
>>> Hi,
>>> I do have an externally-created timestamp, but some minutes may pass
>>> before
>>> it will be sent to Solr.
&g
In production, you should have requests arriving at Solr simultaneously. Those
simultaneous requests will be processed in parallel.
For each query, there are many ways to improve response time. It depends on the
query and the schema.
What query response time are you seeing?
wunder
On Feb 20,
That seems fairly fast. We index about 3 million documents in about half that
time. We are probably limited by the time it takes to get the data from MySQL.
Don't optimize. Solr automatically merges index segments as needed. Optimize
forces a full merge. You'll probably never notice the differen
I cannot answer "yes" to any of those options.
Master/slave and cloud have different strengths and weaknesses. We will use
each one where it is appropriate.
The loose coupling in master/slave is a very good thing and increases
robustness for a corpus that does not have tight freshness requireme
erformance penalty of 100 POST requests (of 1 document each) againt 1
>> request of 100 docs, if a soft commit is eventually done.
>>
>> Thanks in advance...
--
Walter Underwood
wun...@wunderwood.org
Lower case is safer than upper case. For unicode, uppercasing is a lossy
conversion. There are sets of different lower case characters that convert to
the same upper case character. When you convert back to lower case, you don't
know which one it was originally.
Always use lower case for text.
100 shards on a node will almost certainly be slow, but at least it would be
scalable. 7TB of data on one node is going to be slow regardless of how you
shard it.
I might choose a number with more useful divisors than 100, perhaps 96 or 144.
wunder
On Feb 28, 2013, at 4:25 PM, Mark Miller wrot
Are you trying to strip out HTML tags? There are built-in classes that do that.
Or you might want to parse the XML or HTML before you pass it to Solr. An XML
parser will interpret CDATA so that you never have to think about it. The
parsed data is just text.
wunder
On Mar 1, 2013, at 9:21 AM, S
Don't use wildcards. A leading wildcard matches against every token in the
index. This is the search equivalent of a full table scan in a relational
database.
Instead, create a field type that tokenizes e-mail addresses into pieces, then
use phrase search against that.
The address "f...@yahoo.
That is a good start. Use the Analysis page in the admin UI to see what the
tokenizer does.
wunder
On Mar 1, 2013, at 11:02 AM, girish.gopal wrote:
> Hello Wunder,
> I see your point. Will this help if I search for "giri", "giri@",
> "giri@gmail", "@gmail.com" and other combinations.
> So, if
Your assumption is wrong. Solr and Lucene match entire words.
You can use wildcards, but you need to be aware of the performance issues.
If there words are related parts of speech, like singular and plural, you can
use a stemmer to index a root form.
You can also configure synonyms at index tim
Your servers seems to be about the right size, but as everyone else has said,
it depends on the kinds of queries.
Solr should be the only service on the system. Solr can make heavy use of the
disk which will interfere with other processes. If you are lucky enough to get
the system tuned to run
First, terms used to subset the index should be a filter query, not part of the
main query. That may help, because the filter query terms are not used for
relevance scoring.
Have you done any system profiling? Where is the bottleneck: CPU or disk? There
is no point in optimising things before y
host zk(5110) and the url of the
>>> other server(zk port). When i try to start this it give the error: "port
>>> out
>>> of range:-1".
>>>
>>
>> The full log line, ideally with several lines above and below for context,
>> is going to be crucial for figuring this out. Also, the contents of your
>> solr.xml file may be important.
>>
>> Thanks,
>> Shawn
>>
>>
--
Walter Underwood
wun...@wunderwood.org
>> immediately notify us by email or telephone and delete the
>> original email and attachments
>> without using, disseminating or reproducing its contents to
>> anyone other than the intended
>> recipient. Wolters Kluwer shall not be liable for the
>> incorrect or incomplete transmission of
>> of this email or any attachments, nor for unauthorized use
>> by its employees.
>>
>> Wolters Kluwer nv has its registered address in Alphen aan
>> den Rijn, The Netherlands, and is registered
>> with the Trade Registry of the Dutch Chamber of Commerce
>> under number 33202517.
>>
--
Walter Underwood
wun...@wunderwood.org
hint,
>>> Tom
>>>
>>> This email and any attachments may contain confidential or
>>> privileged information
>>> and is intended for the addressee only. If you are not the
>>> intended recipient, please
>>> immediately notify us by email
> &group=true
>
> &group.field=BusinessDateTime
>
> &group.facet=true
>
> &group.field=NetSales
>
> Now the facet is working properly however it is returning the count of the
> documents however i need the sum of the NetSales and the TransCount fields
> instead.
>
> Any help or suggestions would be greatly appreciated.
>
> Thanks,
> Adam
--
Walter Underwood
wun...@wunderwood.org
d love to just keep using the SQL DB that we have been using but
> alas I am not allowed to.
>
> Thanks,
> Adam
>
> -Original Message-----
> From: Walter Underwood [mailto:wun...@wunderwood.org]
> Sent: Monday, March 18, 2013 11:58 AM
> To: solr-user@lucene.apache
s not work, what are the best practices for managing dev/test/prod
configs for Solr?
wunder
--
Walter Underwood
wun...@wunderwood.org
Search Guy, Chegg.com
Or you can do a search for two ads with random ordering, then a second search
for ads in the desired order with excludes for the two ads returned in the
first.
You don't have to do everything inside Solr.
wunder
Search Guy, Chegg
On Feb 9, 2012, at 1:04 AM, Tommaso Teofili wrote:
> I think y
Why are you asking us? This is a standard feature of Newrelic, ask them. They
should have the answer.
http://blog.newrelic.com/2010/05/11/got-apache-solr-search-server-use-rpm-to-monitor-troubleshoot-and-tune-solr-operations/
You can use Solr with any servlet container. We use Tomcat in producti
I've looked at the wiki and the changelog, and I'm still confused about what
versions support compressed fields.
We have an index which is rapidly growing through 100Gb, and I'd like to turn
on text field compression without reindexing. Is that possible?
We are on 3.3.0.
w
> http://sematext.com/spm/solr-performance-monitoring/index.html
>
>
>
> - Original Message -
>> From: Walter Underwood
>> To: solr-user@lucene.apache.org
>> Cc:
>> Sent: Monday, February 13, 2012 5:51 PM
>> Subject: What versions support compresse
In practice, I expect a linear piecewise function (with sharp corners) would be
indistinguishable from the smoothed function. It is also much easier to read,
test, and debug. It might even be faster.
Try the sharp corners one first.
wunder
On Feb 14, 2012, at 10:56 AM, Ted Dunning wrote:
> In
In my first try with the DIH, I had several sub-entities and it was making six
queries per document. My 20M doc load was going to take many hours, most of a
day. I re-wrote it to eliminate those, and now it makes a single query for the
whole load and takes 70 minutes. These are small documents,
t to search on field "title".
>> Now my field title holds the value "great smartphone".
>> If I search on "smartphone" the item is found. But I want the item also to
>> be found on "great" or "phone" it doesnt work.
>> I have been playing around with the tokenizer test function, but have failed
>> to find the definition for the "text" fieldtype I need.
>> Help? :)
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Need-tokenization-that-finds-part-of-stringvalue-tp3785366p3785366.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
--
Walter Underwood
wun...@wunderwood.org
to disable the query elevation stuff by removing it from
your solrconfig.xml.
wunder
Walter Underwood
wun...@wunderwood.org
On Mar 5, 2012, at 1:09 PM, Welty, Richard wrote:
> i googled and found numerous references to this, but no answers that went to
> my specific issues.
>
> i
On Mar 5, 2012, at 1:16 PM, Welty, Richard wrote:
> Walter Underwood [mailto:wun...@wunderwood.org] writes:
>
>> You may be able to have unique keys. At Netflix, I found that there were
>> collisions between >the movie IDs and the person IDs. So, I put an 'm' at
Solr is not relational, so you will probably need to take a fresh look at your
data.
Here is one method.
1. Sketch your search results page.
2. Each result is a document in Solr.
3. Each displayed item is a stored field in Solr.
4. Each searched item is an indexed field in Solr.
It may help to
ore..?
>
> On Thu, Mar 8, 2012 at 12:12 AM, Walter Underwood
> wrote:
>
>> Solr is not relational, so you will probably need to take a fresh look at
>> your data.
>>
>> Here is one method.
>>
>> 1. Sketch your search results page.
>> 2. Each res
gt; fq={!join from=customer_id to=fk_phone_customer_id}phone_area_code:212&
> fq=customer_gender:female
>
> But that does not work for me.
>
> Appreciate any thoughts,
>
> Angelyna
--
Walter Underwood
wun...@wunderwood.org
No, the deleted files do not get replicated. Instead, the slaves do the same
thing as the master, holding on to the deleted files after the new files are
copied over.
The optimize is obsoleting all of your index files, so maybe should quit doing
that. Without an optimize, the deleted files will
If you want to do *anything* across all matches, you probably should be using a
relational database. Search engines, like Solr, are optimized for just the best
matches. Fetching all matches is likely to be slow. Relational databases are
optimized for working with the whole set of matches.
wunde
If you must have real-time search, you might look at systems that are designed
to do that. MarkLogic isn't free, but it is fast and real-time. You can use
their no-charge Express license for development and prototyping:
http://developer.marklogic.com/express
OK, back to Solr.
wunder
Search Guy
Don't. "Optimize" is a poorly-chosen name for a full merge. It doesn't make
that much difference and there is almost never a need to do it on a periodic
basis.
The full merge will mean a longer time between the commit and the time that the
data is first searchable. Do the commit, then search.
at
> various times? Do the deleted documents get removed when doing a
> merge or does that only get done on an optimize?
>
> On Thu, Mar 29, 2012 at 7:08 PM, Walter Underwood
> wrote:
>> Don't. "Optimize" is a poorly-chosen name for a full merge. It doesn'
Quantiles require accessing the entire list of results, or at least, sorting by
the interesting values, checking the total hits, then accessing the results
list at the desired interval. So, with 3000 hits, get deciles by getting the
first row, then the 301st row, the 601st row, etc.
This might
Try adding a multivalued relatedVideo field to each document, then you won't
need the join.
Almost always, you want to do the joins before you load documents into Solr,
and use a denormalized schema in Solr. That will be faster and simpler at query
time.
wunder
Search Guy, Chegg
On Apr 3, 201
I believe we are talking about two different things. The original question was
about incrementally building up a field during indexing, right?
After a document is committed, a field cannot be separately updated, that is
true in both Lucene and Solr.
wunder
On Apr 4, 2012, at 12:20 PM, Yonik S
Why?
When you reindex, is it OK if they all change?
If you reindex one document, is it OK if it gets a new sequential number?
wunder
On Apr 5, 2012, at 9:23 PM, Manish Bafna wrote:
> We already have a unique key (We use md5 value).
> We need another id (sequential numbers).
>
> On Fri, Apr 6,
do it.
> If we pass the number to the field, it will take that value, if we dont
> pass it, it will do auto-increment.
> Because if we update, i will have old number and i will pass it as a field
> again.
>
> On Fri, Apr 6, 2012 at 9:59 AM, Walter Underwood wrote:
>
>>
You will need to define or customize a field type for text.
The example schema.xml file that is installed with Solr 3.5 has a several kinds
of text fields, "text_general" and "text_en" are good places to start. You can
use one of those, then customize it.
wunder
On Apr 9, 2012, at 11:27 AM, s
> ignoreCase="true" expand="true"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateNumberParts="1"
>
>
> ignoreCase="true" expand="true"/>
>words="stopwords.txt" enablePositionIncrements="true" />
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
German noun decompounding is a little more complicated than it might seem.
There can be transformations or inflections, like the "s" in "Weinachtsbaum"
(Weinachten/Baum).
Internal nouns should be recapitalized, like "Baum" above.
Some compounds probably should not be decompounded, like "Fahrrad
u highlight, you need a
dictionary-based segmenter.
wunder
--
Walter Underwood
wun...@wunderwood.org
valence "Fahrrad = Rad" than
decompounding.
wunder
--
Walter Underwood
wun...@wunderwood.org
It is easy. Create two fields, text_exact and text_stem. Don't use the stemmer
in the first chain, do use the stemmer in the second. Give the text_exact a
bigger weight than text_stem.
wunder
On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote:
> No, I don't think there's an OOB way to make this
, I'd like to have that document be the first result in a *:* query.
>>
>> I'm looking into index time boosting using the boost attribute on the
>> appropriate doc. I haven't tested this yet, and I'm not sure this would do
>> anything for the *:* queries.
>>
>> Thanks for any suggested reading or patterns...
>>
>> Best,
>> Chris
>>
>>
>> --
--
Walter Underwood
wun...@wunderwood.org
There is a third approach. Create two fields and always query both of them,
with the exact field given a higher weight. This works great and performs well.
It is what we did at Netflix and what I'm doing at Chegg.
wunder
On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:
> So I just realized t
te:
> Yes, and you might choose to use different options for different fields. For
> dictionary searches, where users are searching for specific words, and a high
> degree of precision is called for, stemming is less helpful, but for full
> text searches, more so.
>
> -Mike
avoc.
>>
>> I'd like to get your thoughts on the following:
>>
>> - Is it standard practice to avoid boosting the title field much, because of
>> the (generally) high IDF of title field terms?
>> - Are there other strategies for handling the high IDF
Solr will not keep the structure of your XML data. Solr and Lucene have a flat
data model. You can map hierarchy into that, but it can be a lot of work.
I recommend starting with a dedicated XML database. MarkLogic is commercial,
but they have added a free developer license that can be used for
Bigrams across character types seems like a useful thing, especially for
indexing adjective and verb endings.
An n-gram approach is always going to generate a lot of junk along with the
gold. Tighten the rules and good stuff is missed, guaranteed. The only way to
sort it out is to use a tokeniz
a-with-solr-integration-details
wunder
--
Walter Underwood
wun...@wunderwood.org
You'll see katakana used with kanji in noun compounds where one of the words is
foreign.
In Japanese, "Rice University" is not written with the kanji word for "rice".
They use katakana for "rice" and kanji for "university", like this: ライス大学.
This is very common. I expect that "President Obama"
nothing. However for some reason if I search
>> on 'evalu' it finds all the matches. Is that an indexing setting or query
>> setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval'
>> to
>> be a match?
>>
>> Thanks,
>> Ken
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
--
Walter Underwood
wun...@wunderwood.org
gt;>> have found no clue about it at all).
>>>
>>> Do you think is it a good idea?
>>>
>>> Do you know services using Solritas as a frontend on a public site?
>>>
>>> My personal opinion is that using Solritas in production is a very bad
>> idea
>>> for us, but have not so much experience with Solr yet, and Solritas
>>> documentation is far from a detailed, up-to-date one, so don't really
>> know
>>> what is it really usable for.
>>>
>>> Thanks,
>>> Andras
>>
>>
--
Walter Underwood
wun...@wunderwood.org
Yes, milliseconds. --wunder
On May 10, 2012, at 8:57 AM, G.Long wrote:
> Hi :)
>
> In what unit of time is expressed the QTime of a QueryResponse? Is it
> milliseconds?
>
> Gary
No. Lucene and Solr commits replace the entire document. --wunder
On May 12, 2012, at 10:00 AM, Mark Laurent wrote:
> Hello,
>
> Is it possible to perform an index commit that Solr would add the incoming
> value to an existing fields' value?
>
> I have for example:
>
>
> required="
ext:
> http://lucene.472066.n3.nabble.com/Must-match-and-terms-with-only-one-letter-tp3984139.html
> Sent from the Solr - User mailing list archive at Nabble.com.
--
Walter Underwood
wun...@wunderwood.org
In Unicode, uppercasing characters loses information, because there are some
upper case characters that represent more than one lower case character.
Lower casing text is safe, so always lower-case.
wunder
On May 18, 2012, at 10:41 AM, srinir wrote:
> I am wondering why solr doesnt have an upp
Why? Query-time boosting is fast and more flexible.
wunder
Search Guy, Netflix & Chegg
On May 24, 2012, at 6:11 AM, Chamnap Chhorn wrote:
> Anyone could help me? I really need index-time field-boosting.
>
> On Thu, May 24, 2012 at 4:21 PM, Chamnap Chhorn
> wrote:
>
>> Hi all,
>>
>> I want t
Am I correct in thinking that a
> multiversion concurrency control (MVCC) locking mechanism now exist for a
> single core or is it lock-free and multi-core?
>
> Many thanks,
> Nicholas Ball (aka incunix)
--
Walter Underwood
wun...@wunderwood.org
tion
> asset. Therefore, some document when matched are more important than
> others. That's what index time boost does, right?
>
> On Thu, May 24, 2012 at 10:10 PM, Walter Underwood
> wrote:
>
>> Why? Query-time boosting is fast and more flexible.
>>
>> wu
A. Never optimize on the slave.
B. You probably do not need to optimize on the master.
"Optimize" does not optimize anything. It is forced merge, combining segments.
Solr automatically combines segments as needed.
wunder
On May 26, 2012, at 1:57 PM, sudarshan wrote:
> Hi All,
> I happen
Solr automatically scales the scores of fuzzy matches by their distance from an
exact match. So, you don't have to change anything.
wunder
On May 26, 2012, at 11:52 PM, Gau wrote:
> Hi Lori,
>
> Yeah. I thought exactly of the same solution. Use a copy field and boost
> the relevancy of the t
You do not need to use optimize at all.
Solr continually merges segments ("optimizes") as needed.
wunder
On May 29, 2012, at 6:08 AM, sudarshan wrote:
> Hi Walter,
> Thank you. Do you mean that optimize need not be used at all?
> If Solr merges segments (when needed as you said), is
Solr does not natively store/index/search arbitrary JSON documents.
It accepts JSON in a specific format for document input.
wunder
On May 29, 2012, at 3:21 PM, rjain15 wrote:
> Hi Gora,
>
> I am working on a Mobile App, which is updating/accessing/searching data and
> I have created a simple
On May 30, 2012, at 11:44 AM, Aaron Daubman wrote:
> The bigger question is: what are the parallel task
> execution paths in Solr and under what conditions are they possible?
I'd go with the general servlet rules, where everything is assumed to have
concurrent access.
wunde
http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor
The defaults are very good. I have never changed them, and I've had Solr in
production at two major sites, Netflix and Chegg.
Don't spend any more time worrying about merges.
wunder
On May 31, 2012, at 10:51 AM, sudarshan wrote:
>
This is a bad idea. Solr is not designed to be exposed to arbitrary internet
traffic and attacks. The best design is to have a front end server make
requests to Solr, then use those to make HTML pages.
wunder
On Jun 7, 2012, at 4:49 AM, Spadez wrote:
> Final comment from me then Ill let someon
Are you requesting a large number of rows? If so, request smaller chunks, like
ten at a time. Then you can show those with a "waiting" note.
wunder
On Jun 7, 2012, at 1:14 PM, Laurent Vaills wrote:
> Hi everyone,
>
> We have some grouping queries that are quite long to execute. Some are too
>
You probably do not want this ranking, because any query with a common word,
like "the", will match most of the corpus in step two.
Instead, use Solr to weight better quality matches more heavily, maybe 4X for
exact matches, 2X for stemmed matches, and 1X for phonetic matches.
wunder
On Dec 20
> www.dataprisma.com.br
> - Original Message -
> From: "Walter Underwood"
> To:
> Sent: Monday, December 20, 2010 2:02 PM
> Subject: Re: about groups of random results + alphabetical result
>
>
> You probably do not want this ranking, because any quer
On Jan 13, 2011, at 1:28 PM, Dennis Gearon wrote:
> Do I even need a body for this message? ;-)
>
> Dennis Gearon
Are you asking "is it" or "should it be"? If the latter, we can also discuss
Emacs and vi.
wunder
--
Walter Underwood
K6WRU
701 - 800 of 1703 matches
Mail list logo