Re: 400 error with boost and exists()

2013-01-16 Thread Walter Underwood
ot;if", which should be commas. Also, you're > using some odd syntax in the "exists" value data source which expects a field > name or a function. > > -- Jack Krupansky > > -Original Message- From: Walter Underwood > Sent: Wednesday, January 16, 201

Re: 400 error with boost and exists()

2013-01-16 Thread Walter Underwood
None of the variants worked. I started with that syntax for both exists() and if(). All gave the same stack trace. --wunder On Jan 16, 2013, at 3:32 PM, Yonik Seeley wrote: > On Wed, Jan 16, 2013 at 6:11 PM, Walter Underwood > wrote: >> I got the syntax fr

Re: 400 error with boost and exists()

2013-01-16 Thread Walter Underwood
Ah, that would be it. Does 4.0 also give a stack trace if you call a function that doesn't exist? I can achieve most of what I want with bq, though that has IDF, which I'd rather avoid here. wunder On Jan 16, 2013, at 3:38 PM, Yonik Seeley wrote: > On Wed, Jan 16, 2013 at 6:

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Walter Underwood
Or a different design. You can mark collections for deletion, then delete them in an organized, safe manner later. wunder On Jan 17, 2013, at 12:40 PM, snake wrote: > Ok so is there any other to stop this problem I am having where any site > can break solr by delering their collection? > Seems

Re: Questions about boosting

2013-01-17 Thread Walter Underwood
Have you tried boost query? bq=provider:fred wunder On Jan 17, 2013, at 9:08 PM, Jack Krupansky wrote: > Start with "Query Elevation" and see if that helps: > http://wiki.apache.org/solr/QueryElevationComponent > > Index-time document boost is a possibility. > > Maybe an ExternalFileField whe

Re: Questions about boosting

2013-01-17 Thread Walter Underwood
want to be able to apply > the boost to arbitrary queries. > > The source data comes from MySQL, and this is a seven-shard distributed index > with 74075200 documents as of a few minutes ago. Although ExternalFileField > probably wouldn't be impossible, it is rather impract

Re: Questions about boosting

2013-01-18 Thread Walter Underwood
On Jan 17, 2013, at 10:53 PM, Shawn Heisey wrote: > On 1/17/2013 11:41 PM, Walter Underwood wrote: >> As I understand it, the bq parameter is a full Lucene query, but only used >> for ranking, not for selection. This is the complement of fq. >> >> You can use

Re: Solr cache considerations

2013-01-20 Thread Walter Underwood
t;> big >>>>>>> textual field. >>>>>>> The queries on the index are non-trivial, and a little-bit long >>>> (might >>>>> be >>>>>>> hundreds of terms). No query is identical to another. >>>>>>> >>>>>>> Now, I want to analyze the cache performance (before setting up the >>>>> whole >>>>>>> environment), in order to estimate how much RAM will I need. >>>>>>> >>>>>>> filterCache: >>>>>>> In my scenariom, every query has some filters. let's say that each >>>>> filter >>>>>>> matches 1M documents, out of 10M. Does the estimated memory usage >>>>> should >>>>>> be >>>>>>> 1M * sizeof(uniqueId) * num-of-filters-in-cache? >>>>>>> >>>>>>> fieldValueCache: >>>>>>> Due to the difference between queries, I guess that fieldValueCache >>>> is >>>>>> the >>>>>>> most important factor on query performance. Here comes a generic >>>>>> question: >>>>>>> I'm indexing new documents to the index constantly. Soft commits >>> will >>>>> be >>>>>>> performed every 10 mins. Does it say that the cache is meaningless, >>>>> after >>>>>>> every 10 minutes? >>>>>>> >>>>>>> documentCache: >>>>>>> enableLazyFieldLoading will be enabled, and "fl" contains a very >>>> small >>>>>> set >>>>>>> of fields. BUT, I need to return highlighting on about (possibly) >>> 20 >>>>>>> fields. Does the highlighting component use the documentCache? I >>>> guess >>>>>> that >>>>>>> highlighting requires the whole field to be loaded into the >>>>>> documentCache. >>>>>>> Will it happen only for fields that matched a term from the query? >>>>>>> >>>>>>> And one more question: I'm planning to hard-commit once a day. >>>> Should I >>>>>>> prepare to a significant RAM usage growth between hard-commits? >>>>>> (consider a >>>>>>> lot of new documents in this period...) >>>>>>> Does this RAM comes from the same pool as the caches? An >>> OutOfMemory >>>>>>> exception can happen is this scenario? >>>>>>> >>>>>>> Thanks a lot. >>>>>> >>>>> >>>> >>> -- Walter Underwood wun...@wunderwood.org

Re: Solr 4.0 - timeAllowed in distributed search

2013-01-20 Thread Walter Underwood
> *:* > true > true > true > 3 > 500 > ... > 30,000 docs > *:* name="querystring">*:* name="parsedquery">MatchAllDocsQuery(*:*) name="parsedquery_toString">*:* > LuceneQParser > 617.0 name="prepare">0.0 name="org.apache.solr.handler.component.QueryComponent"> name="time">0.0 > name="time">0.0 > name="time">0.0 > name="time">0.0 > name="time">0.0 > name="time">0.0 > 617.0 name="org.apache.solr.handler.component.QueryComponent"> name="time">516.0 > name="time">0.0 > name="time">0.0 > name="time">0.0 > name="time">0.0 > name="time">101.0 > > Thank you. > Best regards, > Lyuba -- Walter Underwood wun...@wunderwood.org

Re: ResultSet Solr

2013-01-23 Thread Walter Underwood
Why? Just skip over that in the code. --wunder On Jan 23, 2013, at 12:50 PM, hassancrowdc wrote: > no I wanted it in json. i want it to start from where square bracket starts [ > . I want to remove everything before that. I can get it in json by including > wt=json. I just want to remove Response

Starting instances with multiple collections

2013-01-23 Thread Walter Underwood
Am I missing something here? wunder -- Walter Underwood wun...@wunderwood.org

Re: Deletion from database

2013-01-24 Thread Walter Underwood
The general solution is to add a "deleted" column to your database, or even a "deleted date" column. When you update Solr from the DB, issue a delete for each item deleted since the last successful update. You can delete those rows after the Solr update or to be extra safe, delete them a few d

Re: Error in DIH after upgrading from 4.0 to 4.1

2013-01-25 Thread Walter Underwood
eDS.password}" > batchSize="-1"/> > > solrconfig.xml: > > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > data-config.xml > > > ... > ... > ... > ... > > > > Did I miss something or is it a bug? > > Thanks, > Boris. > -- Walter Underwood wun...@wunderwood.org

Re: Starting instances with multiple collections

2013-01-25 Thread Walter Underwood
artup is > only for "playing". You ought to load configs into ZK as a separate operation > from starting Solrs (and creating collections for that matter). Also see > recent mail-list dialog "Submit schema definition using curl via SOLR" > > Regards, Per Steffen

Re: Starting instances with multiple collections

2013-01-25 Thread Walter Underwood
Oops, that is -DzkHost, not -Dzkhost. --wunder On Jan 25, 2013, at 10:56 AM, Walter Underwood wrote: > Thanks, it is working when using just a solr.xml for each node. I can't find > that anywhere in the docs. > > As far as I can tell, the minimum config for a Zookee

Re: configuring datasource for dynamic password and user

2013-01-30 Thread Walter Underwood
This was discussed last week, with two different solutions: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/browser In general, you can set a Java property, like "-Ddbpass=fred", then use it in the config files as "${dbpass}". wunder On Jan 30, 2013, at 3:37 AM, Lapera-Va

Re: Multi-threaded post.jar?

2013-02-04 Thread Walter Underwood
ature. Given that the SimplePostTool is becoming >>> far from simple, I wanted to see whether the feature is likely to be >>> accepted before I put in the effort. Also, I would need to consider >>> which parts of the tool to add that to. Currently I only want it for >>> posting XML docs, but there's also crawling capabilities in it too. >>> >>> Thoughts? >>> >>> Upayavira >> -- Walter Underwood wun...@wunderwood.org

Re: Multi-threaded post.jar?

2013-02-04 Thread Walter Underwood
sted in my changes is another matter. > > Upayavira > > On Tue, Feb 5, 2013, at 04:43 AM, Walter Underwood wrote: >> Have you considered writing a script to upload them with curl and running >> multiple copies of the script in the background? >> >> wunder >>

Re: solr atomic update

2013-02-05 Thread Walter Underwood
You cannot do that. Solr does document-level updates with batch commits. The value will be available after the batch commit completes. With Solr4 you can do a realtime get after the commit, but it is still two operations. wunder On Feb 5, 2013, at 9:09 AM, Marcos Mendez wrote: > Any ideas on t

Re: Benefits of Solr over Lucene?

2013-02-12 Thread Walter Underwood
This is apples and pomegranates. Lucene is a library, Solr is a server. In features, they are more alike than different. wunder On Feb 12, 2013, at 7:40 AM, JohnRodey wrote: > I know that Solr web-enables a Lucene index, but I'm trying to figure out > what other things Solr offers over Lucene.

Re: Benefits of Solr over Lucene?

2013-02-12 Thread Walter Underwood
etter to use Solr >> instead, and if there's something you need that Solr can't do, put your >> development team to work writing the required plugin. They would likely >> spend far less time doing that than writing an entire search system using >> Lucene. >> >> Thanks, >> Shawn >> > > > > -- > - > http://zzzoot.blogspot.com/ > - -- Walter Underwood wun...@wunderwood.org

Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Walter Underwood
between > a >> > meta data field and a larger content field in Solr. >> > >> > Your current search (guessing here) iterates all terms in the content >> > fields and take a comparatively large penalty when a large document is >> > encountered. The inversion of index in Solr means that the search terms >> are >> > looked up in a dictionary and refers to the documents they belong to. > The >> > penalty for having thousands or millions of terms as compared to tens or >> > hundreds in a field in an inverted index is very small. >> > >> > We're still in "any random machine you've got available"-land so I > second >> > Michael's suggestion. >> > >> > Regards, >> > Toke Eskildsen > -- Walter Underwood wun...@wunderwood.org

Re: POLL: Which Solr version are you on?

2013-02-15 Thread Walter Underwood
Seems like there is no way to change your vote. I saw the "... but upgrading" options at the bottom after I'd already voted. I would just remove those from the poll. They only complicate things. wunder On Feb 15, 2013, at 10:27 AM, Otis Gospodnetic wrote: > Hi, > > I think the subject is self

Re: Timestamp field is changed on update

2013-02-16 Thread Walter Underwood
Do you really want the time that Solr first saw it or do you want the time that the document was really created in the system? I think an external create timestamp would be a lot more useful. wunder On Feb 16, 2013, at 12:37 PM, Isaac Hebsh wrote: > I opened a JIRA for this improvement request

Re: Timestamp field is changed on update

2013-02-16 Thread Walter Underwood
to Solr set >> the timestamp when it does so. >> >> Upayavira >> >> On Sat, Feb 16, 2013, at 08:56 PM, Isaac Hebsh wrote: >>> Hi, >>> I do have an externally-created timestamp, but some minutes may pass >>> before >>> it will be sent to Solr. &g

Re: Threads running while querrying

2013-02-20 Thread Walter Underwood
In production, you should have requests arriving at Solr simultaneously. Those simultaneous requests will be processed in parallel. For each query, there are many ways to improve response time. It depends on the query and the schema. What query response time are you seeing? wunder On Feb 20,

Re: Index optimize takes more than 40 minutes for 18M documents

2013-02-21 Thread Walter Underwood
That seems fairly fast. We index about 3 million documents in about half that time. We are probably limited by the time it takes to get the data from MySQL. Don't optimize. Solr automatically merges index segments as needed. Optimize forces a full merge. You'll probably never notice the differen

Re: Poll: SolrCloud vs. Master-Slave usage

2013-02-25 Thread Walter Underwood
I cannot answer "yes" to any of those options. Master/slave and cloud have different strengths and weaknesses. We will use each one where it is appropriate. The loose coupling in master/slave is a very good thing and increases robustness for a corpus that does not have tight freshness requireme

Re: update fails if one doc is wrong

2013-02-26 Thread Walter Underwood
erformance penalty of 100 POST requests (of 1 document each) againt 1 >> request of 100 docs, if a soft commit is eventually done. >> >> Thanks in advance... -- Walter Underwood wun...@wunderwood.org

Re: Solr Case-sensitivity issue with search field name

2013-02-28 Thread Walter Underwood
Lower case is safer than upper case. For unicode, uppercasing is a lossy conversion. There are sets of different lower case characters that convert to the same upper case character. When you convert back to lower case, you don't know which one it was originally. Always use lower case for text.

Re: Solr 4.1 Solr Cloud Shard Structure

2013-02-28 Thread Walter Underwood
100 shards on a node will almost certainly be slow, but at least it would be scalable. 7TB of data on one node is going to be slow regardless of how you shard it. I might choose a number with more useful divisors than 100, perhaps 96 or 144. wunder On Feb 28, 2013, at 4:25 PM, Mark Miller wrot

Re: Defining tokenizer pattern with < character

2013-03-01 Thread Walter Underwood
Are you trying to strip out HTML tags? There are built-in classes that do that. Or you might want to parse the XML or HTML before you pass it to Solr. An XML parser will interpret CDATA so that you never have to think about it. The parsed data is just text. wunder On Mar 1, 2013, at 9:21 AM, S

Re: Email Search Slow

2013-03-01 Thread Walter Underwood
Don't use wildcards. A leading wildcard matches against every token in the index. This is the search equivalent of a full table scan in a relational database. Instead, create a field type that tokenizes e-mail addresses into pieces, then use phrase search against that. The address "f...@yahoo.

Re: Email Search Slow

2013-03-01 Thread Walter Underwood
That is a good start. Use the Analysis page in the admin UI to see what the tokenizer does. wunder On Mar 1, 2013, at 11:02 AM, girish.gopal wrote: > Hello Wunder, > I see your point. Will this help if I search for "giri", "giri@", > "giri@gmail", "@gmail.com" and other combinations. > So, if

Re: Unable to match partial word

2013-03-05 Thread Walter Underwood
Your assumption is wrong. Solr and Lucene match entire words. You can use wildcards, but you need to be aware of the performance issues. If there words are related parts of speech, like singular and plural, you can use a stemmer to index a root form. You can also configure synonyms at index tim

Re: SOLR - Recommendation on architecture

2013-03-08 Thread Walter Underwood
Your servers seems to be about the right size, but as everyone else has said, it depends on the kinds of queries. Solr should be the only service on the system. Solr can make heavy use of the disk which will interfere with other processes. If you are lucky enough to get the system tuned to run

Re: How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-08 Thread Walter Underwood
First, terms used to subset the index should be a filter query, not part of the main query. That may help, because the filter query terms are not used for relevance scoring. Have you done any system profiling? Where is the bottleneck: CPU or disk? There is no point in optimising things before y

Re: SolrCloud: port out of range:-1

2013-03-08 Thread Walter Underwood
host zk(5110) and the url of the >>> other server(zk port). When i try to start this it give the error: "port >>> out >>> of range:-1". >>> >> >> The full log line, ideally with several lines above and below for context, >> is going to be crucial for figuring this out. Also, the contents of your >> solr.xml file may be important. >> >> Thanks, >> Shawn >> >> -- Walter Underwood wun...@wunderwood.org

Re: strange edismax parsing when searching in multiple fields (#TB)

2013-03-13 Thread Walter Underwood
>> immediately notify us by email or telephone and delete the >> original email and attachments >> without using, disseminating or reproducing its contents to >> anyone other than the intended >> recipient. Wolters Kluwer shall not be liable for the >> incorrect or incomplete transmission of >> of this email or any attachments, nor for unauthorized use >> by its employees. >> >> Wolters Kluwer nv has its registered address in Alphen aan >> den Rijn, The Netherlands, and is registered >> with the Trade Registry of the Dutch Chamber of Commerce >> under number 33202517. >> -- Walter Underwood wun...@wunderwood.org

Re: [SPAM] Re: strange edismax parsing when searching in multiple fields (#TB)

2013-03-13 Thread Walter Underwood
hint, >>> Tom >>> >>> This email and any attachments may contain confidential or >>> privileged information >>> and is intended for the addressee only. If you are not the >>> intended recipient, please >>> immediately notify us by email

Re: Group By and Sum

2013-03-18 Thread Walter Underwood
> &group=true > > &group.field=BusinessDateTime > > &group.facet=true > > &group.field=NetSales > > Now the facet is working properly however it is returning the count of the > documents however i need the sum of the NetSales and the TransCount fields > instead. > > Any help or suggestions would be greatly appreciated. > > Thanks, > Adam -- Walter Underwood wun...@wunderwood.org

Re: Group By and Sum

2013-03-18 Thread Walter Underwood
d love to just keep using the SQL DB that we have been using but > alas I am not allowed to. > > Thanks, > Adam > > -Original Message----- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: Monday, March 18, 2013 11:58 AM > To: solr-user@lucene.apache

solrcore.properties

2012-02-06 Thread Walter Underwood
s not work, what are the best practices for managing dev/test/prod configs for Solr? wunder -- Walter Underwood wun...@wunderwood.org Search Guy, Chegg.com

Re: How to do this in Solr? random result for the first few results

2012-02-09 Thread Walter Underwood
Or you can do a search for two ads with random ordering, then a second search for ads in the desired order with excludes for the two ads returned in the first. You don't have to do everything inside Solr. wunder Search Guy, Chegg On Feb 9, 2012, at 1:04 AM, Tommaso Teofili wrote: > I think y

Re: how to monitor solr in newrelic

2012-02-13 Thread Walter Underwood
Why are you asking us? This is a standard feature of Newrelic, ask them. They should have the answer. http://blog.newrelic.com/2010/05/11/got-apache-solr-search-server-use-rpm-to-monitor-troubleshoot-and-tune-solr-operations/ You can use Solr with any servlet container. We use Tomcat in producti

What versions support compressed text fields?

2012-02-13 Thread Walter Underwood
I've looked at the wiki and the changelog, and I'm still confused about what versions support compressed fields. We have an index which is rapidly growing through 100Gb, and I'd like to turn on text field compression without reindexing. Is that possible? We are on 3.3.0. w

Re: What versions support compressed text fields?

2012-02-13 Thread Walter Underwood
> http://sematext.com/spm/solr-performance-monitoring/index.html > > > > - Original Message - >> From: Walter Underwood >> To: solr-user@lucene.apache.org >> Cc: >> Sent: Monday, February 13, 2012 5:51 PM >> Subject: What versions support compresse

Re: Need help with graphing function (MATH)

2012-02-14 Thread Walter Underwood
In practice, I expect a linear piecewise function (with sharp corners) would be indistinguishable from the smoothed function. It is also much easier to read, test, and debug. It might even be faster. Try the sharp corners one first. wunder On Feb 14, 2012, at 10:56 AM, Ted Dunning wrote: > In

Re: Unusually long data import time?

2012-02-22 Thread Walter Underwood
In my first try with the DIH, I had several sub-entities and it was making six queries per document. My 20M doc load was going to take many hours, most of a day. I re-wrote it to eliminate those, and now it makes a single query for the whole load and takes 70 minutes. These are small documents,

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Walter Underwood
t to search on field "title". >> Now my field title holds the value "great smartphone". >> If I search on "smartphone" the item is found. But I want the item also to >> be found on "great" or "phone" it doesnt work. >> I have been playing around with the tokenizer test function, but have failed >> to find the definition for the "text" fieldtype I need. >> Help? :) >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Need-tokenization-that-finds-part-of-stringvalue-tp3785366p3785366.html >> Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org

Re: disabling QueryElevationComponent

2012-03-05 Thread Walter Underwood
to disable the query elevation stuff by removing it from your solrconfig.xml. wunder Walter Underwood wun...@wunderwood.org On Mar 5, 2012, at 1:09 PM, Welty, Richard wrote: > i googled and found numerous references to this, but no answers that went to > my specific issues. > > i

Re: disabling QueryElevationComponent

2012-03-05 Thread Walter Underwood
On Mar 5, 2012, at 1:16 PM, Welty, Richard wrote: > Walter Underwood [mailto:wun...@wunderwood.org] writes: > >> You may be able to have unique keys. At Netflix, I found that there were >> collisions between >the movie IDs and the person IDs. So, I put an 'm' at

Re: schema design help

2012-03-07 Thread Walter Underwood
Solr is not relational, so you will probably need to take a fresh look at your data. Here is one method. 1. Sketch your search results page. 2. Each result is a document in Solr. 3. Each displayed item is a stored field in Solr. 4. Each searched item is an indexed field in Solr. It may help to

Re: schema design help

2012-03-07 Thread Walter Underwood
ore..? > > On Thu, Mar 8, 2012 at 12:12 AM, Walter Underwood > wrote: > >> Solr is not relational, so you will probably need to take a fresh look at >> your data. >> >> Here is one method. >> >> 1. Sketch your search results page. >> 2. Each res

Re: 3 Way Solr Join . . ?

2012-03-10 Thread Walter Underwood
gt; fq={!join from=customer_id to=fk_phone_customer_id}phone_area_code:212& > fq=customer_gender:female > > But that does not work for me. > > Appreciate any thoughts, > > Angelyna -- Walter Underwood wun...@wunderwood.org

Re: index size with replication

2012-03-15 Thread Walter Underwood
No, the deleted files do not get replicated. Instead, the slaves do the same thing as the master, holding on to the deleted files after the new files are copied over. The optimize is obsoleting all of your index files, so maybe should quit doing that. Without an optimize, the deleted files will

Re: query score across ALL docs

2012-03-28 Thread Walter Underwood
If you want to do *anything* across all matches, you probably should be using a relational database. Search engines, like Solr, are optimized for just the best matches. Fetching all matches is likely to be slow. Relational databases are optimized for working with the whole set of matches. wunde

Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Walter Underwood
If you must have real-time search, you might look at systems that are designed to do that. MarkLogic isn't free, but it is fast and real-time. You can use their no-charge Express license for development and prototyping: http://developer.marklogic.com/express OK, back to Solr. wunder Search Guy

Re: Optimizing in SolrCloud

2012-03-29 Thread Walter Underwood
Don't. "Optimize" is a poorly-chosen name for a full merge. It doesn't make that much difference and there is almost never a need to do it on a periodic basis. The full merge will mean a longer time between the commit and the time that the data is first searchable. Do the commit, then search.

Re: Optimizing in SolrCloud

2012-03-29 Thread Walter Underwood
at > various times? Do the deleted documents get removed when doing a > merge or does that only get done on an optimize? > > On Thu, Mar 29, 2012 at 7:08 PM, Walter Underwood > wrote: >> Don't. "Optimize" is a poorly-chosen name for a full merge. It doesn'

Re: Quantiles in SOLR ???

2012-03-30 Thread Walter Underwood
Quantiles require accessing the entire list of results, or at least, sorting by the interesting values, checking the total hits, then accessing the results list at the desired interval. So, with 3000 hits, get deciles by getting the first row, then the 301st row, the 601st row, etc. This might

Re: solr join

2012-04-03 Thread Walter Underwood
Try adding a multivalued relatedVideo field to each document, then you won't need the join. Almost always, you want to do the joins before you load documents into Solr, and use a denormalized schema in Solr. That will be faster and simpler at query time. wunder Search Guy, Chegg On Apr 3, 201

Re: Incremantally updating a VERY LARGE field - Is this possibe ?

2012-04-04 Thread Walter Underwood
I believe we are talking about two different things. The original question was about incrementally building up a field during indexing, right? After a document is committed, a field cannot be separately updated, that is true in both Lucene and Solr. wunder On Apr 4, 2012, at 12:20 PM, Yonik S

Re: counter field

2012-04-05 Thread Walter Underwood
Why? When you reindex, is it OK if they all change? If you reindex one document, is it OK if it gets a new sequential number? wunder On Apr 5, 2012, at 9:23 PM, Manish Bafna wrote: > We already have a unique key (We use md5 value). > We need another id (sequential numbers). > > On Fri, Apr 6,

Re: counter field

2012-04-05 Thread Walter Underwood
do it. > If we pass the number to the field, it will take that value, if we dont > pass it, it will do auto-increment. > Because if we update, i will have old number and i will pass it as a field > again. > > On Fri, Apr 6, 2012 at 9:59 AM, Walter Underwood wrote: > >>

Re: Solr is indexing but not showing results

2012-04-09 Thread Walter Underwood
You will need to define or customize a field type for text. The example schema.xml file that is installed with Solr 3.5 has a several kinds of text fields, "text_general" and "text_en" are good places to start. You can use one of those, then customize it. wunder On Apr 9, 2012, at 11:27 AM, s

Re: Solr is indexing but not showing results

2012-04-09 Thread Walter Underwood
> ignoreCase="true" expand="true"/> >ignoreCase="true" >words="stopwords.txt" >enablePositionIncrements="true" >/> > generateNumberParts="1"

Re: Solr is indexing but not showing results

2012-04-09 Thread Walter Underwood
> > > ignoreCase="true" expand="true"/> >words="stopwords.txt" enablePositionIncrements="true" /> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"

Re: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the "s" in "Weinachtsbaum" (Weinachten/Baum). Internal nouns should be recapitalized, like "Baum" above. Some compounds probably should not be decompounded, like "Fahrrad

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
u highlight, you need a dictionary-based segmenter. wunder -- Walter Underwood wun...@wunderwood.org

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
valence "Fahrrad = Rad" than decompounding. wunder -- Walter Underwood wun...@wunderwood.org

Re: Solr Scoring

2012-04-12 Thread Walter Underwood
It is easy. Create two fields, text_exact and text_stem. Don't use the stemmer in the first chain, do use the stemmer in the second. Give the text_exact a bigger weight than text_stem. wunder On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote: > No, I don't think there's an OOB way to make this

Re: Can you suggest a method or pattern to consistently promote a document with any query?

2012-04-18 Thread Walter Underwood
, I'd like to have that document be the first result in a *:* query. >> >> I'm looking into index time boosting using the boost attribute on the >> appropriate doc. I haven't tested this yet, and I'm not sure this would do >> anything for the *:* queries. >> >> Thanks for any suggested reading or patterns... >> >> Best, >> Chris >> >> >> -- -- Walter Underwood wun...@wunderwood.org

Re: Deciding whether to stem at query time

2012-04-23 Thread Walter Underwood
There is a third approach. Create two fields and always query both of them, with the exact field given a higher weight. This works great and performs well. It is what we did at Netflix and what I'm doing at Chegg. wunder On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote: > So I just realized t

Re: Deciding whether to stem at query time

2012-04-23 Thread Walter Underwood
te: > Yes, and you might choose to use different options for different fields. For > dictionary searches, where users are searching for specific words, and a high > degree of precision is called for, stemming is less helpful, but for full > text searches, more so. > > -Mike

Re: Title Boosting and IDF

2012-04-25 Thread Walter Underwood
avoc. >> >> I'd like to get your thoughts on the following: >> >> - Is it standard practice to avoid boosting the title field much, because of >> the (generally) high IDF of title field terms? >> - Are there other strategies for handling the high IDF

Re: Does Solr fit my needs?

2012-04-27 Thread Walter Underwood
Solr will not keep the structure of your XML data. Solr and Lucene have a flat data model. You can map hierarchy into that, but it can be a lot of work. I recommend starting with a dedicated XML database. MarkLogic is commercial, but they have added a free developer license that can be used for

Re: CJKBigram filter questons: single character queries, bigrams created across sript/character types

2012-04-27 Thread Walter Underwood
Bigrams across character types seems like a useful thing, especially for indexing adjective and verb endings. An n-gram approach is always going to generate a lot of junk along with the gold. Tighten the rules and good stuff is missed, guaranteed. The only way to sort it out is to use a tokeniz

Re: Benchmark Solr vs Elastic Search vs Sensei

2012-04-27 Thread Walter Underwood
a-with-solr-integration-details wunder -- Walter Underwood wun...@wunderwood.org

Re: CJKBigram filter questons: single character queries, bigrams created across sript/character types

2012-04-30 Thread Walter Underwood
You'll see katakana used with kanji in noun compounds where one of the words is foreign. In Japanese, "Rice University" is not written with the kanji word for "rice". They use katakana for "rice" and kanji for "university", like this: ライス大学. This is very common. I expect that "President Obama"

Re: question on tokenization control

2012-05-01 Thread Walter Underwood
nothing. However for some reason if I search >> on 'evalu' it finds all the matches. Is that an indexing setting or query >> setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' >> to >> be a match? >> >> Thanks, >> Ken >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> -- Walter Underwood wun...@wunderwood.org

Re: Solritas in production

2012-05-06 Thread Walter Underwood
gt;>> have found no clue about it at all). >>> >>> Do you think is it a good idea? >>> >>> Do you know services using Solritas as a frontend on a public site? >>> >>> My personal opinion is that using Solritas in production is a very bad >> idea >>> for us, but have not so much experience with Solr yet, and Solritas >>> documentation is far from a detailed, up-to-date one, so don't really >> know >>> what is it really usable for. >>> >>> Thanks, >>> Andras >> >> -- Walter Underwood wun...@wunderwood.org

Re: question about solr response qtime

2012-05-10 Thread Walter Underwood
Yes, milliseconds. --wunder On May 10, 2012, at 8:57 AM, G.Long wrote: > Hi :) > > In what unit of time is expressed the QTime of a QueryResponse? Is it > milliseconds? > > Gary

Re: Indexing to add to a field, not replace

2012-05-12 Thread Walter Underwood
No. Lucene and Solr commits replace the entire document. --wunder On May 12, 2012, at 10:00 AM, Mark Laurent wrote: > Hello, > > Is it possible to perform an index commit that Solr would add the incoming > value to an existing fields' value? > > I have for example: > > > required="

Re: Must match and terms with only one letter

2012-05-16 Thread Walter Underwood
ext: > http://lucene.472066.n3.nabble.com/Must-match-and-terms-with-only-one-letter-tp3984139.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org

Re: why no uppercase filter in solr

2012-05-18 Thread Walter Underwood
In Unicode, uppercasing characters loses information, because there are some upper case characters that represent more than one lower case character. Lower casing text is safe, so always lower-case. wunder On May 18, 2012, at 10:41 AM, srinir wrote: > I am wondering why solr doesnt have an upp

Re: Index-time field boost with DIH

2012-05-24 Thread Walter Underwood
Why? Query-time boosting is fast and more flexible. wunder Search Guy, Netflix & Chegg On May 24, 2012, at 6:11 AM, Chamnap Chhorn wrote: > Anyone could help me? I really need index-time field-boosting. > > On Thu, May 24, 2012 at 4:21 PM, Chamnap Chhorn > wrote: > >> Hi all, >> >> I want t

Re: Solr 4.0 Distributed Concurrency Control Mechanism?

2012-05-24 Thread Walter Underwood
Am I correct in thinking that a > multiversion concurrency control (MVCC) locking mechanism now exist for a > single core or is it lock-free and multi-core? > > Many thanks, > Nicholas Ball (aka incunix) -- Walter Underwood wun...@wunderwood.org

Re: Index-time field boost with DIH

2012-05-24 Thread Walter Underwood
tion > asset. Therefore, some document when matched are more important than > others. That's what index time boost does, right? > > On Thu, May 24, 2012 at 10:10 PM, Walter Underwood > wrote: > >> Why? Query-time boosting is fast and more flexible. >> >> wu

Re: Is optimize needed on slaves if it replicates from optimized master?

2012-05-26 Thread Walter Underwood
A. Never optimize on the slave. B. You probably do not need to optimize on the master. "Optimize" does not optimize anything. It is forced merge, combining segments. Solr automatically combines segments as needed. wunder On May 26, 2012, at 1:57 PM, sudarshan wrote: > Hi All, > I happen

Re: Solr boost relevancy

2012-05-26 Thread Walter Underwood
Solr automatically scales the scores of fuzzy matches by their distance from an exact match. So, you don't have to change anything. wunder On May 26, 2012, at 11:52 PM, Gau wrote: > Hi Lori, > > Yeah. I thought exactly of the same solution. Use a copy field and boost > the relevancy of the t

Re: Is optimize needed on slaves if it replicates from optimized master?

2012-05-29 Thread Walter Underwood
You do not need to use optimize at all. Solr continually merges segments ("optimizes") as needed. wunder On May 29, 2012, at 6:08 AM, sudarshan wrote: > Hi Walter, > Thank you. Do you mean that optimize need not be used at all? > If Solr merges segments (when needed as you said), is

Re: MongoDB and Solr

2012-05-29 Thread Walter Underwood
Solr does not natively store/index/search arbitrary JSON documents. It accepts JSON in a specific format for document input. wunder On May 29, 2012, at 3:21 PM, rjain15 wrote: > Hi Gora, > > I am working on a Mobile App, which is updating/accessing/searching data and > I have created a simple

Re: Tips on creating a custom QueryCache?

2012-05-30 Thread Walter Underwood
On May 30, 2012, at 11:44 AM, Aaron Daubman wrote: > The bigger question is: what are the parallel task > execution paths in Solr and under what conditions are they possible? I'd go with the general servlet rules, where everything is assumed to have concurrent access. wunde

Re: Is optimize needed on slaves if it replicates from optimized master?

2012-05-31 Thread Walter Underwood
http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor The defaults are very good. I have never changed them, and I've had Solr in production at two major sites, Netflix and Chegg. Don't spend any more time worrying about merges. wunder On May 31, 2012, at 10:51 AM, sudarshan wrote: >

Re: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Walter Underwood
This is a bad idea. Solr is not designed to be exposed to arbitrary internet traffic and attacks. The best design is to have a front end server make requests to Solr, then use those to make HTML pages. wunder On Jun 7, 2012, at 4:49 AM, Spadez wrote: > Final comment from me then Ill let someon

Re: timeAllowed flag in the response

2012-06-07 Thread Walter Underwood
Are you requesting a large number of rows? If so, request smaller chunks, like ten at a time. Then you can show those with a "waiting" note. wunder On Jun 7, 2012, at 1:14 PM, Laurent Vaills wrote: > Hi everyone, > > We have some grouping queries that are quite long to execute. Some are too >

Re: about groups of random results + alphabetical result

2010-12-20 Thread Walter Underwood
You probably do not want this ranking, because any query with a common word, like "the", will match most of the corpus in step two. Instead, use Solr to weight better quality matches more heavily, maybe 4X for exact matches, 2X for stemmed matches, and 1X for phonetic matches. wunder On Dec 20

Re: about groups of random results + alphabetical result

2010-12-20 Thread Walter Underwood
> www.dataprisma.com.br > - Original Message - > From: "Walter Underwood" > To: > Sent: Monday, December 20, 2010 2:02 PM > Subject: Re: about groups of random results + alphabetical result > > > You probably do not want this ranking, because any quer

Re: start value in queries zero or one based?

2011-01-13 Thread Walter Underwood
On Jan 13, 2011, at 1:28 PM, Dennis Gearon wrote: > Do I even need a body for this message? ;-) > > Dennis Gearon Are you asking "is it" or "should it be"? If the latter, we can also discuss Emacs and vi. wunder -- Walter Underwood K6WRU

<    3   4   5   6   7   8   9   10   11   12   >