AW: Lexical analysis tools for German language data

2012-04-13 Thread Michael Ludwig
> Von: Tomas Zerolo > > > There can be transformations or inflections, like the "s" in > > > "Weinachtsbaum" (Weinachten/Baum). > > > > I remember from my linguistics studies that the terminus technicus > > for these is "Fugenmorphem" (interstitial or joint morpheme) [...] > > IANAL (I am not a l

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
> Von: Walter Underwood > German noun decompounding is a little more complicated than it might > seem. > > There can be transformations or inflections, like the "s" in > "Weinachtsbaum" (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is "Fugenmorph

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
> Von: Markus Jelsma > We've done a lot of tests with the HyphenationCompoundWordTokenFilter > using a from TeX generated FOP XML file for the Dutch language and > have seen decent results. A bonus was that now some tokens can be > stemmed properly because not all compounds are listed in the > dic

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
would slow down the update process but you don't need to split > words during search. > > Le 12 avr. 2012 à 11:52, Michael Ludwig a écrit : > > > >> Given an input of "Windjacke" (probably "wind jacket" in English), > >> I'd like the co

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
> Given an input of "Windjacke" (probably "wind jacket" in English), > I'd like the code that prepares the data for the index (tokenizer > etc) to understand that this is a "Jacke" ("jacket") so that a > query for "Jacke" would include the "Windjacke" document in its > result set. > > It appears t

Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Given an input of "Windjacke" (probably "wind jacket" in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a "Jacke" ("jacket") so that a query for "Jacke" would include the "Windjacke" document in its result set. It appears to me that such

Re: Popular keywords statistics .

2009-07-06 Thread Michael Ludwig
Wallace schrieb: I'd like to hear what approaches are being used by users to know what people is searching for in their apps. You could process the access log. You could write a filter servlet logging the relevant part of the query string to a dedicated location. Michael Ludwig

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread Michael Ludwig
the inclusion of a stopword list result in stopwords being of top importance in the MoreLikeThis query? Michael Ludwig

Re: Monitor search traffic

2009-07-01 Thread Michael Ludwig
Gurjot Singh schrieb: Hi, Is there a way to monitor the number of search queries made on the solr index. http://localhost:8983/solr/admin/stats.jsp Look for "requests :". Michael Ludwig

Re: Installing a patch in a solr nightly on Windows

2009-07-01 Thread Michael Ludwig
Koji Sekiguchi schrieb: I'm not a Windows user, but I think you can use Linux command (e.g. patch, to apply SOLR-284 patch to Solr nightly build) on cygwin environment. The standalone patch utility for Win32 is another option. http://gnuwin32.sourceforge.net/packages/patch.htm Michael Ludwig

Re: Search for phrase including prepositions

2009-07-01 Thread Michael Ludwig
e ok but more than 3 words resulted zero. Why is happens? Hi Akinori, I guess you're using the DisMax query parser. Please read this entire page: http://wiki.apache.org/solr/DisMaxRequestHandler The parameter that allows you to tweak this is the "mm" parameter. Michael Ludwig

Re: Search for phrase including prepositions

2009-06-30 Thread Michael Ludwig
ou'll probably find that the word "for" is removed as a so-called stopword. Michael Ludwig

Re: SOLR SpeelChecker and german Umlauts

2009-06-30 Thread Michael Ludwig
rFieldType - Michael Ludwig http://markmail.org/thread/dgi4llhc7x5wuroc (BTW, the patch in SOLR-1204 is ready but still awaiting clarification. See comments from June 11 and 18.) My Config is : spellcheck = 'true'; spellcheck.dictionary = 'jarowinkler' spellcheck.onlyMorePop

Re: spelling suggestion in solr.

2009-06-30 Thread Michael Ludwig
Radha C. schrieb: The feature "spelling suggestion" is available in solr? If yes, can you tell me some documentations? Have you tried googling for: solr spelling ? First hit: http://wiki.apache.org/solr/SpellCheckComponent Michael Ludwig

Re: nested dismax queries

2009-06-29 Thread Michael Ludwig
nd to think that drop-down boxes (the values of which you control) are a nice match for the filter query, whereas user-entered text is more likely to be a candidate for the main query. Michael Ludwig

Re: nested dismax queries

2009-06-29 Thread Michael Ludwig
fq=y:blub" instead of "fq=x:bla AND y:blub". See: filterCache/@size, queryResultCache/@size, documentCache/@size http://markmail.org/thread/tb6aanicpt43okcm Michael Ludwig

Re: Searching across multivalued fields

2009-06-19 Thread Michael Ludwig
MilkDud schrieb: Michael Ludwig-4 wrote: What do you expect the user to enter? * "dream theater innocence faded" - certainly wrong * dream theater "innocence faded" - much better Most likely they would just enter dream theater innocence faded, no quotes. Without any quot

Re: Searching across multivalued fields

2009-06-18 Thread Michael Ludwig
st exact match which is nothing but unique key = 1001? Yes, it is: q=id:1001 (1) Don't use DisMax here, that will not interpret field names. (2) Replace "id" by whatever name you gave to your unique key field. Michael Ludwig

Re: Distributed querying using solr multicore.

2009-06-18 Thread Michael Ludwig
Rakhi Khatwani schrieb: On Thu, Jun 18, 2009 at 3:51 PM, Michael Ludwig wrote: I don't know how we're supposed to use it. I did the following: http://flunder:8983/solr/xpg/select?q=bla&shards=flunder:8983/solr/xpg,flunder:8983/solr/kk i am gettin a page load error... "

Re: Distributed querying using solr multicore.

2009-06-18 Thread Michael Ludwig
r:8983/solr/kk For SolrJ, see this thread: Using SolrJ with multicore/shards - ahammad http://markmail.org/thread/qnytfrk4dytmgjis if so, isnt there a better way to do that? No idea. Michael Ludwig

Re: FilterCache issue

2009-06-18 Thread Michael Ludwig
cumulative_evictions : 61153787 As we can see the cache hit ratio is almost zero. How do I improve the filter cache. Maybe these pages add some ideas to the mix: http://wiki.apache.org/solr/FilterQueryGuidance https://issues.apache.org/jira/browse/SOLR-475 Michael Ludwig

Re: Few Queries regarding indexes in Solr

2009-06-18 Thread Michael Ludwig
stops! Imagine it did one day! Michael Ludwig

Re: Searching across multivalued fields

2009-06-18 Thread Michael Ludwig
ifier title - album title interpret - the musician, possibly multi-valued track - every song or whatever, definitely multi-valued Read up about multi-valued fields (sample schema.xml, for example, or Google) if you're unsure what this is; your posting subject, however, suggests you aren't. Regards, Michael Ludwig

Re: Searching across multivalued fields

2009-06-17 Thread Michael Ludwig
- every song or whatever, definitely multi-valued Michael Ludwig

Re: Searching across multivalued fields

2009-06-17 Thread Michael Ludwig
e" titles :-) Now with a phrase query with a small ps and a large posIncGap that could word. But then I lose the ability to search for artist and track name together. Another thing, are you sure you have enabled "pf" for "track"? Michael Ludwig

Re: fq vs. q

2009-06-17 Thread Michael Ludwig
ather than the terms within a single field. I added the comment in that I think that a wiki page discussing fs vs q should also mention facet.query. It now does: http://wiki.apache.org/solr/FilterQueryGuidance Michael Ludwig

Re: Solr Query | Field:value with dismaxquery

2009-06-17 Thread Michael Ludwig
quot;&qt=dismaxrequest - return correct results I'd attribute that to the "mm" (minimum match) parameter, the meaning of which you can understand reading the following page, which it would probably make a lot of sense to read anyway: http://wiki.apache.org/solr/DisMaxRequestHandler Michael Ludwig

Re: Could solr build two different indexes?

2009-06-17 Thread Michael Ludwig
://wiki.apache.org/solr/CoreAdmin Michael Ludwig

Re: what date format to pass for search in Solr?

2009-06-17 Thread Michael Ludwig
something like "solr date range query". For example, see: http://www.nabble.com/Date-Range-Query-%2B-Fields-to16108517.html Michael Ludwig

Re: Few Queries regarding indexes in Solr

2009-06-17 Thread Michael Ludwig
x27;s what most people do, though nothing prevents the indexing client from sending the same doc to multiple shards. In some scenarios that's exactly what you want to do. What kind of scenario would that be? Michael Ludwig -- A: Because it messes up the order in which people normally read te

Re: Searching across multivalued fields

2009-06-17 Thread Michael Ludwig
Use the DisMaxRequestHandler and specify all fields you want to use in your query in the qf parameter. artist^3 album^2 track^1 http://wiki.apache.org/solr/DisMaxRequestHandler Michael Ludwig

Re: Few Queries regarding indexes in Solr

2009-06-16 Thread Michael Ludwig
2 and update the indexes. is it possible to send the differences only into shard 3 and then merge it at shard 3? My (very limited) understanding of shards is that you repartition your documents among shards and send each document to only one shard. (Not sure this is correct.) Michael Ludwig

Re: Joins or subselects in solr

2009-06-15 Thread Michael Ludwig
, but some less regular graph, then the notion of a "main item" needs clarification. Michael Ludwig

Re: fq vs. q

2009-06-15 Thread Michael Ludwig
Shalin Shekhar Mangar schrieb: On Mon, Jun 15, 2009 at 4:39 PM, Michael Ludwig wrote: I think if you truncate dates to incomplete dates, you effectively also lose all the date logic. You may still apply it, but what would you take the result to mean? You can't regain precision you'

Re: fq vs. q

2009-06-15 Thread Michael Ludwig
o range faceting for a given field and obtain, say, results reduced from their actual continuum of values to three ranges {A,B,C}, you'd have to define three "facet.query" parameters accordingly. A mere "facet.field", on the other hand, creates as many filters as there are unique values in the field. Is that correct? Michael Ludwig

Re: fq vs. q

2009-06-15 Thread Michael Ludwig
. Bottom line, I think it may make perfect sense to store dates and times in integers, depending on your use case and your client. Michael Ludwig

Re: fq vs. q

2009-06-12 Thread Michael Ludwig
Michael Ludwig schrieb: Martin Davidsson schrieb: I've tried to read up on how to decide, when writing a query, what criteria goes in the q parameter and what goes in the fq parameter, to achieve optimal performance. Is there [...] some kind of rule of thumb to help me decide how to

Re: Faceting on text fields

2009-06-11 Thread Michael Ludwig
ch is what gets used to analyze the data in order to determine clusters, if I understand correctly. Michael Ludwig

Re: Build Failed

2009-06-11 Thread Michael Ludwig
. BUILD SUCCESSFUL You might want to read up on Ant usage in the Ant User Manual, a copy of which should be part of your installation, or can be found on the web. Quick overview: ant -help When I wrote "ant -verbose", I meant "ant -verbose ", so: ant -verbose example Michael Ludwig

Re: dismax parsing applied to specific fields

2009-06-11 Thread Michael Ludwig
t;the DisMaxRequestHandler is simply the standard request handler with the default query parser set to the DisMax Query Parser". So maybe you could program your own CustomDisMaxRequestHandler that reuses the DisMax query parser (and probably other components) to achieve what you want. Michael Ludwig

Re: Build Failed

2009-06-11 Thread Michael Ludwig
rough the files in question, but I can't seem to find the issue. Any suggestions? Run: ant -verbose Michael Ludwig

Re: Customizing results

2009-06-11 Thread Michael Ludwig
might be overkill for your particular situation. Michael Ludwig

Re: copyfield and 'store' and highlighting

2009-06-10 Thread Michael Ludwig
ashokc schrieb: Do I have to declare 'field1' also to be stored? 'field1' is never returned in the response. I find the following Wiki page helpful when dealing with @stored, @indexed and friends: http://wiki.apache.org/solr/FieldOptionsByUseCase Michael Ludwig

Re: Solr relevancy score - conversion

2009-06-10 Thread Michael Ludwig
ing to any other language. Michael Ludwig

Re: How to disable posting updates from a remote server

2009-06-10 Thread Michael Ludwig
address. Michael Ludwig

Re: Customizing results

2009-06-10 Thread Michael Ludwig
such as GNU Gettext for this purpose. May or may not make sense in your particular situation. Michael Ludwig

Re: Faceting on text fields

2009-06-10 Thread Michael Ludwig
to be determined. Is that a correct assessment? Michael Ludwig

Re: Faceting on text fields

2009-06-10 Thread Michael Ludwig
Yonik Seeley schrieb: Yep, all that sounds right. An additional optimization counts terms for the documents *not* in the set when the base set is over half the size of the index. Cool :-) Thanks for confirming my assumptions! Michael Ludwig

Re: fq vs. q

2009-06-10 Thread Michael Ludwig
Fergus McMenemie schrieb: On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig wrote: A filter query is cached, which means that it is the more useful the more often it is repeated. We know how often certain queries arise, or at least have the means to collect that data - so we know what might be

Re: fq vs. q

2009-06-09 Thread Michael Ludwig
Shalin Shekhar Mangar schrieb: No, both filters and queries are computed on the entire index. My comment was related to the "A filter query should probably be orthogonal to the primary query..." part. I meant that both kinds of use-cases are common. Got it. Thanks :-) Michael Ludwig

Re: filterCache/@size, queryResultCache/@size, documentCache/@size

2009-06-09 Thread Michael Ludwig
Shalin Shekhar Mangar schrieb: On Tue, Jun 9, 2009 at 7:47 PM, Michael Ludwig wrote: Given the following three filtering scenarios of (a) x:bla, (b) y:blub, and (c) x:bla AND y:blub, will I end up with two or three distinct filters? In other words, may filters be composites or are they

Re: fq vs. q

2009-06-09 Thread Michael Ludwig
Shalin Shekhar Mangar schrieb: On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig wrote: A filter query should probably be orthogonal to the primary query, which means in plain English: unrelated to the primary query. To give an example, I have a field "category", which is a required

Re: statistics about word distances in solr

2009-06-09 Thread Michael Ludwig
ions is likely to scale as the product of the number of your primary search results, the number of your search terms, and the number of your facets. I assume this is an expensive operation. Michael Ludwig

Re: Faceting on text fields

2009-06-09 Thread Michael Ludwig
), and (b) collecting all the pesky little terms from the new structure mapping documents to term numbers? So basically, depending on expediency, you (a) know the facets and count the documents which display them, or you (b) take the documents and see what facets they have? Michael Ludwig

Re: Faceting on text fields

2009-06-09 Thread Michael Ludwig
side process based on top N (say 100) hits for this but it is my last option. Also a very interesting data mining question! I'm sorry I don't have any answers for you. Maybe someone else does. Best, Michael Ludwig

Re: Field Compression

2009-06-09 Thread Michael Ludwig
f you don't save, say, five or ten percent (YMMV), it might not be worth the effort. Michael Ludwig

Re: filter on millions of IDs from external query

2009-06-09 Thread Michael Ludwig
is so terribly expensive. Michael Ludwig

filterCache/@size, queryResultCache/@size, documentCache/@size

2009-06-09 Thread Michael Ludwig
@size) is concerned? Michael Ludwig

Re: fq vs. q

2009-06-09 Thread Michael Ludwig
category values. I then allow the application to apply filtering by category, incidentally, using faceting, which is a typical usage pattern, I guess. Michael Ludwig

Re: spell checking

2009-06-05 Thread Michael Ludwig
ess invasive. I added two sentences to the "Introduction" of: http://wiki.apache.org/solr/SpellCheckComponent Michael Ludwig

Re: SpellCheckComponent: queryAnalyzerFieldType

2009-06-05 Thread Michael Ludwig
and if possible, give a patch? Please see: https://issues.apache.org/jira/browse/SOLR-1204 Regards, Michael Ludwig

SpellCheckComponent: queryAnalyzerFieldType

2009-06-04 Thread Michael Ludwig
laid out in the thread referred to above, it seems you want to use the spellcheck.q parameter for anything but what can be encoded in ASCII. Is that true? Michael Ludwig

Re: spell checking

2009-06-04 Thread Michael Ludwig
ite different from a spellchecker. IMHO, a name conveying the actual meaning, along the lines of "suggest", would make more sense. Michael Ludwig

Re: French and SpellingQueryConverter

2009-05-19 Thread Michael Ludwig
L}\d_]+:-) Michael Ludwig

Re: French and SpellingQueryConverter

2009-05-19 Thread Michael Ludwig
Shalin Shekhar Mangar schrieb: On Mon, May 11, 2009 at 2:46 PM, Michael Ludwig wrote: Could you give an example of how the spellcheck.q parameter can be brought into play to (take non-ASCII characters into account, so that "Käse" isn't mishandled) given the following example:

Re: Replication master+slave

2009-05-15 Thread Michael Ludwig
do entities. C:\MILU\dev\XML # type egpe-net.xml http://lobster.as-guides.com/ds/solr.schema.ent"; > ]> &egpe_from_the_net; &egpe_from_the_local_disk; C:\MILU\dev\XML # type egpe-local.ent Michael Ludwig

Re: Selective Searches Based on User Identity

2009-05-13 Thread Michael Ludwig
I'll plead ignorance of the 'ineluctable filter query' and will have to read up on that one. I meant a filter query that the application tags onto the query on behalf of the user and without the user being able to do anything about it so he cannot circumvent the filter. Best regards, Michael Ludwig

Re: Selective Searches Based on User Identity

2009-05-13 Thread Michael Ludwig
nsidering that I'm a Solr/Lucene newbie, this approach might have a disadvantage that escapes me, which is why other people haven't made this particular suggestion. If so, I'd be happy to learn why this isn't preferable. Michael Ludwig

Re: French and SpellingQueryConverter

2009-05-11 Thread Michael Ludwig
äse")); } } Note the result of the above, which is plain wrong, reads: [(k,0,1,type=), (se,2,4,type=)] Thanks. Michael Ludwig

Re: What are the Unicode encodings supported by Solr?

2009-05-08 Thread Michael Ludwig
with some encoding not getting supported by Solr. Did you make sure to not rely on your platform default encoding (Charset) when constructing the InputStreamReader? If in doubt, take a look at the InputStreamReader constructors. Michael Ludwig

Organizing multiple searchers around overlapping subsets of data

2009-05-08 Thread Michael Ludwig
asing overlaps and hence redundancy? Michael Ludwig

Re: unable to run the solr in tomcat 5.0

2009-05-06 Thread Michael Ludwig
u can dream up. Seriously, read the docs, it'll help you :-) Michael Ludwig

Re: unable to run the solr in tomcat 5.0

2009-05-06 Thread Michael Ludwig
uday kumar maddigatla schrieb: My intention is to use 8080 as port. Is there any other way taht Solr will post the files in 8080 port Solr doesn't post, it listens. Use the curl utility as indicated in the documentation. http://wiki.apache.org/solr/UpdateXmlMessages Michael Ludwig

Re: unable to run the solr in tomcat 5.0

2009-05-06 Thread Michael Ludwig
n the address bar of your browser. Or even do a string replacement s/8983/8080/g on the Solr doc you're viewing. Michael Ludwig

Re: How to index the documents in Apache Solr

2009-05-06 Thread Michael Ludwig
hieve. I think you should start there. http://lucene.apache.org/solr/tutorial.html#Indexing+Data Michael Ludwig

Re: unable to run the solr in tomcat 5.0

2009-05-06 Thread Michael Ludwig
structions in the tutorial and run Solr in Jetty as per the distribution, which works out of the box: http://lucene.apache.org/solr/tutorial.html Michael Ludwig

Re: schema.xml: default values for @indexed and @stored

2009-05-06 Thread Michael Ludwig
Otis Gospodnetic schrieb: Attribute values for fields should be inherited from attribute values of their field types. Thanks, that answers my question pertaining to @indexed and @stored in the "fieldtype" and "field" elements in "schema.xml". Michael Ludwig

Re: Multi-index Design

2009-05-06 Thread Michael Ludwig
Matt Weber schrieb: http://wiki.apache.org/solr/MultipleIndexes Thanks, Mark. Your explanation and the pointer to the Wiki have clarified things for me. Michael Ludwig

Re: Multi-index Design

2009-05-05 Thread Michael Ludwig
to that type of data that I could limit my search to, as per Otis' post? (4) And is that what's called a "core" here? (5) Or, failing (3), and lumping everything together in one search domain (core?), would I use that "type field" to limit my search to a particular type of data? Michael Ludwig

Re: Externalize database parameters from data-config.xml

2009-05-05 Thread Michael Ludwig
&cred; m...@lobster:~/funkuhr > cat zwei.xml ]> &cred; m...@lobster:~/funkuhr > cat cred.ent ich geheim m...@lobster:~/funkuhr > xmllint --noent eins.xml ]> ich geheim m...@lobster:~/funkuhr > xmllint --noent zwei.xml ]> ich geheim But that doesn'

schema.xml: default values for @indexed and @stored

2009-05-04 Thread Michael Ludwig
g to field/@type? Or do these default to "true" regardless of what's specified in the respective ? Michael Ludwig

Re: Problem adding unicoded docs to Solr through SolrJ

2009-04-30 Thread Michael Ludwig
ke a look at the class java.nio.charset.Charset and the methods encode, decode, newEncoder, newDecoder. Michael Ludwig

Highlighting using XML instead of strings?

2009-04-29 Thread Michael Ludwig
lt of favouring XML over strings, I rather want something like this: Eumel NDR Ländermagazine There could be a parameter "hl.xml" which I could use to request modified XML like this: hl.xlm=em hl.xlm=b This would allow smoother processing technologies like XSLT. Is such a feature available? Michael Ludwig

Re: Problem adding unicoded docs to Solr through SolrJ

2009-04-29 Thread Michael Ludwig
System.out.println(Charset.defaultCharset().displayName()); System.out.println(new String(bytes)); System.out.println(new String(bytes, Charset.forName("UTF-8"))); } } Output: windows-1252 Käse (bad) Käse (good) Michael Ludwig

Re: Performance and number of search results

2009-04-29 Thread Michael Ludwig
some profiling for your specific scenario. The rule of thumb here is probably: Get what you need. Michael Ludwig

Re: UTF8 compatibility

2009-04-29 Thread Michael Ludwig
plus 1 𐀀 Maybe the test script output says that such characters cannot be used for querying. Hardly relevant if you consider that the BMP comprises even languages such as Telugu, Bopomofo and French. Best, Michael Ludwig