Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Paul Libbrecht
This looks like the stored content is shortened. Can it be? Can you see that inside the docs? paul > Evert R. > 14 February 2016 at 11:26 > Hi There, > > I have a situation where started a techproducts, without any modification, > post a pdf file. When searching

Re: URI is too long

2016-01-31 Thread Paul Libbrecht
How about using POST? paul > Salman Ansari > 31 January 2016 at 15:20 > Hi, > > I am building a long query containing multiple ORs between query terms. I > started to receive the following exception: > > The remote server returned an error: (414) Request-URI Too

Re: Indexing Wikipedia

2015-12-04 Thread Paul Libbrecht
SImply... some fields are not stored so they are only searched through (being indexed) but not given back? (title and text in the tutorial you refer to). Are these the missing fields? Paul > Kate Kas > 5 décembre 2015 00:23 > Hi, > > i tried to index .xml files from

Re: Arabic analyser

2015-11-10 Thread Paul Libbrecht
Mahmoud, there is an arabic analyzer: https://wiki.apache.org/solr/LanguageAnalysis#Arabic doesn't it do what you describe? Synonyms probably work there too. Paul > Mahmoud Almokadem > 9 novembre 2015 17:47 > Thanks Jack, > > This is a good solution, but we

Re: Boosting a document score when advertised! Please help!

2015-11-05 Thread Paul Libbrecht
Alessandro, none of them seem to match what I'd expect be done: given an extra param that indicates the author, for each query, add an extra boosting. Christian, I used to do that with a query component (in java) but I think that nowadays you can do that with the bq parameter of edismax. paul

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Paul Libbrecht
I believe that very many installations of solr actually need a query expansion such as the one you describe below with an indexing of each textual fields in multiple forms (string, straight (whitespace/ideaograms), stemmed, phonetic). Thanks to edismax, I think, you would do the following

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Paul Libbrecht
Alexandre, I guess you are talking about that post: http://lucidworks.com/blog/2015/06/06/query-autofiltering-extended-language-logic-search/ I think it is very often impossible to solve properly. Words such as "direction" have very many meanings and would come in different fields. In IMDB,

Re: Instant Page Previews

2015-10-08 Thread Paul Libbrecht
This is a very nice start Charlie, I'd warn a bit however, on the value of such previews: automated previews of web-page can be quite far from what users might be remembering a page should look like. In particular all tool pages typically show quite "empty" or "initial" state in such automatic

Re: Ideas

2015-09-21 Thread Paul Libbrecht
Writing a query component would be pretty easy or? It would throw an exception if crazy numbers are requested... I can provide a simple example of a maven project for a query component. Paul William Bell wrote: > We have some Denial of service attacks on our web site. SOLR threads are > going

Re: Strange interpretation of invalid ISO date strings

2015-09-06 Thread Paul Libbrecht
Just a word of warning: iso-8601, the date format standard, is quite big, to say the least, and I thus expect very few implementations to be complete.  I survived one such interoperability issue with Safari on iOS6. While they (and JS I think) claim iso-8601, it was not complete and fine

pre-loaded function-query?

2015-08-18 Thread Paul Libbrecht
Hello Solr experts, I'm writing a query expansion QueryComponent which takes web-app parameters (e.g. profile information) and turns them into a solr query. Thus far I've used lucene TermQuery-ies with success. Now, I would like to use something a bit more elaborate. Either I write it with

Re: pre-loaded function-query?

2015-08-18 Thread Paul Libbrecht
Doug Turnbull wrote: I'm not sure if you mean organizing function queries under the hood in a query component or externally. Externally, I've always followed John Berryman's great advice for working with Solr when dealing with complex/reusable function queries and boosts

require diversity in results?

2015-04-24 Thread Paul Libbrecht
Hello list, I'm wondering if there could extra parameters or query operators that where I could impose that sorting by relevance should be relaxed so that there's a minimum diversity in some fields in the first page of results. For example, I'd like the search results to contain at least three

Re: Differentiate direction.

2014-12-18 Thread Paul Libbrecht
Kind of depends on how you're going to query. If you're going to query always with a direction then, you can probably prefix all tokens with the direction. If you're going to query always simple text bits, then using phrase search with d1 and d2 being words might also work. If you're going for

Re: comparing feature vectors using Solr/Lucene

2014-11-26 Thread Paul Libbrecht
Upayavira, on the lucene list, two tools are sometimes talked about which might be doing some of what you are searching: - semanticvectors (https://code.google.com/p/semanticvectors) - word2vec https://github.com/kojisekig/word2vec-lucene/i Maybe it helps? I'm under the impression that you are

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Paul Libbrecht
Hello Koji, how would you compare that to SemanticVectors? paul On 20 nov. 2014, at 10:10, Koji Sekiguchi k...@r.email.ne.jp wrote: Hello, It's my pleasure to share that I have an interesting tool word2vec for Lucene available at https://github.com/kojisekig/word2vec-lucene . As you

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Paul Libbrecht
('France') + vector('Italy') results in a vector that is very close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to vector('queen') Thanks, Koji (2014/11/20 20:01), Paul Libbrecht wrote: Hello Koji, how would you compare that to SemanticVectors

Re: Facets not supporting multi language?

2014-09-11 Thread Paul Libbrecht
The way this is done in drupal and probably many others is that the facet fields are keywords from a taxonomy. If you want to facet through single language, you probably want to separate the fields where you index each of the languages (so a field text-en, text-ft through which you would facet.

Re: How to implement multilingual word components fields schema?

2014-09-09 Thread Paul Libbrecht
Ilia, one aspect you surely loose with a single field approach is the differentiation of semantic fields in different languages for words that sounds the same. The words sitting and directions are easy example that have fully different semantics in French and English, at least. directions would

Re: what os env you use to develop lucene or solr?

2014-08-11 Thread Paul Libbrecht
I use MacOSX for development since more than 10 years. It's, by far, the user-friendliest Unix-based system. So copy and paste works correctly from the terminal to the IDE. Find in the terminal is nicely behaving (really!). This is kilometers away from XWindows' terminals and megameters away from

Re: Anybody uses Solr JMX?

2014-08-06 Thread Paul Libbrecht
Hello Otis, this looks like an excellent idea! I'm in need of that, erm… last week and probably this one too. Is there not a risk that reading certain JMX properties actually hogs the process? (or is it by design that MBeans are supposed to be read without any lock effect?). thanks for the

Re: Character encoding problems

2014-07-29 Thread Paul Libbrecht
If you are seeing appelé au téléphone in the browser, I would guess that the data is being rendered in UTF-8 by your server and the content type of the html is set to iso-8859-1 or not being set and your browser is defaulting to iso-8859-1. You can force the encoding to utf-8 in the

Re: multilingual search

2014-07-04 Thread Paul Libbrecht
To do just what Jack described, I often write a solr query component that does query expansion. Based on some parameters I can recognize to be a language hint (e.g. the language of the environment they search in, the browser's accept-language) I reformulate the query into a query in the fields

Re: multilingual search

2014-07-04 Thread Paul Libbrecht
1. Modify the qf parameter directly by either adding the _xx language suffix to each field in qf, or replacing the xx for any qf fields that already have an _xx suffix. 2. Have separate qf_xx parameters which are customized for specific languages and then copy the language-specific qf_xx

Re: Solr on S3FileSystem, Kosmos, GlusterFS, etc….

2014-06-24 Thread Paul Libbrecht
I've always been under the impression that file-system-access-speed is crucial for Lucene-based storage and have always advocated to not use NFS for that (for which we had slowness of a factor of 5 approximately). Has there any performance measurement made for such a setting? Is FS-caching

Re: Website running Solr

2014-05-11 Thread Paul Libbrecht
Not with certainty as solr may be working far behind another set of tools that make queries (and nothing licensing prevents it). If you get a software that has maybe Solr inside, I think the credits section should include a mention of some sort. However, there may be hints if a website uses

Re: Is it possible to cluster on search results but return only clusters?

2014-05-06 Thread Paul Libbrecht
put rows to zero? Exploit the facets as clusters ? paul Le 6 mai 2014 à 16:42, Sebastián Ramírez sebastian.rami...@senseta.com a écrit : I have this query / URL

Re: Anybody uses Solr JMX?

2014-05-05 Thread Paul Libbrecht
Thank you everybody for the links and explanations. I am still curious whether JMX exposes more details than the Admin UI? I am thinking of a troubleshooting context, rather than long-term monitoring one. JMX is multi-purpose. So, in principle, it can offer considerably more. I've seen

Re: Anybody uses Solr JMX?

2014-05-04 Thread Paul Libbrecht
Also, Zabbix and Nagios does read from JMX. Zabbix has a prototype for SOLR which is a simple way to gather an amount of data from solr and do, for example, archiving and plotting of cache values. paul Le 5 mai 2014 à 04:37, Ahmet Arslan iori...@yahoo.com a écrit : Hi, It looks like JMX

Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Paul Libbrecht
Navaa, you need query expansion for that. E.g. if your query goes through dismax, you need to add the two field names to the qf parameter. The nice thing is that qf can be: text^3.0 test.stemmed^2 text.phonetic^1 And thus exact matches are preferred to stemmed or phonetic matches. This is

Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Paul Libbrecht
Navaa, You need the query to be sent to the two fields. In dismax, this is easy. Paul On 12 février 2014 14:22:33 HNEC, Navaa navnath.thomb...@xtremumsolutions.com wrote: Hi, I am using solr for searching phoneticly equivalent string my schema contains... fieldType

Re: Remove stemming without reindexing - currently using KStem

2014-02-02 Thread Paul Libbrecht
Abhishek, stemming is applied before the tokens get into the index. Changing the stemming of the indexer cannot be done without reindexing. paul Le 2 févr. 2014 à 06:23, abhishek jain abhishek.netj...@gmail.com a écrit : Hi Friends, Is it possible to remove stemming without having to

Re: Chaining plugins

2013-12-26 Thread Paul Libbrecht
I have subclassed the query component to do so. Using params, you can get almost everything thinkable that is not too much documented. paul On 26 déc. 2013, at 15:59, elmerfudd na...@012.net.il wrote: I would like to develope a search handler that is doing some logic and then just sends the

Re: Call to Solr via TCP

2013-12-10 Thread Paul Libbrecht
Zwer, I think it may be a bit dangerous as jetty may start to do some connection management and expect the client to do so. However, if you look into http/1.0 you have a little chance that doing simple http calls is as simple as socket connections. What could be the reason not to use a decent

Re: How solr text search finding work

2013-11-28 Thread Paul Libbrecht
Viresh, there's two ways to solve this. - Using the CompoundWordsAnalyzer. I still haven't been able to find an easy to embark method into there. That would decompose, at indexing and query time, the term Kreditgeber into kredit and geber. For a higher precision, you probably want to do it at

Re: weak documents

2013-11-27 Thread Paul Libbrecht
Thomas, our experience with Curriki.org is that evaluating what I call the related documents is a procedure that needs access to the complete content and thus is run at the DB level and no thte sold-level. For example, if a user changes a part of its name, we need to reindex all of his

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Paul Libbrecht
I personally felt Tomcat to be in a more appropriate community, that of the Apache Foundation, than Jetty. Also, jetty always has been striving for simplicity and that's really not always what you intend to when you plan an app-server. E.g. features such as the manager or mod_ajp appeared

Re: More on topic of Meta-search/Federated Search with Solr

2013-09-05 Thread Paul Libbrecht
Hello list, A student of a friend of mine made his masters on that topic, especially about federated ranking. I have copied his text here: http://direct.hoplahup.net/tmp/FederatedRanking-Koblischke-2009.pdf Feel free to contact me to contact Robert Koblischke for questions.

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-27 Thread Paul Libbrecht
that, and it is not really my role to try, but something that works without Harvesting and local indexing is obviously desirable to Enterprise Search users. On Mon, Aug 26, 2013 at 4:46 PM, Paul Libbrecht p...@hoplahup.net wrote: Why not simply create a meta search engine that indexes everything of each

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Paul Libbrecht
Why not simply create a meta search engine that indexes everything of each of the nodes.? (I think one calls this harvesting) I believe that this the way to avoid all sorts of performance bottleneck. As far as I could analyze, the performance of a federated search is the performance of the

Re: Where is the webapps directory of servlet container?

2013-08-17 Thread Paul Libbrecht
What should I do? Can you help make me understand the work flow? Kamaljeet, in most servlet-containers (e.g. Tomcat or Jetty), there is such a directory called webapps. In Sun Java App Server it is inside domains/domain-name/applications/j2ee-modules/. Maybe it helps? If not, please

Re: Where is the webapps directory of servlet container?

2013-08-17 Thread Paul Libbrecht
Maybe it helps? If not, please indicate the servlet container you chose. I have installed java and solr 4.4.0. I guess I need to install Jetty or Tomat. Not able to decide among both. But tried with Jetty. Is it necessary to add new user to use Jetty?? Jetty comes bundled with Solr.

Re: Where is the webapps directory of servlet container?

2013-08-17 Thread Paul Libbrecht
Are they refering to solr-4.4.0/example/webapps directory here? https://cwiki.apache.org/confluence/display/solr/Installing+Solr But solr.war is already placed there. Is it fine? I believe it is intended to be fine indeed ;-). However any other installation with a webapps directory would be

Re: searching both english and japanese

2013-07-07 Thread Paul Libbrecht
Shalom, isn't the StandardAnalyzer supposed to take care of forking in case of ideograms? I.e. use a Japanese-friendly analyzer for japanese characters and an English-friendly analyzer otherwise. As Jack pointed out, edismax is nifty to expand a query on multiple fields. If you need to do more

Re: searching both english and japanese

2013-07-07 Thread Paul Libbrecht
Shalom, isn't the StandardAnalyzer supposed to take care of forking in case of ideograms? I.e. use a Japanese-friendly analyzer for japanese characters and an English-friendly analyzer otherwise. As Jack pointed out, edismax is nifty to expand a query on multiple fields. If you need to do more

Re: Good Desktop Search?

2013-05-03 Thread Paul Libbrecht
Savia, maybe not very mature yet, but someone on java-us...@lucene.apache.org announced such a tool the other day. I'm copying it below. I do not know of many otherwise. paul Hi everybody, just a simple question is there any solr/lucene based desktop search project around someone might

Re: Book text with chapter line number

2013-04-24 Thread Paul Libbrecht
It's easy to then store a map of term position to line-number and page-number along with each paragraph, or? Paul On 24 avr. 2013, at 16:24, Timothy Potter wrote: Chapter seems too broad and line seems too narrow -- have you thought about paragraph level? Something like: docID, book

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Paul Libbrecht
Nice, Chantal can you indicate there or here what kind of speed for integration tests you've reached with this, from a bare source to a successfully tested application? (e.g. with 100 documents) thanks in advance Paul On 14 mars 2013, at 09:29, Chantal Ackermann wrote: Hi all, this

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Paul Libbrecht
is certainly not the fastest tool to start up and get going… If you are asking because you want to run rather a lot requests and test their output - JMeter might be preferrable? Hope that was not too vague an answer, Chantal Am 14.03.2013 um 09:51 schrieb Paul Libbrecht: Nice

Re: Figure out what value was matched in multi valued field

2013-03-13 Thread Paul Libbrecht
Mephisto, Maybe LUCENE-1999 helps you. We've used it with some success. Otherwise, you're left with highlighting. paul On 13 mars 2013, at 14:11, Jack Krupansky wrote: Add debugQuery=true to your query and examine the explain section, which will show the terms/phrases that scored for each

Re: velocity in /srv/www

2013-03-13 Thread Paul Libbrecht
Guy, you'd need a proxy to go from one port (80 for the apache) to port 8983. Apache httpd will not run solr alone. Then the question of where you put the velocity page is just a matter of configuration. A symbolic link probably. paul On 13 mars 2013, at 15:39, Guy Dobson wrote: Fellow

Re: Which fields matched?

2012-12-08 Thread Paul Libbrecht
We've used lucene-1999 with some success in ActiveMath to find the language that was matched. paul Le 8 déc. 2012 à 10:09, Mikhail Khludnev a écrit : Jeff, explain() algorithm is definitely too slow to be used at search time. There is an approach which I'm aware of - watch for scorers

Re: user session id / cookie to record search query

2012-11-21 Thread Paul Libbrecht
Record? E.g. output the cookie value of a given name in the log? Provided you use Apache mod_proxy, we do this by a special log-format. paul Le 21 nov. 2012 à 09:50, Romita Saha a écrit : Hi All, Do anyone have an idea how to use user session id / cookie to record search query from that

Re: Production Release process with Solr 3.5 implementation.

2012-11-01 Thread Paul Libbrecht
Here is what we do for Curriki.org: We run a background indexing by setting up another solr that performs all the indexing. Then we can start the install process. Then we can update the index with the things changed since the background indexing. paul Le 1 nov. 2012 à 21:46, adityab a écrit :

Re: how to display MathML in search results?

2012-10-30 Thread Paul Libbrecht
Joe, if XHTML works fine... why would MathML not? Is it swallowed? I agree with Dave that I see nothing Solr specific. Maybe a namespace issue? If the search results pull from Solr, they would pull from a stored field which you can inspect by using the url /solr/select?q= (this renders XML,

Re: how to display MathML in search results?

2012-10-30 Thread Paul Libbrecht
Le 30 oct. 2012 à 20:30, Joe Corneli a écrit : select?q=... directly in Solr. What's in there? Are MathML islands gone? paul

Re: how to display MathML in search results?

2012-10-30 Thread Paul Libbrecht
:51, Joe Corneli a écrit : On Tue, Oct 30, 2012 at 8:21 PM, Paul Libbrecht p...@hoplahup.net wrote: Le 30 oct. 2012 à 20:30, Joe Corneli a écrit : select?q=... directly in Solr. What's in there? Are MathML islands gone? Yep! Like this: str name=content [...] For example, the universal

Re: Best and quickest Solr Search Front end

2012-10-22 Thread Paul Libbrecht
My experience for the easiest query is solr/itas (aka velocity solr). paul Le 22 oct. 2012 à 11:15, Muwonge Ronald a écrit : Hi all, have done some crawls for certain urls with nutch and indexed them to solr.I kindly request for assistance in getting the best search interface but have no

diversity of search results?

2012-10-19 Thread Paul Libbrecht
Hello SOLR expert, yesterday in our group we realized that a danger we may need to face is that a search result includes very similar results. Of course, one would expect skimming so that duplicates that show almost the same results in a search result would be avoided but we fear that this is

Re: Help with Velocity in SolrItas

2012-10-09 Thread Paul Libbrecht
Lance, this is the kind of fun that happens with Velocity all day long... In general, when it outputs the variable name, it's the that the variable is null; this can happen when a method is missing for example There are actually effective uses of this brain-dead-debugger-oriented-practice!

Re: Retrieval of large number of documents

2012-09-12 Thread Paul Libbrecht
Isn't XSLT the bottleneck here? I have not yet met an incremental XSLT processor, although I heard XSLT 1 claimed it could be done in principle. If you start to do this kind of processing, I think you have no other choice than write your own output method. Paul Le 12 sept. 2012 à 15:47,

Re: Semantic document format... standards?

2012-09-11 Thread Paul Libbrecht
As Michael hinted, I believe RDF would be the de-factor answer. Within it, things such as OWL or SKOS certainly represent classical formats. Processors such as OWLAPI can go pretty far there. As Péter hinted, schema.org might provide a way to complement an existing XML with semantic information.

Re: Semantic document format... standards?

2012-09-11 Thread Paul Libbrecht
Otis, if you have a bit of time to research, I think your document may look a lot like the documents processed by: http://langtech.jrc.it/ which is a flagship multilingual technology implementation and includes a fair amount of entity disambiguation as far as I could hear in Ralph's

Re: SOLR 4.0 / Jetty Security Set Up

2012-09-07 Thread Paul Libbrecht
Erick, I think that should be described differently... You need to set-up protected access for some paths. /update is one of them. And you could make this protected at the jetty level or using Apache proxies and rewrites. Probably /select should be kept open but you need to evaluate if that can

Re: log properties file location with Tomcat

2012-08-27 Thread Paul Libbrecht
Yair, you can create it easily, it will be used. Paul Le 27 août 2012 à 09:16, yair even-zohar a écrit : I'm newbie with Tomcat configurations and am looking to reduce the logging level for Solr Where should I put the logging.properties file and how to point Tomcat to use it?

Re: scanned pdf with solr cell

2012-08-15 Thread Paul Libbrecht
Ahmet, the dock icon appears when AWT starts, e.g. when a font is loaded. You can prevent it using the headless mode but this is likely to trigger an exception. Same if your user is not UI-logged-in. hope it helps. Paul Le 15 août 2012 à 01:30, Ahmet Arslan a écrit : Hi All, I have set

Re: scanned pdf with solr cell

2012-08-15 Thread Paul Libbrecht
Le 15 août 2012 à 13:03, Ahmet Arslan a écrit : Hi Paul, thanks for the explanation. So is it nothing to worry about? it is nothing to worry about except to remember that you can't run this step in a daemon-like process. (on Linux, I had to set-up a VNC-server for similar tasks) paul

Re: HTTP Basic Authentication with HttpSolrServer

2012-08-08 Thread Paul Libbrecht
Villam, this is a question for httpclient, I think you want to enable preemptive authentication so as to avoid the need to repeat the query after the unauthorized response is sent. http://hc.apache.org/httpclient-3.x/authentication.html#Preemptive_Authentication paul Le 8 août 2012

Re: Is SOLR best suited to this application - Finding co-ordinates

2012-07-31 Thread Paul Libbrecht
Solr is definitely well suited for this. Depending on your client, getting json or xml is definitely super high performance for such a data set that barely changes. Make sure you make the right params in the queries, solr caching will then provide you amazing performances. paul Le 31 juil.

Re: solrconfig.xml | registration of JSPs

2012-07-31 Thread Paul Libbrecht
Le 31 juil. 2012 à 14:07, Roland Ucker a écrit : Is it possible to map a request URL to a JSP in the solrconfig.xml? Roland, not in the solrconfig.xml but it's not too hard to make a wrapper that can do that... I have code here that does this (actually forwards the requests to anything, not

Re: Geocoding with Solr

2012-07-29 Thread Paul Libbrecht
Spadez, I've had some success into using the nicely open data of GeoNames.org but that was modestly only to fetch zip-code-to-town-name and long-lat associations, search for postal-codes. It would be lovely that some more best practice examples are distributed on how to handle, e.g., the

Re: language detection and phonetic

2012-07-26 Thread Paul Libbrecht
Le 26 juil. 2012 à 21:22, Alireza Salimi a écrit : The question is: is there any cleaner way to do that? I've always done phonetic match using a separate phonetic field (title-ph for example) and copy-field. There's one considerable advantage to that: using such as dismax, you can say prefer

Re: Solr Monitoring Tool

2012-07-20 Thread Paul Libbrecht
Suneel, there's many monitoring tools out there. Zabbix is one of them, it is in PHP. I think SolrGaze as well (not sure). I've been using HypericHQ, which is pure java, and I have been satisfied with it though it leaves some space for enhancement. Other names include Nagios, also in PHP, and

Re: UTF-8

2012-07-17 Thread Paul Libbrecht
My experience is that this property has made a whole lot of a difference. At least till solr 3.1. The servlet container has not been the only bit. paul Le 18 juil. 2012 à 05:12, William Bell a écrit : -Dfile.encoding=UTF-8... Is this usually recommended for SOLR indexes? Or is the

Re: How to Request several docs ?

2012-07-06 Thread Paul Libbrecht
Le 6 juil. 2012 à 15:43, Bruno Mannina a écrit : I have a list of PN that I want to get and I don't want to do one request by PN and I think it's not clean to do PN1 or PN2 or PN3 or . I've always done this so. paul

Re: Use of Solr as primary store for search engine

2012-07-04 Thread Paul Libbrecht
Amit, not exactly a response to your question but doing this with a lucene index on i2geo.net has resulted in considerably performance boost (reading from stored-fields instead of reading from the xwiki objects which pull from the SQL database). However, it implied that we had to rewrite

Re: Use of Solr as primary store for search engine

2012-07-04 Thread Paul Libbrecht
Le 4 juil. 2012 à 21:17, Amit Nithian a écrit : Thanks for your response! Were you using the SQL database as an object store to pull XWiki objects or did you have to execute several queries to reconstruct these objects? The first. It's all fairly transparent. There are XWiki Classes and XWiki

Re: what's better for in memory searching?

2012-06-11 Thread Paul Libbrecht
Li Li, have you considered allocating a RAM-Disk? It's not the most flexible thing... but it's certainly close, in performance to a RAMDirectory. MMapping on that is likely to be useless but I doubt you can set it to zero. That'd need experiment. Also, doesn't caching and auto-warming provide

Re: what's better for in memory searching?

2012-06-11 Thread Paul Libbrecht
Le 11 juin 2012 à 11:16, Li Li a écrit : do you mean software RAM disk? Right. OS level. using RAM to simulate disk? Yes. That generally makes a disk which is boost fast in reading and writing. How to deal with Persistence? Synchronization (slaving?). paul

Re: return *all* words at levenstein distance = N from query word

2012-06-07 Thread Paul Libbrecht
I would debug somewhere close to the FuzzyQuery. Lucene is doing exactly that (just as PrefixQueries are doing): expand a FuzzyQuery (PrefixQuery) to a disjunction of term-queries for the words that match that fuzzy or prefix queries. Maybe it helps you start? paul Le 7 juin 2012 à 18:15,

HypericHQ plugins?

2012-06-05 Thread Paul Libbrecht
Hello SOLR users, is there someone who wrote plugins for HypericHQ to monitor the very many metrics SOLR exposes through JMX? I am a kind of newbie to JMX and the tutorials of Hyperic aren't simple enough to my taste... so I'd be helped if someone did it already. thanks in advance Paul

Re: How many doc/doc in the XML source file before indexing?

2012-05-24 Thread Paul Libbrecht
Bruno, see the solrconfig.xml, you have all sorts of tweaks for this kind of things. paul Le 24 mai 2012 à 09:49, Bruno Mannina a écrit : Hi All, Just a little question concerning the max number of add doc/doc /add that I can write in the xml source file before indexing? only one,

searching when in a solr-component?

2012-05-11 Thread Paul Libbrecht
Hello SOLR experts, can I see the same index while responding another query? If yes how? thanks in advance Paul

Re: Solritas in production

2012-05-09 Thread Paul Libbrecht
Le 7 mai 2012 à 13:30, Marcelo Carvalho Fernandes a écrit : Anything else? If fearing DoS attacks by too large queries (e.g. if having millions of documents), consider writing a query-component that can limit the queries. I believe that there's nothing else. paul

anticipating the indexing completion

2012-05-09 Thread Paul Libbrecht
Hello SOLR experts, I have my own indexing web-application which talks in XML to SOLR. It works wonderfully well. The queue is displayed in the indexer, so that experts can have a track that it went well into the index. However, i see no way currently to display that solr's searcher includes

Re: anticipating the indexing completion

2012-05-09 Thread Paul Libbrecht
time from previous warming and estimate. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Paul Libbrecht p...@hoplahup.net To: solr-user@lucene.apache.org Sent: Wednesday, May 9, 2012 6:21 AM Subject

Re: Solritas in production

2012-05-07 Thread Paul Libbrecht
I do not share this reasoning at all. Of course a new UI would need to be developed, solr/itas is just an example. But that precisely is the interest of solr/itas, a view system that is easy to tune. I do not feel, at all, that it means that it is not production ready. There are a zillion ways

Re: Removing old documents

2012-05-02 Thread Paul Libbrecht
With which client? paul Le 2 mai 2012 à 01:29, alx...@aim.com a écrit : all caching is disabled and I restarted jetty. The same results.

Re: Removing old documents

2012-05-01 Thread Paul Libbrecht
I've been surprised to see Firefox cache even after empty-cache was ordered for JSOn results... this is quite annoying but I have get accustomed to it by doing the following when I need to debug: add a random parameter extra. But only when debugging! Using wget or curl showed me that the

Re: Solr for routing a webapp

2012-04-26 Thread Paul Libbrecht
Have you tried using mod_rewrite for this? paul Le 26 avr. 2012 à 15:16, Björn Zapadlo a écrit : Hello, I'm thinking about using a Solr index for routing a webapp. I have pregenerated base urls in my index. E.g. /foo/bar1 /foo/bar2 /foo/bar3 /foo/bar4 /bar/foo1 /bar/foo2

Re: Solr for routing a webapp

2012-04-26 Thread Paul Libbrecht
Or write your own query component mapping /solr/* in the web.xml, exposing the request by a thread-local through a filter, and reading this setting the appropriate query parameters... Performance-wise, this seems quite reasonable I think. paul Le 26 avr. 2012 à 16:58, Paul Libbrecht a écrit

Re: Deciding whether to stem at query time

2012-04-24 Thread Paul Libbrecht
Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit : This would not necessarily increase the size of your index that much - you don't to store both fields, just 1 of them if you really need it for highlighting or displaying. If not, just index. I second this. The query expansion process is

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-14 Thread Paul Libbrecht
Benson, In mid 2009, I has such a question answered with a nifty score bitwise manipulation, and a little precision loss. For each result I could pick the language of a multilingual match. If interested, I can dig. Paul -- Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-14 Thread Paul Libbrecht
up, it worked for me in Lucene, 2.4.1. We used this to create an auto-completion popup which selected the right language by flagging the right sub-query that was most matched. paul Le 14 avr. 2012 à 15:34, Benson Margulies a écrit : yes please On Apr 14, 2012, at 2:40 AM, Paul Libbrecht p

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-14 Thread Paul Libbrecht
Margulies a écrit : On Sat, Apr 14, 2012 at 12:37 PM, Paul Libbrecht p...@hoplahup.net wrote: Benson, it was in the Lucene world in May 2010: http://mail-archives.apache.org/mod_mbox/lucene-java-user/201005.mbox/%3c469705.48901...@web29016.mail.ird.yahoo.com%3E Mark Harwood pointed me

Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Michael, I'm on this list and the lucene list since several years and have not found this yet. It's been one neglected topics to my taste. There is a CompoundAnalyzer but it requires the compounds to be dictionary based, as you indicate. I am convinced there's a way to build the

Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Bernd, can you please say a little more? I think this list is ok to contain some description for commercial solutions that satisfy a request formulated on list. Is there any product at BASIS Tech that provides a compound-analyzer with a big dictionary of decomposed compounds in German? If yes,

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : Some compounds probably should not be decompounded, like Fahrrad (farhren/Rad). With a dictionary-based stemmer, you might decide to avoid decompounding for words in the dictionary. Good point. More or less, Fahrrad is generally abbreviated

Re: Quantiles in SOLR ???

2012-04-03 Thread Paul Libbrecht
Kashif, my knowledge in probability is limited but I believe the simple similarity function can be seen as a quantile. You can read about it in many places, I believe I read it in the Lucene in Action book. paul Le 3 avr. 2012 à 15:14, Kashif Khan a écrit : Thanks for sharing your

Re: Content privacy, search index

2012-04-01 Thread Paul Libbrecht
Hello Benjamin, Le 1 avr. 2012 à 11:48, dbenjamin a écrit : You lost me :-) You mean implementing a specific RequestHandler just for my needs ? I think a QueryComponent is enough, it'd extend QueryComponent. It's prepare method reads all the params and calls the ResponseBuilder's setQuery

  1   2   3   >