terms component misleading results
Hello, I need to know exact count of certain terms in the documents. I noticed that when I update a document, (only one field for testing) the terms count go +1 for that specific term. for example, if I have two documents in index, each with tag=ccc and if I update one of the documents, the terms frequency for ccc becomes 3. when I optimize the index, it goes down again to correct number. (2) Is there any way to get the exact term frequency? Regular querying works well, but i quite did not understand why the terms count is misleading. Best Regards, C.B.
upgrade to 3.6
Hello, I have upgraded from 1.4 to 3.6 - it went quite smooth, using the same schema.xml I have done some testing, and I have not found any problems yet. Soon I will migrate the production system to 3.6 Any recomendations on this matter? Maybe I skipped something? Best Regards, C.B.
Re: upgrade to 3.6
Hello, I have tested, but was not able to replicate the problem. (basically i indexed few documents with utf8 chars, and then searched for them, and found ok) On the issues at 27/Apr/12 08:56 the fix is now committed to 3.6 branch I just recently downloaded the 3.6 - well actually it seems I downloaded it at 2012-04-27 19:27 GMT+2 (from file stamp) Does that mean that I was lucky? Best, On Fri, May 25, 2012 at 10:17 AM, Sami Siren ssi...@gmail.com wrote: Hi, If you're using non ascii data with solrj you might want to test that it works for you properly. See for example https://issues.apache.org/jira/browse/SOLR-3375 -- Sami Siren On Fri, May 25, 2012 at 10:11 AM, Cam Bazz camb...@gmail.com wrote: Hello, I have upgraded from 1.4 to 3.6 - it went quite smooth, using the same schema.xml I have done some testing, and I have not found any problems yet. Soon I will migrate the production system to 3.6 Any recomendations on this matter? Maybe I skipped something? Best Regards, C.B.
Re: terms component misleading results
Oh ok, I got it. So If I update the document three times, does that mean I have 1 normal document, and 2 marked for deletion? Because the max difference was 1 - no matter how many times you update. I think I can manage the faceting to do what I need. I guess that will be faster than making a real query, and extracting the full docs. Best Regards, -C.B. On Fri, May 25, 2012 at 10:14 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : the terms count go +1 for that specific term. for example, if I have : two documents in index, each with tag=ccc and if I update one of the : documents, the terms frequency for ccc becomes 3. when I optimize the : index, it goes down again to correct number. (2) http://wiki.apache.org/solr/TermsComponent Retrieving terms in index order is very fast since the implementation directly uses Lucene's TermEnum to iterate over the term dictionary. ... The doc frequencies returned are the number of documents that match the term, including any documents that have been marked for deletion but not yet removed from the index. : Is there any way to get the exact term frequency? field faceting. -Hoss
representing latlontype in pojo
Hello, I have custom pojo's, and I use solrj to read and index them with getBeans() method. So now, I want to store a spatially searchable data member in my pojo. I have in my schema.xml: fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ and field name=location type=location indexed=true stored=true/ - so, what object type must I have in my bean? LatLonType does not seem to have a constructor, or getX, getY methods, and I think it is internal to solr. How can I store a 2d point and index it to a field type that is latlontype, if I am using solrj? Best Regards, C.B.
synonyms file, and example cases
Hello, I have been looking at the solr synonym file that was an example, I did not understand some notation: aaa = bbb = 1 2 ccc = 1,2 a\=a = b\=b a\,a = b\,b fooaaa,baraaa,bazaaa The first one says search for when query is aaa. am I correct? the second one finds 1 2 when query is bbb the third one is find 1 or 2 when query is ccc the fourth, and fifth one I have not understood. the last one, i assume is a group, bidirectional mapping between fooaaa,baraaa,bazaaa I am especially interested with this last one, if I do aaa,bbb it will find aaa and bbb when either aaa or bbb is queryied? am I correct in those assumptions? Best regards, C.B.
making rotating timestamped logs from solr output
Hello, I would like to log the solr console. although solr logs requests in timestamped format, this only logs the requests, i.e. does not log number of hits for a given query, etc. is there any easy way to do this other then reverting to methods for capturing solr output. I usually run solr on my server using screen command first, running solr, then detaching from console. but it would be nice to have output logging instead of request logging. best regards, c.b.
Re: faceting question
is there no other way then to use the patch? since the query A is super set of B ??? if not doable, I will probably use some caching technique. Best. On Sat, Jan 24, 2009 at 9:14 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Sat, Jan 24, 2009 at 6:56 AM, Cam Bazz camb...@gmail.com wrote: Hello; I got a multiField named tagList which may contain multiple tags. I am making a query like: tagList:a AND tagList:b AND tagList:c and I am also getting a tagList facet returning me some values. What I would like is Solr to return me facets as if the query was: tagList:a AND tagList:b is it even possible? If I understand correctly, 1. You want to query for tagList:a AND tagList:b AND tagList:c 2. At the same time, you want to request facets for tagList but only for tagList:a and tagList:b If that is correct, you can use the features introduced by https://issues.apache.org/jira/browse/SOLR-911 However you may need to put #1 as fq instead of q. -- Regards, Shalin Shekhar Mangar.
faceting question
Hello; I got a multiField named tagList which may contain multiple tags. I am making a query like: tagList:a AND tagList:b AND tagList:c and I am also getting a tagList facet returning me some values. What I would like is Solr to return me facets as if the query was: tagList:a AND tagList:b is it even possible? Best, -C.B.
Re: feeding data
Hello Erik, I am specially interested on how to integrate it to a glassfish/ejb3 environment. In the past, I have done something like a proxy servlet to forward the request and get back the request. it is kind of bother some. also for indexing i need some sort of api access. Anyone has done integration of solr to a serlvet/ejb3 based system? Best Regards, -C.B. On Thu, Sep 4, 2008 at 3:32 PM, Erik Hatcher [EMAIL PROTECTED] wrote: On Sep 4, 2008, at 8:27 AM, Cam Bazz wrote: hello, is there no other way then making xml files and feeding those to solr? I just want to feed solr programmatically. - without xml There are several options. You can feed Solr XML, or CSV, or use any of the Solr client APIs (though those use XML under the covers for indexing documents, but transparently). A more advanced option is to use Solr in embedded mode where you use its Java API directly with no intermediate representation needed. Erik
feeding documents tru API
Hello, I have been looking at the API documentation but I dont know where to look in order to feed documents tru API without using xml files. any ideas? Best. -C.B.
feeding data
hello, is there no other way then making xml files and feeding those to solr? I just want to feed solr programmatically. - without xml Best.
Re: adding documents with json post
thanks a bunch. On Mon, Jun 23, 2008 at 4:39 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi Cam, Yes, the various other formats are for responses only, as far as I'm aware. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Cam Bazz [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Sunday, June 22, 2008 5:53:56 PM Subject: adding documents with json post hello; this probably has been asked before but how can I add documents with tru a json ajax submit? does solr only accept XML input? Best. -C.B.
html to text based on some sort of uniqueness metric
Hello, I am indexing newspaper articles as an excercise in solr. When dealing with newspaper articles in previous experiences I always tried to get the div or the table that contains the actual news, using nekohtml traversing tru the dom tree and getting the text from the div or table that contains the article. When dealing with many newspapers, it is a hassle to custom code to extract relevant information. There is usually a lot of garbage in the html. From categories to ads, and further more they change, so a static coding is problematic. I have been thinking if I could measure the frequency or uniqueness for each node, and find the news automatically - but I have not come up with an implementation. Has anyone did/contemplated/used something similar? Maybe there is already a way - using lucene, or even hadoop. Best Regards, -C.A.
Re: Solr system and numbers
I got a similar question: how would one normalize or even detect if a string is a phone number? On Mon, Jun 9, 2008 at 4:17 PM, dudes dudes [EMAIL PROTECTED] wrote: great info ,,, thanks a lot all Date: Mon, 9 Jun 2008 05:58:50 -0700 From: [EMAIL PROTECTED] Subject: Re: Solr system and numbers To: solr-user@lucene.apache.org Hi, Solr/Lucene can treat phone numbers as strings. If you want to clean them up and normalize them outside of Solr, you can do that and feed them into Solr as pure numbers. How the phone numbers will be treated after you pump them into Solr depends on the analyzer you choose to use for this data. If you don't need to search on subsets of phone numbers, then just don't tokenize them (i.e. use string type if the phone numbers contain any non-numeric characters, sint otherwise). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: dudes dudes To: solr-user@lucene.apache.org Sent: Monday, June 9, 2008 2:10:20 PM Subject: Solr system and numbers Hello experts, How does Solr deal with numbers or phone numbers .. For example if you have 1234 and 12 34 or 1 234... with spaces between the numbers .. Or this is dealt by lucene ? any documentations or tutorial on this ? many thanks, ak _ All new Live Search at Live.com http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/ _ All new Live Search at Live.com http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/
solr query syntax
Hello, how can we specify in query so it will just bring certain field and query in the default field? for example can I do a year:1998 AND searchword Best Regards, -C.B.
Re: Announcement of Solr Javascript Client
I have done something similar and I am using a search servlet that will forward the request to solr tru commons htclient. Maybe it could be a solution to DoS, although it is still possible. Best. -Cam Bazz On Thu, May 29, 2008 at 8:04 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I just had a look at the demo and reeeally like it! I didn't pay enough attention to this thread, though. Is the main concern that by having a Solr search webapp that is really all in UI and uses your JS library, the backend Solr server is directly exposed and thus somebody could peek in the web page source, figure out Solr's address, and start issuing delete and other damaging requests? I think somebody mentioned a Servlet Filter. Couldn't we simply supply a servlet filter that allows only some request URLs, possibly reading those URLs from an external file, thus allowing easy customization? This dynamic stuff looks vry juicy. Question about scalability: How much is cached either client-side? With every new letter I type, is JS hitting Solr, or is there some caching (planned) on the client? Danke, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matthew Runo [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, May 29, 2008 12:50:25 PM Subject: Re: Announcement of Solr Javascript Client Wow. This is really pretty cool. You're much further along than I thought you were! I'd love to see this in as an 'official' Solr client. Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833 On May 29, 2008, at 8:15 AM, Matthias Epheser wrote: The server was rebooted yesterday without my knowledge, so the jetty is restarted and should be reachable at http://lovo.test.dev.indoqa.com/mepheser/moobrowser/ As you can see, this first demo uses widget classes and is built with mootools.
solr feed problem
hello, I am trying to feed solr with xml files of my own schema, and I am getting: SEVERE: org.xmlpull.v1.XmlPullParserException: entity reference names can not start with character '\ufffd' my xml is utf8 for sure, as well as the text inside. but for some reason I get this exception and then solr crashes. Any ideas? Best Regards, -C.B.
exception while feeding converted text from pdf
Hello, I made a simple java program to convert my pdfs to text, and then to xml file. I am getting a strange exception. I think the converted files have some errors. should I encode the txt string that I extract from the pdfs in a special way? Best, -C.B. EVERE: org.xmlpull.v1.XmlPullParserException: entity reference names can not start with character ' ' (position: START_TAG seen ...ay\n latitude 59 ... @80:64) at org.xmlpull.mxp1.MXParser.parseEntityRef(MXParser.java:2212) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1275) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Re: indexing pdf documents
yes, I have seen the documentation on RichDocumentRequestHandler at the http://wiki.apache.org/solr/UpdateRichDocuments page. However, from what I understand this just feeds documents to solr. How can I construct something like: document_id, document_name, document_text and feed it in. (i.e. my documents have labels) Best. -C.B. On Tue, May 13, 2008 at 1:30 AM, Chris Harris [EMAIL PROTECTED] wrote: Solr does not have this support built in, but there's a patch for it: https://issues.apache.org/jira/browse/SOLR-284 On Mon, May 12, 2008 at 2:02 PM, Cam Bazz [EMAIL PROTECTED] wrote: Hello, Before making a little program to extract the txt from my pdfs and feed it into solr with xml, I just wanted to check if solr has capability to digest pdf files apart from xml? Best Regards, -C.B.
indexing pdf documents
Hello, Before making a little program to extract the txt from my pdfs and feed it into solr with xml, I just wanted to check if solr has capability to digest pdf files apart from xml? Best Regards, -C.B.