Porting from Solr 1.3 to 3.5
I am porting my app from lucene 2.X(solr 1.3) to lucene 3.X(solr 3.5). The following is my issue. This one was valid in 2.X, but 3.5 throws me an error. IndexReader reader = IndexReader.open(/home/path/to/my/dataDir); 2.X accepted a string, but 3.5 strictly wants a Directory object. I find Directory to be abstract and the only way to instantiate it seems a RAMDirectory(). How do I go about this and how do I point my reader to the desired directory? P.S : Our application needs a custom logic this way and hence instead of going with cores, we do it this way. -- With Thanks and Regards, Ramprakash Ramamoorthy, Engineer Trainee, Zoho Corporation. +91 9626975420
Help! Confused about using Jquery for the Search query - Want to ditch it
Hi, My current method of searching involes communicating to solr using python. The clients browser communicates to the search API using jquery/json. However, although this works, I dont like the dependancy on Javascript. Either I can keep with this method and have a backup system in place that works when javascript is disabled, or better yet, I can use a system that works both with Javascript or without. So I was thinking, instead of using the API and returning a JSON to be interpreted by Javascript, I could create a new handler to render the search results in the server and use POST to submit the query to the server. So, if I wanted a fast and effiicent method of querying results from Solr and returning the results all without Javascript enabled, what choices do I have? Your thoughts would be hugely appreciated because im new to this stuff. James -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr, I have perfomance problem for indexing.
what is your db schema ? do you need to import all the schema ? (128 joined tables ??) or are the tables all independant ? (if so dump them out and import them in using csv) cheers lee c On 7 June 2012 02:32, Jihyun Suh jhsuh.ourli...@gmail.com wrote: Each table has 35,000 rows. (35 thousands). I will check the log for each step of indexing. I run Solr 3.5. 2012/6/6 Jihyun Suh jhsuh.ourli...@gmail.com I have 128 tables of mysql 5.x and each table have 3,5000 rows. When I start dataimport(indexing) in Solr, it takes 5 minutes for one table. But When Solr indexs 20th table, it takes around 10 minutes for one table. And then When it indexs 40th table, it takes around 20 minutes for one table. Solr has some performance problem for too many documents? Should I set some configuration?
Re: Porting from Solr 1.3 to 3.5
On Thu, Jun 7, 2012 at 1:18 PM, Ramprakash Ramamoorthy youngestachie...@gmail.com wrote: I am porting my app from lucene 2.X(solr 1.3) to lucene 3.X(solr 3.5). The following is my issue. This one was valid in 2.X, but 3.5 throws me an error. IndexReader reader = IndexReader.open(/home/path/to/my/dataDir); 2.X accepted a string, but 3.5 strictly wants a Directory object. I find Directory to be abstract and the only way to instantiate it seems a RAMDirectory(). How do I go about this and how do I point my reader to the desired directory? P.S : Our application needs a custom logic this way and hence instead of going with cores, we do it this way. -- With Thanks and Regards, Ramprakash Ramamoorthy, Engineer Trainee, Zoho Corporation. +91 9626975420 I was able to do it. I just did it this way IndexReader reader = new SimpleFSDirectory(new File(my/desired/path)); Thanks for your time. -- With Thanks and Regards, Ramprakash Ramamoorthy, Engineer Trainee, Zoho Corporation. +91 9626975420
Re: Help! Confused about using Jquery for the Search query - Want to ditch it
Further to my last reply. How about I do the following: Send the request to the server using the GET method and then return the results in XML rather than JSON. Does this sound logical? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988128.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exception when optimizing index
Hi Jack, its the virtual machine running on a VMware vSphere 5 Enterprise Plus. Machine has 30 GB vRAM, 8 core vCPU 3.0 GHz, 2 TB SATA RAID-10 over iSCSI. Operation system is CentOS 6.2 64bit. Here are java infos: - catalina.base/usr/share/tomcat6 - catalina.home/usr/share/tomcat6 - catalina.useNamingtrue - common.loader ${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar - file.encodingUTF-8 - file.encoding.pkgsun.io - file.separator/ - java.awt.graphicsenvsun.awt.X11GraphicsEnvironment - java.awt.printerjobsun.print.PSPrinterJob - java.class.path /usr/share/tomcat6/bin/bootstrap.jar /usr/share/tomcat6/bin/tomcat-juli.jar/usr/share/java/commons-daemon.jar - java.class.version50.0 - java.endorsed.dirs - java.ext.dirs /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/ext /usr/java/packages/lib/ext - java.home/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre - java.io.tmpdir/var/cache/tomcat6/temp - java.library.path /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64 /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/../lib/amd64 /usr/java/packages/lib/amd64/usr/lib64/lib64/lib/usr/lib - java.naming.factory.initial org.apache.naming.java.javaURLContextFactory - java.naming.factory.url.pkgsorg.apache.naming - java.runtime.nameOpenJDK Runtime Environment - java.runtime.version1.6.0_22-b22 - java.specification.nameJava Platform API Specification - java.specification.vendorSun Microsystems Inc. - java.specification.version1.6 - java.util.logging.config.file /usr/share/tomcat6/conf/logging.properties - java.util.logging.managerorg.apache.juli.ClassLoaderLogManager - java.vendorSun Microsystems Inc. - java.vendor.urlhttp://java.sun.com/ - java.vendor.url.bughttp://java.sun.com/cgi-bin/bugreport.cgi - java.version1.6.0_22 - java.vm.infomixed mode - java.vm.nameOpenJDK 64-Bit Server VM - java.vm.specification.nameJava Virtual Machine Specification - java.vm.specification.vendorSun Microsystems Inc. - java.vm.specification.version1.0 - java.vm.vendorSun Microsystems Inc. - java.vm.version20.0-b11 - javax.sql.DataSource.Factory org.apache.commons.dbcp.BasicDataSourceFactory - line.separator - os.archamd64 - os.nameLinux - os.version2.6.32-220.13.1.el6.x86_64 - package.access sun.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper.,sun.beans. - package.definition sun.,java.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper. - path.separator: - server.loader - shared.loader - sun.arch.data.model64 - sun.boot.class.path /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/resources.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rt.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/sunrsasign.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/jsse.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/jce.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/charsets.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/netx.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/plugin.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rhino.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/modules/jdk.boot.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/classes - sun.boot.library.path /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64 - sun.cpu.endianlittle - sun.cpu.isalist - sun.io.unicode.encodingUnicodeLittle - sun.java.commandorg.apache.catalina.startup.Bootstrap start - sun.java.launcherSUN_STANDARD - sun.jnu.encodingUTF-8 - sun.management.compilerHotSpot 64-Bit Tiered Compilers - sun.os.patch.levelunknown - tomcat.util.buf.StringCache.byte.enabledtrue - user.countryUS - user.dir/usr/share/tomcat6 - user.home/usr/share/tomcat6 - user.languageen - user.nametomcat - user.timezoneEurope/Ljubljana As far as I see from the JIRA issue I have the patch attached (as mentioned I have a trunk version from May 12). Any ideas? Many thanks! On Wed, Jun 6, 2012 at 2:49 PM, Jack Krupansky j...@basetechnology.comwrote: It could be related to https://issues.apache.org/**jira/browse/LUCENE-2975https://issues.apache.org/jira/browse/LUCENE-2975. At least the exception comes from the same function. Caused by: java.io.IOException: Invalid vInt detected (too many bits) at org.apache.lucene.store.**DataInput.readVInt(DataInput.**java:112) What hardware and Java version are you running? -- Jack Krupansky -Original Message- From: Rok Rejc Sent: Wednesday,
Re: Solr, I have perfomance problem for indexing.
You haven't really told us much about what you're doing here. As Lee hints, we don't know much about the details of *how* you are doing this. But unless you're doing something odd, Solr shouldn't be the bottleneck here. Often when a database import is slow, the problem is in the data- acquisition bit. That is, your SQL query for some reason gets slow. That said, with DIH it can be hard to know exactly. You might want to consider using SolrJ instead of DIH. We've found that as the import process gets more complex, using SolrJ is often easier. See: http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/ Best Erick On Thu, Jun 7, 2012 at 5:26 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: what is your db schema ? do you need to import all the schema ? (128 joined tables ??) or are the tables all independant ? (if so dump them out and import them in using csv) cheers lee c On 7 June 2012 02:32, Jihyun Suh jhsuh.ourli...@gmail.com wrote: Each table has 35,000 rows. (35 thousands). I will check the log for each step of indexing. I run Solr 3.5. 2012/6/6 Jihyun Suh jhsuh.ourli...@gmail.com I have 128 tables of mysql 5.x and each table have 3,5000 rows. When I start dataimport(indexing) in Solr, it takes 5 minutes for one table. But When Solr indexs 20th table, it takes around 10 minutes for one table. And then When it indexs 40th table, it takes around 20 minutes for one table. Solr has some performance problem for too many documents? Should I set some configuration?
Solr, db connections remain after indexing a table.
I index many tables which are written with entities in data-config.xml. But after indexing one table, db connection remains even though I set 'holdability=CLOSE_CURSORS_AT_COMMIT'. How can I remove the connection after indexing a table? dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://hostname/dbname batchSize=2000 user=id password=passwd readOnly=true transactionIsolation=TRANSACTION_READ_COMMITTED holdability=CLOSE_CURSORS_AT_COMMIT connectionTimeout=1 readTimeout=24 / document name=doc entity name=testTbl_0 transformer=RegexTransformer onError=continue query=SELECT Title, url, DocID, substring_index(body,' ',2048) description FROM testTbl_0 WHERE status in ('1','s') field column=DocID name=id / field column=Title name=title_t / field column=description name=contents_txt / field column=url name=url / /entity entity name=testTbl_1 transformer=RegexTransformer onError=continue query=SELECT Title, url, DocID, substring_index(body,' ',2048) description FROM testTbl_1 WHERE status in ('1','s') field column=DocID name=id / field column=Title name=title_t / field column=description name=contents_txt / field column=url name=url / /entity +---+--+--+--+-+--+---+---+ | Id| User | Host | db | Command | Time | State | Info | +---+--+--+--+-+--+---+---+ | 88757 | id | hostname:38843 | tmp | Sleep | 2268 | | NULL | | 88758 | id | hostname:38844 | tmp | Sleep | 2196 | | NULL | | 88759 | id | hostname:38845 | tmp | Sleep | 2134 | | NULL | | 88760 | id | hostname:47822 | tmp | Sleep | 2074 | | NULL | | 88761 | id | hostname:47823 | tmp | Sleep | 2013 | | NULL | | 88762 | id | hostname:47824 | tmp | Sleep | 1953 | | NULL | | 88763 | id | hostname:47825 | tmp | Sleep | 1896 | | NULL | | 88764 | id | hostname:47826 | tmp | Sleep | 1838 | | NULL | | 88765 | id | hostname:39795 | tmp | Sleep | 1778 | | NULL | | 88766 | id | hostname:39796 | tmp | Sleep | 1717 | | NULL | | 88767 | id | hostname:39797 | tmp | Sleep | 1658 | | NULL | | 88768 | id | hostname:39798 | tmp | Sleep | 1594 | | NULL | | 88769 | id | hostname:39799 | tmp | Sleep | 1535 | | NULL | | 88770 | id | hostname:50275 | tmp | Sleep | 1470 | | NULL | | 88771 | id | hostname:50276 | tmp | Sleep | 1411 | | NULL | | 88772 | id | hostname:50277 | tmp | Sleep | 1352 | | NULL | | 88773 | id | hostname:50278 | tmp | Sleep | 1291 | | NULL | | 88774 | id | hostname:57385 | tmp | Sleep | 1165 | | NULL | | 88775 | id | hostname:57386 | tmp | Sleep | 1044 | | NULL | | 88776 | id | hostname:57387 | tmp | Sleep | 923 | | NULL | | 88777 | id | hostname:53484 | tmp | Sleep | 801 | | NULL | | 88778 | id | hostname:53485 | tmp | Sleep | 682 | | NULL | | 88779 | id | hostname:58343 | tmp | Sleep | 560 | | NULL | | 88780 | id | hostname:58344 | tmp | Sleep | 438 | | NULL | | 88781 | id | hostname:58345 | tmp | Sleep | 314 | | NULL | | 88782 | id | hostname:50474 | tmp | Sleep | 193 | | NULL | | 88783 | id | hostname:50475 | tmp | Sleep | 72 | | NULL | ...
Re: Levenstein Distance
During the analysis phase you could add payloads to the terms using LevensteinDistance and then use that in conjunction with a PayloadSimilarity class ´See [1] for an example), or just use a custom Similarity class which uses LevensteinDistance for scoring. HTH Tommaso [1] : http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ 2012/6/6 Gau gauravshe...@gmail.com I have a list of synoynms which is being expanded at query time. This yields a lot of results (in millions). My use-case is name search. I want to sort the results by Levenstein Distance. I know this can be done with strdist function. But sorting being inefficient and Solr function adding to its woes kills the performance. I want the results to be returned as quickly as possible. One of the ways which I think Levenstein can work is, applying the strdist on the synonym file and getting the scores of each of the synonym. And then use these scores to boost the results appropriately, it should be equivalent to levenstein distance. But I am not sure how to do this in Solr or infact if Solr supports this. -- View this message in context: http://lucene.472066.n3.nabble.com/Levenstein-Distance-tp3988026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr, db connections remain after indexing a table.
I read someone's question and answer about db connection. Someone said, db connection is still alive for 10minutes. But I start to index(dataimport) before 1 hour, all of db connection remains for 1 hour. | 88757 | id | localhost:38843 | tmp | Sleep | 3696 | | NULL | | 88758 | id | localhost:38844 | tmp | Sleep | 3624 | | NULL | 2012/6/7 Jihyun Suh jhsuh.ourli...@gmail.com I index many tables which are written with entities in data-config.xml. But after indexing one table, db connection remains even though I set 'holdability=CLOSE_CURSORS_AT_COMMIT'. How can I remove the connection after indexing a table? dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://hostname/dbname batchSize=2000 user=id password=passwd readOnly=true transactionIsolation=TRANSACTION_READ_COMMITTED holdability=CLOSE_CURSORS_AT_COMMIT connectionTimeout=1 readTimeout=24 / document name=doc entity name=testTbl_0 transformer=RegexTransformer onError=continue query=SELECT Title, url, DocID, substring_index(body,' ',2048) description FROM testTbl_0 WHERE status in ('1','s') field column=DocID name=id / field column=Title name=title_t / field column=description name=contents_txt / field column=url name=url / /entity entity name=testTbl_1 transformer=RegexTransformer onError=continue query=SELECT Title, url, DocID, substring_index(body,' ',2048) description FROM testTbl_1 WHERE status in ('1','s') field column=DocID name=id / field column=Title name=title_t / field column=description name=contents_txt / field column=url name=url / /entity +---+--+--+--+-+--+---+---+ | Id| User | Host | db | Command | Time | State | Info | +---+--+--+--+-+--+---+---+ | 88757 | id | hostname:38843 | tmp | Sleep | 2268 | | NULL | | 88758 | id | hostname:38844 | tmp | Sleep | 2196 | | NULL | | 88759 | id | hostname:38845 | tmp | Sleep | 2134 | | NULL | | 88760 | id | hostname:47822 | tmp | Sleep | 2074 | | NULL | | 88761 | id | hostname:47823 | tmp | Sleep | 2013 | | NULL | | 88762 | id | hostname:47824 | tmp | Sleep | 1953 | | NULL | | 88763 | id | hostname:47825 | tmp | Sleep | 1896 | | NULL | | 88764 | id | hostname:47826 | tmp | Sleep | 1838 | | NULL | | 88765 | id | hostname:39795 | tmp | Sleep | 1778 | | NULL | | 88766 | id | hostname:39796 | tmp | Sleep | 1717 | | NULL | | 88767 | id | hostname:39797 | tmp | Sleep | 1658 | | NULL | | 88768 | id | hostname:39798 | tmp | Sleep | 1594 | | NULL | | 88769 | id | hostname:39799 | tmp | Sleep | 1535 | | NULL | | 88770 | id | hostname:50275 | tmp | Sleep | 1470 | | NULL | | 88771 | id | hostname:50276 | tmp | Sleep | 1411 | | NULL | | 88772 | id | hostname:50277 | tmp | Sleep | 1352 | | NULL | | 88773 | id | hostname:50278 | tmp | Sleep | 1291 | | NULL | | 88774 | id | hostname:57385 | tmp | Sleep | 1165 | | NULL | | 88775 | id | hostname:57386 | tmp | Sleep | 1044 | | NULL | | 88776 | id | hostname:57387 | tmp | Sleep | 923 | | NULL | | 88777 | id | hostname:53484 | tmp | Sleep | 801 | | NULL | | 88778 | id | hostname:53485 | tmp | Sleep | 682 | | NULL | | 88779 | id | hostname:58343 | tmp | Sleep | 560 | | NULL | | 88780 | id | hostname:58344 | tmp | Sleep | 438 | | NULL | | 88781 | id | hostname:58345 | tmp | Sleep | 314 | | NULL | | 88782 | id | hostname:50474 | tmp | Sleep | 193 | | NULL | | 88783 | id | hostname:50475 | tmp | Sleep | 72 | | NULL | ...
Re: filtering number and repeated contents
thanks Jack , I will try updateProcessor Between does SOLR store tokenized content in fields if field have property stored=true ? On Tue, Jun 5, 2012 at 8:23 PM, Jack Krupansky j...@basetechnology.comwrote: My (very limited) understanding of boilerpipe in Tika is that it strips out short text, which is great for all the menu and navigation text, but the typical disclaimer at the bottom of an email is not very short and frequently can be longer than the email message body itself. You may have to resort to a custom update processor that is programmed with some disclaimer signature text strings to be removed from field values. -- Jack Krupansky -Original Message- From: Mark , N Sent: Tuesday, June 05, 2012 8:28 AM To: solr-user@lucene.apache.org Subject: filtering number and repeated contents Is it possible to filter out numbers and disclaimer ( repeated contents) while indexing to SOLR? These are all surplus information and do not want to index it I have tried using boilerpipe algorithm as well to remove surplus infromation from web pages such as navigational elements, templates, and advertisements , I think it works well but looking forward to see If I could filter out disclaimer information too mainly in email texts. -- Thanks, *Nipen Mark * -- Thanks, *Nipen Mark *
Re: Help! Confused about using Jquery for the Search query - Want to ditch it
Final comment from me then Ill let someone else speak. The solution we seem to be looking at is send a GET request to SOLR and then send back a renderized page, so we are basically creating the results page on the server rather than the client side. I would really like to hear what people have to say about this. Is this a good idea? Are there any major disadvantages? It seems like the only way to go to have a reliable search site which works without Javascript. -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988158.html Sent from the Solr - User mailing list archive at Nabble.com.
how to work with solr
Hi all can anybosy suggest me how to work with solr in web application please send the information Regards Raja -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-work-with-solr-tp3988154.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ERROR 400 undefined field
Am 07.06.2012 09:55, schrieb sheethal shreedhar: http://localhost:8983/solr/select/?q=fruitversion=2.2start=0rows=10indent=on I get HTTP ERROR 400 Problem accessing /solr/select/. Reason: undefined field text Look at your schema.xml. You'll find a line like this: defaultSearchFieldtext/defaultSearchField Replace text with a field that s defined somewhere in schema.xml. Or change your query to something with a field name like this: http://localhost:8983/solr/select/?q=somefield:fruit Or use the (e)dismax handler and configure it accordingly. See http://wiki.apache.org/solr/DisMaxRequestHandler. Greetings, Kuli
Hiring multiple Lucene/Solr Engineers, Leads, and Architects
Hi, Best Buy is building new Search Platform/Eco-System powered by Lucene/Solr. We are hiring multiple Lucene/Solr engineers, tech leads, and architects, both full-time and consulting based in Minneapolis, MN. This is a long term project and the team is fun to work with. Please reach out to me if you are interested @ venkat.amb...@bestbuy.com Thanks, Venkat Ambati Sr. Manager, Digital Commerce Tower, GBS IT Best Buy
RE: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults
Hi The search is distributed over all shards. The problem exists locally as well. Thanks, -Original message- From:Jack Krupansky j...@basetechnology.com Sent: Wed 06-Jun-2012 17:07 To: solr-user@lucene.apache.org Subject: Re: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults Do single-word queries return hits? Is this a multi-shard environment? Does the request list all the shards needed to give hits for all the collations you expect? Maybe the queries are being done locally and don't have hits for the collations locally. -- Jack Krupansky -Original Message- From: Markus Jelsma Sent: Wednesday, June 06, 2012 6:21 AM To: solr-user@lucene.apache.org Subject: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults Hi, We've had some issues with a bad zero-hits collation being returned for a two word query where one word was only one edit away from the required collation. With spellcheck.maxCollations to a reasonable number we saw the various suggestions without the required collation. We decreased thresholdTokenFrequency to make it appear in the list of collations. However, with collateExtendedResults=true the hits field for each collation was zero, which is incorrect. Required collation=huub stapel (two hits) and q=huup stapel collation:{ collationQuery:heup stapel, hits:0, misspellingsAndCorrections:{ huup:heup}}, collation:{ collationQuery:hugo stapel, hits:0, misspellingsAndCorrections:{ huup:hugo}}, collation:{ collationQuery:hulp stapel, hits:0, misspellingsAndCorrections:{ huup:hulp}}, collation:{ collationQuery:hup stapel, hits:0, misspellingsAndCorrections:{ huup:hup}}, collation:{ collationQuery:huub stapel, hits:0, misspellingsAndCorrections:{ huup:huub}}, collation:{ collationQuery:huur stapel, hits:0, misspellingsAndCorrections:{ huup:huur} Now, with maxCollationTries set to 3 or higher we finally get the required collation and the only collation able to return results. How can we determine the best value for maxCollationTries regarding the decrease of the thresholdTokenFrequency? Why is hits always zero? This is with a today's build and distributed search enabled. Thanks, Markus
Re: Solr 4.0 Clean Commit for production use
Thanks everyone! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Clean-Commit-for-production-use-tp3987852p3988183.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults
Hello! -Original message- From:Dyer, James james.d...@ingrambook.com Sent: Wed 06-Jun-2012 17:23 To: solr-user@lucene.apache.org Subject: RE: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults Markus, With maxCollationTries=0, it is not going out and querying the collations to see how many hits they each produce. So it doesn't know the # of hits. That is why if you also specify collateExtendedResults=true, all the hit counts are zero. It would probably be better in this case if it would not report hits in the extended response at all. (On the other hand, if you're seeing zeros and maxCollationTries0, then you've hit a bug!) I see. It would indeed make sense to get rid of the hits field when it's always zero anyway if maxCollationTries=0. Despite your recent explanations it raises some confusion. thresholdTokenFrequency in my opinion is a pretty blunt instrument for getting rid of bad suggestions. It takes out all of the rare terms, presuming that if a term is rare in the data it either is a mistake or isn't worthy to be suggested ever. But if you're using maxCollationTries the suggestions that don't fit will be filtered out automatically, making thresholdTokenFrequency to be needed less. (On the other hand, if you're using IndexBasedSpellChecker, thresholdTokenFrequency will make the dictionary smaller and spellcheck.build run faster... This is solved entirely in 4.0 with DirectSolrSpellChecker...) I forgot to mention this is with the DirectSolrSpellChecker. I guess we'll just have to try working with the thresholdTokenFrequency. It's difficult, however, because the index will grow but changes are that at some point the rare, but correct, token drops below the threshold and is not suggested anymore. We also see the benefit from the threshold since our index is human editted and contains rare but misspelled words. For the apps here, I've been using maxCollationTries=10 and have been getting good results. Keep in mind that even though you're allowing it to try up to 10 queries to find a viable collation, so long as you're setting maxCollations to something low it will (hopefully) seldom need to try more than a couple before finding one with hits. (I always ask for only 1 collation as we just re-apply the spelling correction automatically if the original query returned nothing). Also, if spellcheck.count is low it might not have enough terms available to try, so you might need to raise this value also if raising maxCollationTries. We have a similar set-up and require only one collation to be returned. I can increase maxCollationTries. The worse problem, in my opinion is the fact that it won't ever suggest words if they're in the index (even if using thresholdTokenFrequency to remove them from the dictionary). For that there is https://issues.apache.org/jira/browse/SOLR-2585 which is part of Solr 4. The only other workaround is onlyMorePopular which has its own issues. (see http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount). We don't really like onlyMorePopular since more hits is not always a better suggestion. We decided to turn it off quite some time ago. Also because of SOLR-2555.AlternativeTermCount may indeed be a solution. Thanks, we'll manage for now. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, June 06, 2012 5:22 AM To: solr-user@lucene.apache.org Subject: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults Hi, We've had some issues with a bad zero-hits collation being returned for a two word query where one word was only one edit away from the required collation. With spellcheck.maxCollations to a reasonable number we saw the various suggestions without the required collation. We decreased thresholdTokenFrequency to make it appear in the list of collations. However, with collateExtendedResults=true the hits field for each collation was zero, which is incorrect. Required collation=huub stapel (two hits) and q=huup stapel collation:{ collationQuery:heup stapel, hits:0, misspellingsAndCorrections:{ huup:heup}}, collation:{ collationQuery:hugo stapel, hits:0, misspellingsAndCorrections:{ huup:hugo}}, collation:{ collationQuery:hulp stapel, hits:0, misspellingsAndCorrections:{ huup:hulp}}, collation:{ collationQuery:hup stapel, hits:0, misspellingsAndCorrections:{ huup:hup}}, collation:{ collationQuery:huub stapel, hits:0, misspellingsAndCorrections:{ huup:huub}}, collation:{ collationQuery:huur stapel,
Re: how to work with solr
What language environment are you using? PHP, Python, Ruby, other? Each has its own interface. But ultimately Solr is just another web service with an HTTP and XML or JSON interface. So, it is mostly a question of how your client environment accesses web services that have an HTTP and XML or JSON interface. There is a little info here: http://www.lucidimagination.com/blog/2010/01/14/solr-search-user-interface-examples/ -- Jack Krupansky -Original Message- From: sdssfour Sent: Thursday, June 07, 2012 7:38 AM To: solr-user@lucene.apache.org Subject: how to work with solr Hi all can anybosy suggest me how to work with solr in web application please send the information Regards Raja -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-work-with-solr-tp3988154.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filtering number and repeated contents
Solr (Lucene actually) stores the source form of the data that was fed to Solr, so it is not yet tokenized and will include all punctuation and whitespace. -- Jack Krupansky -Original Message- From: Mark , N Sent: Thursday, June 07, 2012 7:45 AM To: solr-user@lucene.apache.org Subject: Re: filtering number and repeated contents thanks Jack , I will try updateProcessor Between does SOLR store tokenized content in fields if field have property stored=true ? On Tue, Jun 5, 2012 at 8:23 PM, Jack Krupansky j...@basetechnology.comwrote: My (very limited) understanding of boilerpipe in Tika is that it strips out short text, which is great for all the menu and navigation text, but the typical disclaimer at the bottom of an email is not very short and frequently can be longer than the email message body itself. You may have to resort to a custom update processor that is programmed with some disclaimer signature text strings to be removed from field values. -- Jack Krupansky -Original Message- From: Mark , N Sent: Tuesday, June 05, 2012 8:28 AM To: solr-user@lucene.apache.org Subject: filtering number and repeated contents Is it possible to filter out numbers and disclaimer ( repeated contents) while indexing to SOLR? These are all surplus information and do not want to index it I have tried using boilerpipe algorithm as well to remove surplus infromation from web pages such as navigational elements, templates, and advertisements , I think it works well but looking forward to see If I could filter out disclaimer information too mainly in email texts. -- Thanks, *Nipen Mark * -- Thanks, *Nipen Mark *
Re: Boost by Nested Query / Join Needed?
Thanks for your reply. I think the number could eventually get very large (~1B) as our customer-base grows, since each customer could possibly have a preference for each candy, but currently we're looking at around 50M. I've looked at the Solr-2272 patch for joins, which looks as though it might fit the bill, but don't want to ignore an underlying scalability issue if my schema organization doesn't make sense. Also, it has recently been brought to my attention that it might be problematic if preferences are updated frequently, which they will be ('candy' records will not be). If it helps things at all, I never have to do any *direct* searches (just indirect/join-type referencing) on the preference data. Does it make more sense to try to index preference data in a separate core and use another (non-nested) query to obtain them? I had thought of trying a nested query with the query Function Query, but I need the 'candy' id from the initial query, which amounts to join-like behavior. Thanks again for your guidance, -Nick -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818p3988210.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help! Confused about using Jquery for the Search query - Want to ditch it
This is a bad idea. Solr is not designed to be exposed to arbitrary internet traffic and attacks. The best design is to have a front end server make requests to Solr, then use those to make HTML pages. wunder On Jun 7, 2012, at 4:49 AM, Spadez wrote: Final comment from me then Ill let someone else speak. The solution we seem to be looking at is send a GET request to SOLR and then send back a renderized page, so we are basically creating the results page on the server rather than the client side. I would really like to hear what people have to say about this. Is this a good idea? Are there any major disadvantages? It seems like the only way to go to have a reliable search site which works without Javascript. -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988158.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication lag
Hello, Boris, If I remember correctly, older versions of Solr report the version of the as-of-yet uncommitted core in the replication page. So if you did a commit on the master and then a replication, you'd see that version on the client. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Thu, Jun 7, 2012 at 3:53 AM, Boris Vorotnikov bori...@auto.ru wrote: Hello, My name is Boris Vorotnikov. I am a developer of project parts.auto.ru. My team uses solr v.3.5 in this project. Several days ago I noticed that replication between master and slave had a time lag. There was no such lags before. I tried to find the source of trouble but it was unsuccessfully. All I found is that master tells more recent version of index than it has. And when I press the button Replicate now on slave it receive information about index that it already has and do nothing. This is configuration of master and slave replication: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfterstartup/str str name=replicateAfteroptimize/str str name=confFilesschema.xml,stopwords.txt,synonyms.txt,spellings.txt/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://__solr.master_url__:__solr.port__/solr/parts/replication/str str name=pollInterval01:00:00/str /lst str name=maxNumberOfBackups3/str /requestHandler enable.master - parameter of command line. It works. Solr replicates only once a day when we do index optimization by cron. Is there any parameter responsible for keeping index version or something else? Best regards, Boris Vorotnikov Developer auto.ru Tel: +7 (499) 780 3780 # 424 E-mail: bori...@auto.ru
Re: Help! Confused about using Jquery for the Search query - Want to ditch it
And keep Solr behind a firewall or authentication or even better, both! People *will* find and exploit your Solr installation. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Thu, Jun 7, 2012 at 10:31 AM, Walter Underwood wun...@wunderwood.org wrote: This is a bad idea. Solr is not designed to be exposed to arbitrary internet traffic and attacks. The best design is to have a front end server make requests to Solr, then use those to make HTML pages. wunder On Jun 7, 2012, at 4:49 AM, Spadez wrote: Final comment from me then Ill let someone else speak. The solution we seem to be looking at is send a GET request to SOLR and then send back a renderized page, so we are basically creating the results page on the server rather than the client side. I would really like to hear what people have to say about this. Is this a good idea? Are there any major disadvantages? It seems like the only way to go to have a reliable search site which works without Javascript. -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988158.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exception when optimizing index
Is the index otherwise usable for queries? And it is only the optimize that is failing? I suppose it is possible that the index could be corrupted, but it is also possible that there is a bug in Lucene. I would suggest running Lucene CheckIndex next. See what it has to say. See: https://builds.apache.org/job/Lucene-trunk/javadoc/core/org/apache/lucene/index/CheckIndex.html#main(java.lang.String[]) -- Jack Krupansky -Original Message- From: Rok Rejc Sent: Thursday, June 07, 2012 5:50 AM To: solr-user@lucene.apache.org Subject: Re: Exception when optimizing index Hi Jack, its the virtual machine running on a VMware vSphere 5 Enterprise Plus. Machine has 30 GB vRAM, 8 core vCPU 3.0 GHz, 2 TB SATA RAID-10 over iSCSI. Operation system is CentOS 6.2 64bit. Here are java infos: - catalina.base/usr/share/tomcat6 - catalina.home/usr/share/tomcat6 - catalina.useNamingtrue - common.loader ${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar - file.encodingUTF-8 - file.encoding.pkgsun.io - file.separator/ - java.awt.graphicsenvsun.awt.X11GraphicsEnvironment - java.awt.printerjobsun.print.PSPrinterJob - java.class.path /usr/share/tomcat6/bin/bootstrap.jar /usr/share/tomcat6/bin/tomcat-juli.jar/usr/share/java/commons-daemon.jar - java.class.version50.0 - java.endorsed.dirs - java.ext.dirs /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/ext /usr/java/packages/lib/ext - java.home/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre - java.io.tmpdir/var/cache/tomcat6/temp - java.library.path /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64 /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/../lib/amd64 /usr/java/packages/lib/amd64/usr/lib64/lib64/lib/usr/lib - java.naming.factory.initial org.apache.naming.java.javaURLContextFactory - java.naming.factory.url.pkgsorg.apache.naming - java.runtime.nameOpenJDK Runtime Environment - java.runtime.version1.6.0_22-b22 - java.specification.nameJava Platform API Specification - java.specification.vendorSun Microsystems Inc. - java.specification.version1.6 - java.util.logging.config.file /usr/share/tomcat6/conf/logging.properties - java.util.logging.managerorg.apache.juli.ClassLoaderLogManager - java.vendorSun Microsystems Inc. - java.vendor.urlhttp://java.sun.com/ - java.vendor.url.bughttp://java.sun.com/cgi-bin/bugreport.cgi - java.version1.6.0_22 - java.vm.infomixed mode - java.vm.nameOpenJDK 64-Bit Server VM - java.vm.specification.nameJava Virtual Machine Specification - java.vm.specification.vendorSun Microsystems Inc. - java.vm.specification.version1.0 - java.vm.vendorSun Microsystems Inc. - java.vm.version20.0-b11 - javax.sql.DataSource.Factory org.apache.commons.dbcp.BasicDataSourceFactory - line.separator - os.archamd64 - os.nameLinux - os.version2.6.32-220.13.1.el6.x86_64 - package.access sun.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper.,sun.beans. - package.definition sun.,java.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper. - path.separator: - server.loader - shared.loader - sun.arch.data.model64 - sun.boot.class.path /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/resources.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rt.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/sunrsasign.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/jsse.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/jce.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/charsets.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/netx.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/plugin.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rhino.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/modules/jdk.boot.jar /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/classes - sun.boot.library.path /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64 - sun.cpu.endianlittle - sun.cpu.isalist - sun.io.unicode.encodingUnicodeLittle - sun.java.commandorg.apache.catalina.startup.Bootstrap start - sun.java.launcherSUN_STANDARD - sun.jnu.encodingUTF-8 - sun.management.compilerHotSpot 64-Bit Tiered Compilers - sun.os.patch.levelunknown - tomcat.util.buf.StringCache.byte.enabledtrue - user.countryUS - user.dir/usr/share/tomcat6 - user.home/usr/share/tomcat6 - user.languageen - user.nametomcat - user.timezoneEurope/Ljubljana As far as I see from the JIRA issue I have the patch attached (as mentioned I have a trunk version from May 12). Any ideas? Many thanks! On Wed, Jun 6, 2012 at 2:49 PM, Jack
Re: How to cap facet counts beyond a specified limit
Sounds like an interesting improvement to propose. It will also depend on various factors, such as number of unique terms in a field, field type, etc. Which field types are giving you the most trouble and how many unique values do they have? And do you specify a facet.method or just let it default? What release of Solr are you on? Are you using trie for numeric fields? Are these mostly string fields? Any boolean fields? -- Jack Krupansky -Original Message- From: Andrew Laird Sent: Thursday, June 07, 2012 4:01 AM To: solr-user@lucene.apache.org Subject: How to cap facet counts beyond a specified limit We have an index with ~100M documents and I am looking for a simple way to speed up faceted searches. Is there a relatively straightforward way to stop counting the number of matching documents beyond some specifiable value? For our needs we don't really need to know that a particular facet has exactly 14,203,527 matches - just knowing that there are more than a million is enough. If I could somehow limit the hit counts to a million (say) it seems like that could decrease the work required to compute the values (just stop counting after the limit is reached) and potentially improve faceted search time - especially when we have 20-30 fields to facet on. Has anyone else tried to do something like this? Many thanks for comments and info, Sincerely, andy laird | gettyimages | 206.925.6728
return *all* words at levenstein distance = N from query word
Hi all, I am wandering if SOLR can return me all words in my text corpus that have a given levenstein distance with my query word. Possible? Difficult? Cheers, Giovanni
Re: return *all* words at levenstein distance = N from query word
I would debug somewhere close to the FuzzyQuery. Lucene is doing exactly that (just as PrefixQueries are doing): expand a FuzzyQuery (PrefixQuery) to a disjunction of term-queries for the words that match that fuzzy or prefix queries. Maybe it helps you start? paul Le 7 juin 2012 à 18:15, Giovanni Gherdovich a écrit : Hi all, I am wandering if SOLR can return me all words in my text corpus that have a given levenstein distance with my query word. Possible? Difficult? Cheers, Giovanni
Re: Help! Confused about using Jquery for the Search query - Want to ditch it
Thank you for the reply, but I'm afraid I don't understand :( This is how things are setup. On my Python website, I have a keyword and location box. When clicked, it queries the server via a javascript GET request, it then sends back the data via Json. I'm saying that I dont want to be reliant on Javascript. So I'm confused about the best way to not only send the request to the Solr server, but also how to receive the data. My guess is that a GET request without javascript is the right way to send the request to the Solr server, but then what should Solr be spitting out the other end, just an XML file? Then is the idea that my Python site would receive this XML data and display it on the site? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988246.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boost by Nested Query / Join Needed?
For posterity, I think we're going to remove 'preference' data from Solr indexing and go in the custom Function Query direction with a key-value store. -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818p3988255.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Help! Confused about using Jquery for the Search query - Want to ditch it
I'm new to Solr...but this is more of a web programming question...so I can get in on this :). Your only option to get the data from Solr sans-Javascript, is the use python to pull the results BEFORE the client loads the page. So, if you are asking if you can get AJAX like results (an already loaded page pulling info from your Solr server)...but without using Javascript...no, you cannot do that. You might be able to hack something ugly together using iframes, but trust me, you don't want to. It will look bad, it won't work well, and interacting with data in an iframe is nightmarish. So, basically, if you don't want to use Javascript, your only option is a total page reload every time you need to query Solr (which you then query on the python side.) -Original Message- From: Spadez [mailto:james_will...@hotmail.com] Sent: Thursday, June 07, 2012 11:37 AM To: solr-user@lucene.apache.org Subject: Re: Help! Confused about using Jquery for the Search query - Want to ditch it Thank you for the reply, but I'm afraid I don't understand :( This is how things are setup. On my Python website, I have a keyword and location box. When clicked, it queries the server via a javascript GET request, it then sends back the data via Json. I'm saying that I dont want to be reliant on Javascript. So I'm confused about the best way to not only send the request to the Solr server, but also how to receive the data. My guess is that a GET request without javascript is the right way to send the request to the Solr server, but then what should Solr be spitting out the other end, just an XML file? Then is the idea that my Python site would receive this XML data and display it on the site? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988246.html Sent from the Solr - User mailing list archive at Nabble.com. Quincy and its subsidiaries do not discriminate in the sale of advertising in any medium (broadcast, print, or internet), and will accept no advertising which is placed with an intent to discriminate on the basis of race or ethnicity.
RE: Help! Confused about using Jquery for the Search query - Want to ditch it
Hi Ben, Thank you for the reply. So, If I don't want to use Javascript and I want the entire page to reload each time, is it being done like this? 1. User submits form via GET 2. Solr server queried via GET 3. Solr server completes query 4. Solr server returns XML output 5. XML data put into results page 6. User shown new results page Is this basically how it would work if we wanted Javascript out of the equation? Regards, James -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988272.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help! Confused about using Jquery for the Search query - Want to ditch it
On 6/7/2012 1:53 PM, Spadez wrote: Hi Ben, Thank you for the reply. So, If I don't want to use Javascript and I want the entire page to reload each time, is it being done like this? 1. User submits form via GET 2. Solr server queried via GET 3. Solr server completes query 4. Solr server returns XML output 5. XML data put into results page 6. User shown new results page Is this basically how it would work if we wanted Javascript out of the equation? Seems to me that you'd still have to have Javascript turn the XML into HTML -- unless you use the XsltResponseWriter (http://wiki.apache.org/solr/XsltResponseWriter) to use XSLT to turn the raw XML into your actual results HTML. The other option is to create a python page that does the call to Solr and spits out just the HTML for your results, then call THAT rather than calling Solr directly. Nick
replication start notification
Is there a programmatic way or otherwise to become aware when the replication operation starts? In looking at the source for ReplicationHandler, there aren't log statements to indicate that it started. Thanks, Jon
Re: Help! Confused about using Jquery for the Search query - Want to ditch it
On Thu, Jun 7, 2012 at 1:59 PM, Nick Chase nch...@earthlink.net wrote: The other option is to create a python page that does the call to Solr and spits out just the HTML for your results, then call THAT rather than calling Solr directly. This is the *only* option if you're listening to Walter and I. Don't give end users direct access to your Solr box! Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com
Re: Help! Confused about using Jquery for the Search query - Want to ditch it
+1 on that! If you do want to provide direct results, ALWAYS send requests through a proxy that can verify that a) all requests are coming from your web app, and b) only acceptable queries are being passed on. Nick On 6/7/2012 2:50 PM, Michael Della Bitta wrote: On Thu, Jun 7, 2012 at 1:59 PM, Nick Chasench...@earthlink.net wrote: The other option is to create a python page that does the call to Solr and spits out just the HTML for your results, then call THAT rather than calling Solr directly. This is the *only* option if you're listening to Walter and I. Don't give end users direct access to your Solr box!
RE: Help! Confused about using Jquery for the Search query - Want to ditch it
Yes (or, at least, I think I understand what you are saying, haha.) Let me clarify. 1. Client sends GET request to web server 2. Web server (via Python, in your case, if I remember correctly) queries Solr Server 3. Solr server sends response to web server 4. You take that data and put it into the page you are creating server-side 5. Server returns static page to client -Original Message- From: Spadez [mailto:james_will...@hotmail.com] Sent: Thursday, June 07, 2012 12:53 PM To: solr-user@lucene.apache.org Subject: RE: Help! Confused about using Jquery for the Search query - Want to ditch it Hi Ben, Thank you for the reply. So, If I don't want to use Javascript and I want the entire page to reload each time, is it being done like this? 1. User submits form via GET 2. Solr server queried via GET 3. Solr server completes query 4. Solr server returns XML output 5. XML data put into results page 6. User shown new results page Is this basically how it would work if we wanted Javascript out of the equation? Regards, James -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988272.html Sent from the Solr - User mailing list archive at Nabble.com. Quincy and its subsidiaries do not discriminate in the sale of advertising in any medium (broadcast, print, or internet), and will accept no advertising which is placed with an intent to discriminate on the basis of race or ethnicity.
RE: Help! Confused about using Jquery for the Search query - Want to ditch it
Thank you, that helps. The bit I am still confused about how the server sends the response to the server though. I get the impression that there are different ways that this could be done, but is sending an XML response back to the Python server the best way to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Help! Confused about using Jquery for the Search query - Want to ditch it
As far as I know, it is the only way to do this. Look around a bit, Python (or PHP, or C, etc., etc.) is able to act as an HTTP client...in fact, that is the most common way that web services are consumed. But, we are definitely beyond the scope of the Solr list at this point. -Original Message- From: Spadez [mailto:james_will...@hotmail.com] Sent: Thursday, June 07, 2012 2:09 PM To: solr-user@lucene.apache.org Subject: RE: Help! Confused about using Jquery for the Search query - Want to ditch it Thank you, that helps. The bit I am still confused about how the server sends the response to the server though. I get the impression that there are different ways that this could be done, but is sending an XML response back to the Python server the best way to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html Sent from the Solr - User mailing list archive at Nabble.com. Quincy and its subsidiaries do not discriminate in the sale of advertising in any medium (broadcast, print, or internet), and will accept no advertising which is placed with an intent to discriminate on the basis of race or ethnicity.
RE: Help! Confused about using Jquery for the Search query - Want to ditch it
But, check out things like httplib2 and urllib2. -Original Message- From: Spadez [mailto:james_will...@hotmail.com] Sent: Thursday, June 07, 2012 2:09 PM To: solr-user@lucene.apache.org Subject: RE: Help! Confused about using Jquery for the Search query - Want to ditch it Thank you, that helps. The bit I am still confused about how the server sends the response to the server though. I get the impression that there are different ways that this could be done, but is sending an XML response back to the Python server the best way to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html Sent from the Solr - User mailing list archive at Nabble.com. Quincy and its subsidiaries do not discriminate in the sale of advertising in any medium (broadcast, print, or internet), and will accept no advertising which is placed with an intent to discriminate on the basis of race or ethnicity.
Re: timeAllowed flag in the response
Are you requesting a large number of rows? If so, request smaller chunks, like ten at a time. Then you can show those with a waiting note. wunder On Jun 7, 2012, at 1:14 PM, Laurent Vaills wrote: Hi everyone, We have some grouping queries that are quite long to execute. Some are too long to execute and are not acceptable. We have setup timeout for the socket but with this we get no result and the query is still running on the Solr side. So, we are now using the timeAllowed parameter which is a good compromise. However, in the response, how can we know that the query was stopped because it was too long ? I need this information for monitoring and to tell the user that the results are not complete. Regards, Laurent
Re: replication start notification
SOLR-1855 has a script that checks replication details: /solr/${CORE}/replication?command=details # Get the last time the core replicated correctly. # Get the last time the core failed to replicate. # Is this core replicating (aka pulling index from master) right now? See: https://issues.apache.org/jira/browse/SOLR-1855 -- Jack Krupansky -Original Message- From: Jon Kirton Sent: Thursday, June 07, 2012 2:30 PM To: solr-user@lucene.apache.org Subject: replication start notification Is there a programmatic way or otherwise to become aware when the replication operation starts? In looking at the source for ReplicationHandler, there aren't log statements to indicate that it started. Thanks, Jon
PorterStemmerTokenizerFactory ?
I've read different suggestions on how to handle cases where synonyms are used and there are multiple version of the original word that need to point to the same set of synonyms (/responsibility, responsibilities, obligation, duty/ ). The approach that seems most logical is to configure a SynonymFilterFactory to use a custom TokenizerFactory that stems synonyms by calling out to the PorterStemmer. Does anyone know if a PorterStemmerTokenizerFactory already exists somewhere? Thank you. Carrie Coy
Re: PorterStemmerTokenizerFactory ?
Look at the text_en field type in the Solr 3.6 example schema. -- Jack Krupansky -Original Message- From: Carrie Coy Sent: Thursday, June 07, 2012 5:04 PM To: solr-user@lucene.apache.org Subject: PorterStemmerTokenizerFactory ? I've read different suggestions on how to handle cases where synonyms are used and there are multiple version of the original word that need to point to the same set of synonyms (/responsibility, responsibilities, obligation, duty/ ). The approach that seems most logical is to configure a SynonymFilterFactory to use a custom TokenizerFactory that stems synonyms by calling out to the PorterStemmer. Does anyone know if a PorterStemmerTokenizerFactory already exists somewhere? Thank you. Carrie Coy
Re: Filter query vs Facets
You may want to read the faceting overview: http://wiki.apache.org/solr/SolrFacetingOverview -- Jack Krupansky -Original Message- From: Swetha Shenoy Sent: Thursday, June 07, 2012 5:24 PM To: solr-user@lucene.apache.org Subject: Filter query vs Facets Hi, I had a question regarding the filter query (fq) and faceted search. Both are used to filter search results. Can someone tell me how they are different and when you would use one over the other? Thanks, Swetha
ContentStreamUpdateRequest method addFile in 4.0 release.
In latest 4.0 release, the addFile() method has a new argument 'contentType': addFile(File file, String contentType) In context of Solr Cell how should addFile() method be called? Specifically I refer to the Wiki example: ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(mailing_lists.pdf)); up.setParam(literal.id, mailing_lists.pdf); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); result = server.request(up); assertNotNull(Couldn't upload mailing_lists.pdf, result); rsp = server.query( new SolrQuery( *:*) ); Assert.assertEquals( 1, rsp.getResults().getNumFound() ); given at URL: http://wiki.apache.org/solr/ExtractingRequestHandler Since Solr Cell is calling Tika under the hood, doesn't the file content-type is already identified by Tika? Looking at the code, it seems passing NULL would do the job, is that correct? Also for Solr Cell, is the ContentStreamUpdateRequest class is the right one to use or there is a different class that is more appropriate here? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/ContentStreamUpdateRequest-method-addFile-in-4-0-release-tp3988344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Data Import Handler to invoke a stored procedure with output (cursor) parameter
Thanks Michael and Lance! I decided to go with an Oracle Pipelined Table function and that took care of it. I think that's what Michael was referring to below. This enabled us to be able to make a simple SQL call. Thanks again. From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Sunday, June 3, 2012 12:28 AM Subject: Re: Using Data Import Handler to invoke a stored procedure with output (cursor) parameter Right, or create a view. On Fri, Jun 1, 2012 at 8:11 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Apologies for the terseness of this reply, as I'm on my mobile. To treat the result of a function call as a table in Oracle SQL, use the table() function, like this: select * from table(my_stored_func()) HTH, Michael On Jun 1, 2012 8:01 PM, Niran Fajemisin afa...@yahoo.com wrote: So I was able to run some additional tests today on this. I tried to use a stored function instead of a stored procedure. The hope was that the Stored Function would simply be a wrapper for the Store Procedure and would simply return the cursor as the return value. This unfortunately did not work. My test attempted to call the function from the query attribute of the entity tag as such: {call my_stored_func()} It raised an error stating that: 'my_stored_func' is not a procedure or is undefined. This makes sense because the invocation format above is customarily reserved for a stored procedure. So then I tried the typical approach for invoking a function which would be: {call ? := my_stored_function()} And as expected this resulted in an error stating that: not all variables bound . Again, this is expected as the ? notation would be the placeholder parameter that would be bound to the OracleTypes.CURSOR constant in a typical JDBC program. Note that this function has been tested outside of DIH and it works when properly invoked. I think the bottom-line here is that there is no proper support for stored procedures (or functions for that matter) in DIH. This is really unfortunate because anyone thinking of doing any significant processing in the source RDBMS prior to data export would have to look elsewhere. Short of adding this functionality to the JdbcDataSource class of the DIH, I think I'm at a dead end. If anyone knows of any alternatives I would greatly appreciate hearing them. Thanks for the responses as usual. Cheers. From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org; Niran Fajemisin afa...@yahoo.com Sent: Thursday, May 31, 2012 3:09 PM Subject: Re: Using Data Import Handler to invoke a stored procedure with output (cursor) parameter Can you add a new stored procedure that uses your current one? It would operate like the DIH expects. I don't remember if DB cursors are a standard part of JDBC. If they are, it would be a great addition to the DIH if they work right. On Thu, May 31, 2012 at 10:44 AM, Niran Fajemisin afa...@yahoo.com wrote: Thanks for your response, Michael. Unfortunately changing the stored procedure is not really an option here. From what I'm seeing, it would appear that there's really no way of somehow instructing the Data Import Handler to get a handle on the output parameter from the stored procedure. It's a bit surprising though that no one has ran into this scenario but I suppose most people just work around it. Anyone else care to shed some more light on alternative approaches? Thanks again. From: Michael Della Bitta michael.della.bi...@appinions.com To: solr-user@lucene.apache.org Sent: Thursday, May 31, 2012 9:40 AM Subject: Re: Using Data Import Handler to invoke a stored procedure with output (cursor) parameter I could be wrong about this, but Oracle has a table() function that I believe turns the output of a function as a table. So possibly you could wrap your procedure in a function that returns the cursor, or convert the procedure to a function. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Thu, May 31, 2012 at 8:00 AM, Niran Fajemisin afa...@yahoo.com wrote: Hi all, I've seen a few questions asked around invoking stored procedures from within Data Import Handler but none of them seem to indicate what type of output parameters were being used. I have a stored procedure created in Oracle database that takes a couple input parameters and has an output parameter that is a reference cursor. The cursor is expected to be used as a way of iterating through the returned table rows. I'm using the following format to invoke my stored procedure in the Data Import Handler's data config XML: entity name=entity_name ... query={call my_stored_proc(inParam1, inParam2)} .../entity I have tested that
Solr 4.0 Master slave configuration in JBOSS 5.1.2
I have Solr 4.0 (apache-solr-4.0) and JBoss Application Server 5.1.2 installed in RHEL 6.2 machine. I was successful in integrating solr with JBoss and I am able to view admin console (single core). Now I would link to create the Master/Slave configuration for Solr servers ? can anyone help me? Thanks Amit -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Master-slave-configuration-in-JBOSS-5-1-2-tp3988375.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on addBean and deleteByQuery
On Wed, Jun 6, 2012 at 8:51 PM, Darin Pope da...@planetpope.com wrote: When using SolrJ (1.4.1 or 3.5.0) and calling either addBean or deleteByQuery, the POST body has numbers before and after the XML (47 and 0 as noted in the example below): It looks like this is HTTP chunked transfer encoding. As to whether that's configurable in SolrJ, I defer to the experts on the list. http://en.wikipedia.org/wiki/Chunked_transfer_encoding -- Nick Zadrozny http://websolr.com — hassle-free hosted search, powered by Apache Solr