Solr Schema and how?
Hello all, We have a screen builder application where users design their own forms. They have a choice of create forms fields with type date, text,numbers,large text etc upto total of 500 fields supported on a screen. Once screens are designed system automatically handle the type checking for valid data entries on front end even though data of any type gets stored as text. So as you can imagine, table is huge with 600+ columns(screenId,recordId,field1 ...field500) and every column is set as 'text'. Same table stores data for every screen designed in the system. So basically here are my questions 1. How best to index it? I did it using dynamic field 'field*' which works great 2. Since everything is text,not sure how to enable filtering on each field e.g. If a user wants to enable 'greater than' or 'less then' type of queries on a number field (stored as text), somehow that data needs to be stored as number in SOLR but I don't think I have a way to do that. I can't do that Since 'field2' may be be a 'number' field for a 'screen1' and 'date' for screen2. Would appreciate any ideas to handle this? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Schema-and-how-tp3393989p3393989.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how can i develop client application with solr url using javascript?
search 'ajax-solr' on google. To handle solr url, look at establishing a proxy Good luck. -- View this message in context: http://lucene.472066.n3.nabble.com/how-can-i-develop-client-application-with-solr-url-using-javascript-tp3275506p3276269.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0 = Spatial Search - How to
Thanks Here was the issues. Concatenating 2 floats(lat,lng) at mysql end converted it to a BLOB. Indexing would fail in storing BLOB in 'location' type field. After BLOB issue was resolved, all worked ok. Thank you all for your help -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2253691.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.0 = Spatial Search - How to
Ok, this could be very easy to do but was not able to do this. Need to enable location search i.e. if someone searches for location 'New York' = show results for New York and results within 50 miles of New York. We do have latitude/longitude stored in database for each record but not sure how to index these values to enable spatial search. Any help would be much appreciated. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2245592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0 = Spatial Search - How to
Adam, thanks. Yes that helps but how does coords fields get populated? All I have is field name=lat type=tdouble indexed=true stored=true / field name=lng type=tdouble indexed=true stored=true / field name=coord type=location indexed=true stored=true / fields 'lat' and 'lng' get populated by dataimporthandler but coord, am not sure? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2245709.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DIH and denormalizing
In your query 'query=SELECT webtable as wt FROM ncdat_wt WHERE featurecode='${ncdat.feature}' .. instead of ${ncdat.feature} use ${dataTable.feature} where dataTable is your parent entity name. From: Shawn Heisey-4 [via Lucene] [mailto:ml-node+929151-1527242139-124...@n3.nabble.com] Sent: Monday, June 28, 2010 2:24 PM To: caman Subject: DIH and denormalizing I am trying to do some denormalizing with DIH from a MySQL source. Here's part of my data-config.xml: entity name=dataTable pk=did query=SELECT *,FROM_UNIXTIME(post_date) as pd FROM ncdat WHERE did ${dataimporter.request.minDid} AND did = ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) entity name=ncdat_wt query=SELECT webtable as wt FROM ncdat_wt WHERE featurecode='${ncdat.feature}' /entity /entity The relationship between features in ncdat and webtable in ncdat_wt (via featurecode) will be many-many. The wt field in schema.xml is set up as multivalued. It seems that ${ncdat.feature} is not being set. I saw a query happening on the server and it was SELECT webtable as wt FROM ncdat_wt WHERE featurecode='' - that last part is an empty string with single quotes around it. From what I can tell, there are no entries in ncdat where feature is blank. I've tried this with both a 1.5-dev checked out months ago (which we are using in production) and a 3.1-dev checked out today. Am I doing something wrong? Thanks, Shawn _ View message @ http://lucene.472066.n3.nabble.com/DIH-and-denormalizing-tp929151p929151.htm l To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-denormalizing-tp929151p929168.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Can solr return pretty text as the content?
Define Pretty text. 1)Are you talking about XML/JSON returned by SOLR is not pretty ? If yes, try indent=on with your query params 2)Or talking about data in certain field? Solr returns what you feed it. Look at your filters for that field type. Your filters/tokenizer may be stripping the formatting. From: JohnRodey [via Lucene] [mailto:ml-node+917912-920852633-124...@n3.nabble.com] Sent: Wednesday, June 23, 2010 1:19 PM To: caman Subject: Can solr return pretty text as the content? When I feed pretty text into solr for indexing from lucene and search for it, the content is always returned as one long line of text. Is there a way for solr to return the pretty formatted text to me? _ View message @ http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-conten t-tp917912p917912.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-content-tp917912p917966.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Stemmed and/or unStemmed field
Ahh,perfect. Will take a look. thanks From: Robert Muir [via Lucene] [mailto:ml-node+918302-232685105-124...@n3.nabble.com] Sent: Wednesday, June 23, 2010 4:17 PM To: caman Subject: Re: Stemmed and/or unStemmed field On Wed, Jun 23, 2010 at 3:58 PM, Vishal A. [hidden email]wrote: Here is what I am trying to do : Someone clicks on 'Comforters Pillows' , we would want the results to be filtered where title has keyword 'Comforter' or 'Pillows' but we have been getting results with word 'comfort' in the title. I assume it is because of stemming. What is the right way to handle this? from your examples, it seems a more lightweight stemmer might be an easy option: https://issues.apache.org/jira/browse/LUCENE-2503 -- Robert Muir [hidden email] _ View message @ http://lucene.472066.n3.nabble.com/Stemmed-and-or-unStemmed-field-tp917876p9 18302.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Stemmed-and-or-unStemmed-field-tp917876p918309.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: JSON formatted response from SOLR question....
Take a look at AjaxSolr source code: http://github.com/evolvingweb/ajax-solr This should give you exactly what you need. thanks From: Tod [via Lucene] [mailto:ml-node+789105-593266572-124...@n3.nabble.com] Sent: Monday, May 10, 2010 7:22 AM To: caman Subject: JSON formatted response from SOLR question I apologize, this is such a JSON/javascript question but I'm stuck and am not finding any resources that address this specifically. I'm doing a faceted search and getting back in my facet_counts.faceted_fields response an array of countries. I'm gathering the count of the array elements returned using this notation: rsp.facet_counts.facet_fields.country.length ... where rsp is the eval'ed JSON response from SOLR. From there I just loop through listing the individual country with its associated count. The problem I am having is trying to automate this to loop through any one of a number of facets contained in my JSON response, not just country. So instead of the above I would have something like: rsp.facet_counts.facet_fields.VARIABLE.length ... where VARIABLE would be the name of one of the facets passed into a javascript function to perform the loop. None of the javascript examples I can find seems to address this. Has anyone run into this? Is there a better list to ask this question? Thanks in advance. _ View message @ http://lucene.472066.n3.nabble.com/JSON-formatted-response-from-SOLR-questio n-tp789105p789105.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/JSON-formatted-response-from-SOLR-question-tp789105p789183.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DIH full-import memory issue
This may help: batchSize : The batchsize used in jdbc connection http://wiki.apache.org/solr/DataImportHandler#Configuring_DataSources From: Geek Gamer [via Lucene] [mailto:ml-node+809069-2054572211-124...@n3.nabble.com] Sent: Monday, May 10, 2010 9:42 PM To: caman Subject: DIH full-import memory issue Hi, I am facing issues with DIH fullimport, I have a database with 3 million records that will translate into index size of 6GB. When I am trying to do full import I am getting out of memory error like : INFO: Starting Full Import May 10, 2010 11:44:06 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties WARNING: Unable to read: dataimport.properties May 10, 2010 11:44:06 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX May 10, 2010 11:44:06 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/home/search/SOLR/solr/data/index,segFN=segments_1,version=127354 9043650,generation=1,filenames=[segments_1] May 10, 2010 11:44:06 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1273549043650 May 10, 2010 11:44:06 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity offer with URL: jdbc:mysql://domU-12-31-39-10-59-01.compute-1.internal/jounce1 May 10, 2010 11:44:07 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 301 Exception in thread Timer-1 java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.newValueIterator(HashMap.java:843) at java.util.HashMap$Values.iterator(HashMap.java:910) at org.mortbay.jetty.servlet.HashSessionManager.scavenge(HashSessionManager.jav a:180) at org.mortbay.jetty.servlet.HashSessionManager.access$000(HashSessionManager.j ava:36) at org.mortbay.jetty.servlet.HashSessionManager$1.run(HashSessionManager.java:1 44) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) May 10, 2010 11:54:54 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 424) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242 ) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja va:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389 ) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:1621) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1398) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2816) at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:467) at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2510) at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1746) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2135) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2536) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2465) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:734) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(J dbcDataSource.java:246) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.jav a:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.jav a:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityPro cessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProce ssor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProc essorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 357) ... 5 more May 10, 2010 11:54:54 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback May 10, 2010 11:54:54 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback I tried allocating 4 Gigs of memory to the VM but no luck. Are the records cached before indexing or streamed? any pointers to documents? thanks in anticipation, umar _ View message @ http://lucene.472066.n3.nabble.com/DIH-full-import-memory-issue-tp809069p809 069.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-full-import-memory-issue
RE: Embedded Solr search query
Why not write a custom request handler which can parse, split, execute and combine results to your queries? From: Eric Grobler [via Lucene] [mailto:ml-node+783150-1027691461-124...@n3.nabble.com] Sent: Friday, May 07, 2010 1:01 AM To: caman Subject: Embedded Solr search query Hello Solr community, When a user search on our web page, we need to run 3 related but different queries. For SEO reasons, we cannot use Ajax so at the moment we run 3 queries sequentially inside a PHP script. Allthough Solr is superfast, the extra network overhead can make the 3 queries 400ms slower than it needs to be. Thus my question is: Is there a way whereby you can send 1 query string to Solr with 2 or more embedded search queries, where Solr will split and execute the queries and return the results of the multiple searches in 1 go. In other words, instead of: - send searchQuery1 get result1 - send searchQuery2 get result2 ... you run: - send searchQuery1+searchQuery2 - get result1+result2 Thanks and Regards Eric _ View message @ http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315 0.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p783156.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Help indexing PDF files
Take a look at Tika library From: Leonardo Azize Martins [via Lucene] [mailto:ml-node+783677-325080270-124...@n3.nabble.com] Sent: Friday, May 07, 2010 6:37 AM To: caman Subject: Help indexing PDF files Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo _ View message @ http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.h tml To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p784092.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Embedded Solr search query
I would just look at SOLR source code and see how standard search handler and dismaxSearchHandler are implemented. Look under package 'org.apache.solr. http://hudson.zones.apache.org/hudson/job/Solr-trunk/clover/org/apache/solr /handler/pkg-summary.html handler' From: Eric Grobler [via Lucene] [mailto:ml-node+783212-2036924225-124...@n3.nabble.com] Sent: Friday, May 07, 2010 1:33 AM To: caman Subject: Re: Embedded Solr search query Hi Camen, I was hoping someone has done it already :-) I am also new to Solr/lucene, can you perhaps point me to a request handler example page? Thanks and Regards Eric On Fri, May 7, 2010 at 9:05 AM, caman [hidden email]wrote: Why not write a custom request handler which can parse, split, execute and combine results to your queries? From: Eric Grobler [via Lucene] [mailto:[hidden email][hidden email] ] Sent: Friday, May 07, 2010 1:01 AM To: caman Subject: Embedded Solr search query Hello Solr community, When a user search on our web page, we need to run 3 related but different queries. For SEO reasons, we cannot use Ajax so at the moment we run 3 queries sequentially inside a PHP script. Allthough Solr is superfast, the extra network overhead can make the 3 queries 400ms slower than it needs to be. Thus my question is: Is there a way whereby you can send 1 query string to Solr with 2 or more embedded search queries, where Solr will split and execute the queries and return the results of the multiple searches in 1 go. In other words, instead of: - send searchQuery1 get result1 - send searchQuery2 get result2 ... you run: - send searchQuery1+searchQuery2 - get result1+result2 Thanks and Regards Eric _ View message @ http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315 0.htmlhttp://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp78315 0p78315%0A0.html To start a new topic under Solr - User, email [hidden email][hidden email] To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315 6.html Sent from the Solr - User mailing list archive at Nabble.com. _ View message @ http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78321 2.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p784098.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: run on reboot on windows
Ahmed, Best is if you take a look at the documentation of jetty or tomcat. SOLR can run on any web container, it's up to you how you configure your web container to run Thanks Aboxy From: S Ahmed [via Lucene] [mailto:ml-node+772174-2097041460-124...@n3.nabble.com] Sent: Sunday, May 02, 2010 4:33 PM To: caman Subject: Re: run on reboot on windows By default it uses Jetty, so your saying Tomcat on windows server 2008/ IIS7 runs as a native windows service? On Sun, May 2, 2010 at 12:46 AM, Dave Searle [hidden email]wrote: Set tomcat6 service to auto start on boot (if running tomat) Sent from my iPhone On 2 May 2010, at 02:31, S Ahmed [hidden email] wrote: Hi, I'm trying to get Solr to run on windows, such that if it reboots the Solr service will be running. How can I do this? _ View message @ http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772174. html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772178.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: run on reboot on windows
Please take a look at this for tomcat http://tomcat.apache.org/tomcat-6.0-doc/setup.html#Windows and for jetty : http://docs.codehaus.org/display/JETTY/Win32Wrapper Hope this helps. From: S Ahmed [via Lucene] [mailto:ml-node+772182-2115387142-124...@n3.nabble.com] Sent: Sunday, May 02, 2010 4:44 PM To: caman Subject: Re: run on reboot on windows its not tomcat/jetty that's the issue, its how to get things to re-start on a windows server (tomcat and jetty don't run as native windows services) so I am a little confused..thanks. On Sun, May 2, 2010 at 7:37 PM, caman [hidden email]wrote: Ahmed, Best is if you take a look at the documentation of jetty or tomcat. SOLR can run on any web container, it's up to you how you configure your web container to run Thanks Aboxy From: S Ahmed [via Lucene] [mailto:[hidden email][hidden email] ] Sent: Sunday, May 02, 2010 4:33 PM To: caman Subject: Re: run on reboot on windows By default it uses Jetty, so your saying Tomcat on windows server 2008/ IIS7 runs as a native windows service? On Sun, May 2, 2010 at 12:46 AM, Dave Searle [hidden email]wrote: Set tomcat6 service to auto start on boot (if running tomat) Sent from my iPhone On 2 May 2010, at 02:31, S Ahmed [hidden email] wrote: Hi, I'm trying to get Solr to run on windows, such that if it reboots the Solr service will be running. How can I do this? _ View message @ http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772174. html To start a new topic under Solr - User, email [hidden email][hidden email] To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772178. html Sent from the Solr - User mailing list archive at Nabble.com. _ View message @ http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772182. html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772190.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Only one field in the result
I think you are looking for 'fl' param. From: pcmanprogrammeur [via Lucene] [mailto:ml-node+761818-821639313-124...@n3.nabble.com] Sent: Wednesday, April 28, 2010 12:38 AM To: caman Subject: Only one field in the result Hello, In my schema.xml, i have some fields stored and indexed. However, in a particular case, i would like to get only one field in my XML result ! Is it possible? Thanks for your help ! _ View message @ http://lucene.472066.n3.nabble.com/Only-one-field-in-the-result-tp761818p761 818.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Only-one-field-in-the-result-tp761818p761823.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problem with DataImportHandler and embedded entities
Are you storing the comment field or indexing it? field .. Stored=false ... will not appear in the document. From: Jason Rutherglen [via Lucene] [mailto:ml-node+740624-966329660-124...@n3.nabble.com] Sent: Wednesday, April 21, 2010 10:15 AM To: caman Subject: Problem with DataImportHandler and embedded entities I'm using the following data-config.xml with DataImportHandler. I've never used embedded entities before however I'm not seeing the comment show up in the document... I'm not sure what's up. dataConfig dataSource type=JdbcDataSource name=ch driver=com.mysql.jdbc.Driver url=jdbc:mysql://127.0.0.1:3306/ch batchSize=-1 user=ch password=ch_on_this/ document name=ch entity name=applications pk=id dataSource=ch query=SELECT id, updated FROM applications limit 10 entity name=comment dataSource=ch query=SELECT comment FROM ratings WHERE app = ${applications.id} field name=comment column=comment/ /entity /entity /document /dataConfig _ View message @ http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp 740624p740624.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp740624p740634.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problem with DataImportHandler and embedded entities
Hard to tell. Did you try putting the child entity part of main query with subquery. Don't think that is the issue though but worth a try Select id, updated,( SELECT comment FROM ratings WHERE app = appParent.id) as comment FROM applications appParent limit 10 From: Jason Rutherglen [via Lucene] [mailto:ml-node+740680-1955771337-124...@n3.nabble.com] Sent: Wednesday, April 21, 2010 10:33 AM To: caman Subject: Re: Problem with DataImportHandler and embedded entities Caman, I'm storing it. This is what I see when DataImportHandler verbose is turned on. While the field names don't match, I am seeing that sub-queries are being performed, data is being returned. It's just not making it into the document. lst name=verbose-output - lst name=entity:applications - lst name=document#1 str name=querySELECT id, updated FROM applications limit 10/str str name=time-taken0:0:0.9/str str--- row #1-/str int name=id407/int date name=updated2009-11-02T06:35:48Z/date str-/str - lst name=entity:added str name=querySELECT added FROM ratings WHERE app = 407/str str name=time-taken0:0:0.8/str /lst /lst On Wed, Apr 21, 2010 at 10:17 AM, caman [hidden email] http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740680i=0 wrote: Are you storing the comment field or indexing it? field .. Stored=false ... will not appear in the document. From: Jason Rutherglen [via Lucene] [mailto:[hidden email] http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740680i=1 ] Sent: Wednesday, April 21, 2010 10:15 AM To: caman Subject: Problem with DataImportHandler and embedded entities I'm using the following data-config.xml with DataImportHandler. I've never used embedded entities before however I'm not seeing the comment show up in the document... I'm not sure what's up. dataConfig dataSource type=JdbcDataSource name=ch driver=com.mysql.jdbc.Driver url=jdbc:mysql://127.0.0.1:3306/ch batchSize=-1 user=ch password=ch_on_this/ document name=ch entity name=applications pk=id dataSource=ch query=SELECT id, updated FROM applications limit 10 entity name=comment dataSource=ch query=SELECT comment FROM ratings WHERE app = ${applications.id} field name=comment column=comment/ /entity /entity /document /dataConfig _ View message @ http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp 740624p740624.html To start a new topic under Solr - User, email [hidden email] http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740680i=2 To unsubscribe from Solr - User, click (link removed) yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp 740624p740634.html Sent from the Solr - User mailing list archive at Nabble.com. _ View message @ http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp 740624p740680.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://n3.nabble.com/Problem-with-DataImportHandler-and-embedded-entities-tp740624p740708.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problem with DataImportHandler and embedded entities
What is the unique id set in schema? From: Jason Rutherglen [via Lucene] [mailto:ml-node+740744-1209892083-124...@n3.nabble.com] Sent: Wednesday, April 21, 2010 10:56 AM To: caman Subject: Re: Problem with DataImportHandler and embedded entities The other issue now is full-import is only importing 1 document, and that's all. Despite no limits etc... Odd... On Wed, Apr 21, 2010 at 10:48 AM, Jason Rutherglen [hidden email] http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740744i=0 wrote: I think it's working, it was the lack of the seemingly innocuous sub-entity pk=application_id. After adding that I'm seeing some data returned. On Wed, Apr 21, 2010 at 10:44 AM, Jason Rutherglen [hidden email] http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740744i=1 wrote: Something's off, for each row, it's performing the following 5 sub-queries. Weird. Below is the updated data-config.xml (compared to the original email I changed the field from comment to added). lst name=document#5 str--- row #1-/str int name=id876/int date name=updated2009-11-02T06:36:28Z/date str-/str - lst name=entity:added str name=querySELECT added FROM ratings WHERE app = 876/str str name=querySELECT added FROM ratings WHERE app = 876/str str name=querySELECT added FROM ratings WHERE app = 876/str str name=querySELECT added FROM ratings WHERE app = 876/str str name=querySELECT added FROM ratings WHERE app = 876/str str name=time-taken0:0:0.0/str str name=time-taken0:0:0.0/str str name=time-taken0:0:0.0/str str name=time-taken0:0:0.0/str str name=time-taken0:0:0.0/str str--- row #1-/str date name=added2010-01-26T18:08:53Z/date str-/str str--- row #2-/str date name=added2010-01-27T20:16:20Z/date str-/str str--- row #3-/str date name=added2010-01-29T00:02:40Z/date str-/str str--- row #4-/str date name=added2010-02-01T16:59:42Z/date str-/str /lst /lst dataConfig dataSource type=JdbcDataSource name=ch driver=com.mysql.jdbc.Driver url=jdbc:mysql://127.0.0.1:3306/ch batchSize=-1 user=ch password=ch_on_this/ document name=ch entity name=applications pk=id dataSource=ch query=SELECT id, updated FROM applications limit 10 entity name=comment dataSource=ch query=SELECT * FROM ratings WHERE app = ${applications.id} field name=comment column=comment/ field name=added column=added/ /entity /entity /document /dataConfig On Wed, Apr 21, 2010 at 10:41 AM, caman [hidden email] http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740744i=2 wrote: Hard to tell. Did you try putting the child entity part of main query with subquery. Don't think that is the issue though but worth a try Select id, updated,( SELECT comment FROM ratings WHERE app = appParent.id) as comment FROM applications appParent limit 10 From: Jason Rutherglen [via Lucene] [mailto:[hidden email] http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740744i=3 ] Sent: Wednesday, April 21, 2010 10:33 AM To: caman Subject: Re: Problem with DataImportHandler and embedded entities Caman, I'm storing it. This is what I see when DataImportHandler verbose is turned on. While the field names don't match, I am seeing that sub-queries are being performed, data is being returned. It's just not making it into the document. lst name=verbose-output - lst name=entity:applications - lst name=document#1 str name=querySELECT id, updated FROM applications limit 10/str str name=time-taken0:0:0.9/str str--- row #1-/str int name=id407/int date name=updated2009-11-02T06:35:48Z/date str-/str - lst name=entity:added str name=querySELECT added FROM ratings WHERE app = 407/str str name=time-taken0:0:0.8/str /lst /lst On Wed, Apr 21, 2010 at 10:17 AM, caman [hidden email] http://n3.nabble.com/user/SendEmail.jtp?type=node http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740680i=0 node=740680i=0 wrote: Are you storing the comment field or indexing it? field .. Stored=false ... will not appear in the document. From: Jason Rutherglen [via Lucene] [mailto:[hidden email] http://n3.nabble.com/user/SendEmail.jtp?type=node http://n3.nabble.com/user/SendEmail.jtp?type=nodenode=740680i=1 node=740680i=1 ] Sent: Wednesday, April 21, 2010 10:15 AM To: caman Subject: Problem with DataImportHandler and embedded entities I'm using the following data-config.xml with DataImportHandler. I've never used
RE: DIH dataimport.properties with
Shawn, Is this your custom implementation? For a delta-import, minDid comes from the maxDid value stored after the last successful import. Are you updating the dataTable after the import was successful? How did you handle this? I have similar scenario and your approach will work for my use-case as well thanks From: Shawn Heisey-4 [via Lucene] [mailto:ml-node+738653-1765413222-124...@n3.nabble.com] Sent: Tuesday, April 20, 2010 4:35 PM To: caman Subject: Re: DIH dataimport.properties with Michael, The SolrEntityProcessor looks very intriguing, but it won't work with the released 1.4 version. If that's OK with you and it looks like it'll do what you want, feel free to ignore the rest of this. I'm also using MySQL as an import source for Solr. I was unable to use the last_index_time because my database doesn't have a field I can match against it. I believe you can use something similar to the method that I came up with. The point of this post is to show you how to inject values from outside Solr into a DIH request rather than have Solr provide the milestone that indicates new content. Here's a simplified version of my URL template and entity configuration in data-config.xml. The did field in my database is an autoincrement BIGINT serving as my private key, but something similar could likely be cooked up with timestamps too: http://HOST:PORT/solr/CORE/dataimport?command=COMMAND http://HOST:PORT/solr/CORE/dataimport?command=COMMANDdataTable=DATATABLEm inDid=MINDIDmaxDid=MAXDID dataTable=DATATABLEminDid=MINDIDmaxDid=MAXDID entity name=dataTable pk=did query=SELECT * FROM ${dataimporter.request.dataTable} WHERE did ${dataimporter.request.minDid} AND did = ${dataimporter.request.maxDid} deltaQuery=SELECT MAX(did) FROM ${dataimporter.request.dataTable} deltaImportQuery=SELECT * FROM ${dataimporter.request.dataTable} WHERE did ${dataimporter.request.minDid} AND did = ${dataimporter.request.maxDid} /entity If I am doing a full-import, I set minDid to zero and maxDid to the highest value in the database. For a delta-import, minDid comes from the maxDid value stored after the last successful import. The deltaQuery is required, but in my case, is a throw-away query that just tells Solr the delta-import needs to be run. My query and deltaImportQuery are identical, though yours may not be. Good luck, no matter how you choose to approach this. Shawn On 4/18/2010 9:02 PM, Michael Tibben wrote: I don't really understand how this will help. Can you elaborate ? Do you mean that the last_index_time can be imported from somewhere outside solr? But I need to be able to *set* what last_index_time is stored in dataimport.properties, not get properties from somewhere else On 18/04/10 10:02, Lance Norskog wrote: The SolrEntityProcessor allows you to query a Solr instance and use the results as DIH properties. You would have to create your own regular query to do the delta-import instead of using the delta-import feature. _ View message @ http://n3.nabble.com/DIH-dataimport-properties-with-tp722924p738653.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://n3.nabble.com/DIH-dataimport-properties-with-tp722924p738949.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: dismax vs the standard query handlers
Your answers are here. Wiki describes it pretty well http://wiki.apache.org/solr/DisMaxRequestHandler From: Sandhya Agarwal [via Lucene] [mailto:ml-node+739071-961078546-124...@n3.nabble.com] Sent: Tuesday, April 20, 2010 9:40 PM To: caman Subject: dismax vs the standard query handlers Hello, What are the advantages of using the “dismax” query handler vs the “standard” query handler. As I understand, “dismax” queries are parsed differently and provide more flexibility w.r.t score boosting etc. Do we have any more reasons ? Thanks, Sandhya _ View message @ http://n3.nabble.com/dismax-vs-the-standard-query-handlers-tp739071p739071.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) here. -- View this message in context: http://n3.nabble.com/dismax-vs-the-standard-query-handlers-tp739071p739081.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DIH questions
I had similar requirement and was not able to figure out at that time. Was able to use some of the SQL Magic to create concatenated string for sub-entities and then process them in transformer which may or may not work for your use-case. Just a thought. Mention specifics here please and I can see if anything can be done Thanks James http://www.click2money.com From: Blargy [via Lucene] [mailto:ml-node+722651-1893075853-124...@n3.nabble.com] Sent: Thursday, April 15, 2010 4:28 PM To: caman Subject: Re: DIH questions Is there anyway that a sub-entity can delete/rewrite fields from the document? Is there anyway sub-entities can get access to what the documents current value for a current field? _ View message @ http://n3.nabble.com/DIH-questions-tp719892p722651.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://n3.nabble.com/DIH-questions-tp719892p722676.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: CopyField
As far as I know, No. But why don't you keep another column 'source_final' and you populate it with value from sourc1 or sourc2 depending on what has value(Look at transformer, may be script transformer) . then in schema.xml copyField source=source_final dest=dest/ Thanks James http://www.click2money.com From: Blargy [via Lucene] [mailto:ml-node+722785-1511121936-124...@n3.nabble.com] Sent: Thursday, April 15, 2010 5:54 PM To: caman Subject: CopyField Is there anyway to instruct copy field overwrite an existing field, or only accept the first one? copyField source=source1 dest=dest/ copyField source=source2 dest=dest/ Basically I'm want to copy source1 to dest (if it exists). If source1 doesnt exist then copy source2 into dest. Is this possible? _ View message @ http://n3.nabble.com/CopyField-tp722785p722785.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) yc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://n3.nabble.com/CopyField-tp722785p722800.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic categorization transactional data
@Grant Less than a minute. If we go with the meta-retrieval from the index, we will have to keep the index updated down to seconds. But that may not scale well. Probably a hybrid approach? I will look into classifier. thanks Grant Ingersoll-6 wrote: On Mar 18, 2010, at 2:44 PM, caman wrote: 1) Took care of the first one by Transformer. This is often also something done by a classifier that is trained to deal with all the statistical variations in your text. Tools like Weka, Mahout, OpenNLP, etc. can be applied here. 2) Any input on 2 please? I need to store # of views and popularity with each document and that can change pretty often. Recommended to use database or can this be updated to SOLr directly? My issue with DB is that with every SOLR search hit, will have to do DB hit to retrieve meta-data. Define often, please. Less than a minute or more than a minute? Any input is appreciated please caman wrote: Hello all, Please see below.any help much appreciated. 1) Extracting data out of a text field to assign a category for certain configured words. e.g. If the text is Google does it again with Android and If 'Google' and 'Android' are the configured words, I want to b able to assign the article to tags 'Google' and 'Android' and 'Technical' . Can I do this with a custom filter during analysis? Similarly setting up categories for each article based on keywords in the text. 2) How about using SOLR as transactional datastore? Need to keep track of rating for each document. Would 'ExternalFileField' be good choice for this use-case? Thanks in advance. -- View this message in context: http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27949786.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27970656.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic categorization transactional data
1) Took care of the first one by Transformer. 2) Any input on 2 please? I need to store # of views and popularity with each document and that can change pretty often. Recommended to use database or can this be updated to SOLr directly? My issue with DB is that with every SOLR search hit, will have to do DB hit to retrieve meta-data. Any input id appreciated please caman wrote: Hello all, Please see below.any help much appreciated. 1) Extracting data out of a text field to assign a category for certain configured words. e.g. If the text is Google does it again with Android and If 'Google' and 'Android' are the configured words, I want to b able to assign the article to tags 'Google' and 'Android' and 'Technical' . Can I do this with a custom filter during analysis? Similarly setting up categories for each article based on keywords in the text. 2) How about using SOLR as transactional datastore? Need to keep track of rating for each document. Would 'ExternalFileField' be good choice for this use-case? Thanks in advance. -- View this message in context: http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27949786.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic categorization transactional data
David, Much appreciated. This gives me enough to work with. I missed one important point. Our data changes pretty frequently which mean we may be running deltas every 5-10 minutes. in-memory should work thanks David Smiley @MITRE.org wrote: You'll probably want to influence your relevancy on this popularity number that is changing often. ExternalFileField looks like a possibility though I haven't used it. Another would be using an in-memory cache which stores all popularity numbers for any data that has its popularity updated since the last index update (say since the previous night). On second thought, it may need to be absolutely all of them but these are just #s so no big deal? You could then customize a ValueSource subclass which gets data from this fast in-memory up to date source. See FileFloatSource for an example that uses a file instead of an in-memory structure. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Mar 18, 2010, at 2:44 PM, caman wrote: 2) Any input on 2 please? I need to store # of views and popularity with each document and that can change pretty often. Recommended to use database or can this be updated to SOLr directly? My issue with DB is that with every SOLR search hit, will have to do DB hit to retrieve meta-data. Any input is appreciated please -- View this message in context: http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27950036.html Sent from the Solr - User mailing list archive at Nabble.com.
dynamic categorization transactional data
Hello all, Please see below.any help much appreciated. 1) Extracting data out of a text field to assign a category for certain configured words. e.g. If the text is Google does it again with Android and If 'Google' and 'Android' are the configured words, I want to b able to assign the article to tags 'Google' and 'Android' and 'Technical' . Can I do this with a custom filter during analysis? Similarly setting up categories for each article based on keywords in the text. 2) How about using SOLR as transactional datastore? Need to keep track of rating for each document. Would 'ExternalFileField' be good choice for this use-case? Thanks in advance. -- View this message in context: http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27790233.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR Index or database
Hello All, Just struggling with a thought where SOLR or a database would be good option for me.Here are my requirements. We index about 600+ news/blogs into out system. Only information we store locally is the title,link and article snippet.We are able to index all these sources into SOLR index and it works perfectly. This is where is gets tricky: We need to store certain meta information as well. e.g. 1. Rating/popularity of article 2. Sharing of the articles between users 3. How may times articles is viewed. 4. Comments on each article. So far, we are deciding to store meta-information in the database and link this data with the a document in the index. When user opens the page, results are combined from index and the database to render the view. Any reservation on using the above architecture? Is SOLR right fit in this case? We do need full text search so SOLR is no-brainer imho but would love to hear community view. Any feedback appreciated thanks -- View this message in context: http://old.nabble.com/SOLR-Index-or-database-tp27772362p27772362.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing an oracle warehouse table
Thanks. I will give this a shot. Alexey-34 wrote: What would be the right way to point out which field contains the term searched for. I would use highlighting for all of these fields and then post process Solr response in order to check highlighting tags. But I don't have so many fields usually and don't know if it's possible to configure Solr to highlight fields using '*' as dynamic fields. On Wed, Feb 3, 2010 at 2:43 AM, caman aboxfortheotherst...@gmail.com wrote: Thanks all. I am on track. Another question: What would be the right way to point out which field contains the term searched for. e.g. If I search for SOLR and if the term exist in field788 for a document, how do I pinpoint that which field has the term. I copied all the fields in field called 'body' which makes searching easier but would be nice to show the field which has that exact term. thanks caman wrote: Hello all, hope someone can point me to right direction. I am trying to index an oracle warehouse table(TableA) with 850 columns. Out of the structure about 800 fields are CLOBs and are good candidate to enable full-text searching. Also have few columns which has relational link to other tables. I am clean on how to create a root entity and then pull data from other relational link as child entities. Most columns in TableA are named as field1,field2...field800. Now my question is how to organize the schema efficiently: First option: if my query is 'select * from TableA', Do I define field name=attr1 column=FIELD1 / for each of those 800 columns? Seems cumbersome. May be can write a script to generate XML instead of handwriting both in data-config.xml and schema.xml. OR Dont define any field name=attr1 column=FIELD1 / so that column in SOLR will be same as in the database table. But questions are 1)How do I define unique field in this scenario? 2) How to copy all the text fields to a common field for easy searching? Any helpful is appreciated. Please feel free to suggest any alternative way. Thanks -- View this message in context: http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27429352.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27439611.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing an oracle warehouse table
Anyone please? caman wrote: Hello all, hope someone can point me to right direction. I am trying to index an oracle warehouse table(TableA) with 850 columns. Out of the structure about 800 fields are CLOBs and are good candidate to enable full-text searching. Also have few columns which has relational link to other tables. I am clean on how to create a root entity and then pull data from other relational link as child entities. Most columns in TableA are named as field1,field2...field800. Now my question is how to organize the schema efficiently: First option: if my query is 'select * from TableA', Do I define field name=attr1 column=FIELD1 / for each of those 800 columns? Seems cumbersome. May be can write a script to generate XML instead of handwriting both in data-config.xml and schema.xml. OR Dont define any field name=attr1 column=FIELD1 / so that column in SOLR will be same as in the database table. But questions are 1)How do I define unique field in this scenario? 2) How to copy all the text fields to a common field for easy searching? Any helpful is appreciated. Please feel free to suggest any alternative way. Thanks -- View this message in context: http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27424327.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing an oracle warehouse table
Ron, Much appreciated. Search requirements are as : 1) Enable search/faceting on author,service,datetime. 2) Enable full text search on all text column which are named as col1 col800+ -- total of more than 800 columns. Here is what I did so far: Defined entities in db schema in db-config.xml without any column definition in the file, which basically mean is that I want to keep fields name same as in the database. Now in schema.xml : I have field tag for each database field retrieved with the SQL queries in db-config.xml, which are more than 800+ (did not write this by hand,wrote a groovy script to generate this for me from the database) Multi-valued : Yes, this is what I am using to copy all the fields col1...col800+ to one multi-valued field. That fileld is set as default for search. You are right about going to original data source but then had to take a different approach. Original source is all XML files which do not follow a standard schema for the structure. I hope what I mentioned above makes sense.appreciate the response. Ron Chan wrote: it depends on what the search requirements are, so without knowing the details here are some vague pointers you may only need to have fields for the columns you are going to be categorizing and searching on, this may be a small subset of the 800 and the rest can go into one large field to fulfil the full text search another thing to look into is the multi value fields, this can sometimes replace the one-to-many relationships in database also it may sometimes be worth while going to the original data source rather than the warehouse table, as this is already flattened and denormalised, the flattening and denormalizing will most likely be done a different way when solr indexing database type data, highly likely you will end up with less rows and less columns in the solr index, as each solr document can be seen as multi-dimensional - Original Message - From: caman aboxfortheotherst...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, 2 February, 2010 1:23:01 AM Subject: Indexing an oracle warehouse table Hello all, hope someone can point me to right direction. I am trying to index an oracle warehouse table(TableA) with 850 columns. Out of the structure about 800 fields are CLOBs and are good candidate to enable full-text searching. Also have few columns which has relational link to other tables. I am clean on how to create a root entity and then pull data from other relational link as child entities. Most columns in TableA are named as field1,field2...field800. Now my question is how to organize the schema efficiently: First option: if my query is 'select * from TableA', Do I define field name=attr1 column=FIELD1 / for each of those 800 columns? Seems cumbersome. May be can write a script to generate XML instead of handwriting both in data-config.xml and schema.xml. OR Dont define any field name=attr1 column=FIELD1 / so that column in SOLR will be same as in the database table. But questions are 1)How do I define unique field in this scenario? 2) How to copy all the text fields to a common field for easy searching? Any helpful is appreciated. Please feel free to suggest any alternative way. Thanks -- View this message in context: http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27414263.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27425156.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing a oracle warehouse table
Alexey, This is exactly what I was looking for. Thank you thank you thank you .. Should have read the documentation a little better. Much appreciated. Alexey-34 wrote: Dont define any field name=attr1 column=FIELD1 / so that column in SOLR will be same as in the database table. Correct You can define dynamic field dynamicField name=field* type=text indexed=true stored=true/ ( see http://wiki.apache.org/solr/SchemaXml#Dynamic_fields ) 1)How do I define unique field in this scenario? You can create primary key into database or generate it directly in Solr ( see UUID techniques http://wiki.apache.org/solr/UniqueKey ) 2) How to copy all the text fields to a common field for easy searching? copyField source=field* dest=field/ ( see http://wiki.apache.org/solr/SchemaXml#Copy_Fields ) On Tue, Feb 2, 2010 at 4:22 AM, caman aboxfortheotherst...@gmail.com wrote: Hello all, hope someone can point me to right direction. I am trying to index an oracle warehouse table(TableA) with 850 columns. Out of the structure about 800 fields are CLOBs and are good candidate to enable full-text searching. Also have few columns which has relational link to other tables. I am clean on how to create a root entity and then pull data from other relational link as child entities. Most columns in TableA are named as field1,field2...field800. Now my question is how to organize the schema efficiently: First option: if my query is 'select * from TableA', Do I define field name=attr1 column=FIELD1 / for each of those 800 columns? Seems cumbersome. May be can write a script to generate XML instead of handwriting both in data-config.xml and schema.xml. OR Dont define any field name=attr1 column=FIELD1 / so that column in SOLR will be same as in the database table. But questions are 1)How do I define unique field in this scenario? 2) How to copy all the text fields to a common field for easy searching? Any helpful is appreciated. Please feel free to suggest any alternative way. Thanks -- View this message in context: http://old.nabble.com/Indexing-a-oracle-warehouse-table-tp27414263p27414263.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27426206.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing an oracle warehouse table
Thanks all. I am on track. Another question: What would be the right way to point out which field contains the term searched for. e.g. If I search for SOLR and if the term exist in field788 for a document, how do I pinpoint that which field has the term. I copied all the fields in field called 'body' which makes searching easier but would be nice to show the field which has that exact term. thanks caman wrote: Hello all, hope someone can point me to right direction. I am trying to index an oracle warehouse table(TableA) with 850 columns. Out of the structure about 800 fields are CLOBs and are good candidate to enable full-text searching. Also have few columns which has relational link to other tables. I am clean on how to create a root entity and then pull data from other relational link as child entities. Most columns in TableA are named as field1,field2...field800. Now my question is how to organize the schema efficiently: First option: if my query is 'select * from TableA', Do I define field name=attr1 column=FIELD1 / for each of those 800 columns? Seems cumbersome. May be can write a script to generate XML instead of handwriting both in data-config.xml and schema.xml. OR Dont define any field name=attr1 column=FIELD1 / so that column in SOLR will be same as in the database table. But questions are 1)How do I define unique field in this scenario? 2) How to copy all the text fields to a common field for easy searching? Any helpful is appreciated. Please feel free to suggest any alternative way. Thanks -- View this message in context: http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27429352.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing a oracle warehouse table
Hello all, hope someone can point me to right direction. I am trying to index an oracle warehouse table(TableA) with 850 columns. Out of the structure about 800 fields are CLOBs and are good candidate to enable full-text searching. Also have few columns which has relational link to other tables. I am clean on how to create a root entity and then pull data from other relational link as child entities. Most columns in TableA are named as field1,field2...field800. Now my question is how to organize the schema efficiently: First option: if my query is 'select * from TableA', Do I define field name=attr1 column=FIELD1 / for each of those 800 columns? Seems cumbersome. May be can write a script to generate XML instead of handwriting both in data-config.xml and schema.xml. OR Dont define any field name=attr1 column=FIELD1 / so that column in SOLR will be same as in the database table. But questions are 1)How do I define unique field in this scenario? 2) How to copy all the text fields to a common field for easy searching? Any helpful is appreciated. Please feel free to suggest any alternative way. Thanks -- View this message in context: http://old.nabble.com/Indexing-a-oracle-warehouse-table-tp27414263p27414263.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document model suggestion
Lance, Makes sense. We are playing around with keeping the security model completely out of Index. We will filter out results before data display based on access rights. But approach you suggested is not ruled out completely. thanks Lance Norskog-2 wrote: Yes, you would have 'role' as a multi-valued field. When you add someone to a role, you don't have to re-index. That's all. On Thu, Dec 17, 2009 at 12:55 PM, caman aboxfortheotherst...@gmail.com wrote: Are you suggesting that roles should be maintained in the index? We do manage out authentication based on roles but at granular level, user rights play a big role as well. I know we need to compromise, just need to find a balance. Thanks Lance Norskog-2 wrote: Role-based authentication is one level of sophistication up from user-based authentication. Users can have different roles, and authentication goes against roles. Documents with multiple viewers would be assigned special roles. All users would also have their own matching role. On Tue, Dec 15, 2009 at 10:01 AM, caman aboxfortheotherst...@gmail.com wrote: Erick, I know what you mean. Wonder if it is actually cleaner to keep the authorization model out of solr index and filter the data at client side based on the user access rights. Thanks all for help. Erick Erickson wrote: Yes, that should work. One hard part is what happens if your authorization model has groups, especially when membership in those groups changes. Then you have to go in and update all the affected docs. FWIW Erick On Tue, Dec 15, 2009 at 12:24 PM, caman aboxfortheotherst...@gmail.comwrote: Shalin, Thanks. much appreciated. Question about: That is usually what people do. The hard part is when some documents are shared across multiple users. What do you recommend when documents has to be shared across multiple users? Can't I just multivalue a field with all the users who has access to the document? thanks Shalin Shekhar Mangar wrote: On Tue, Dec 15, 2009 at 7:26 AM, caman aboxfortheotherst...@gmail.comwrote: Appreciate any guidance here please. Have a master-child table between two tables 'TA' and 'TB' where form is the master table. Any row in TA can have multiple row in TB. e.g. row in TA id---name 1---tweets TB: id|ta_id|field0|field1|field2.|field20|created_by 1|1|value1|value2|value2.|value20|User1 snip/ This works fine and index the data.But all the data for a row in TA gets combined in one document(not desirable). I am not clear on how to 1) separate a particular row from the search results. e.g. If I search for 'Android' and there are 5 rows for android in TB for a particular instance in TA, would like to show them separately to user and if the user click on any of the row,point them to an attached URL in the application. Should a separate index be maintained for each row in TB?TB can have millions of rows. The easy answer is that whatever you want to show as results should be the thing that you index as documents. So if you want to show tweets as results, one document should represent one tweet. Solr is different from relational databases and you should not think about both the same way. De-normalization is the way to go in Solr. 2) How to protect one user's data from another user. I guess I can keep a column for a user_id in the schema and append that filter automatically when I search through SOLR. Any better alternatives? That is usually what people do. The hard part is when some documents are shared across multiple users. Bear with me if these are newbie questions please, this is my first day with SOLR. No problem. Welcome to Solr! -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26799016.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26834798.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26881664.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document model suggestion
Are you suggesting that roles should be maintained in the index? We do manage out authentication based on roles but at granular level, user rights play a big role as well. I know we need to compromise, just need to find a balance. Thanks Lance Norskog-2 wrote: Role-based authentication is one level of sophistication up from user-based authentication. Users can have different roles, and authentication goes against roles. Documents with multiple viewers would be assigned special roles. All users would also have their own matching role. On Tue, Dec 15, 2009 at 10:01 AM, caman aboxfortheotherst...@gmail.com wrote: Erick, I know what you mean. Wonder if it is actually cleaner to keep the authorization model out of solr index and filter the data at client side based on the user access rights. Thanks all for help. Erick Erickson wrote: Yes, that should work. One hard part is what happens if your authorization model has groups, especially when membership in those groups changes. Then you have to go in and update all the affected docs. FWIW Erick On Tue, Dec 15, 2009 at 12:24 PM, caman aboxfortheotherst...@gmail.comwrote: Shalin, Thanks. much appreciated. Question about: That is usually what people do. The hard part is when some documents are shared across multiple users. What do you recommend when documents has to be shared across multiple users? Can't I just multivalue a field with all the users who has access to the document? thanks Shalin Shekhar Mangar wrote: On Tue, Dec 15, 2009 at 7:26 AM, caman aboxfortheotherst...@gmail.comwrote: Appreciate any guidance here please. Have a master-child table between two tables 'TA' and 'TB' where form is the master table. Any row in TA can have multiple row in TB. e.g. row in TA id---name 1---tweets TB: id|ta_id|field0|field1|field2.|field20|created_by 1|1|value1|value2|value2.|value20|User1 snip/ This works fine and index the data.But all the data for a row in TA gets combined in one document(not desirable). I am not clear on how to 1) separate a particular row from the search results. e.g. If I search for 'Android' and there are 5 rows for android in TB for a particular instance in TA, would like to show them separately to user and if the user click on any of the row,point them to an attached URL in the application. Should a separate index be maintained for each row in TB?TB can have millions of rows. The easy answer is that whatever you want to show as results should be the thing that you index as documents. So if you want to show tweets as results, one document should represent one tweet. Solr is different from relational databases and you should not think about both the same way. De-normalization is the way to go in Solr. 2) How to protect one user's data from another user. I guess I can keep a column for a user_id in the schema and append that filter automatically when I search through SOLR. Any better alternatives? That is usually what people do. The hard part is when some documents are shared across multiple users. Bear with me if these are newbie questions please, this is my first day with SOLR. No problem. Welcome to Solr! -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26799016.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26834798.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document model suggestion
Shalin, Thanks. much appreciated. Question about: That is usually what people do. The hard part is when some documents are shared across multiple users. What do you recommend when documents has to be shared across multiple users? Can't I just multivalue a field with all the users who has access to the document? thanks Shalin Shekhar Mangar wrote: On Tue, Dec 15, 2009 at 7:26 AM, caman aboxfortheotherst...@gmail.comwrote: Appreciate any guidance here please. Have a master-child table between two tables 'TA' and 'TB' where form is the master table. Any row in TA can have multiple row in TB. e.g. row in TA id---name 1---tweets TB: id|ta_id|field0|field1|field2.|field20|created_by 1|1|value1|value2|value2.|value20|User1 snip/ This works fine and index the data.But all the data for a row in TA gets combined in one document(not desirable). I am not clear on how to 1) separate a particular row from the search results. e.g. If I search for 'Android' and there are 5 rows for android in TB for a particular instance in TA, would like to show them separately to user and if the user click on any of the row,point them to an attached URL in the application. Should a separate index be maintained for each row in TB?TB can have millions of rows. The easy answer is that whatever you want to show as results should be the thing that you index as documents. So if you want to show tweets as results, one document should represent one tweet. Solr is different from relational databases and you should not think about both the same way. De-normalization is the way to go in Solr. 2) How to protect one user's data from another user. I guess I can keep a column for a user_id in the schema and append that filter automatically when I search through SOLR. Any better alternatives? That is usually what people do. The hard part is when some documents are shared across multiple users. Bear with me if these are newbie questions please, this is my first day with SOLR. No problem. Welcome to Solr! -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document model suggestion
Erick, I know what you mean. Wonder if it is actually cleaner to keep the authorization model out of solr index and filter the data at client side based on the user access rights. Thanks all for help. Erick Erickson wrote: Yes, that should work. One hard part is what happens if your authorization model has groups, especially when membership in those groups changes. Then you have to go in and update all the affected docs. FWIW Erick On Tue, Dec 15, 2009 at 12:24 PM, caman aboxfortheotherst...@gmail.comwrote: Shalin, Thanks. much appreciated. Question about: That is usually what people do. The hard part is when some documents are shared across multiple users. What do you recommend when documents has to be shared across multiple users? Can't I just multivalue a field with all the users who has access to the document? thanks Shalin Shekhar Mangar wrote: On Tue, Dec 15, 2009 at 7:26 AM, caman aboxfortheotherst...@gmail.comwrote: Appreciate any guidance here please. Have a master-child table between two tables 'TA' and 'TB' where form is the master table. Any row in TA can have multiple row in TB. e.g. row in TA id---name 1---tweets TB: id|ta_id|field0|field1|field2.|field20|created_by 1|1|value1|value2|value2.|value20|User1 snip/ This works fine and index the data.But all the data for a row in TA gets combined in one document(not desirable). I am not clear on how to 1) separate a particular row from the search results. e.g. If I search for 'Android' and there are 5 rows for android in TB for a particular instance in TA, would like to show them separately to user and if the user click on any of the row,point them to an attached URL in the application. Should a separate index be maintained for each row in TB?TB can have millions of rows. The easy answer is that whatever you want to show as results should be the thing that you index as documents. So if you want to show tweets as results, one document should represent one tweet. Solr is different from relational databases and you should not think about both the same way. De-normalization is the way to go in Solr. 2) How to protect one user's data from another user. I guess I can keep a column for a user_id in the schema and append that filter automatically when I search through SOLR. Any better alternatives? That is usually what people do. The hard part is when some documents are shared across multiple users. Bear with me if these are newbie questions please, this is my first day with SOLR. No problem. Welcome to Solr! -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26799016.html Sent from the Solr - User mailing list archive at Nabble.com.
Document model suggestion
Appreciate any guidance here please. Have a master-child table between two tables 'TA' and 'TB' where form is the master table. Any row in TA can have multiple row in TB. e.g. row in TA id---name 1---tweets TB: id|ta_id|field0|field1|field2.|field20|created_by 1|1|value1|value2|value2.|value20|User1 This is how I am trying to model this in SOLR document entity name=TA query=select * from TA deltaQuery=select id from TA where (last_updated '${dataimporter.last_index_time}' or date_created '${dataimporter.last_index_time}') deltaImportQuery=select * from TA where ID='${dataimporter.delta.id}' field column=name name=name / field column=name name=nameSort / field column=name name=alphaNameSort / entity name=TB query=select id,field0,field1,field2,field3,field4,ta_id from TB where ta_id='${TA.id}' deltaQuery=select ta_id from TB where (last_updated '${dataimporter.last_index_time}' or date_created '${dataimporter.last_index_time}') parentDeltaQuery=select id from TA where id=${TB.ta_id} field name=dataId column=id / field name=attr0 column=field0 / field name=attr1 column=field1 / field name=attr2 column=field2 / field name=attr3 column=field3 / field name=attr4 column=field4 / /entity /entity /document This works fine and index the data.But all the data for a row in TA gets combined in one document(not desirable). I am not clear on how to 1) separate a particular row from the search results. e.g. If I search for 'Android' and there are 5 rows for android in TB for a particular instance in TA, would like to show them separately to user and if the user click on any of the row,point them to an attached URL in the application. Should a separate index be maintained for each row in TB?TB can have millions of rows. 2) How to protect one user's data from another user. I guess I can keep a column for a user_id in the schema and append that filter automatically when I search through SOLR. Any better alternatives? Bear with me if these are newbie questions please, this is my first day with SOLR. Thanks -- View this message in context: http://old.nabble.com/Document-model-suggestion-tp26784346p26784346.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: An issue with commit/ using Solr Cell and multiple files
You are right. I got into same thing. Windows curl gave me error but cygwin ran without any issues. thanks Lance Norskog-2 wrote: It is a windows problem (or curl, whatever). This works with double-quotes. C:\Users\work\Downloads\cygwin\home\work\curl-7.19.4\curl.exe http://localhost:8983/solr/update --data-binary commit/ -H Content-type:text/xml; charset=utf-8 Single-quotes inside double-quotes should work: commit waitFlush='false'/ On Tue, Sep 8, 2009 at 11:59 AM, caman aboxfortheotherst...@gmail.comwrote: seems to be an error with curl Kevin Miller-17 wrote: I am getting the same error message. I am running Solr on a Windows machine. Is the commit command a curl command or is it a Solr command? Kevin Miller Web Services -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 08, 2009 12:52 PM To: solr-user@lucene.apache.org Subject: Re: An issue with commit/ using Solr Cell and multiple files solr/examples/exampledocs/post.sh does: curl $URL --data-binary 'commit/' -H 'Content-type:text/xml; charset=utf-8' Not sure if that helps or how it compares to the book. On Sep 8, 2009, at 1:48 PM, Kevin Miller wrote: I am using the Solr nightly build from 8/11/2009. I am able to index my documents using the Solr Cell but when I attempt to send the commit command I get an error. I am using the example found in the Solr 1.4 Enterprise Search Server book (recently released) found on page 84. It shows to commit the changes as follows (I am showing where my files are located not the example in the book): c:\curl\bin\curl http://echo12:8983/solr/update/ -H Content-Type: text/xml --data-binary 'commit waitFlush=false/' this give me this error: The system cannot find the file specified. I get the same error when I modify it to look like the following: c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit waitFlush=false/' c:\curl\bin\curl http://echo12:8983/solr/update/; -H Content-Type: text/xml --data-binary 'commit waitFlush=false/' c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit /' c:\curl\bin\curl http://echo12:8983/solr/update/; 'commit /' I am using the example configuration in Solr so my documents are found in the exampledocs folder also my curl program in located in the root directory which is the reason for the way the curl command is being executed. I would appreciate any information on where to look or how to get the commit command to execute after indexing multiple files. Kevin Miller Oklahoma Tax Commission Web Services -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25352122.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- View this message in context: http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25394203.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: An issue with commit/ using Solr Cell and multiple files
seems to be an error with curl Kevin Miller-17 wrote: I am getting the same error message. I am running Solr on a Windows machine. Is the commit command a curl command or is it a Solr command? Kevin Miller Web Services -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 08, 2009 12:52 PM To: solr-user@lucene.apache.org Subject: Re: An issue with commit/ using Solr Cell and multiple files solr/examples/exampledocs/post.sh does: curl $URL --data-binary 'commit/' -H 'Content-type:text/xml; charset=utf-8' Not sure if that helps or how it compares to the book. On Sep 8, 2009, at 1:48 PM, Kevin Miller wrote: I am using the Solr nightly build from 8/11/2009. I am able to index my documents using the Solr Cell but when I attempt to send the commit command I get an error. I am using the example found in the Solr 1.4 Enterprise Search Server book (recently released) found on page 84. It shows to commit the changes as follows (I am showing where my files are located not the example in the book): c:\curl\bin\curl http://echo12:8983/solr/update/ -H Content-Type: text/xml --data-binary 'commit waitFlush=false/' this give me this error: The system cannot find the file specified. I get the same error when I modify it to look like the following: c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit waitFlush=false/' c:\curl\bin\curl http://echo12:8983/solr/update/; -H Content-Type: text/xml --data-binary 'commit waitFlush=false/' c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit /' c:\curl\bin\curl http://echo12:8983/solr/update/; 'commit /' I am using the example configuration in Solr so my documents are found in the exampledocs folder also my curl program in located in the root directory which is the reason for the way the curl command is being executed. I would appreciate any information on where to look or how to get the commit command to execute after indexing multiple files. Kevin Miller Oklahoma Tax Commission Web Services -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25352122.html Sent from the Solr - User mailing list archive at Nabble.com.