Re: SolrCloud removing shard (how to not loose data)
Mark, I know i still have access to data and i can woke ap shard again. What i want to do is. I have 3 shards on 3 nodes, one on each. Now i discower that i dont need 3 nodes and i want only 2. So i want to remove shard and put data from it to these who left. Is there way to index that data without force index it again? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-removing-shard-how-to-not-loose-data-tp4032138p4032459.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: retrieving latest document **only**
Am 10.01.2013 11:54, schrieb jmozah: I need a query that matches only the most recent ones... Because my stats depend on it.. But I have a requirement to show **only** the latest documents and the stats along with it.. What do you want? 'the most recent ones' or '**only** the latest' ? Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs. Uwe
Forwarding authentication credentials in internal node-to-node requests
Hi I read http://wiki.apache.org/solr/SolrSecurity and know a lot about webcontainer authentication and authorization. Im sure I will be able to set it up so that each solr-node is will require HTTP authentication for (selected) incoming requests. But solr-nodes also make requests among each other and Im in doubt if credentials are forwarded from the original request to the internal sub-requests? E.g. lets say that each solr-node is set up to require authentication for search request. An outside user makes a distributed request including correct username/password. Since it is a distributed search, the node which handles the original request from the user will have to make sub-requests to other solr-nodes but they also require correct credentials in order to accept this sub-request. Are the credentials from the original request duplicated to the sub-requests or what options do I have? Same thing goes for e.g. update requests if they are sent to a node which does not run (all) the replica of the shard in which the documents to be added/updated/deleted belong. The node needs to make sub-request to other nodes, and it will require forwarding the credentials. Does this just work out of the box, or ... ? Regards, Per Steffensen
Re: Auto completion
in solrconfig.xml str name=defTypeedismax/str str name=qf text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0 branch_name^1.1 hq_passout_year^1.4 course_type^10.0 institute_name^5.0 qualification_type^5.0 mail^2.0 state_name^1.0 /str str name=dftext/str str name=mm100%/str str name=q.alt*:*/str str name=rows10/str str name=fl*,score/str str name=mlt.qf text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0 branch_name^1.1 hq_passout_year^1.4 course_type^10.0 institute_name^5.0 qualification_type^5.0 mail^2.0 state_name^1.0 /str str name=mlt.fltext,last_name,first_name,course_name,id,branch_name,hq_passout_year,course_type,institute_name,qualification_type,mail,state_name/str int name=mlt.count3/int str name=faceton/str str name=facet.fieldis_top_institute/str str name=facet.fieldcourse_name/str str name=facet.rangecgpa/str int name=f.cgpa.facet.range.start0/int int name=f.cgpa.facet.range.end10/int int name=f.cgpa.facet.range.gap2/int and in schema.xml field name=id type=text_general indexed=true stored=true required=true multiValued=false / field name=first_name type=text_general indexed=false stored=true/ field name=last_name type=text_general indexed=false stored=true/ field name=institute_name type=text_general indexed=true stored=true/ ... ... ... copyField source=first_name dest=text/ copyField source=last_name dest=text/ copyField source=institute_name dest=text/ ... ... ... so please now tell me what will be JavaScript (terms.fl parameter) ? and conf/velocity/head.vm, and also the 'name' reference in suggest.vm. please reply .. and thanks for previous reply .. :-) -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-completion-tp4032267p4032450.html Sent from the Solr - User mailing list archive at Nabble.com.
which way for export
hello. Which is the best/fastest way to get the value of many fields from index? My problem is, that i need to calculate a sum of amounts. this amount is in my index (stored=true). my php script get all values with paging. but if a request takes too long, jetty is killing this process of export. is it better to get all the fields with wt=csv/json/xml or something other handler? -- View this message in context: http://lucene.472066.n3.nabble.com/which-way-for-export-tp4032487.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Forwarding authentication credentials in internal node-to-node requests
Hi, If your credentials are fixed i would configure username:password in your request handler's shardHandlerFactory configuration section and then modify HttpShardHandlerFactory.init() to create a HttpClient with an AuthScope configured with those settings. I don't think you can obtain the original credentials very easy when inside HttpShardHandlerFactory. Cheers -Original message- From:Per Steffensen st...@designware.dk Sent: Fri 11-Jan-2013 13:07 To: solr-user@lucene.apache.org Subject: Forwarding authentication credentials in internal node-to-node requests Hi I read http://wiki.apache.org/solr/SolrSecurity and know a lot about webcontainer authentication and authorization. Im sure I will be able to set it up so that each solr-node is will require HTTP authentication for (selected) incoming requests. But solr-nodes also make requests among each other and Im in doubt if credentials are forwarded from the original request to the internal sub-requests? E.g. lets say that each solr-node is set up to require authentication for search request. An outside user makes a distributed request including correct username/password. Since it is a distributed search, the node which handles the original request from the user will have to make sub-requests to other solr-nodes but they also require correct credentials in order to accept this sub-request. Are the credentials from the original request duplicated to the sub-requests or what options do I have? Same thing goes for e.g. update requests if they are sent to a node which does not run (all) the replica of the shard in which the documents to be added/updated/deleted belong. The node needs to make sub-request to other nodes, and it will require forwarding the credentials. Does this just work out of the box, or ... ? Regards, Per Steffensen
Re: retrieving latest document **only**
What do you want? 'the most recent ones' or '**only** the latest' ? Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs. Uwe I need **only** the latest documents... in the above query , refdate can vary based on the query. ./zahoor
Re: retrieving latest document **only**
one crude way is first query and pick the latest date from the result then issue a query with q=timestamp[latestDate TO latestDate] But i dont want to execute two queries... ./zahoor On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote: What do you want? 'the most recent ones' or '**only** the latest' ? Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs. Uwe I need **only** the latest documents... in the above query , refdate can vary based on the query. ./zahoor
Re: retrieving latest document **only**
could you use field collapsing? Boost by date and only show one value per group, and you'll have the most recent document only. Upayavira On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote: one crude way is first query and pick the latest date from the result then issue a query with q=timestamp[latestDate TO latestDate] But i dont want to execute two queries... ./zahoor On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote: What do you want? 'the most recent ones' or '**only** the latest' ? Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs. Uwe I need **only** the latest documents... in the above query , refdate can vary based on the query. ./zahoor
configuring schema to match database
Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
Re: SolrCloud removing shard (how to not loose data)
Seams I'm to lazy. I found this http://wiki.apache.org/solr/MergingSolrIndexes, and it works rly. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-removing-shard-how-to-not-loose-data-tp4032138p4032508.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.0, slow opening searchers
Hi, We're experiencing slow startup times of searchers in Solr when containing a large number of documents. We use Solr v4.0 with Jetty and currently have 267.657.634 documents stored, spread across 9 cores. These documents contain keywords, with additional statistics, which we are using for suggestions and related keywords. When we (re)start Solr on one of our servers it can take up to two hours before Solr has opened all of it's searchers and starts accepting connections again. We can't figure out why it takes so long to open those searchers. Also the CPU and memory usage of Solr while opening searchers is not extremely high. Are there any known issues or tips someone could give us to speed up opening searchers? If you need more details, please ping me. Best regards, Marcel Bremer Vinden.nl BV
Re: Index data from multiple tables into Solr
Hi! I know the pain! ;) That's why I wrote a bit on a blog, so I could remember in the future. Here is the link in case you would like to read a tutorial how to setup SOLR w/ multicore and hook it up to the database: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ I hope it helps! D. On Thu, Jan 10, 2013 at 6:19 PM, hassancrowdc hassancrowdc...@gmail.comwrote: Hi, i am trying to index multiple tables in solr. I am not sure which data config file to be changed there are so many of them(like solr-data-config, db-data-config)? Also, do i have to change the id, name and desc to the name of the columns in my table? and how do i add solr_details field in schema? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: configuring schema to match database
Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
SV: configuring schema to match database
When thinkting some more, Perhaps I could have coursename and such as multivalue? Or should I have separate indeces for users, courses and languages? I get the feeling both would work, but now sure which way is the best to go. When a user is updating/removing/adding a course it would be nice to to have to query the database for users courses and languages and update everything but just update a course document But perhaps I'm thinking to much in database terms? But still I'm unsure how the schema should look like Thanks /Niklas -Ursprungligt meddelande- Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] Skickat: den 11 januari 2013 14:19 Till: solr-user@lucene.apache.org Ämne: configuring schema to match database Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
SV: configuring schema to match database
Hmm noticed I wrote I have 3 columns: users, courses and languages I ofcourse mean I have 3 tables: users, courses and languages /Niklas -Ursprungligt meddelande- Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] Skickat: den 11 januari 2013 14:19 Till: solr-user@lucene.apache.org Ämne: configuring schema to match database Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
SV: configuring schema to match database
Hi Dariusz, To me this example has one table user and I have many tables that connects to one user and that is what I'm unsure how how to do. /Niklas -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 14:56 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
Re: configuring schema to match database
Hi, No, it has actually two tables. User and Item. The example shown on the blog is for one table, because you repeat the same thing for the other table. Only your data-import.xml file changes. For the rest, just copy and paste it in the conf directory. If you are running your solr in Linux, then you can work with symlinks. D. On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi Dariusz, To me this example has one table user and I have many tables that connects to one user and that is what I'm unsure how how to do. /Niklas -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 14:56 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
Re: Forwarding authentication credentials in internal node-to-node requests
Hmmm, it will not work for me. I want the original credential forwarded in the sub-requests. The credentials are mapped to permissions (authorization), and basically I dont want a user to be able have something done in the (automatically performed by the contacted solr-node) sub-requests that he is not authorized to do. Forward of credentials is a must. So what you are saying is that I should expect to have to do some modifications to Solr in order to achieve what I want? Regards, Per Steffensen On 1/11/13 2:11 PM, Markus Jelsma wrote: Hi, If your credentials are fixed i would configure username:password in your request handler's shardHandlerFactory configuration section and then modify HttpShardHandlerFactory.init() to create a HttpClient with an AuthScope configured with those settings. I don't think you can obtain the original credentials very easy when inside HttpShardHandlerFactory. Cheers -Original message- From:Per Steffensen st...@designware.dk Sent: Fri 11-Jan-2013 13:07 To: solr-user@lucene.apache.org Subject: Forwarding authentication credentials in internal node-to-node requests Hi I read http://wiki.apache.org/solr/SolrSecurity and know a lot about webcontainer authentication and authorization. Im sure I will be able to set it up so that each solr-node is will require HTTP authentication for (selected) incoming requests. But solr-nodes also make requests among each other and Im in doubt if credentials are forwarded from the original request to the internal sub-requests? E.g. lets say that each solr-node is set up to require authentication for search request. An outside user makes a distributed request including correct username/password. Since it is a distributed search, the node which handles the original request from the user will have to make sub-requests to other solr-nodes but they also require correct credentials in order to accept this sub-request. Are the credentials from the original request duplicated to the sub-requests or what options do I have? Same thing goes for e.g. update requests if they are sent to a node which does not run (all) the replica of the shard in which the documents to be added/updated/deleted belong. The node needs to make sub-request to other nodes, and it will require forwarding the credentials. Does this just work out of the box, or ... ? Regards, Per Steffensen
Re: Reading properties in data-import.xml
Thanks Alex! This brought me to the solution I wanted to achieve. :) D. On Thu, Jan 10, 2013 at 3:21 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: dataimport.properties is for DIH to store it's own properties for delta processing and things. Try solrcore.properties instead, as per recent discussion: http://lucene.472066.n3.nabble.com/Reading-database-connection-properties-from-external-file-td4031154.html Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Jan 10, 2013 at 3:58 AM, Dariusz Borowski darius...@gmail.com wrote: I'm having a problem using a property file in my data-import.xml file. My aim is to not hard code some values inside my xml file, but rather reusing the values from a property file. I'm using multicore and some of the values are being changed from time to time and I do not want to change them in all my data-import files. For example: dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://${host}:3306/projectX user=${username} password=${password} / I tried everything, but don't know how I can use proporties here. I tried to put my values in dataimport.properties, located under SOLR-HOME/conf and under SOLR-HOME/core1/conf, but without any success. Please, could someone help me on this?
SV: configuring schema to match database
Ahh sorry, Now I understand, Ok seems like a good solution, I just know need to understand how to query multiple cores now :) -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 15:15 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi, No, it has actually two tables. User and Item. The example shown on the blog is for one table, because you repeat the same thing for the other table. Only your data-import.xml file changes. For the rest, just copy and paste it in the conf directory. If you are running your solr in Linux, then you can work with symlinks. D. On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi Dariusz, To me this example has one table user and I have many tables that connects to one user and that is what I'm unsure how how to do. /Niklas -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 14:56 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1 / D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
Re: configuring schema to match database
I don't know how to query multiple cores and if it's possible at once, but otherwise I would create a JOIN sql script if you need values from multiple tables. D. On Fri, Jan 11, 2013 at 3:27 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Ahh sorry, Now I understand, Ok seems like a good solution, I just know need to understand how to query multiple cores now :) -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 15:15 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi, No, it has actually two tables. User and Item. The example shown on the blog is for one table, because you repeat the same thing for the other table. Only your data-import.xml file changes. For the rest, just copy and paste it in the conf directory. If you are running your solr in Linux, then you can work with symlinks. D. On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi Dariusz, To me this example has one table user and I have many tables that connects to one user and that is what I'm unsure how how to do. /Niklas -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 14:56 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1 / D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
Re: configuring schema to match database
On 11 January 2013 19:57, Niklas Langvig niklas.lang...@globesoft.com wrote: Ahh sorry, Now I understand, Ok seems like a good solution, I just know need to understand how to query multiple cores now :) There is no need to use multiple cores in your setup. Going back to your original problem statement, it can easily be handled with a single core, and it actually makes more sense to do it that way. You will need to give us more details. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Presumably, you mean three tables, as you describe each as having columns. How are the tables connected? Is there a foreign key relationship between them? Is the relationship one-to-one, one-to-many, or what? Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill [...] I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? 1. Your schema for the single core is quite straightforward, and along the lines of what you had described (one field for each database column in each table), e.g., field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=date indexed=true / field name=enddate type= indexed=true / field name=language type=string indexed=true / field name=writingskill type=string indexed=true / field name=verbalskill type=string indexed=true / Pay attention to the type. Dates should typically be solr.DateField. The others can be strings, but if they are integers in the database, you might benefit from making these integers in Solr also. 2. One has to stop thinking of Solr as a RDBMS. Instead, one flattens out data from a typical RDBMS structure. It is difficult to give you complete instructions unless you describe the database relationships, but, e.g., if one has userA with course1, course2, and course3, and userB with course2, course4, the Solr documents would be : userA course1 details for course1... userA course2 details for course2... userA course3 details for course3... userB course2 details for course2... userB course4 details for course4... This scheme could also be extended to languages, depending on how the tables are related. 3. While indexing into Solr, one has to select from the database, and flatten out the data as above. The two main ways of doing this are using a library like SolrJ for Java (other languages have other libraries, e.g., django-haystack is easy to get started with if one is using Python/Django), or the Solr DataImportHandler (please see http://wiki.apache.org/solr/DataImportHandler ) with nested entities. 4. With such a structure, querying Solr should be simple. Regards, Gora
SV: configuring schema to match database
It sounds good not to use more than one core, for sure I do not want to over complicate this. Yes I meant tables. It's pretty simple. Both table courses and languages has it's own primary key courseseqno and languagesseqno Both also have a foreign key userid that references the users table with column userid The relationship from users to courses and languages are one-to-many. but I guess I'm thinking wrong because my idead whould be to have a block of fields connected with one id field name=coursename type=string indexed=true / field name=startdate type=date indexed=true / field name=enddate type= indexed=true / These three are connected with a field name=courseseqno type=int indexed=true / But also have a field name=userid type=int indexed=true / To connect to a specific user? Thanks /Niklas -Ursprungligt meddelande- Från: Gora Mohanty [mailto:g...@mimirtech.com] Skickat: den 11 januari 2013 15:55 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database On 11 January 2013 19:57, Niklas Langvig niklas.lang...@globesoft.com wrote: Ahh sorry, Now I understand, Ok seems like a good solution, I just know need to understand how to query multiple cores now :) There is no need to use multiple cores in your setup. Going back to your original problem statement, it can easily be handled with a single core, and it actually makes more sense to do it that way. You will need to give us more details. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Presumably, you mean three tables, as you describe each as having columns. How are the tables connected? Is there a foreign key relationship between them? Is the relationship one-to-one, one-to-many, or what? Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill [...] I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? 1. Your schema for the single core is quite straightforward, and along the lines of what you had described (one field for each database column in each table), e.g., field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=date indexed=true / field name=enddate type= indexed=true / field name=language type=string indexed=true / field name=writingskill type=string indexed=true / field name=verbalskill type=string indexed=true / Pay attention to the type. Dates should typically be solr.DateField. The others can be strings, but if they are integers in the database, you might benefit from making these integers in Solr also. 2. One has to stop thinking of Solr as a RDBMS. Instead, one flattens out data from a typical RDBMS structure. It is difficult to give you complete instructions unless you describe the database relationships, but, e.g., if one has userA with course1, course2, and course3, and userB with course2, course4, the Solr documents would be : userA course1 details for course1... userA course2 details for course2... userA course3 details for course3... userB course2 details for course2... userB course4 details for course4... This scheme could also be extended to languages, depending on how the tables are related. 3. While indexing into Solr, one has to select from the database, and flatten out the data as above. The two main ways of doing this are using a library like SolrJ for Java (other languages have other libraries, e.g., django-haystack is easy to get started with if one is using Python/Django), or the Solr DataImportHandler (please see http://wiki.apache.org/solr/DataImportHandler ) with nested entities. 4. With such a structure, querying Solr should be simple. Regards, Gora
Re: Getting Files into Zookeeper
It's a bug that you only see RuntimeException - in 4.1 you will get the real problem - which is likely around connecting to zookeeper. You might try with a single zk host in the zk host string initially. That might make it easier to track down why it won't connect. It's tough to diagnose because the root exception is being swallowed - it's likely a connect to zk failed exception though. - Mark On Jan 10, 2013, at 1:34 PM, Christopher Gross cogr...@gmail.com wrote: I'm trying to get SolrCloud working with more than one configuration going. I have the base schema that Solr 4 comes with, I'd like to push that and one from another project (it does have the _version_ field in it.) I'm having difficulty figuring out how to push things into zookeeper, or if I'm even doing this right. From the SolrCloud page, I'm trying this and I get an error -- $ java -classpath zookeeper-3.3.6.jar:apache-solr-core-4.0.0.jar:apache-solr-solrj-4.0.0.jar:commons-cli-1.2.jar:slf4j-jdk14-1.6.4.jar:slf4j-api-1.6.4.jar:commons-codec-1.7.jar:commons-fileupload-1.2.1.jar:commons-io-2.1.jar:commons-lang-2.6.jar:guava-r05.jar:httpclient-4.1.3.jar:httpcore-4.1.4.jar:httpmime-4.1.3.jar:jcl-over-slf4j-1.6.4.jar:lucene-analyzers-common-4.0.0.jar:lucene-analyzers-kuromoji-4.0.0.jar:lucene-analyzers-phonetic-4.0.0.jar:lucene-core-4.0.0.jar:lucene-grouping-4.0.0.jar:lucene-highlighter-4.0.0.jar:lucene-memory-4.0.0.jar:lucene-misc-4.0.0.jar:lucene-queries-4.0.0.jar:lucene-queryparser-4.0.0.jar:lucene-spatial-4.0.0.jar:lucene-suggest-4.0.0.jar:spatial4j-0.3.jar:wstx-asl-3.2.7.jar org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2181,localhost:2182,localhost:2183,localhost:2184,localhost:2 185 -confdir /solr/data/test/conf -confname myconf Exception in thread main java.lang.RuntimeException at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:115) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:83) at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:158) Can anyone point me in the direction of some documentation or let me know if there's something that I'm missing? Thanks! -- Chris
Re: Setting up new SolrCloud - need some guidance
On Jan 10, 2013, at 12:06 PM, Shawn Heisey s...@elyograg.org wrote: On 1/9/2013 8:54 PM, Mark Miller wrote: I'd put everything into one. You can upload different named sets of config files and point collections either to the same sets or different sets. You can really think about it the same way you would setting up a single node with multiple cores. The main difference is that it's easier to share sets of config files across collections if you want to. You don't need to at all though. I'm not sure if xinclude works with zk, but I don't think it does. Thank you for your assistance. I'll work on recombining my solrconfig.xml. Are there any available full examples of how to set up and start both zookeeper and Solr? I'll be using the included Jetty 8. I'm not sure - there are a few blog posts out there. The wiki does a decent job for Solr but doesn't get in ZooKeeper - the ZooKeeper site has a pretty simple setup guide though. Specific questions that have come to mind: If I'm planning multiple collections with their own configs, do I still need to bootstrap zookeeper when I start Solr, or should I start it up with the zkHost parameter and then use the collection admin to upload information? I have not looked closely at the collection admin yet, I just know that it exists. Currently, there are two main options. Either use the bootstrap param on first startup or use the zkcli cmd line tool to upload config sets and link them to collections. I have heard that if a replica node is down long enough that transaction logs are not enough to fully fix that node, SolrCloud will initiate a full replication. Is that the case? If so, is it necessary to configure the replication handler with a specific path for the name, or does SolrCloud handle that itself? The replication handler should be defined as you see it in the default example solrconfig.xml file. Very bare bones. Is there an option on updateLog that controls how many transactions are kept, or is that managed automatically by SolrCloud? I have read some things that talk about 100 updates. I expect updates on this to be extremely frequent and small, so 100 updates isn't much, and I may want to increase that. No option - 100 is it as it has implications on the recovery strategy if it's raised. I'd like to see it configurable in the future, but would require make some other knobs change as well if I remember right. Is it expected with future versions of Solr that I could upgrade one of my nodes to 4.2 or 4.3 and have it work with the other node still at 4.1? I would also hope that would mean that the last 4.x release would work with 5.0. That would make it possible to do rolling upgrades with no downtime. I don't think we have committed to anything here yet. Seems like something we need to hash out, but we have not wanted to be too limited initially. For example, the Solr 4.0 to 4.1 upgrade with SolrCloud still needs some explanation and might require some down time. - Mark Thanks, Shawn
RE: Setting up new SolrCloud - need some guidance
FYI: XInclude works fine. We have all request handlers in solrconfig in separate files and include them via XInclude on a running SolrCloud cluster. -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 11-Jan-2013 17:13 To: solr-user@lucene.apache.org Subject: Re: Setting up new SolrCloud - need some guidance On Jan 10, 2013, at 12:06 PM, Shawn Heisey s...@elyograg.org wrote: On 1/9/2013 8:54 PM, Mark Miller wrote: I'd put everything into one. You can upload different named sets of config files and point collections either to the same sets or different sets. You can really think about it the same way you would setting up a single node with multiple cores. The main difference is that it's easier to share sets of config files across collections if you want to. You don't need to at all though. I'm not sure if xinclude works with zk, but I don't think it does. Thank you for your assistance. I'll work on recombining my solrconfig.xml. Are there any available full examples of how to set up and start both zookeeper and Solr? I'll be using the included Jetty 8. I'm not sure - there are a few blog posts out there. The wiki does a decent job for Solr but doesn't get in ZooKeeper - the ZooKeeper site has a pretty simple setup guide though. Specific questions that have come to mind: If I'm planning multiple collections with their own configs, do I still need to bootstrap zookeeper when I start Solr, or should I start it up with the zkHost parameter and then use the collection admin to upload information? I have not looked closely at the collection admin yet, I just know that it exists. Currently, there are two main options. Either use the bootstrap param on first startup or use the zkcli cmd line tool to upload config sets and link them to collections. I have heard that if a replica node is down long enough that transaction logs are not enough to fully fix that node, SolrCloud will initiate a full replication. Is that the case? If so, is it necessary to configure the replication handler with a specific path for the name, or does SolrCloud handle that itself? The replication handler should be defined as you see it in the default example solrconfig.xml file. Very bare bones. Is there an option on updateLog that controls how many transactions are kept, or is that managed automatically by SolrCloud? I have read some things that talk about 100 updates. I expect updates on this to be extremely frequent and small, so 100 updates isn't much, and I may want to increase that. No option - 100 is it as it has implications on the recovery strategy if it's raised. I'd like to see it configurable in the future, but would require make some other knobs change as well if I remember right. Is it expected with future versions of Solr that I could upgrade one of my nodes to 4.2 or 4.3 and have it work with the other node still at 4.1? I would also hope that would mean that the last 4.x release would work with 5.0. That would make it possible to do rolling upgrades with no downtime. I don't think we have committed to anything here yet. Seems like something we need to hash out, but we have not wanted to be too limited initially. For example, the Solr 4.0 to 4.1 upgrade with SolrCloud still needs some explanation and might require some down time. - Mark Thanks, Shawn
Re: Getting Files into Zookeeper
I changed it to only go to one Zookeeper (localhost:2181) and it still gave me the same stack trace error. I was eventually able to get around this -- I just used the bootstrap arguments when starting up my Tomcat instances to push the configs over -- though I'd rather just do it externally from Tomcat in the future. Thanks Mark. -- Chris On Fri, Jan 11, 2013 at 11:00 AM, Mark Miller markrmil...@gmail.com wrote: It's a bug that you only see RuntimeException - in 4.1 you will get the real problem - which is likely around connecting to zookeeper. You might try with a single zk host in the zk host string initially. That might make it easier to track down why it won't connect. It's tough to diagnose because the root exception is being swallowed - it's likely a connect to zk failed exception though. - Mark On Jan 10, 2013, at 1:34 PM, Christopher Gross cogr...@gmail.com wrote: I'm trying to get SolrCloud working with more than one configuration going. I have the base schema that Solr 4 comes with, I'd like to push that and one from another project (it does have the _version_ field in it.) I'm having difficulty figuring out how to push things into zookeeper, or if I'm even doing this right. From the SolrCloud page, I'm trying this and I get an error -- $ java -classpath zookeeper-3.3.6.jar:apache-solr-core-4.0.0.jar:apache-solr-solrj-4.0.0.jar:commons-cli-1.2.jar:slf4j-jdk14-1.6.4.jar:slf4j-api-1.6.4.jar:commons-codec-1.7.jar:commons-fileupload-1.2.1.jar:commons-io-2.1.jar:commons-lang-2.6.jar:guava-r05.jar:httpclient-4.1.3.jar:httpcore-4.1.4.jar:httpmime-4.1.3.jar:jcl-over-slf4j-1.6.4.jar:lucene-analyzers-common-4.0.0.jar:lucene-analyzers-kuromoji-4.0.0.jar:lucene-analyzers-phonetic-4.0.0.jar:lucene-core-4.0.0.jar:lucene-grouping-4.0.0.jar:lucene-highlighter-4.0.0.jar:lucene-memory-4.0.0.jar:lucene-misc-4.0.0.jar:lucene-queries-4.0.0.jar:lucene-queryparser-4.0.0.jar:lucene-spatial-4.0.0.jar:lucene-suggest-4.0.0.jar:spatial4j-0.3.jar:wstx-asl-3.2.7.jar org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2181,localhost:2182,localhost:2183,localhost:2184,localhost:2 185 -confdir /solr/data/test/conf -confname myconf Exception in thread main java.lang.RuntimeException at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:115) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:83) at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:158) Can anyone point me in the direction of some documentation or let me know if there's something that I'm missing? Thanks! -- Chris
Re: configuring schema to match database
On 11 January 2013 21:13, Niklas Langvig niklas.lang...@globesoft.com wrote: It sounds good not to use more than one core, for sure I do not want to over complicate this. [...] Yes, not only are multiple cores unnecessarily complicated here, your searches will also be be less complex, and faster. Both table courses and languages has it's own primary key courseseqno and languagesseqno There is no need to index these. Both also have a foreign key userid that references the users table with column userid The relationship from users to courses and languages are one-to-many. but I guess I'm thinking wrong because my idead whould be to have a block of fields connected with one id field name=coursename type=string indexed=true / field name=startdate type=date indexed=true / field name=enddate type= indexed=true / These three are connected with a field name=courseseqno type=int indexed=true / But also have a field name=userid type=int indexed=true / To connect to a specific user? [...] You are still thinking of Solr as a RDBMS, where you should not be. In your case, it is easiest to flatten out the data. This increases the size of the index, but that should not really be of concern. As your courses and languages tables are connected only to user, the schema that I described earlier should suffice. To extend my earlier example, given: * userA with courses c1, c2, c3, and languages l1, l2 * userB with c2, c3, and l2 you should flatten it such that you get the following Solr documents userA c1 name c1 startdate...l1 l1 writing skill... userA c1 name c1 startdate...l2 l2 writing skill... userA c2 name c2 startdate...l1 l1 writing skill... ... userB c2 name c2 startdate...l2 l2 writing skill... userB c3 name c3 startdate...l2 l2 writing skill... i.e., a total of 3 courses x 2 languages = 6 documents for userA, and 2 courses x 1 language = 2 documents for userB In order to get this form of flattened data into Solr, I would suggest using the DataImportHandler with nested entities. Please see the earlier link to DIH. Also, a Google search for Solr dataimporthandler nested entities turns up many examples, including: http://solr.pl/en/2010/10/11/data-import-handler-%E2%80%93-how-to-import-data-from-sql-databases-part-1/ Please give it a try, and post here with your attempts if you run into any issues. Regards, Gora
How to disable\clear filterCache(from SolrIndexSearcher ) in a custom searchComponent
Hello thank you in advance for your help!, *Context:* I have implemented a custom search component that receives 3 parameters field, termValue and payloadX. The component should search for a termValue in the requested Lucene field and for each *termValue* to check *payloadX* in its associated payload the information. *Constraints:* I don't want to disable filterCache from solconfig.xml the filterCache class=solr.FastLRUCache since I have other searchComponents that could use the filterCache. I have implemented this the payload search using SpanTermQuery and attached it to q:field=termValue public class MySearchComponent extends XPatternsSearchComponent { public void prepare(ResponseBuilder rb){ ... rb.setQueryString(parameters.get(CommonParams.Q)... } public void process(ResponseBuilder rb) { ... SolrIndexSearcher.QueryResult queryResult = new SolrIndexSearcher.QueryResult();// ??? question for help *CustomSpanTermQuery* customFilterQuery = new CustomSpanTermQuery(field, term, payload); //search for payloadCriteria in the payload in a specific field for a specific term QueryCommand queryCommand = rb.getQueryCommand().setFilterList(filterQuery)); rb.req.getSearcher().search(queryResult, queryCommand); ... } *Issue:* If I call the search component with field1, termValue1 and: - *payload1*(the first search) the result from filtering it is saved in filterCache. - *payload2*(second time) the results from the first search(filterCache) are returned and not a different expected result set. Findings: I noticed that in SolrIndexSearch, filterCache is private so I can not change\clear it through inheritance. Also I tried to use rb.getQueryCommand().replaceFlags() but SolrIndexSearch.NO_CHECK_FILTERCACHE|NO_CHECK_QCACHE|NO_SET_QCACHE are not public too. *Question*: How to disable\clear filterCache(from SolrIndexSearcher ) *only *for a custom search component. Do I have other options\approaches? Best regards, Radu
RE: Forwarding authentication credentials in internal node-to-node requests
Hmm, you need to set up the HttpClient in HttpShardHandlerFactory but you cannot access the HttpServletRequest from there, it is only available in SolrDispatchFilter AFAIK. And then, the HttpServletRequest can only return the remote user name, not the password he, she or it provided. I don't know how to obtain the password. -Original message- From:Per Steffensen st...@designware.dk Sent: Fri 11-Jan-2013 15:28 To: solr-user@lucene.apache.org Subject: Re: Forwarding authentication credentials in internal node-to-node requests Hmmm, it will not work for me. I want the original credential forwarded in the sub-requests. The credentials are mapped to permissions (authorization), and basically I dont want a user to be able have something done in the (automatically performed by the contacted solr-node) sub-requests that he is not authorized to do. Forward of credentials is a must. So what you are saying is that I should expect to have to do some modifications to Solr in order to achieve what I want? Regards, Per Steffensen On 1/11/13 2:11 PM, Markus Jelsma wrote: Hi, If your credentials are fixed i would configure username:password in your request handler's shardHandlerFactory configuration section and then modify HttpShardHandlerFactory.init() to create a HttpClient with an AuthScope configured with those settings. I don't think you can obtain the original credentials very easy when inside HttpShardHandlerFactory. Cheers -Original message- From:Per Steffensen st...@designware.dk Sent: Fri 11-Jan-2013 13:07 To: solr-user@lucene.apache.org Subject: Forwarding authentication credentials in internal node-to-node requests Hi I read http://wiki.apache.org/solr/SolrSecurity and know a lot about webcontainer authentication and authorization. Im sure I will be able to set it up so that each solr-node is will require HTTP authentication for (selected) incoming requests. But solr-nodes also make requests among each other and Im in doubt if credentials are forwarded from the original request to the internal sub-requests? E.g. lets say that each solr-node is set up to require authentication for search request. An outside user makes a distributed request including correct username/password. Since it is a distributed search, the node which handles the original request from the user will have to make sub-requests to other solr-nodes but they also require correct credentials in order to accept this sub-request. Are the credentials from the original request duplicated to the sub-requests or what options do I have? Same thing goes for e.g. update requests if they are sent to a node which does not run (all) the replica of the shard in which the documents to be added/updated/deleted belong. The node needs to make sub-request to other nodes, and it will require forwarding the credentials. Does this just work out of the box, or ... ? Regards, Per Steffensen
Re: link on graph page
They point to the admin UI - or should - that seems right? - Mark On Jan 11, 2013, at 10:57 AM, Christopher Gross cogr...@gmail.com wrote: I've managed to get my SolrCloud set up to have 2 different indexes up and running. However, my URLs aren't right. They just point to http://server:port/solr, not http://server:port/solr/index1 or http://server:port/solr/index2. Is that something that I can set in my solr.xml for that Solr instance, or is it something that I'd have to set in each one's solrconfig.xml. Any help would be appreciated. Thanks! -- Chris
Re: configuring schema to match database
On 01/11/2013 05:23 PM, Gora Mohanty wrote: You are still thinking of Solr as a RDBMS, where you should not be. In your case, it is easiest to flatten out the data. This increases the size of the index, but that should not really be of concern. As your courses and languages tables are connected only to user, the schema that I described earlier should suffice. To extend my earlier example, given: * userA with courses c1, c2, c3, and languages l1, l2 * userB with c2, c3, and l2 you should flatten it such that you get the following Solr documents userA c1 name c1 startdate...l1 l1 writing skill... userA c1 name c1 startdate...l2 l2 writing skill... userA c2 name c2 startdate...l1 l1 writing skill... userB c2 name c2 startdate...l2 l2 writing skill... userB c3 name c3 startdate...l2 l2 writing skill... i.e., a total of 3 courses x 2 languages = 6 documents for userA, and 2 courses x 1 language = 2 documents for userB Actually, that is what you would get when doing a join in an RDBMS, the cross-product of your tables. This is NOT AT ALL what you typically do in Solr. Best start the other way around, think of Solr as a retrieval system, not a storage system. What are your queries? What do you want to find, and what criteria do you use to search for it? If your intention is to find users that match certain criteria, each entry should be a user (with ALL associated information, e.g. all courses, all language skills, etc.), if you want to retrieve courses, each entry should be a course. Let's say you want to find users who have certain language skills, you would have a schema that describes a user: - user id - user name - languages - ... In languages, you could store e.g. things like: en|reading|high es|writing|low, etc. It could be a multivalued field or just have everything separated by space and a tokenizer that splits on whitespace. Now you can query: - language:es* -- return all users with some spanish skills - language:en|writing|high -- return all users with high english writing skills - +(language:es* language:fr*) +language:en|writing|high -- return users with high english writing skills and some knowledge of french or spanish If you want to avoid wildcard queries (more costly) you can just add plain en and es, etc. to your field so language:es will match anybody with spanish skills. Best, Jens
Re: configuring schema to match database
On 11 January 2013 22:30, Jens Grivolla j+...@grivolla.net wrote: [...] Actually, that is what you would get when doing a join in an RDBMS, the cross-product of your tables. This is NOT AT ALL what you typically do in Solr. Best start the other way around, think of Solr as a retrieval system, not a storage system. What are your queries? What do you want to find, and what criteria do you use to search for it? [...] Um, he did describe his desired queries, and there was a reason that I proposed the above schema design. UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian Unless the index is becoming huge, I feel that it is better to flatten everything out rather than combine fields, and post-process the results. Regards, Gora
Re: Solr 4.0, slow opening searchers
Hi Marcel, Are you committing data with hard commits or soft commits? I've seen systems where we've inadvertently only used soft commits, which means that the entire transaction log has to be re-read on startup, which can take a long time. Hard commits flush indexed data to disk, and make it a lot quicker to restart. Alan Woodward a...@flax.co.uk On 11 Jan 2013, at 13:51, Marcel Bremer wrote: Hi, We're experiencing slow startup times of searchers in Solr when containing a large number of documents. We use Solr v4.0 with Jetty and currently have 267.657.634 documents stored, spread across 9 cores. These documents contain keywords, with additional statistics, which we are using for suggestions and related keywords. When we (re)start Solr on one of our servers it can take up to two hours before Solr has opened all of it's searchers and starts accepting connections again. We can't figure out why it takes so long to open those searchers. Also the CPU and memory usage of Solr while opening searchers is not extremely high. Are there any known issues or tips someone could give us to speed up opening searchers? If you need more details, please ping me. Best regards, Marcel Bremer Vinden.nl BV
how to perform a delta-import when related table is updated
My delta-import (http://localhost:8983/solr/freemedia/dataimport?command=delta-import) does not correctly update my solr fields. Please see my data-config here: entity name=freemedia query=select * from freemedia WHERE categoryid0 deltaImportQuery=select * from freemedia WHERE updatedate lt; getdate() AND id='${dataimporter.delta.id}' AND categoryid0 deltaQuery=select id from freemedia where updatedate gt; '${dataimporter.last_index_time}' AND categoryid0 entity name=lovecount query=select COUNT(id) as likes FROM freemedialikes WHERE freemediaid=${freemedia.id}/entity Now when a new item is inserted into [freemedialikes] and I perform a delta-import, the Solr index does not show the total new amount of likes. Only after I perform a full-import (http://localhost:8983/solr/freemedia/dataimport?command=full-import) the correct number is shown. So the SQL is returning the correct results, I just don't know how to get the updated likes count via the delta-import. I have reloaded the data-config everytime I made a change. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting up new SolrCloud - need some guidance
On 1/11/2013 9:15 AM, Markus Jelsma wrote: FYI: XInclude works fine. We have all request handlers in solrconfig in separate files and include them via XInclude on a running SolrCloud cluster. Good to know. I'm still deciding whether I want to recombine or continue to use xinclude. Is the xinclude path relative to solrconfig.xml just as it is now, so I could link to include/indexConfig.xml? Are things partitioned well enough that one collection's config will not overlap into another config when using xinclude and relative paths? The way I do things now, all files in cores/corename/conf (relative to solr.home) are symlinks, such as solrconfig.xml - ../../../config/X/solrconfig.xml, where X is a general designation for a type of config. I have good separation between instanceDir, data, and real config files. The paths in the xinclude elements are relative to the location of the symlink. Thanks, Shawn
RE: how to perform a delta-import when related table is updated
Peter, See http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command , then scroll down to where it says The deltaQuery in the above example only detects changes in item but not in other tables... It shows you two ways to do it. Option 1: add a reference to the last_modified_date (or whatever) from the child table in a where-in clause in the parent entity's deltaQuery. Option 2: add a parentDeltaQuery on the child entity. This is a query that tells DIH which parent-table keys need to update because of child table updates. In other words, say your child's Delta Query says that child_id=1 changed. You might have for parentDeltaQuery something like: SELECT ID FROM PARENT P WHERE P.CHILD_ID=${Child.ID} . While this can simplify things for you and prevent you from not needing giant where-in clauses on the parent query, it will double the number of queries that get issued to determine which documents to update. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Friday, January 11, 2013 12:02 PM To: solr-user@lucene.apache.org Subject: how to perform a delta-import when related table is updated My delta-import (http://localhost:8983/solr/freemedia/dataimport?command=delta-import) does not correctly update my solr fields. Please see my data-config here: entity name=freemedia query=select * from freemedia WHERE categoryid0 deltaImportQuery=select * from freemedia WHERE updatedate lt; getdate() AND id='${dataimporter.delta.id}' AND categoryid0 deltaQuery=select id from freemedia where updatedate gt; '${dataimporter.last_index_time}' AND categoryid0 entity name=lovecount query=select COUNT(id) as likes FROM freemedialikes WHERE freemediaid=${freemedia.id}/entity Now when a new item is inserted into [freemedialikes] and I perform a delta-import, the Solr index does not show the total new amount of likes. Only after I perform a full-import (http://localhost:8983/solr/freemedia/dataimport?command=full-import) the correct number is shown. So the SQL is returning the correct results, I just don't know how to get the updated likes count via the delta-import. I have reloaded the data-config everytime I made a change. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: how to perform a delta-import when related table is updated
Hi James, Ok, so I did this: entity name=freemedia query=select * from freemedia WHERE categoryid0 deltaImportQuery=select * from freemedia WHERE updatedate lt; getdate() AND id='${dataimporter.delta.id}' AND categoryid0 deltaQuery=select id from freemedia where id in (select freemediaid as id from freemedialikes where createdate '${dih.last_index_time}') or updatedate gt; '${dataimporter.last_index_time}' AND categoryid0 I now get this error in the logfile: SEVERE: Delta Import Failed java.lang.IllegalArgumentException: deltaQuery has no column to resolve to declared primary key pk='ID' Now, my table looks like this: CREATE TABLE [dbo].[freemedialikes]( [id] [int] IDENTITY(1,1) NOT NULL, [userid] [nvarchar](50) NOT NULL, [freemediaid] [int] NOT NULL, [createdate] [datetime] NOT NULL, CONSTRAINT [PK_freemedialikes] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[freemedialikes] WITH CHECK ADD CONSTRAINT [FK_freemedialikes_freemedia] FOREIGN KEY([freemediaid]) REFERENCES [dbo].[freemedia] ([id]) ON DELETE CASCADE GO ALTER TABLE [dbo].[freemedialikes] CHECK CONSTRAINT [FK_freemedialikes_freemedia] GO ALTER TABLE [dbo].[freemedialikes] ADD CONSTRAINT [DF_freemedialikes_createdate] DEFAULT (getdate()) FOR [createdate] GO So in the deltaquery I thought I had to reference the freemediaid, like so: select freemediaid as id from freemedialikes Got the same error as above. So then I thought since there was mention of a PK in the error I just reference the PK of the childtable, didn't make sense, but hey :) select id from freemedialikes w But I got the same error again. Any suggestions? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587p4032608.html Sent from the Solr - User mailing list archive at Nabble.com.
Accessing raw index data
Hi, I have just setup my first Solr 4.0 instance and have added about one million documents. I would like to access the raw data stored in the index. Can somebody give me a starting point how to do that? As a first step, a simple dump would be absolutely ok. I just want to play around and do some static offline analysis. In the long term, I probably would like to implement custom search components to enrich my search results. So if there's no export for raw data, I would be happy to learn how to implement custom handlers and/or search components. Some guidance where to start would be very appreciated. kind regards, Achim
Re: Accessing raw index data
On 12 January 2013 01:06, Achim Domma do...@procoders.net wrote: Hi, I have just setup my first Solr 4.0 instance and have added about one million documents. I would like to access the raw data stored in the index. Can somebody give me a starting point how to do that? As a first step, a simple dump would be absolutely ok. I just want to play around and do some static offline analysis. In the long term, I probably would like to implement custom search components to enrich my search results. So if there's no export for raw data, I would be happy to learn how to implement custom handlers and/or search components. Some guidance where to start would be very appreciated. It is not clear what you mean by raw data, and what level of customisation you are after. Here are two possibilities: * At the base, Solr indexes are Lucene indexes, so one can always drop down to that level. * Also, Solr allows plugins for various components. This link might be of help, depending on the extent of customisation you are after: http://wiki.apache.org/solr/SolrPlugins Maybe you should approach this from the other end: If you could describe what you are trying to achieve, people might be able to offer possibilities. Regards, Gora
Re: Accessing raw index data
At the base, Solr indexes are Lucene indexes, so one can always drop down to that level. That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather multiple of them), holding all words which occurre in my documents, each word having a list of documents the word was part of. I would like to do some statistics based on this information, would like to analyze how it changes if I change my text processing settings, ... If you would give me a starting point like Data is stored in Lucene indexes, which are documented at XXX. In a request handler you can access the indexes via YYY., I would be perfectly happy figuring out the rest on my own. Documentation about 4.0 is a bit limited, so it's hard to find an entry point. cheers, Achim Am 11.01.2013 um 20:54 schrieb Gora Mohanty: On 12 January 2013 01:06, Achim Domma do...@procoders.net wrote: Hi, I have just setup my first Solr 4.0 instance and have added about one million documents. I would like to access the raw data stored in the index. Can somebody give me a starting point how to do that? As a first step, a simple dump would be absolutely ok. I just want to play around and do some static offline analysis. In the long term, I probably would like to implement custom search components to enrich my search results. So if there's no export for raw data, I would be happy to learn how to implement custom handlers and/or search components. Some guidance where to start would be very appreciated. It is not clear what you mean by raw data, and what level of customisation you are after. Here are two possibilities: * At the base, Solr indexes are Lucene indexes, so one can always drop down to that level. * Also, Solr allows plugins for various components. This link might be of help, depending on the extent of customisation you are after: http://wiki.apache.org/solr/SolrPlugins Maybe you should approach this from the other end: If you could describe what you are trying to achieve, people might be able to offer possibilities. Regards, Gora
Re: Accessing raw index data
On 12 January 2013 02:03, Achim Domma do...@procoders.net wrote: At the base, Solr indexes are Lucene indexes, so one can always drop down to that level. That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather multiple of them), holding all words which occurre in my documents, each word having a list of documents the word was part of. I would like to do some statistics based on this information, would like to analyze how it changes if I change my text processing settings, ... If you would give me a starting point like Data is stored in Lucene indexes, which are documented at XXX. In a request handler you can access the indexes via YYY., I would be perfectly happy figuring out the rest on my own. Documentation about 4.0 is a bit limited, so it's hard to find an entry point. Sadly, you have hit the limits of my knowledge: We have not yet had the need to delve into details of Lucene indexes, but I am sure that others can fill in. Regards, Gora
Re: Accessing raw index data
Have you looked at Solr admin interface in details? Specifically, analysis section under each core. It provides some of the statistics you seem to want. And, gives you the source code to look at to understand how to create your own version of that. Specifically, the Luke package is what you might be looking for. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jan 11, 2013 at 3:33 PM, Achim Domma do...@procoders.net wrote: At the base, Solr indexes are Lucene indexes, so one can always drop down to that level. That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather multiple of them), holding all words which occurre in my documents, each word having a list of documents the word was part of. I would like to do some statistics based on this information, would like to analyze how it changes if I change my text processing settings, ... If you would give me a starting point like Data is stored in Lucene indexes, which are documented at XXX. In a request handler you can access the indexes via YYY., I would be perfectly happy figuring out the rest on my own. Documentation about 4.0 is a bit limited, so it's hard to find an entry point. cheers, Achim Am 11.01.2013 um 20:54 schrieb Gora Mohanty: On 12 January 2013 01:06, Achim Domma do...@procoders.net wrote: Hi, I have just setup my first Solr 4.0 instance and have added about one million documents. I would like to access the raw data stored in the index. Can somebody give me a starting point how to do that? As a first step, a simple dump would be absolutely ok. I just want to play around and do some static offline analysis. In the long term, I probably would like to implement custom search components to enrich my search results. So if there's no export for raw data, I would be happy to learn how to implement custom handlers and/or search components. Some guidance where to start would be very appreciated. It is not clear what you mean by raw data, and what level of customisation you are after. Here are two possibilities: * At the base, Solr indexes are Lucene indexes, so one can always drop down to that level. * Also, Solr allows plugins for various components. This link might be of help, depending on the extent of customisation you are after: http://wiki.apache.org/solr/SolrPlugins Maybe you should approach this from the other end: If you could describe what you are trying to achieve, people might be able to offer possibilities. Regards, Gora
SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr
i have a bit strange usecase. when i index a pdf to solr i use ContentStreamUpdateRequest. The lucene document then contains in the text field all containing items (the parsed items of the physical pdf). i also need to add these parsed items to another lucene document. is there a way, to receive/parse these items just in memory, without comitting them to lucene? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SloppyPhraseScorer behavior change
Moreover just checked .. autoGeneratePhraseQueries=true is set for both 3.4 and 4.0 in my schema. Thanks Varun On Fri, Jan 11, 2013 at 1:04 PM, varun srivastava varunmail...@gmail.comwrote: Hi Jack, Is this a new change done in solr 4.0 ? Seems autoGeneratePhraseQueries option is present from solr 3.1. Just wanted to confirm this is the difference causing change in behavior between 3.4 and 4.0. Thanks Varun On Mon, Dec 24, 2012 at 3:00 PM, Jack Krupansky j...@basetechnology.comwrote: Thanks. Sloppy phrase requires that the query terms be in a phrase, but you don't have any quotes in your query. Depending on your schema field type you may be running into a change in how auto-generated phrase queries are handled. It used to be that apple0ipad would always be treated as the quoted phrase apple 0 ipad, but now that is only true if your field type has autoGeneratePhraseQueries=true set. Now, if you don't have that option set, the term gets treated as (apple OR 0 OR ipad), which is a lot looser than the exact phrase. Look at the new example schema for the text_en_splitting field type as an example. -- Jack Krupansky -Original Message- From: varun srivastava Sent: Monday, December 24, 2012 5:49 PM To: solr-user@lucene.apache.org Subject: Re: SloppyPhraseScorer behavior change Hi Jack, My query was simple /solr/select?query=ipad apple apple0ipad and doc contained apple ipad . If you see the patch attached with the bug 3215 , you will find following comment. I want to confirm whether the behaviour I am observing is in sync with what the patch developer intended or its just some regression bug. In solr 3.4 phrase order is honored, whereas in solr 4.0 phrase order is not honored, i.e. apple ipad and ipad apple both treated as same. /** + * Score a candidate doc for all slop-valid position-combinations (matches) + * encountered while traversing/hopping the PhrasePositions. + * br The score contribution of a match depends on the distance: + * br - highest score for distance=0 (exact match). + * br - score gets lower as distance gets higher. + * brExample: for query a b~2, a document x a b a y can be scored twice: + * once for a b (distance=0), and once for b a (distance=2). + * brPossibly not all valid combinations are encountered, because for efficiency + * we always propagate the least PhrasePosition. This allows to base on + * PriorityQueue and move forward faster. + * As result, for example, document a b c b a + * would score differently for queries a b c~4 and c b a~4, although + * they really are equivalent. + * Similarly, for doc a b c b a f g, query c b~2 + * would get same score as g f~2, although c b~2 could be matched twice. + * We may want to fix this in the future (currently not, for performance reasons). + */ On Mon, Dec 24, 2012 at 1:21 PM, Jack Krupansky j...@basetechnology.com **wrote: Could you post the full query URL, so we can see exactly what your query was? Or, post the output of debug=query, which will show us what Lucene query was generated. -- Jack Krupansky -Original Message- From: varun srivastava Sent: Monday, December 24, 2012 1:53 PM To: solr-user@lucene.apache.org Subject: SloppyPhraseScorer behavior change Hi, Due to following bug fix https://issues.apache.org/jira/browse/LUCENE-3215https://issues.apache.org/**jira/browse/LUCENE-3215 https:**//issues.apache.org/jira/**browse/LUCENE-3215https://issues.apache.org/jira/browse/LUCENE-3215observing a change in behavior of SloppyPhraseScorer. I just wanted to confirm my understanding with you all. After solr 3.5 ( bug is fixed in 3.5), if there is a document a b c d e, then in solr 3.4 only query a b will match with document, but in solr 3.5 onwards, both query a b and b a will match. Is it right ? Thanks Varun
Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr
If I understand it, you are sending the file to Solr which then uses Tika library to do the preprocessing/extraction and stores the results in the defined fields . If you don't want Solr to do the storing and want to change extracted fields, just use the Tika library in your client and work with returned document yourself. This is less of a network load as well, as you don't send the whole file over the wire. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jan 11, 2013 at 3:55 PM, uwe72 uwe.clem...@exxcellent.de wrote: i have a bit strange usecase. when i index a pdf to solr i use ContentStreamUpdateRequest. The lucene document then contains in the text field all containing items (the parsed items of the physical pdf). i also need to add these parsed items to another lucene document. is there a way, to receive/parse these items just in memory, without comitting them to lucene? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr
Yes, i don't really want to index/store the pdf document in lucene. i just need the parsed tokens for other things. So you mean i can use ExtractingRequestHandler.java to retrieve the items. has anybody a piece of code, doing that? actually i give the pdf as input and want the parsed items (the same what would be in the text field in the stored lucene doc). -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032646.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr
ok, seems this works: Tika tika = new Tika(); String tokens = tika.parseToString(file); -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032649.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Accessing raw index data
On 1/11/2013 1:33 PM, Achim Domma wrote: At the base, Solr indexes are Lucene indexes, so one can always drop down to that level. That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather multiple of them), holding all words which occurre in my documents, each word having a list of documents the word was part of. I would like to do some statistics based on this information, would like to analyze how it changes if I change my text processing settings, ... If you would give me a starting point like Data is stored in Lucene indexes, which are documented at XXX. In a request handler you can access the indexes via YYY., I would be perfectly happy figuring out the rest on my own. Documentation about 4.0 is a bit limited, so it's hard to find an entry point. There is the TermsComponent, which can be utilized in a terms requestHandler. The example solrconfig.xml found in all downloaded copies of Solr has a /terms request handler. http://wiki.apache.org/solr/TermsComponent As you've already been told, there is a tool called Luke, but a version that works with Solr 4.0.0 is hard to find. The official download location only has a 4.0.0-ALPHA version, and there have been reported problems using it with indexes from the final Solr 4.0.0. Thanks, Shawn
Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr
Look at the extractOnly parameter. But doing this in your client is the more recommended way of doing this to keep Solr from getting beat up too bad. Erik On Jan 11, 2013, at 15:55, uwe72 uwe.clem...@exxcellent.de wrote: i have a bit strange usecase. when i index a pdf to solr i use ContentStreamUpdateRequest. The lucene document then contains in the text field all containing items (the parsed items of the physical pdf). i also need to add these parsed items to another lucene document. is there a way, to receive/parse these items just in memory, without comitting them to lucene? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr
Erik, what do u mean with this parameter, i don't find it.. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032656.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr
It's an ExtractingRequestHandler parameter (see the wiki). Not quite sure the Java incantation to set that but definitely possible. Erik On Jan 11, 2013, at 17:14, uwe72 uwe.clem...@exxcellent.de wrote: Erik, what do u mean with this parameter, i don't find it.. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032656.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: how to perform a delta-import when related table is updated
Awesome! This one line did the trick: entity name=freemedia pk=id query=select * from freemedia WHERE categoryid0 Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587p4032671.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: retrieving latest document **only**
Cool… it worked… But the count of all the groups and the count inside stats component does not match… Is that a bug? ./zahoor On 11-Jan-2013, at 6:48 PM, Upayavira u...@odoko.co.uk wrote: could you use field collapsing? Boost by date and only show one value per group, and you'll have the most recent document only. Upayavira On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote: one crude way is first query and pick the latest date from the result then issue a query with q=timestamp[latestDate TO latestDate] But i dont want to execute two queries... ./zahoor On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote: What do you want? 'the most recent ones' or '**only** the latest' ? Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs. Uwe I need **only** the latest documents... in the above query , refdate can vary based on the query. ./zahoor
Re: retrieving latest document **only**
Not sure exactly what you mean, can you give an example? Upayavira On Sat, Jan 12, 2013, at 06:32 AM, J Mohamed Zahoor wrote: Cool… it worked… But the count of all the groups and the count inside stats component does not match… Is that a bug? ./zahoor On 11-Jan-2013, at 6:48 PM, Upayavira u...@odoko.co.uk wrote: could you use field collapsing? Boost by date and only show one value per group, and you'll have the most recent document only. Upayavira On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote: one crude way is first query and pick the latest date from the result then issue a query with q=timestamp[latestDate TO latestDate] But i dont want to execute two queries... ./zahoor On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote: What do you want? 'the most recent ones' or '**only** the latest' ? Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs. Uwe I need **only** the latest documents... in the above query , refdate can vary based on the query. ./zahoor