Re: SolrCloud removing shard (how to not loose data)

2013-01-11 Thread mizayah
Mark, I know i still have access to data and i can woke ap shard again. What i want to do is. I have 3 shards on 3 nodes, one on each. Now i discower that i dont need 3 nodes and i want only 2. So i want to remove shard and put data from it to these who left. Is there way to index that data

Re: retrieving latest document **only**

2013-01-11 Thread Uwe Reh
Am 10.01.2013 11:54, schrieb jmozah: I need a query that matches only the most recent ones... Because my stats depend on it.. But I have a requirement to show **only** the latest documents and the stats along with it.. What do you want? 'the most recent ones' or '**only** the latest' ?

Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Per Steffensen
Hi I read http://wiki.apache.org/solr/SolrSecurity and know a lot about webcontainer authentication and authorization. Im sure I will be able to set it up so that each solr-node is will require HTTP authentication for (selected) incoming requests. But solr-nodes also make requests among

Re: Auto completion

2013-01-11 Thread anurag.jain
in solrconfig.xml str name=defTypeedismax/str str name=qf text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0 branch_name^1.1 hq_passout_year^1.4 course_type^10.0 institute_name^5.0 qualification_type^5.0 mail^2.0 state_name^1.0 /str

which way for export

2013-01-11 Thread stockii
hello. Which is the best/fastest way to get the value of many fields from index? My problem is, that i need to calculate a sum of amounts. this amount is in my index (stored=true). my php script get all values with paging. but if a request takes too long, jetty is killing this process of export.

RE: Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Markus Jelsma
Hi, If your credentials are fixed i would configure username:password in your request handler's shardHandlerFactory configuration section and then modify HttpShardHandlerFactory.init() to create a HttpClient with an AuthScope configured with those settings. I don't think you can obtain the

Re: retrieving latest document **only**

2013-01-11 Thread jmozah
What do you want? 'the most recent ones' or '**only** the latest' ? Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs. Uwe I need **only** the latest documents... in the above query , refdate can vary based on the query. ./zahoor

Re: retrieving latest document **only**

2013-01-11 Thread jmozah
one crude way is first query and pick the latest date from the result then issue a query with q=timestamp[latestDate TO latestDate] But i dont want to execute two queries... ./zahoor On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote: What do you want? 'the most recent ones' or

Re: retrieving latest document **only**

2013-01-11 Thread Upayavira
could you use field collapsing? Boost by date and only show one value per group, and you'll have the most recent document only. Upayavira On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote: one crude way is first query and pick the latest date from the result then issue a query with

configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer

Re: SolrCloud removing shard (how to not loose data)

2013-01-11 Thread mizayah
Seams I'm to lazy. I found this http://wiki.apache.org/solr/MergingSolrIndexes, and it works rly. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-removing-shard-how-to-not-loose-data-tp4032138p4032508.html Sent from the Solr - User mailing list archive at

Solr 4.0, slow opening searchers

2013-01-11 Thread Marcel Bremer
Hi, We're experiencing slow startup times of searchers in Solr when containing a large number of documents. We use Solr v4.0 with Jetty and currently have 267.657.634 documents stored, spread across 9 cores. These documents contain keywords, with additional statistics, which we are using for

Re: Index data from multiple tables into Solr

2013-01-11 Thread Dariusz Borowski
Hi! I know the pain! ;) That's why I wrote a bit on a blog, so I could remember in the future. Here is the link in case you would like to read a tutorial how to setup SOLR w/ multicore and hook it up to the database: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ I

Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our

SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
When thinkting some more, Perhaps I could have coursename and such as multivalue? Or should I have separate indeces for users, courses and languages? I get the feeling both would work, but now sure which way is the best to go. When a user is updating/removing/adding a course it would be nice to

SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hmm noticed I wrote I have 3 columns: users, courses and languages I ofcourse mean I have 3 tables: users, courses and languages /Niklas -Ursprungligt meddelande- Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] Skickat: den 11 januari 2013 14:19 Till:

SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hi Dariusz, To me this example has one table user and I have many tables that connects to one user and that is what I'm unsure how how to do. /Niklas -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 14:56 Till:

Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
Hi, No, it has actually two tables. User and Item. The example shown on the blog is for one table, because you repeat the same thing for the other table. Only your data-import.xml file changes. For the rest, just copy and paste it in the conf directory. If you are running your solr in Linux, then

Re: Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Per Steffensen
Hmmm, it will not work for me. I want the original credential forwarded in the sub-requests. The credentials are mapped to permissions (authorization), and basically I dont want a user to be able have something done in the (automatically performed by the contacted solr-node) sub-requests that

Re: Reading properties in data-import.xml

2013-01-11 Thread Dariusz Borowski
Thanks Alex! This brought me to the solution I wanted to achieve. :) D. On Thu, Jan 10, 2013 at 3:21 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: dataimport.properties is for DIH to store it's own properties for delta processing and things. Try solrcore.properties instead, as per

SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Ahh sorry, Now I understand, Ok seems like a good solution, I just know need to understand how to query multiple cores now :) -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 15:15 Till: solr-user@lucene.apache.org Ämne: Re:

Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
I don't know how to query multiple cores and if it's possible at once, but otherwise I would create a JOIN sql script if you need values from multiple tables. D. On Fri, Jan 11, 2013 at 3:27 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Ahh sorry, Now I understand, Ok seems like

Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 19:57, Niklas Langvig niklas.lang...@globesoft.com wrote: Ahh sorry, Now I understand, Ok seems like a good solution, I just know need to understand how to query multiple cores now :) There is no need to use multiple cores in your setup. Going back to your original problem

SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
It sounds good not to use more than one core, for sure I do not want to over complicate this. Yes I meant tables. It's pretty simple. Both table courses and languages has it's own primary key courseseqno and languagesseqno Both also have a foreign key userid that references the users table

Re: Getting Files into Zookeeper

2013-01-11 Thread Mark Miller
It's a bug that you only see RuntimeException - in 4.1 you will get the real problem - which is likely around connecting to zookeeper. You might try with a single zk host in the zk host string initially. That might make it easier to track down why it won't connect. It's tough to diagnose

Re: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Mark Miller
On Jan 10, 2013, at 12:06 PM, Shawn Heisey s...@elyograg.org wrote: On 1/9/2013 8:54 PM, Mark Miller wrote: I'd put everything into one. You can upload different named sets of config files and point collections either to the same sets or different sets. You can really think about it the

RE: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Markus Jelsma
FYI: XInclude works fine. We have all request handlers in solrconfig in separate files and include them via XInclude on a running SolrCloud cluster. -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 11-Jan-2013 17:13 To: solr-user@lucene.apache.org Subject: Re:

Re: Getting Files into Zookeeper

2013-01-11 Thread Christopher Gross
I changed it to only go to one Zookeeper (localhost:2181) and it still gave me the same stack trace error. I was eventually able to get around this -- I just used the bootstrap arguments when starting up my Tomcat instances to push the configs over -- though I'd rather just do it externally from

Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 21:13, Niklas Langvig niklas.lang...@globesoft.com wrote: It sounds good not to use more than one core, for sure I do not want to over complicate this. [...] Yes, not only are multiple cores unnecessarily complicated here, your searches will also be be less complex, and

How to disable\clear filterCache(from SolrIndexSearcher ) in a custom searchComponent

2013-01-11 Thread radu
Hello thank you in advance for your help!, *Context:* I have implemented a custom search component that receives 3 parameters field, termValue and payloadX. The component should search for a termValue in the requested Lucene field and for each *termValue* to check *payloadX* in its associated

RE: Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Markus Jelsma
Hmm, you need to set up the HttpClient in HttpShardHandlerFactory but you cannot access the HttpServletRequest from there, it is only available in SolrDispatchFilter AFAIK. And then, the HttpServletRequest can only return the remote user name, not the password he, she or it provided. I don't

Re: link on graph page

2013-01-11 Thread Mark Miller
They point to the admin UI - or should - that seems right? - Mark On Jan 11, 2013, at 10:57 AM, Christopher Gross cogr...@gmail.com wrote: I've managed to get my SolrCloud set up to have 2 different indexes up and running. However, my URLs aren't right. They just point to

Re: configuring schema to match database

2013-01-11 Thread Jens Grivolla
On 01/11/2013 05:23 PM, Gora Mohanty wrote: You are still thinking of Solr as a RDBMS, where you should not be. In your case, it is easiest to flatten out the data. This increases the size of the index, but that should not really be of concern. As your courses and languages tables are connected

Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 22:30, Jens Grivolla j+...@grivolla.net wrote: [...] Actually, that is what you would get when doing a join in an RDBMS, the cross-product of your tables. This is NOT AT ALL what you typically do in Solr. Best start the other way around, think of Solr as a retrieval

Re: Solr 4.0, slow opening searchers

2013-01-11 Thread Alan Woodward
Hi Marcel, Are you committing data with hard commits or soft commits? I've seen systems where we've inadvertently only used soft commits, which means that the entire transaction log has to be re-read on startup, which can take a long time. Hard commits flush indexed data to disk, and make it

how to perform a delta-import when related table is updated

2013-01-11 Thread PeterKerk
My delta-import (http://localhost:8983/solr/freemedia/dataimport?command=delta-import) does not correctly update my solr fields. Please see my data-config here: entity name=freemedia query=select * from freemedia WHERE categoryid0 deltaImportQuery=select * from freemedia

Re: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Shawn Heisey
On 1/11/2013 9:15 AM, Markus Jelsma wrote: FYI: XInclude works fine. We have all request handlers in solrconfig in separate files and include them via XInclude on a running SolrCloud cluster. Good to know. I'm still deciding whether I want to recombine or continue to use xinclude. Is the

RE: how to perform a delta-import when related table is updated

2013-01-11 Thread Dyer, James
Peter, See http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command , then scroll down to where it says The deltaQuery in the above example only detects changes in item but not in other tables... It shows you two ways to do it. Option 1: add a reference to the

RE: how to perform a delta-import when related table is updated

2013-01-11 Thread PeterKerk
Hi James, Ok, so I did this: entity name=freemedia query=select * from freemedia WHERE categoryid0 deltaImportQuery=select * from freemedia WHERE updatedate lt; getdate() AND id='${dataimporter.delta.id}' AND categoryid0 deltaQuery=select id from freemedia where id in

Accessing raw index data

2013-01-11 Thread Achim Domma
Hi, I have just setup my first Solr 4.0 instance and have added about one million documents. I would like to access the raw data stored in the index. Can somebody give me a starting point how to do that? As a first step, a simple dump would be absolutely ok. I just want to play around and do

Re: Accessing raw index data

2013-01-11 Thread Gora Mohanty
On 12 January 2013 01:06, Achim Domma do...@procoders.net wrote: Hi, I have just setup my first Solr 4.0 instance and have added about one million documents. I would like to access the raw data stored in the index. Can somebody give me a starting point how to do that? As a first step, a

Re: Accessing raw index data

2013-01-11 Thread Achim Domma
At the base, Solr indexes are Lucene indexes, so one can always drop down to that level. That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather multiple of them), holding all words which occurre in my documents, each word having a list of

Re: Accessing raw index data

2013-01-11 Thread Gora Mohanty
On 12 January 2013 02:03, Achim Domma do...@procoders.net wrote: At the base, Solr indexes are Lucene indexes, so one can always drop down to that level. That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather multiple of them), holding all

Re: Accessing raw index data

2013-01-11 Thread Alexandre Rafalovitch
Have you looked at Solr admin interface in details? Specifically, analysis section under each core. It provides some of the statistics you seem to want. And, gives you the source code to look at to understand how to create your own version of that. Specifically, the Luke package is what you might

SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
i have a bit strange usecase. when i index a pdf to solr i use ContentStreamUpdateRequest. The lucene document then contains in the text field all containing items (the parsed items of the physical pdf). i also need to add these parsed items to another lucene document. is there a way, to

Re: SloppyPhraseScorer behavior change

2013-01-11 Thread varun srivastava
Moreover just checked .. autoGeneratePhraseQueries=true is set for both 3.4 and 4.0 in my schema. Thanks Varun On Fri, Jan 11, 2013 at 1:04 PM, varun srivastava varunmail...@gmail.comwrote: Hi Jack, Is this a new change done in solr 4.0 ? Seems autoGeneratePhraseQueries option is present

Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread Alexandre Rafalovitch
If I understand it, you are sending the file to Solr which then uses Tika library to do the preprocessing/extraction and stores the results in the defined fields . If you don't want Solr to do the storing and want to change extracted fields, just use the Tika library in your client and work with

Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
Yes, i don't really want to index/store the pdf document in lucene. i just need the parsed tokens for other things. So you mean i can use ExtractingRequestHandler.java to retrieve the items. has anybody a piece of code, doing that? actually i give the pdf as input and want the parsed items

Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
ok, seems this works: Tika tika = new Tika(); String tokens = tika.parseToString(file); -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032649.html Sent from the

Re: Accessing raw index data

2013-01-11 Thread Shawn Heisey
On 1/11/2013 1:33 PM, Achim Domma wrote: At the base, Solr indexes are Lucene indexes, so one can always drop down to that level. That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather multiple of them), holding all words which occurre in my

Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread Erik Hatcher
Look at the extractOnly parameter. But doing this in your client is the more recommended way of doing this to keep Solr from getting beat up too bad. Erik On Jan 11, 2013, at 15:55, uwe72 uwe.clem...@exxcellent.de wrote: i have a bit strange usecase. when i index a pdf to solr i use

Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
Erik, what do u mean with this parameter, i don't find it.. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032656.html Sent from the Solr - User mailing list archive at

Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread Erik Hatcher
It's an ExtractingRequestHandler parameter (see the wiki). Not quite sure the Java incantation to set that but definitely possible. Erik On Jan 11, 2013, at 17:14, uwe72 uwe.clem...@exxcellent.de wrote: Erik, what do u mean with this parameter, i don't find it.. -- View this

RE: how to perform a delta-import when related table is updated

2013-01-11 Thread PeterKerk
Awesome! This one line did the trick: entity name=freemedia pk=id query=select * from freemedia WHERE categoryid0 Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587p4032671.html Sent from the Solr

Re: retrieving latest document **only**

2013-01-11 Thread J Mohamed Zahoor
Cool… it worked… But the count of all the groups and the count inside stats component does not match… Is that a bug? ./zahoor On 11-Jan-2013, at 6:48 PM, Upayavira u...@odoko.co.uk wrote: could you use field collapsing? Boost by date and only show one value per group, and you'll have the

Re: retrieving latest document **only**

2013-01-11 Thread Upayavira
Not sure exactly what you mean, can you give an example? Upayavira On Sat, Jan 12, 2013, at 06:32 AM, J Mohamed Zahoor wrote: Cool… it worked… But the count of all the groups and the count inside stats component does not match… Is that a bug? ./zahoor On 11-Jan-2013, at 6:48 PM,