Re: SolrCloud removing shard (how to not loose data)

2013-01-11 Thread mizayah
Mark, I know i still have access to data and i can woke ap shard again.

What i want to do is.


I have 3 shards on 3 nodes, one on each. Now i discower that i dont need 3
nodes and i want only 2.
So i want to remove shard and put data from it to these who left.

Is there way to index that data without force index it again?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-removing-shard-how-to-not-loose-data-tp4032138p4032459.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: retrieving latest document **only**

2013-01-11 Thread Uwe Reh

Am 10.01.2013 11:54, schrieb jmozah:

I need a query that matches only the most recent ones...
Because my stats depend on it..

But I have a requirement to show **only** the latest documents and the
stats along with it..


What do you want?
'the most recent ones' or '**only** the latest' ?

Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs.

Uwe



Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Per Steffensen

Hi

I read http://wiki.apache.org/solr/SolrSecurity and know a lot about 
webcontainer authentication and authorization. Im sure I will be able to 
set it up so that each solr-node is will require HTTP authentication for 
(selected) incoming requests.


But solr-nodes also make requests among each other and Im in doubt if 
credentials are forwarded from the original request to the internal 
sub-requests?
E.g. lets say that each solr-node is set up to require authentication 
for search request. An outside user makes a distributed request 
including correct username/password. Since it is a distributed search, 
the node which handles the original request from the user will have to 
make sub-requests to other solr-nodes but they also require correct 
credentials in order to accept this sub-request. Are the credentials 
from the original request duplicated to the sub-requests or what options 
do I have?
Same thing goes for e.g. update requests if they are sent to a node 
which does not run (all) the replica of the shard in which the documents 
to be added/updated/deleted belong. The node needs to make sub-request 
to other nodes, and it will require forwarding the credentials.


Does this just work out of the box, or ... ?

Regards, Per Steffensen


Re: Auto completion

2013-01-11 Thread anurag.jain
in solrconfig.xml 

 
   str name=defTypeedismax/str
   str name=qf
  text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0
branch_name^1.1 hq_passout_year^1.4
  course_type^10.0 institute_name^5.0 qualification_type^5.0
mail^2.0 state_name^1.0
   /str
   str name=dftext/str
   str name=mm100%/str
   str name=q.alt*:*/str
   str name=rows10/str
   str name=fl*,score/str

   str name=mlt.qf
 text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0
branch_name^1.1 hq_passout_year^1.4
  course_type^10.0 institute_name^5.0 qualification_type^5.0
mail^2.0 state_name^1.0
   /str
   str
name=mlt.fltext,last_name,first_name,course_name,id,branch_name,hq_passout_year,course_type,institute_name,qualification_type,mail,state_name/str
   int name=mlt.count3/int

   
   str name=faceton/str
   str name=facet.fieldis_top_institute/str
   str name=facet.fieldcourse_name/str
  
   str name=facet.rangecgpa/str
   int name=f.cgpa.facet.range.start0/int
   int name=f.cgpa.facet.range.end10/int
   int name=f.cgpa.facet.range.gap2/int




and in schema.xml



   field name=id type=text_general indexed=true stored=true
required=true multiValued=false / 
   field name=first_name type=text_general indexed=false
stored=true/
   field name=last_name type=text_general indexed=false
stored=true/
   field name=institute_name type=text_general indexed=true
stored=true/
...
...
...


copyField source=first_name dest=text/
copyField source=last_name dest=text/
 copyField source=institute_name dest=text/
 ...
 ...
 ...


so please now tell me what will be JavaScript (terms.fl parameter) ? and
conf/velocity/head.vm, and also the 'name' reference in suggest.vm. 


please reply .. and thanks for previous reply ..  :-)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-completion-tp4032267p4032450.html
Sent from the Solr - User mailing list archive at Nabble.com.


which way for export

2013-01-11 Thread stockii
hello.

Which is the best/fastest way to get the value of many fields from index?

My problem is, that i need to calculate a sum of amounts. this amount is in
my index (stored=true). my php script get all values with paging. but if a
request takes too long, jetty is killing this process of export.

is it better to get all the fields with wt=csv/json/xml or something other
handler?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/which-way-for-export-tp4032487.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Markus Jelsma
Hi,

If your credentials are fixed i would configure username:password in your 
request handler's shardHandlerFactory configuration section and then modify 
HttpShardHandlerFactory.init() to create a HttpClient with an AuthScope 
configured with those settings.

I don't think you can obtain the original credentials very easy when inside 
HttpShardHandlerFactory.

Cheers 
 
-Original message-
 From:Per Steffensen st...@designware.dk
 Sent: Fri 11-Jan-2013 13:07
 To: solr-user@lucene.apache.org
 Subject: Forwarding authentication credentials in internal node-to-node 
 requests
 
 Hi
 
 I read http://wiki.apache.org/solr/SolrSecurity and know a lot about 
 webcontainer authentication and authorization. Im sure I will be able to 
 set it up so that each solr-node is will require HTTP authentication for 
 (selected) incoming requests.
 
 But solr-nodes also make requests among each other and Im in doubt if 
 credentials are forwarded from the original request to the internal 
 sub-requests?
 E.g. lets say that each solr-node is set up to require authentication 
 for search request. An outside user makes a distributed request 
 including correct username/password. Since it is a distributed search, 
 the node which handles the original request from the user will have to 
 make sub-requests to other solr-nodes but they also require correct 
 credentials in order to accept this sub-request. Are the credentials 
 from the original request duplicated to the sub-requests or what options 
 do I have?
 Same thing goes for e.g. update requests if they are sent to a node 
 which does not run (all) the replica of the shard in which the documents 
 to be added/updated/deleted belong. The node needs to make sub-request 
 to other nodes, and it will require forwarding the credentials.
 
 Does this just work out of the box, or ... ?
 
 Regards, Per Steffensen
 


Re: retrieving latest document **only**

2013-01-11 Thread jmozah



 What do you want?
 'the most recent ones' or '**only** the latest' ?
 
 Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs.
 
 Uwe
 


I need **only** the latest documents...
in the above query , refdate can vary based on the query.

./zahoor





Re: retrieving latest document **only**

2013-01-11 Thread jmozah
one crude way is first query and pick the latest date from the result
then issue a query with q=timestamp[latestDate TO latestDate]

But i dont want to execute two queries...

./zahoor

On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote:

 
 
 
 What do you want?
 'the most recent ones' or '**only** the latest' ?
 
 Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs.
 
 Uwe
 
 
 
 I need **only** the latest documents...
 in the above query , refdate can vary based on the query.
 
 ./zahoor
 
 
 



Re: retrieving latest document **only**

2013-01-11 Thread Upayavira
could you use field collapsing? Boost by date and only show one value
per group, and you'll have the most recent document only.

Upayavira

On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote:
 one crude way is first query and pick the latest date from the result
 then issue a query with q=timestamp[latestDate TO latestDate]
 
 But i dont want to execute two queries...
 
 ./zahoor
 
 On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote:
 
  
  
  
  What do you want?
  'the most recent ones' or '**only** the latest' ?
  
  Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs.
  
  Uwe
  
  
  
  I need **only** the latest documents...
  in the above query , refdate can vary based on the query.
  
  ./zahoor
  
  
  
 


configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hi!
I'm quite new to solr and trying to understand how to create a schema from how 
our postgres database and then search for the content in solr instead of 
querying the db.

My question should be really easy, it has most likely been asked many times but 
still I'm not able to google any answer to it.

To make it easy, I have 3 columns: users, courses and languages

Users has columns , userid, firstname, lastname
Courses has column coursename, startdate, enddate
Languages has column language, writingskill, verbalskill

UserA has taken courseA, courseB and courseC and has writingskill good 
verbalskill good for english and writingskill excellent verbalskill excellent 
for spanish
UserB has taken courseA, courseF, courseG and courseH and has writingskill 
fluent verbalskill fluent for english and writingskill good verbalskill good 
for italian

I would like to put this data into solr so I can search for all users how have 
taken courseA and are fluent in english.
Can I do that?

The problem is I'm not sure how to flatten this database into a schema
It's easy to understand the users column, for example
field name=userid type=string indexed=true /
field name=firstname type=string indexed=true /
field name=lastname type=string indexed=true /

But then I'm not so sure how the schema should look like for courses and 
languages
field name=userid type=string indexed=true /
field name=coursename type=string indexed=true /
field name=startdate type=string indexed=true /
field name=enddate type=string indexed=true /


Thanks for any help
/Niklas


Re: SolrCloud removing shard (how to not loose data)

2013-01-11 Thread mizayah
Seams I'm to lazy.
I found this http://wiki.apache.org/solr/MergingSolrIndexes, and it works
rly.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-removing-shard-how-to-not-loose-data-tp4032138p4032508.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.0, slow opening searchers

2013-01-11 Thread Marcel Bremer
Hi,

We're experiencing slow startup times of searchers in Solr when containing a 
large number of documents.

We use Solr v4.0 with Jetty and currently have 267.657.634 documents stored, 
spread across 9 cores. These documents contain keywords, with additional 
statistics, which we are using for suggestions and related keywords. When we 
(re)start Solr on one of our servers it can take up to two hours before Solr 
has opened all of it's searchers and starts accepting connections again. We 
can't figure out why it takes so long to open those searchers. Also the CPU and 
memory usage of Solr while opening searchers is not extremely high.

Are there any known issues or tips someone could give us to speed up opening 
searchers?

If you need more details, please ping me.


Best regards,

Marcel Bremer
Vinden.nl BV


Re: Index data from multiple tables into Solr

2013-01-11 Thread Dariusz Borowski
Hi!

I know the pain! ;)

That's why I wrote a bit on a blog, so I could remember in the future. Here
is the link in case you would like to read a tutorial how to setup SOLR w/
multicore and hook it up to the database:

http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

I hope it helps!
D.



On Thu, Jan 10, 2013 at 6:19 PM, hassancrowdc hassancrowdc...@gmail.comwrote:

 Hi,
 i am trying to index multiple tables in solr. I am not sure which data
 config file to be changed there are so many of them(like solr-data-config,
 db-data-config)?

 Also, do i have to change the id, name and desc to the name of the columns
 in my table? and

 how do i add solr_details field in schema?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
Hi Niklas,

Maybe this link helps:

http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

D.



On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig 
niklas.lang...@globesoft.com wrote:

 Hi!
 I'm quite new to solr and trying to understand how to create a schema from
 how our postgres database and then search for the content in solr instead
 of querying the db.

 My question should be really easy, it has most likely been asked many
 times but still I'm not able to google any answer to it.

 To make it easy, I have 3 columns: users, courses and languages

 Users has columns , userid, firstname, lastname
 Courses has column coursename, startdate, enddate
 Languages has column language, writingskill, verbalskill

 UserA has taken courseA, courseB and courseC and has writingskill good
 verbalskill good for english and writingskill excellent verbalskill
 excellent for spanish
 UserB has taken courseA, courseF, courseG and courseH and has writingskill
 fluent verbalskill fluent for english and writingskill good verbalskill
 good for italian

 I would like to put this data into solr so I can search for all users how
 have taken courseA and are fluent in english.
 Can I do that?

 The problem is I'm not sure how to flatten this database into a schema
 It's easy to understand the users column, for example
 field name=userid type=string indexed=true /
 field name=firstname type=string indexed=true /
 field name=lastname type=string indexed=true /

 But then I'm not so sure how the schema should look like for courses and
 languages
 field name=userid type=string indexed=true /
 field name=coursename type=string indexed=true /
 field name=startdate type=string indexed=true /
 field name=enddate type=string indexed=true /


 Thanks for any help
 /Niklas



SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
When thinkting some more,
Perhaps I could have coursename and such as multivalue?

Or should I have separate indeces for users, courses and languages?

I get the feeling both would work, but now sure which way is the best to go.

When a user is updating/removing/adding a course it would be nice to to have to 
query the database for users courses and languages and update everything but 
just update a course document
But perhaps I'm thinking to much in database terms?

But still I'm unsure how the schema should look like

Thanks
/Niklas

-Ursprungligt meddelande-
Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] 
Skickat: den 11 januari 2013 14:19
Till: solr-user@lucene.apache.org
Ämne: configuring schema to match database

Hi!
I'm quite new to solr and trying to understand how to create a schema from how 
our postgres database and then search for the content in solr instead of 
querying the db.

My question should be really easy, it has most likely been asked many times but 
still I'm not able to google any answer to it.

To make it easy, I have 3 columns: users, courses and languages

Users has columns , userid, firstname, lastname Courses has column coursename, 
startdate, enddate Languages has column language, writingskill, verbalskill

UserA has taken courseA, courseB and courseC and has writingskill good 
verbalskill good for english and writingskill excellent verbalskill excellent 
for spanish UserB has taken courseA, courseF, courseG and courseH and has 
writingskill fluent verbalskill fluent for english and writingskill good 
verbalskill good for italian

I would like to put this data into solr so I can search for all users how have 
taken courseA and are fluent in english.
Can I do that?

The problem is I'm not sure how to flatten this database into a schema It's 
easy to understand the users column, for example field name=userid 
type=string indexed=true / field name=firstname type=string 
indexed=true / field name=lastname type=string indexed=true /

But then I'm not so sure how the schema should look like for courses and 
languages field name=userid type=string indexed=true / field 
name=coursename type=string indexed=true / field name=startdate 
type=string indexed=true / field name=enddate type=string 
indexed=true /


Thanks for any help
/Niklas


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hmm noticed I wrote I have 3 columns: users, courses and languages
I ofcourse mean I have 3 tables: users, courses and languages

/Niklas

-Ursprungligt meddelande-
Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] 
Skickat: den 11 januari 2013 14:19
Till: solr-user@lucene.apache.org
Ämne: configuring schema to match database

Hi!
I'm quite new to solr and trying to understand how to create a schema from how 
our postgres database and then search for the content in solr instead of 
querying the db.

My question should be really easy, it has most likely been asked many times but 
still I'm not able to google any answer to it.

To make it easy, I have 3 columns: users, courses and languages

Users has columns , userid, firstname, lastname Courses has column coursename, 
startdate, enddate Languages has column language, writingskill, verbalskill

UserA has taken courseA, courseB and courseC and has writingskill good 
verbalskill good for english and writingskill excellent verbalskill excellent 
for spanish UserB has taken courseA, courseF, courseG and courseH and has 
writingskill fluent verbalskill fluent for english and writingskill good 
verbalskill good for italian

I would like to put this data into solr so I can search for all users how have 
taken courseA and are fluent in english.
Can I do that?

The problem is I'm not sure how to flatten this database into a schema It's 
easy to understand the users column, for example field name=userid 
type=string indexed=true / field name=firstname type=string 
indexed=true / field name=lastname type=string indexed=true /

But then I'm not so sure how the schema should look like for courses and 
languages field name=userid type=string indexed=true / field 
name=coursename type=string indexed=true / field name=startdate 
type=string indexed=true / field name=enddate type=string 
indexed=true /


Thanks for any help
/Niklas


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hi Dariusz,
To me this  example has one table user and I have many tables that connects 
to one user and that is what I'm unsure how how to do.

/Niklas


-Ursprungligt meddelande-
Från: Dariusz Borowski [mailto:darius...@gmail.com] 
Skickat: den 11 januari 2013 14:56
Till: solr-user@lucene.apache.org
Ämne: Re: configuring schema to match database

Hi Niklas,

Maybe this link helps:

http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

D.



On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig  niklas.lang...@globesoft.com 
wrote:

 Hi!
 I'm quite new to solr and trying to understand how to create a schema 
 from how our postgres database and then search for the content in solr 
 instead of querying the db.

 My question should be really easy, it has most likely been asked many 
 times but still I'm not able to google any answer to it.

 To make it easy, I have 3 columns: users, courses and languages

 Users has columns , userid, firstname, lastname Courses has column 
 coursename, startdate, enddate Languages has column language, 
 writingskill, verbalskill

 UserA has taken courseA, courseB and courseC and has writingskill good 
 verbalskill good for english and writingskill excellent verbalskill 
 excellent for spanish UserB has taken courseA, courseF, courseG and 
 courseH and has writingskill fluent verbalskill fluent for english and 
 writingskill good verbalskill good for italian

 I would like to put this data into solr so I can search for all users 
 how have taken courseA and are fluent in english.
 Can I do that?

 The problem is I'm not sure how to flatten this database into a schema 
 It's easy to understand the users column, for example field 
 name=userid type=string indexed=true / field name=firstname 
 type=string indexed=true / field name=lastname type=string 
 indexed=true /

 But then I'm not so sure how the schema should look like for courses 
 and languages field name=userid type=string indexed=true / 
 field name=coursename type=string indexed=true / field 
 name=startdate type=string indexed=true / field name=enddate 
 type=string indexed=true /


 Thanks for any help
 /Niklas



Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
Hi,

No, it has actually two tables. User and Item. The example shown on the
blog is for one table, because you repeat the same thing for the other
table. Only your data-import.xml file changes. For the rest, just copy and
paste it in the conf directory. If you are running your solr in Linux, then
you can work with symlinks.

D.



On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig 
niklas.lang...@globesoft.com wrote:

 Hi Dariusz,
 To me this  example has one table user and I have many tables that
 connects to one user and that is what I'm unsure how how to do.

 /Niklas


 -Ursprungligt meddelande-
 Från: Dariusz Borowski [mailto:darius...@gmail.com]
 Skickat: den 11 januari 2013 14:56
 Till: solr-user@lucene.apache.org
 Ämne: Re: configuring schema to match database

 Hi Niklas,

 Maybe this link helps:

 http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

 D.



 On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig 
 niklas.lang...@globesoft.com wrote:

  Hi!
  I'm quite new to solr and trying to understand how to create a schema
  from how our postgres database and then search for the content in solr
  instead of querying the db.
 
  My question should be really easy, it has most likely been asked many
  times but still I'm not able to google any answer to it.
 
  To make it easy, I have 3 columns: users, courses and languages
 
  Users has columns , userid, firstname, lastname Courses has column
  coursename, startdate, enddate Languages has column language,
  writingskill, verbalskill
 
  UserA has taken courseA, courseB and courseC and has writingskill good
  verbalskill good for english and writingskill excellent verbalskill
  excellent for spanish UserB has taken courseA, courseF, courseG and
  courseH and has writingskill fluent verbalskill fluent for english and
  writingskill good verbalskill good for italian
 
  I would like to put this data into solr so I can search for all users
  how have taken courseA and are fluent in english.
  Can I do that?
 
  The problem is I'm not sure how to flatten this database into a schema
  It's easy to understand the users column, for example field
  name=userid type=string indexed=true / field name=firstname
  type=string indexed=true / field name=lastname type=string
  indexed=true /
 
  But then I'm not so sure how the schema should look like for courses
  and languages field name=userid type=string indexed=true /
  field name=coursename type=string indexed=true / field
  name=startdate type=string indexed=true / field name=enddate
  type=string indexed=true /
 
 
  Thanks for any help
  /Niklas
 



Re: Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Per Steffensen
Hmmm, it will not work for me. I want the original credential 
forwarded in the sub-requests. The credentials are mapped to permissions 
(authorization), and basically I dont want a user to be able have 
something done in the (automatically performed by the contacted 
solr-node) sub-requests that he is not authorized to do. Forward of 
credentials is a must. So what you are saying is that I should expect to 
have to do some modifications to Solr in order to achieve what I want?


Regards, Per Steffensen

On 1/11/13 2:11 PM, Markus Jelsma wrote:

Hi,

If your credentials are fixed i would configure username:password in your 
request handler's shardHandlerFactory configuration section and then modify 
HttpShardHandlerFactory.init() to create a HttpClient with an AuthScope 
configured with those settings.

I don't think you can obtain the original credentials very easy when inside 
HttpShardHandlerFactory.

Cheers
  
-Original message-

From:Per Steffensen st...@designware.dk
Sent: Fri 11-Jan-2013 13:07
To: solr-user@lucene.apache.org
Subject: Forwarding authentication credentials in internal node-to-node requests

Hi

I read http://wiki.apache.org/solr/SolrSecurity and know a lot about
webcontainer authentication and authorization. Im sure I will be able to
set it up so that each solr-node is will require HTTP authentication for
(selected) incoming requests.

But solr-nodes also make requests among each other and Im in doubt if
credentials are forwarded from the original request to the internal
sub-requests?
E.g. lets say that each solr-node is set up to require authentication
for search request. An outside user makes a distributed request
including correct username/password. Since it is a distributed search,
the node which handles the original request from the user will have to
make sub-requests to other solr-nodes but they also require correct
credentials in order to accept this sub-request. Are the credentials
from the original request duplicated to the sub-requests or what options
do I have?
Same thing goes for e.g. update requests if they are sent to a node
which does not run (all) the replica of the shard in which the documents
to be added/updated/deleted belong. The node needs to make sub-request
to other nodes, and it will require forwarding the credentials.

Does this just work out of the box, or ... ?

Regards, Per Steffensen





Re: Reading properties in data-import.xml

2013-01-11 Thread Dariusz Borowski
Thanks Alex!

This brought me to the solution I wanted to achieve. :)

D.



On Thu, Jan 10, 2013 at 3:21 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 dataimport.properties is for DIH to store it's own properties for delta
 processing and things. Try solrcore.properties instead, as per recent
 discussion:

 http://lucene.472066.n3.nabble.com/Reading-database-connection-properties-from-external-file-td4031154.html

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Thu, Jan 10, 2013 at 3:58 AM, Dariusz Borowski darius...@gmail.com
 wrote:

  I'm having a problem using a property file in my data-import.xml file.
 
  My aim is to not hard code some values inside my xml file, but rather
  reusing the values from a property file. I'm using multicore and some of
  the values are being changed from time to time and I do not want to
 change
  them in all my data-import files.
 
  For example:
 
  dataSource
  type=JdbcDataSource
  driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://${host}:3306/projectX
  user=${username}
  password=${password} /
 
  I tried everything, but don't know how I can use proporties here. I tried
  to put my values in dataimport.properties, located under SOLR-HOME/conf
  and under SOLR-HOME/core1/conf, but without any success.
 
  Please, could someone help me on this?
 



SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Ahh sorry,
Now I understand,
Ok seems like a good solution, I just know need to understand how to query 
multiple cores now :)

-Ursprungligt meddelande-
Från: Dariusz Borowski [mailto:darius...@gmail.com] 
Skickat: den 11 januari 2013 15:15
Till: solr-user@lucene.apache.org
Ämne: Re: configuring schema to match database

Hi,

No, it has actually two tables. User and Item. The example shown on the blog is 
for one table, because you repeat the same thing for the other table. Only your 
data-import.xml file changes. For the rest, just copy and paste it in the conf 
directory. If you are running your solr in Linux, then you can work with 
symlinks.

D.



On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig  niklas.lang...@globesoft.com 
wrote:

 Hi Dariusz,
 To me this  example has one table user and I have many tables that 
 connects to one user and that is what I'm unsure how how to do.

 /Niklas


 -Ursprungligt meddelande-
 Från: Dariusz Borowski [mailto:darius...@gmail.com]
 Skickat: den 11 januari 2013 14:56
 Till: solr-user@lucene.apache.org
 Ämne: Re: configuring schema to match database

 Hi Niklas,

 Maybe this link helps:

 http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1
 /

 D.



 On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig  
 niklas.lang...@globesoft.com wrote:

  Hi!
  I'm quite new to solr and trying to understand how to create a 
  schema from how our postgres database and then search for the 
  content in solr instead of querying the db.
 
  My question should be really easy, it has most likely been asked 
  many times but still I'm not able to google any answer to it.
 
  To make it easy, I have 3 columns: users, courses and languages
 
  Users has columns , userid, firstname, lastname Courses has column 
  coursename, startdate, enddate Languages has column language, 
  writingskill, verbalskill
 
  UserA has taken courseA, courseB and courseC and has writingskill 
  good verbalskill good for english and writingskill excellent 
  verbalskill excellent for spanish UserB has taken courseA, courseF, 
  courseG and courseH and has writingskill fluent verbalskill fluent 
  for english and writingskill good verbalskill good for italian
 
  I would like to put this data into solr so I can search for all 
  users how have taken courseA and are fluent in english.
  Can I do that?
 
  The problem is I'm not sure how to flatten this database into a 
  schema It's easy to understand the users column, for example field 
  name=userid type=string indexed=true / field name=firstname
  type=string indexed=true / field name=lastname type=string
  indexed=true /
 
  But then I'm not so sure how the schema should look like for courses 
  and languages field name=userid type=string indexed=true / 
  field name=coursename type=string indexed=true / field 
  name=startdate type=string indexed=true / field name=enddate
  type=string indexed=true /
 
 
  Thanks for any help
  /Niklas
 



Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
I don't know how to query multiple cores and if it's possible at once, but
otherwise I would create a JOIN sql script if you need values from multiple
tables.

D.



On Fri, Jan 11, 2013 at 3:27 PM, Niklas Langvig 
niklas.lang...@globesoft.com wrote:

 Ahh sorry,
 Now I understand,
 Ok seems like a good solution, I just know need to understand how to query
 multiple cores now :)

 -Ursprungligt meddelande-
 Från: Dariusz Borowski [mailto:darius...@gmail.com]
 Skickat: den 11 januari 2013 15:15
 Till: solr-user@lucene.apache.org
 Ämne: Re: configuring schema to match database

 Hi,

 No, it has actually two tables. User and Item. The example shown on the
 blog is for one table, because you repeat the same thing for the other
 table. Only your data-import.xml file changes. For the rest, just copy and
 paste it in the conf directory. If you are running your solr in Linux, then
 you can work with symlinks.

 D.



 On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig 
 niklas.lang...@globesoft.com wrote:

  Hi Dariusz,
  To me this  example has one table user and I have many tables that
  connects to one user and that is what I'm unsure how how to do.
 
  /Niklas
 
 
  -Ursprungligt meddelande-
  Från: Dariusz Borowski [mailto:darius...@gmail.com]
  Skickat: den 11 januari 2013 14:56
  Till: solr-user@lucene.apache.org
  Ämne: Re: configuring schema to match database
 
  Hi Niklas,
 
  Maybe this link helps:
 
  http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1
  /
 
  D.
 
 
 
  On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig 
  niklas.lang...@globesoft.com wrote:
 
   Hi!
   I'm quite new to solr and trying to understand how to create a
   schema from how our postgres database and then search for the
   content in solr instead of querying the db.
  
   My question should be really easy, it has most likely been asked
   many times but still I'm not able to google any answer to it.
  
   To make it easy, I have 3 columns: users, courses and languages
  
   Users has columns , userid, firstname, lastname Courses has column
   coursename, startdate, enddate Languages has column language,
   writingskill, verbalskill
  
   UserA has taken courseA, courseB and courseC and has writingskill
   good verbalskill good for english and writingskill excellent
   verbalskill excellent for spanish UserB has taken courseA, courseF,
   courseG and courseH and has writingskill fluent verbalskill fluent
   for english and writingskill good verbalskill good for italian
  
   I would like to put this data into solr so I can search for all
   users how have taken courseA and are fluent in english.
   Can I do that?
  
   The problem is I'm not sure how to flatten this database into a
   schema It's easy to understand the users column, for example field
   name=userid type=string indexed=true / field name=firstname
   type=string indexed=true / field name=lastname type=string
   indexed=true /
  
   But then I'm not so sure how the schema should look like for courses
   and languages field name=userid type=string indexed=true /
   field name=coursename type=string indexed=true / field
   name=startdate type=string indexed=true / field name=enddate
   type=string indexed=true /
  
  
   Thanks for any help
   /Niklas
  
 



Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 19:57, Niklas Langvig niklas.lang...@globesoft.com wrote:
 Ahh sorry,
 Now I understand,
 Ok seems like a good solution, I just know need to understand how to query 
 multiple cores now :)

There is no need to use multiple cores in your setup. Going
back to your original problem statement, it can easily be
handled with a single core, and it actually makes more sense
to do it that way. You will need to give us more details.

  My question should be really easy, it has most likely been asked
  many times but still I'm not able to google any answer to it.
 
  To make it easy, I have 3 columns: users, courses and languages

Presumably, you mean three tables, as you describe each as
having columns. How are the tables connected? Is there a
foreign key relationship between them? Is the relationship
one-to-one, one-to-many, or what?

  Users has columns , userid, firstname, lastname Courses has column
  coursename, startdate, enddate Languages has column language,
  writingskill, verbalskill
[...]
  I would like to put this data into solr so I can search for all
  users how have taken courseA and are fluent in english.
  Can I do that?

1. Your schema for the single core is quite straightforward,
and along the lines of what you had described (one field for
each database column in each table), e.g.,
field name=userid type=string indexed=true /
field name=firstname type=string indexed=true /
field name=lastname type=string indexed=true /
field name=coursename type=string indexed=true /
field name=startdate type=date indexed=true /
field name=enddate type= indexed=true /
field name=language type=string indexed=true /
field name=writingskill type=string indexed=true /
field name=verbalskill type=string indexed=true /
Pay attention to the type. Dates should typically be solr.DateField.
The others can be strings, but if they are integers in the database,
you might benefit from making these integers in Solr also.

2. One has to stop thinking of Solr as a RDBMS. Instead, one
flattens out data from a typical RDBMS structure. It is difficult
to give you complete instructions unless you describe the database
relationships, but, e.g., if one has userA with course1, course2,
and course3, and userB with course2, course4, the Solr documents
would be :
userA course1 details for course1...
userA course2 details for course2...
userA course3 details for course3...
userB course2 details for course2...
userB course4 details for course4...
This scheme could also be extended to languages, depending
on how the tables are related.

3. While indexing into Solr, one has to select from the database,
and flatten out the data as above. The two main ways of
doing this are using a library like SolrJ for Java (other languages
have other libraries, e.g., django-haystack is easy to get started
with if one is using Python/Django), or the Solr DataImportHandler
(please see http://wiki.apache.org/solr/DataImportHandler ) with
nested entities.

4. With such a structure, querying Solr should be simple.

Regards,
Gora


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
It sounds good not to use more than one core, for sure I do not want to over 
complicate this.

Yes I meant tables.
It's pretty simple.

Both table courses and languages has it's own primary key courseseqno and 
languagesseqno
Both also have a foreign key userid that references the users table with 
column userid
The relationship from users to courses and languages are one-to-many.

but I guess I'm thinking wrong because my idead whould be to have a block of 
fields connected with one id

field name=coursename type=string indexed=true /
field name=startdate type=date indexed=true /
field name=enddate type= indexed=true /

These three are connected with a 
field name=courseseqno type=int indexed=true /
But also have a 
field name=userid type=int indexed=true /
To connect to a specific user?

Thanks
/Niklas



-Ursprungligt meddelande-
Från: Gora Mohanty [mailto:g...@mimirtech.com] 
Skickat: den 11 januari 2013 15:55
Till: solr-user@lucene.apache.org
Ämne: Re: configuring schema to match database

On 11 January 2013 19:57, Niklas Langvig niklas.lang...@globesoft.com wrote:
 Ahh sorry,
 Now I understand,
 Ok seems like a good solution, I just know need to understand how to 
 query multiple cores now :)

There is no need to use multiple cores in your setup. Going back to your 
original problem statement, it can easily be handled with a single core, and it 
actually makes more sense to do it that way. You will need to give us more 
details.

  My question should be really easy, it has most likely been asked 
  many times but still I'm not able to google any answer to it.
 
  To make it easy, I have 3 columns: users, courses and languages

Presumably, you mean three tables, as you describe each as having columns. How 
are the tables connected? Is there a foreign key relationship between them? Is 
the relationship one-to-one, one-to-many, or what?

  Users has columns , userid, firstname, lastname Courses has column 
  coursename, startdate, enddate Languages has column language, 
  writingskill, verbalskill
[...]
  I would like to put this data into solr so I can search for all 
  users how have taken courseA and are fluent in english.
  Can I do that?

1. Your schema for the single core is quite straightforward,
and along the lines of what you had described (one field for
each database column in each table), e.g.,
field name=userid type=string indexed=true /
field name=firstname type=string indexed=true /
field name=lastname type=string indexed=true /
field name=coursename type=string indexed=true /
field name=startdate type=date indexed=true /
field name=enddate type= indexed=true /
field name=language type=string indexed=true /
field name=writingskill type=string indexed=true /
field name=verbalskill type=string indexed=true /
Pay attention to the type. Dates should typically be solr.DateField.
The others can be strings, but if they are integers in the database,
you might benefit from making these integers in Solr also.

2. One has to stop thinking of Solr as a RDBMS. Instead, one
flattens out data from a typical RDBMS structure. It is difficult
to give you complete instructions unless you describe the database
relationships, but, e.g., if one has userA with course1, course2,
and course3, and userB with course2, course4, the Solr documents
would be :
userA course1 details for course1...
userA course2 details for course2...
userA course3 details for course3...
userB course2 details for course2...
userB course4 details for course4...
This scheme could also be extended to languages, depending
on how the tables are related.

3. While indexing into Solr, one has to select from the database,
and flatten out the data as above. The two main ways of
doing this are using a library like SolrJ for Java (other languages
have other libraries, e.g., django-haystack is easy to get started
with if one is using Python/Django), or the Solr DataImportHandler
(please see http://wiki.apache.org/solr/DataImportHandler ) with
nested entities.

4. With such a structure, querying Solr should be simple.

Regards,
Gora


Re: Getting Files into Zookeeper

2013-01-11 Thread Mark Miller
It's a bug that you only see RuntimeException - in 4.1 you will get the real 
problem - which is likely around connecting to zookeeper. You might try with a 
single zk host in the zk host string initially. That might make it easier to 
track down why it won't connect. It's tough to diagnose because the root 
exception is being swallowed - it's likely a connect to zk failed exception 
though.

- Mark

On Jan 10, 2013, at 1:34 PM, Christopher Gross cogr...@gmail.com wrote:

 I'm trying to get SolrCloud working with more than one configuration going.
  I have the base schema that Solr 4 comes with, I'd like to push that and
 one from another project (it does have the _version_ field in it.)  I'm
 having difficulty figuring out how to push things into zookeeper, or if I'm
 even doing this right.
 
 From the SolrCloud page, I'm trying this and I get an error --
 
 $ java -classpath
 zookeeper-3.3.6.jar:apache-solr-core-4.0.0.jar:apache-solr-solrj-4.0.0.jar:commons-cli-1.2.jar:slf4j-jdk14-1.6.4.jar:slf4j-api-1.6.4.jar:commons-codec-1.7.jar:commons-fileupload-1.2.1.jar:commons-io-2.1.jar:commons-lang-2.6.jar:guava-r05.jar:httpclient-4.1.3.jar:httpcore-4.1.4.jar:httpmime-4.1.3.jar:jcl-over-slf4j-1.6.4.jar:lucene-analyzers-common-4.0.0.jar:lucene-analyzers-kuromoji-4.0.0.jar:lucene-analyzers-phonetic-4.0.0.jar:lucene-core-4.0.0.jar:lucene-grouping-4.0.0.jar:lucene-highlighter-4.0.0.jar:lucene-memory-4.0.0.jar:lucene-misc-4.0.0.jar:lucene-queries-4.0.0.jar:lucene-queryparser-4.0.0.jar:lucene-spatial-4.0.0.jar:lucene-suggest-4.0.0.jar:spatial4j-0.3.jar:wstx-asl-3.2.7.jar
 org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost
 localhost:2181,localhost:2182,localhost:2183,localhost:2184,localhost:2
 185 -confdir /solr/data/test/conf -confname myconf
 Exception in thread main java.lang.RuntimeException
at
 org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:115)
at
 org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:83)
at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:158)
 
 Can anyone point me in the direction of some documentation or let me know
 if there's something that I'm missing?
 
 Thanks!
 
 -- Chris



Re: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Mark Miller

On Jan 10, 2013, at 12:06 PM, Shawn Heisey s...@elyograg.org wrote:

 On 1/9/2013 8:54 PM, Mark Miller wrote:
 I'd put everything into one. You can upload different named sets of config 
 files and point collections either to the same sets or different sets.
 
 You can really think about it the same way you would setting up a single 
 node with multiple cores. The main difference is that it's easier to share 
 sets of config files across collections if you want to. You don't need to at 
 all though.
 
 I'm not sure if xinclude works with zk, but I don't think it does.
 
 Thank you for your assistance.  I'll work on recombining my solrconfig.xml.  
 Are there any available full examples of how to set up and start both 
 zookeeper and Solr?  I'll be using the included Jetty 8.

I'm not sure - there are a few blog posts out there. The wiki does a decent job 
for Solr but doesn't get in ZooKeeper - the ZooKeeper site has a pretty simple 
setup guide though.

 
 Specific questions that have come to mind:
 
 If I'm planning multiple collections with their own configs, do I still need 
 to bootstrap zookeeper when I start Solr, or should I start it up with the 
 zkHost parameter and then use the collection admin to upload information?  I 
 have not looked closely at the collection admin yet, I just know that it 
 exists.

Currently, there are two main options. Either use the bootstrap param on first 
startup or use the zkcli cmd line tool to upload config sets and link them to 
collections.

 
 I have heard that if a replica node is down long enough that transaction logs 
 are not enough to fully fix that node, SolrCloud will initiate a full 
 replication.  Is that the case?  If so, is it necessary to configure the 
 replication handler with a specific path for the name, or does SolrCloud 
 handle that itself?

The replication handler should be defined as you see it in the default example 
solrconfig.xml file. Very bare bones.

 
 Is there an option on updateLog that controls how many transactions are kept, 
 or is that managed automatically by SolrCloud?  I have read some things that 
 talk about 100 updates.  I expect updates on this to be extremely frequent 
 and small, so 100 updates isn't much, and I may want to increase that.

No option - 100 is it as it has implications on the recovery strategy if it's 
raised. I'd like to see it configurable in the future, but would require make 
some other knobs change as well if I remember right.

 
 Is it expected with future versions of Solr that I could upgrade one of my 
 nodes to 4.2 or 4.3 and have it work with the other node still at 4.1?  I 
 would also hope that would mean that the last 4.x release would work with 
 5.0.  That would make it possible to do rolling upgrades with no downtime.

I don't think we have committed to anything here yet. Seems like something we 
need to hash out, but we have not wanted to be too limited initially. For 
example, the Solr 4.0 to 4.1 upgrade with SolrCloud still needs some 
explanation and might require some down time.

- Mark

 
 Thanks,
 Shawn
 



RE: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Markus Jelsma
FYI: XInclude works fine. We have all request handlers in solrconfig in 
separate files and include them via XInclude on a running SolrCloud cluster. 
 
-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Fri 11-Jan-2013 17:13
 To: solr-user@lucene.apache.org
 Subject: Re: Setting up new SolrCloud - need some guidance
 
 
 On Jan 10, 2013, at 12:06 PM, Shawn Heisey s...@elyograg.org wrote:
 
  On 1/9/2013 8:54 PM, Mark Miller wrote:
  I'd put everything into one. You can upload different named sets of config 
  files and point collections either to the same sets or different sets.
  
  You can really think about it the same way you would setting up a single 
  node with multiple cores. The main difference is that it's easier to share 
  sets of config files across collections if you want to. You don't need to 
  at all though.
  
  I'm not sure if xinclude works with zk, but I don't think it does.
  
  Thank you for your assistance.  I'll work on recombining my solrconfig.xml. 
   Are there any available full examples of how to set up and start both 
  zookeeper and Solr?  I'll be using the included Jetty 8.
 
 I'm not sure - there are a few blog posts out there. The wiki does a decent 
 job for Solr but doesn't get in ZooKeeper - the ZooKeeper site has a pretty 
 simple setup guide though.
 
  
  Specific questions that have come to mind:
  
  If I'm planning multiple collections with their own configs, do I still 
  need to bootstrap zookeeper when I start Solr, or should I start it up with 
  the zkHost parameter and then use the collection admin to upload 
  information?  I have not looked closely at the collection admin yet, I just 
  know that it exists.
 
 Currently, there are two main options. Either use the bootstrap param on 
 first startup or use the zkcli cmd line tool to upload config sets and link 
 them to collections.
 
  
  I have heard that if a replica node is down long enough that transaction 
  logs are not enough to fully fix that node, SolrCloud will initiate a full 
  replication.  Is that the case?  If so, is it necessary to configure the 
  replication handler with a specific path for the name, or does SolrCloud 
  handle that itself?
 
 The replication handler should be defined as you see it in the default 
 example solrconfig.xml file. Very bare bones.
 
  
  Is there an option on updateLog that controls how many transactions are 
  kept, or is that managed automatically by SolrCloud?  I have read some 
  things that talk about 100 updates.  I expect updates on this to be 
  extremely frequent and small, so 100 updates isn't much, and I may want to 
  increase that.
 
 No option - 100 is it as it has implications on the recovery strategy if it's 
 raised. I'd like to see it configurable in the future, but would require make 
 some other knobs change as well if I remember right.
 
  
  Is it expected with future versions of Solr that I could upgrade one of my 
  nodes to 4.2 or 4.3 and have it work with the other node still at 4.1?  I 
  would also hope that would mean that the last 4.x release would work with 
  5.0.  That would make it possible to do rolling upgrades with no downtime.
 
 I don't think we have committed to anything here yet. Seems like something we 
 need to hash out, but we have not wanted to be too limited initially. For 
 example, the Solr 4.0 to 4.1 upgrade with SolrCloud still needs some 
 explanation and might require some down time.
 
 - Mark
 
  
  Thanks,
  Shawn
  
 
 


Re: Getting Files into Zookeeper

2013-01-11 Thread Christopher Gross
I changed it to only go to one Zookeeper (localhost:2181) and it still gave
me the same stack trace error.

I was eventually able to get around this -- I just used the bootstrap
arguments when starting up my Tomcat instances to push the configs over --
though I'd rather just do it externally from Tomcat in the future.

Thanks Mark.

-- Chris


On Fri, Jan 11, 2013 at 11:00 AM, Mark Miller markrmil...@gmail.com wrote:

 It's a bug that you only see RuntimeException - in 4.1 you will get the
 real problem - which is likely around connecting to zookeeper. You might
 try with a single zk host in the zk host string initially. That might make
 it easier to track down why it won't connect. It's tough to diagnose
 because the root exception is being swallowed - it's likely a connect to zk
 failed exception though.

 - Mark

 On Jan 10, 2013, at 1:34 PM, Christopher Gross cogr...@gmail.com wrote:

  I'm trying to get SolrCloud working with more than one configuration
 going.
   I have the base schema that Solr 4 comes with, I'd like to push that and
  one from another project (it does have the _version_ field in it.)  I'm
  having difficulty figuring out how to push things into zookeeper, or if
 I'm
  even doing this right.
 
  From the SolrCloud page, I'm trying this and I get an error --
 
  $ java -classpath
 
 zookeeper-3.3.6.jar:apache-solr-core-4.0.0.jar:apache-solr-solrj-4.0.0.jar:commons-cli-1.2.jar:slf4j-jdk14-1.6.4.jar:slf4j-api-1.6.4.jar:commons-codec-1.7.jar:commons-fileupload-1.2.1.jar:commons-io-2.1.jar:commons-lang-2.6.jar:guava-r05.jar:httpclient-4.1.3.jar:httpcore-4.1.4.jar:httpmime-4.1.3.jar:jcl-over-slf4j-1.6.4.jar:lucene-analyzers-common-4.0.0.jar:lucene-analyzers-kuromoji-4.0.0.jar:lucene-analyzers-phonetic-4.0.0.jar:lucene-core-4.0.0.jar:lucene-grouping-4.0.0.jar:lucene-highlighter-4.0.0.jar:lucene-memory-4.0.0.jar:lucene-misc-4.0.0.jar:lucene-queries-4.0.0.jar:lucene-queryparser-4.0.0.jar:lucene-spatial-4.0.0.jar:lucene-suggest-4.0.0.jar:spatial4j-0.3.jar:wstx-asl-3.2.7.jar
  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost
  localhost:2181,localhost:2182,localhost:2183,localhost:2184,localhost:2
  185 -confdir /solr/data/test/conf -confname myconf
  Exception in thread main java.lang.RuntimeException
 at
  org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:115)
 at
  org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:83)
 at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:158)
 
  Can anyone point me in the direction of some documentation or let me know
  if there's something that I'm missing?
 
  Thanks!
 
  -- Chris




Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 21:13, Niklas Langvig niklas.lang...@globesoft.com wrote:
 It sounds good not to use more than one core, for sure I do not want to over 
 complicate this.
[...]

Yes, not only are multiple cores unnecessarily complicated here,
your searches will also be be less complex, and faster.

 Both table courses and languages has it's own primary key courseseqno and 
 languagesseqno

There is no need to index these.

 Both also have a foreign key userid that references the users table with 
 column userid
 The relationship from users to courses and languages are one-to-many.

 but I guess I'm thinking wrong because my idead whould be to have a block 
 of fields connected with one id

 field name=coursename type=string indexed=true /
 field name=startdate type=date indexed=true /
 field name=enddate type= indexed=true /

 These three are connected with a
 field name=courseseqno type=int indexed=true /
 But also have a
 field name=userid type=int indexed=true /
 To connect to a specific user?
[...]

You are still thinking of Solr as a RDBMS, where you should not
be. In your case, it is easiest to flatten out the data. This increases
the size of the index, but that should not really be of concern. As
your courses and languages tables are connected only to user, the
schema that I described earlier should suffice. To extend my
earlier example, given:
* userA with courses c1, c2, c3, and languages l1, l2
* userB with c2, c3, and l2
you should flatten it such that you get the following Solr documents
userA c1 name c1 startdate...l1 l1 writing skill...
userA c1 name c1 startdate...l2 l2 writing skill...
userA c2 name c2 startdate...l1 l1 writing skill...
...
userB c2 name c2 startdate...l2 l2 writing skill...
userB c3 name c3 startdate...l2 l2 writing skill...
i.e., a total of 3 courses x 2 languages = 6 documents for
userA, and 2 courses x 1 language = 2 documents for userB

In order to get this form of flattened data into Solr, I would
suggest using the DataImportHandler with nested entities.
Please see the earlier link to DIH. Also, a Google search
for Solr dataimporthandler nested entities turns up many
examples, including:
http://solr.pl/en/2010/10/11/data-import-handler-%E2%80%93-how-to-import-data-from-sql-databases-part-1/
Please give it a try, and post here with your attempts if
you run into any issues.

Regards,
Gora


How to disable\clear filterCache(from SolrIndexSearcher ) in a custom searchComponent

2013-01-11 Thread radu

Hello  thank you in advance for your help!,

*Context:*
I have implemented a custom search component that receives 3 parameters 
field, termValue and payloadX.
The component should search for a termValue in the requested Lucene 
field and for each *termValue* to check *payloadX* in its associated 
payload the information.


*Constraints:*
I don't want to disable filterCache from solconfig.xml the filterCache 
class=solr.FastLRUCache  since I have other searchComponents that 
could use the filterCache.


I have implemented this the payload search using SpanTermQuery and 
attached it to q:field=termValue

public class MySearchComponent extends XPatternsSearchComponent {

public void prepare(ResponseBuilder rb){
...   rb.setQueryString(parameters.get(CommonParams.Q)...
}

 public void process(ResponseBuilder rb) {
...

SolrIndexSearcher.QueryResult queryResult = new 
SolrIndexSearcher.QueryResult();// ??? question for help


*CustomSpanTermQuery* customFilterQuery = new CustomSpanTermQuery(field, 
term, payload); //search for payloadCriteria in the payload in a 
specific field for a specific term
QueryCommand queryCommand = 
rb.getQueryCommand().setFilterList(filterQuery));


rb.req.getSearcher().search(queryResult, queryCommand);

...
}

*Issue:*
If I call the search component with field1, termValue1 and:
 - *payload1*(the first search) the result from filtering it is 
saved in filterCache.
 - *payload2*(second time) the results from the first 
search(filterCache) are returned and not a different expected result set.


Findings:
I noticed that in SolrIndexSearch, filterCache is private so I can 
not change\clear it through inheritance.
Also I tried to use rb.getQueryCommand().replaceFlags() but 
SolrIndexSearch.NO_CHECK_FILTERCACHE|NO_CHECK_QCACHE|NO_SET_QCACHE are 
not public too.


*Question*:
How to disable\clear filterCache(from SolrIndexSearcher ) *only 
*for a custom search component.

Do  I have other options\approaches?

Best regards,
Radu


RE: Forwarding authentication credentials in internal node-to-node requests

2013-01-11 Thread Markus Jelsma
Hmm, you need to set up the HttpClient in HttpShardHandlerFactory but you 
cannot access the HttpServletRequest from there, it is only available in 
SolrDispatchFilter AFAIK. And then, the HttpServletRequest can only return the 
remote user name, not the password he, she or it provided. I don't know how to 
obtain the password.
 
-Original message-
 From:Per Steffensen st...@designware.dk
 Sent: Fri 11-Jan-2013 15:28
 To: solr-user@lucene.apache.org
 Subject: Re: Forwarding authentication credentials in internal node-to-node 
 requests
 
 Hmmm, it will not work for me. I want the original credential 
 forwarded in the sub-requests. The credentials are mapped to permissions 
 (authorization), and basically I dont want a user to be able have 
 something done in the (automatically performed by the contacted 
 solr-node) sub-requests that he is not authorized to do. Forward of 
 credentials is a must. So what you are saying is that I should expect to 
 have to do some modifications to Solr in order to achieve what I want?
 
 Regards, Per Steffensen
 
 On 1/11/13 2:11 PM, Markus Jelsma wrote:
  Hi,
 
  If your credentials are fixed i would configure username:password in your 
  request handler's shardHandlerFactory configuration section and then modify 
  HttpShardHandlerFactory.init() to create a HttpClient with an AuthScope 
  configured with those settings.
 
  I don't think you can obtain the original credentials very easy when inside 
  HttpShardHandlerFactory.
 
  Cheers

  -Original message-
  From:Per Steffensen st...@designware.dk
  Sent: Fri 11-Jan-2013 13:07
  To: solr-user@lucene.apache.org
  Subject: Forwarding authentication credentials in internal node-to-node 
  requests
 
  Hi
 
  I read http://wiki.apache.org/solr/SolrSecurity and know a lot about
  webcontainer authentication and authorization. Im sure I will be able to
  set it up so that each solr-node is will require HTTP authentication for
  (selected) incoming requests.
 
  But solr-nodes also make requests among each other and Im in doubt if
  credentials are forwarded from the original request to the internal
  sub-requests?
  E.g. lets say that each solr-node is set up to require authentication
  for search request. An outside user makes a distributed request
  including correct username/password. Since it is a distributed search,
  the node which handles the original request from the user will have to
  make sub-requests to other solr-nodes but they also require correct
  credentials in order to accept this sub-request. Are the credentials
  from the original request duplicated to the sub-requests or what options
  do I have?
  Same thing goes for e.g. update requests if they are sent to a node
  which does not run (all) the replica of the shard in which the documents
  to be added/updated/deleted belong. The node needs to make sub-request
  to other nodes, and it will require forwarding the credentials.
 
  Does this just work out of the box, or ... ?
 
  Regards, Per Steffensen
 
 
 


Re: link on graph page

2013-01-11 Thread Mark Miller
They point to the admin UI - or should - that seems right?

- Mark

On Jan 11, 2013, at 10:57 AM, Christopher Gross cogr...@gmail.com wrote:

 I've managed to get my SolrCloud set up to have 2 different indexes up and
 running.  However, my URLs aren't right.  They just point to
 http://server:port/solr, not http://server:port/solr/index1 or
 http://server:port/solr/index2.
 
 Is that something that I can set in my solr.xml for that Solr instance, or
 is it something that I'd have to set in each one's solrconfig.xml.
 
 Any help would be appreciated.  Thanks!
 
 -- Chris



Re: configuring schema to match database

2013-01-11 Thread Jens Grivolla

On 01/11/2013 05:23 PM, Gora Mohanty wrote:

You are still thinking of Solr as a RDBMS, where you should not
be. In your case, it is easiest to flatten out the data. This increases
the size of the index, but that should not really be of concern. As
your courses and languages tables are connected only to user, the
schema that I described earlier should suffice. To extend my
earlier example, given:
* userA with courses c1, c2, c3, and languages l1, l2
* userB with c2, c3, and l2
you should flatten it such that you get the following Solr documents
userA c1 name c1 startdate...l1 l1 writing skill...
userA c1 name c1 startdate...l2 l2 writing skill...
userA c2 name c2 startdate...l1 l1 writing skill...

userB c2 name c2 startdate...l2 l2 writing skill...
userB c3 name c3 startdate...l2 l2 writing skill...
i.e., a total of 3 courses x 2 languages = 6 documents for
userA, and 2 courses x 1 language = 2 documents for userB


Actually, that is what you would get when doing a join in an RDBMS, the 
cross-product of your tables. This is NOT AT ALL what you typically do 
in Solr.


Best start the other way around, think of Solr as a retrieval system, 
not a storage system. What are your queries? What do you want to find, 
and what criteria do you use to search for it?


If your intention is to find users that match certain criteria, each 
entry should be a user (with ALL associated information, e.g. all 
courses, all language skills, etc.), if you want to retrieve courses, 
each entry should be a course.


Let's say you want to find users who have certain language skills, you 
would have a schema that describes a user:

- user id
- user name
- languages
- ...

In languages, you could store e.g. things like: en|reading|high 
es|writing|low, etc. It could be a multivalued field or just have 
everything separated by space and a tokenizer that splits on whitespace.


Now you can query:

- language:es* -- return all users with some spanish skills
- language:en|writing|high -- return all users with high english writing 
skills
- +(language:es* language:fr*) +language:en|writing|high -- return users 
with high english writing skills and some knowledge of french or spanish


If you want to avoid wildcard queries (more costly) you can just add 
plain en and es, etc. to your field so language:es will match 
anybody with spanish skills.


Best,
Jens



Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 22:30, Jens Grivolla j+...@grivolla.net wrote:
[...]
 Actually, that is what you would get when doing a join in an RDBMS, the 
 cross-product of your tables. This is NOT AT ALL what you typically do in 
 Solr.

 Best start the other way around, think of Solr as a retrieval system, not a 
 storage system. What are your queries? What do you want to find, and what 
 criteria do you use to search for it?
[...]

Um, he did describe his desired queries, and there was a reason
that I proposed the above schema design.

  UserA has taken courseA, courseB and courseC and has writingskill
  good verbalskill good for english and writingskill excellent
  verbalskill excellent for spanish UserB has taken courseA, courseF,
  courseG and courseH and has writingskill fluent verbalskill fluent
  for english and writingskill good verbalskill good for italian

Unless the index is becoming huge, I feel that it is better to
flatten everything out rather than combine fields, and
post-process the results.

Regards,
Gora


Re: Solr 4.0, slow opening searchers

2013-01-11 Thread Alan Woodward
Hi Marcel,

Are you committing data with hard commits or soft commits?  I've seen systems 
where we've inadvertently only used soft commits, which means that the entire 
transaction log has to be re-read on startup, which can take a long time.  Hard 
commits flush indexed data to disk, and make it a lot quicker to restart.

Alan Woodward
a...@flax.co.uk


On 11 Jan 2013, at 13:51, Marcel Bremer wrote:

 Hi,
 
 We're experiencing slow startup times of searchers in Solr when containing a 
 large number of documents.
 
 We use Solr v4.0 with Jetty and currently have 267.657.634 documents stored, 
 spread across 9 cores. These documents contain keywords, with additional 
 statistics, which we are using for suggestions and related keywords. When we 
 (re)start Solr on one of our servers it can take up to two hours before Solr 
 has opened all of it's searchers and starts accepting connections again. We 
 can't figure out why it takes so long to open those searchers. Also the CPU 
 and memory usage of Solr while opening searchers is not extremely high.
 
 Are there any known issues or tips someone could give us to speed up opening 
 searchers?
 
 If you need more details, please ping me.
 
 
 Best regards,
 
 Marcel Bremer
 Vinden.nl BV



how to perform a delta-import when related table is updated

2013-01-11 Thread PeterKerk
My delta-import
(http://localhost:8983/solr/freemedia/dataimport?command=delta-import) does
not correctly update my solr fields.


Please see my data-config here:
entity name=freemedia query=select * from freemedia WHERE
categoryid0
deltaImportQuery=select * from freemedia WHERE updatedate lt; 
getdate()
AND id='${dataimporter.delta.id}' AND categoryid0
deltaQuery=select id from freemedia where updatedate gt;
'${dataimporter.last_index_time}' AND categoryid0


 entity name=lovecount query=select COUNT(id) as likes FROM
freemedialikes WHERE freemediaid=${freemedia.id}/entity 


Now when a new item is inserted into [freemedialikes]
and I perform a delta-import, the Solr index does not show the total new
amount of likes. Only after I perform a full-import
(http://localhost:8983/solr/freemedia/dataimport?command=full-import) the
correct number is shown.
So the SQL is returning the correct results, I just don't know how to get
the updated likes count via the delta-import.

I have reloaded the data-config everytime I made a change. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting up new SolrCloud - need some guidance

2013-01-11 Thread Shawn Heisey

On 1/11/2013 9:15 AM, Markus Jelsma wrote:

FYI: XInclude works fine. We have all request handlers in solrconfig in 
separate files and include them via XInclude on a running SolrCloud cluster.


Good to know.  I'm still deciding whether I want to recombine or 
continue to use xinclude.  Is the xinclude path relative to 
solrconfig.xml just as it is now, so I could link to 
include/indexConfig.xml?  Are things partitioned well enough that one 
collection's config will not overlap into another config when using 
xinclude and relative paths?


The way I do things now, all files in cores/corename/conf (relative to 
solr.home) are symlinks, such as solrconfig.xml - 
../../../config/X/solrconfig.xml, where X is a general 
designation for a type of config.  I have good separation between 
instanceDir, data, and real config files.  The paths in the xinclude 
elements are relative to the location of the symlink.


Thanks,
Shawn



RE: how to perform a delta-import when related table is updated

2013-01-11 Thread Dyer, James
Peter,

See http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command , 
then scroll down to where it says The deltaQuery in the above example only 
detects changes in item but not in other tables...  It shows you two ways to 
do it.

Option 1:  add a reference to the last_modified_date (or whatever) from the 
child table in a where-in clause in the parent entity's deltaQuery.

Option 2:  add a parentDeltaQuery on the child entity.  This is a query that 
tells DIH which parent-table keys need to update because of child table 
updates.  In other words, say your child's Delta Query says that child_id=1 
changed.  You might have for parentDeltaQuery something like: SELECT ID FROM 
PARENT P WHERE P.CHILD_ID=${Child.ID} .  While this can simplify things for you 
and prevent you from not needing giant where-in clauses on the parent query, 
it will double the number of queries that get issued to determine which 
documents to update.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Friday, January 11, 2013 12:02 PM
To: solr-user@lucene.apache.org
Subject: how to perform a delta-import when related table is updated

My delta-import
(http://localhost:8983/solr/freemedia/dataimport?command=delta-import) does
not correctly update my solr fields.


Please see my data-config here:
entity name=freemedia query=select * from freemedia WHERE
categoryid0
deltaImportQuery=select * from freemedia WHERE updatedate lt; 
getdate()
AND id='${dataimporter.delta.id}' AND categoryid0
deltaQuery=select id from freemedia where updatedate gt;
'${dataimporter.last_index_time}' AND categoryid0


 entity name=lovecount query=select COUNT(id) as likes FROM
freemedialikes WHERE freemediaid=${freemedia.id}/entity 


Now when a new item is inserted into [freemedialikes]
and I perform a delta-import, the Solr index does not show the total new
amount of likes. Only after I perform a full-import
(http://localhost:8983/solr/freemedia/dataimport?command=full-import) the
correct number is shown.
So the SQL is returning the correct results, I just don't know how to get
the updated likes count via the delta-import.

I have reloaded the data-config everytime I made a change. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: how to perform a delta-import when related table is updated

2013-01-11 Thread PeterKerk
Hi James,

Ok, so I did this:
entity name=freemedia query=select * from freemedia WHERE 
categoryid0
deltaImportQuery=select * from freemedia WHERE updatedate lt; 
getdate()
AND id='${dataimporter.delta.id}' AND categoryid0
deltaQuery=select id from freemedia where id in
(select freemediaid as 
id from freemedialikes where createdate 
'${dih.last_index_time}') or updatedate gt;
'${dataimporter.last_index_time}' AND categoryid0

I now get this error in the logfile:


SEVERE: Delta Import Failed
java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
declared primary key pk='ID'



Now, my table looks like this:  


CREATE TABLE [dbo].[freemedialikes](
[id] [int] IDENTITY(1,1) NOT NULL,
[userid] [nvarchar](50) NOT NULL,
[freemediaid] [int] NOT NULL,
[createdate] [datetime] NOT NULL,
 CONSTRAINT [PK_freemedialikes] PRIMARY KEY CLUSTERED 
(
[id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[freemedialikes]  WITH CHECK ADD  CONSTRAINT
[FK_freemedialikes_freemedia] FOREIGN KEY([freemediaid])
REFERENCES [dbo].[freemedia] ([id])
ON DELETE CASCADE
GO

ALTER TABLE [dbo].[freemedialikes] CHECK CONSTRAINT
[FK_freemedialikes_freemedia]
GO

ALTER TABLE [dbo].[freemedialikes] ADD  CONSTRAINT
[DF_freemedialikes_createdate]  DEFAULT (getdate()) FOR [createdate]
GO


So in the deltaquery I thought I had to reference the freemediaid, like so:
select freemediaid as id from freemedialikes

Got the same error as above.
So then I thought since there was mention of a PK in the error I just
reference the PK of the childtable, didn't make sense, but hey :)
select id from freemedialikes w

But I got the same error again.


Any suggestions?
Thanks! 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587p4032608.html
Sent from the Solr - User mailing list archive at Nabble.com.


Accessing raw index data

2013-01-11 Thread Achim Domma
Hi,

I have just setup my first Solr 4.0 instance and have added about one million 
documents. I would like to access the raw data stored in the index. Can 
somebody give me a starting point how to do that?

As a first step, a simple dump would be absolutely ok. I just want to play 
around and do some static offline analysis. In the long term, I probably would 
like to implement custom search components to enrich my search results. So if 
there's no export for raw data, I would be happy to learn how to implement 
custom handlers and/or search components. Some guidance where to start would be 
very appreciated.

kind regards,
Achim

Re: Accessing raw index data

2013-01-11 Thread Gora Mohanty
On 12 January 2013 01:06, Achim Domma do...@procoders.net wrote:

 Hi,

 I have just setup my first Solr 4.0 instance and have added about one
 million documents. I would like to access the raw data stored in the index.
 Can somebody give me a starting point how to do that?

 As a first step, a simple dump would be absolutely ok. I just want to play
 around and do some static offline analysis. In the long term, I probably
 would like to implement custom search components to enrich my search
 results. So if there's no export for raw data, I would be happy to learn how
 to implement custom handlers and/or search components. Some guidance where
 to start would be very appreciated.

It is not clear what you mean by raw data, and what level of
customisation you are after. Here are two possibilities:
* At the base, Solr indexes are Lucene indexes, so one can always
  drop down to that level.
* Also, Solr allows plugins for various components. This link might
  be of help, depending on the extent of customisation you are after:
  http://wiki.apache.org/solr/SolrPlugins

Maybe you should approach this from the other end: If you could
describe what you are trying to achieve, people might be able to
offer possibilities.

Regards,
Gora


Re: Accessing raw index data

2013-01-11 Thread Achim Domma
At the base, Solr indexes are Lucene indexes, so one can always
 drop down to that level.

That's what I'm looking for. I understand, that at the end, there has to be an 
inverse index (or rather multiple of them), holding all words which occurre 
in my documents, each word having a list of documents the word was part of. 
I would like to do some statistics based on this information, would like to 
analyze how it changes if I change my text processing settings, ...

If you would give me a starting point like Data is stored in Lucene indexes, 
which are documented at XXX. In a request handler you can access the indexes 
via YYY., I would be perfectly happy figuring out the rest on my own. 
Documentation about 4.0 is a bit limited, so it's hard to find an entry point.

cheers,
Achim

Am 11.01.2013 um 20:54 schrieb Gora Mohanty:

 On 12 January 2013 01:06, Achim Domma do...@procoders.net wrote:
 
 Hi,
 
 I have just setup my first Solr 4.0 instance and have added about one
 million documents. I would like to access the raw data stored in the index.
 Can somebody give me a starting point how to do that?
 
 As a first step, a simple dump would be absolutely ok. I just want to play
 around and do some static offline analysis. In the long term, I probably
 would like to implement custom search components to enrich my search
 results. So if there's no export for raw data, I would be happy to learn how
 to implement custom handlers and/or search components. Some guidance where
 to start would be very appreciated.
 
 It is not clear what you mean by raw data, and what level of
 customisation you are after. Here are two possibilities:
 * At the base, Solr indexes are Lucene indexes, so one can always
  drop down to that level.
 * Also, Solr allows plugins for various components. This link might
  be of help, depending on the extent of customisation you are after:
  http://wiki.apache.org/solr/SolrPlugins
 
 Maybe you should approach this from the other end: If you could
 describe what you are trying to achieve, people might be able to
 offer possibilities.
 
 Regards,
 Gora



Re: Accessing raw index data

2013-01-11 Thread Gora Mohanty
On 12 January 2013 02:03, Achim Domma do...@procoders.net wrote:
 At the base, Solr indexes are Lucene indexes, so one can always
  drop down to that level.

 That's what I'm looking for. I understand, that at the end, there has to be 
 an inverse index (or rather multiple of them), holding all words which 
 occurre in my documents, each word having a list of documents the word 
 was part of. I would like to do some statistics based on this information, 
 would like to analyze how it changes if I change my text processing settings, 
 ...

 If you would give me a starting point like Data is stored in Lucene indexes, 
 which are documented at XXX. In a request handler you can access the indexes 
 via YYY., I would be perfectly happy figuring out the rest on my own. 
 Documentation about 4.0 is a bit limited, so it's hard to find an entry point.

Sadly, you have hit the limits of my knowledge: We
have not yet had the need to delve into details of
Lucene indexes, but I am sure that others can fill in.

Regards,
Gora


Re: Accessing raw index data

2013-01-11 Thread Alexandre Rafalovitch
Have you looked at Solr admin interface in details? Specifically, analysis
section under each core. It provides some of the statistics you seem to
want. And, gives you the source code to look at to understand how to create
your own version of that. Specifically, the Luke package is what you
might be looking for.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Jan 11, 2013 at 3:33 PM, Achim Domma do...@procoders.net wrote:

 At the base, Solr indexes are Lucene indexes, so one can always
  drop down to that level.

 That's what I'm looking for. I understand, that at the end, there has to
 be an inverse index (or rather multiple of them), holding all words which
 occurre in my documents, each word having a list of documents the word
 was part of. I would like to do some statistics based on this information,
 would like to analyze how it changes if I change my text processing
 settings, ...

 If you would give me a starting point like Data is stored in Lucene
 indexes, which are documented at XXX. In a request handler you can access
 the indexes via YYY., I would be perfectly happy figuring out the rest on
 my own. Documentation about 4.0 is a bit limited, so it's hard to find an
 entry point.

 cheers,
 Achim

 Am 11.01.2013 um 20:54 schrieb Gora Mohanty:

  On 12 January 2013 01:06, Achim Domma do...@procoders.net wrote:
 
  Hi,
 
  I have just setup my first Solr 4.0 instance and have added about one
  million documents. I would like to access the raw data stored in the
 index.
  Can somebody give me a starting point how to do that?
 
  As a first step, a simple dump would be absolutely ok. I just want to
 play
  around and do some static offline analysis. In the long term, I probably
  would like to implement custom search components to enrich my search
  results. So if there's no export for raw data, I would be happy to
 learn how
  to implement custom handlers and/or search components. Some guidance
 where
  to start would be very appreciated.
 
  It is not clear what you mean by raw data, and what level of
  customisation you are after. Here are two possibilities:
  * At the base, Solr indexes are Lucene indexes, so one can always
   drop down to that level.
  * Also, Solr allows plugins for various components. This link might
   be of help, depending on the extent of customisation you are after:
   http://wiki.apache.org/solr/SolrPlugins
 
  Maybe you should approach this from the other end: If you could
  describe what you are trying to achieve, people might be able to
  offer possibilities.
 
  Regards,
  Gora




SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
i have a bit strange usecase.

when i index a pdf to solr i use ContentStreamUpdateRequest.
The lucene document then contains in the text field all containing items
(the parsed items of the physical pdf).

i also need to add these parsed items to another lucene document.

is there a way, to receive/parse these items just in memory, without
comitting them to lucene?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SloppyPhraseScorer behavior change

2013-01-11 Thread varun srivastava
Moreover just checked .. autoGeneratePhraseQueries=true is set for both
3.4 and 4.0 in my schema.

Thanks
Varun

On Fri, Jan 11, 2013 at 1:04 PM, varun srivastava varunmail...@gmail.comwrote:

 Hi Jack,
  Is this a new change done in solr 4.0 ? Seems autoGeneratePhraseQueries
 option is present from solr 3.1. Just wanted to confirm this is the
 difference causing change in behavior between 3.4 and 4.0.


 Thanks
 Varun


 On Mon, Dec 24, 2012 at 3:00 PM, Jack Krupansky 
 j...@basetechnology.comwrote:

 Thanks. Sloppy phrase requires that the query terms be in a phrase, but
 you don't have any quotes in your query.

 Depending on your schema field type you may be running into a change in
 how auto-generated phrase queries are handled. It used to be that
 apple0ipad would always be treated as the quoted phrase apple 0 ipad, but
 now that is only true if your field type has autoGeneratePhraseQueries=true
 set. Now, if you don't have that option set, the term gets treated as
 (apple OR 0 OR ipad), which is a lot looser than the exact phrase.

 Look at the new example schema for the text_en_splitting field type as
 an example.


 -- Jack Krupansky

 -Original Message- From: varun srivastava
 Sent: Monday, December 24, 2012 5:49 PM
 To: solr-user@lucene.apache.org
 Subject: Re: SloppyPhraseScorer behavior change


 Hi Jack,
 My query was simple /solr/select?query=ipad apple apple0ipad
 and doc contained apple ipad .

 If you see the patch attached with the bug 3215 , you will find following
 comment. I want to confirm whether the behaviour I am observing is in sync
 with what the patch developer intended or its just some regression bug. In
 solr 3.4 phrase order is honored, whereas in solr 4.0 phrase order is not
 honored, i.e. apple ipad and ipad apple both treated as same.



 

 /**
 +   * Score a candidate doc for all slop-valid position-combinations
 (matches)
 +   * encountered while traversing/hopping the PhrasePositions.
 +   * br The score contribution of a match depends on the distance:
 +   * br - highest score for distance=0 (exact match).
 +   * br - score gets lower as distance gets higher.
 +   * brExample: for query a b~2, a document x a b a y can be
 scored twice:
 +   * once for a b (distance=0), and once for b a (distance=2).
 +   * brPossibly not all valid combinations are encountered, because
 for efficiency
 +   * we always propagate the least PhrasePosition. This allows to base on
 +   * PriorityQueue and move forward faster.
 +   * As result, for example, document a b c b a
 +   * would score differently for queries a b c~4 and c b a~4,
 although
 +   * they really are equivalent.
 +   * Similarly, for doc a b c b a f g, query c b~2
 +   * would get same score as g f~2, although c b~2 could be matched
 twice.
 +   * We may want to fix this in the future (currently not, for
 performance reasons).
 +   */

 



 On Mon, Dec 24, 2012 at 1:21 PM, Jack Krupansky j...@basetechnology.com
 **wrote:

  Could you post the full query URL, so we can see exactly what your query
 was? Or, post the output of debug=query, which will show us what Lucene
 query was generated.

 -- Jack Krupansky

 -Original Message- From: varun srivastava
 Sent: Monday, December 24, 2012 1:53 PM
 To: solr-user@lucene.apache.org
 Subject: SloppyPhraseScorer behavior change


 Hi,
  Due to following bug fix
 https://issues.apache.org/jira/browse/LUCENE-3215https://issues.apache.org/**jira/browse/LUCENE-3215
 https:**//issues.apache.org/jira/**browse/LUCENE-3215https://issues.apache.org/jira/browse/LUCENE-3215observing
 a change

 in behavior of SloppyPhraseScorer. I just wanted to
 confirm my understanding with you all.

 After solr 3.5 ( bug is fixed in 3.5), if there is a document a b c d
 e,
 then in solr 3.4 only query a b will match with document, but in solr
 3.5
 onwards, both  query a b and b a will match. Is it right ?


 Thanks
 Varun






Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread Alexandre Rafalovitch
If I understand it, you are sending the file to Solr which then uses Tika
library to do the preprocessing/extraction and stores the results in the
defined fields .

If you don't want Solr to do the storing and want to change extracted
fields, just use the Tika library in your client and work with returned
document yourself. This is less of a network load as well, as you don't
send the whole file over the wire.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Jan 11, 2013 at 3:55 PM, uwe72 uwe.clem...@exxcellent.de wrote:

 i have a bit strange usecase.

 when i index a pdf to solr i use ContentStreamUpdateRequest.
 The lucene document then contains in the text field all containing items
 (the parsed items of the physical pdf).

 i also need to add these parsed items to another lucene document.

 is there a way, to receive/parse these items just in memory, without
 comitting them to lucene?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
Yes, i don't really want to index/store the pdf document in lucene.

i just need the parsed tokens for other things.

So you mean i can use ExtractingRequestHandler.java to retrieve the items.

has anybody a piece of code, doing that?

actually i give the pdf as input and want the parsed items (the same what
would be in the text field in the stored lucene doc).





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032646.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
ok, seems this works:

  Tika tika = new Tika();
  String tokens = tika.parseToString(file);  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032649.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Accessing raw index data

2013-01-11 Thread Shawn Heisey

On 1/11/2013 1:33 PM, Achim Domma wrote:

At the base, Solr indexes are Lucene indexes, so one can always
  drop down to that level.

That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather 
multiple of them), holding all words which occurre in my documents, each word having 
a list of documents the word was part of. I would like to do some statistics based on this 
information, would like to analyze how it changes if I change my text processing settings, ...

If you would give me a starting point like Data is stored in Lucene indexes, which 
are documented at XXX. In a request handler you can access the indexes via YYY., I 
would be perfectly happy figuring out the rest on my own. Documentation about 4.0 is a 
bit limited, so it's hard to find an entry point.


There is the TermsComponent, which can be utilized in a terms 
requestHandler.  The example solrconfig.xml found in all downloaded 
copies of Solr has a /terms request handler.


http://wiki.apache.org/solr/TermsComponent

As you've already been told, there is a tool called Luke, but a version 
that works with Solr 4.0.0 is hard to find.  The official download 
location only has a 4.0.0-ALPHA version, and there have been reported 
problems using it with indexes from the final Solr 4.0.0.


Thanks,
Shawn



Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread Erik Hatcher
Look at the extractOnly parameter. 

But doing this in your client is the more recommended way of doing this to keep 
Solr from getting beat up too bad. 

Erik

On Jan 11, 2013, at 15:55, uwe72 uwe.clem...@exxcellent.de wrote:

 i have a bit strange usecase.
 
 when i index a pdf to solr i use ContentStreamUpdateRequest.
 The lucene document then contains in the text field all containing items
 (the parsed items of the physical pdf).
 
 i also need to add these parsed items to another lucene document.
 
 is there a way, to receive/parse these items just in memory, without
 comitting them to lucene?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread uwe72
Erik, what do u mean with this parameter, i don't find it..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032656.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ |ContentStreamUpdateRequest | Accessing parsed items without committing to solr

2013-01-11 Thread Erik Hatcher
It's an ExtractingRequestHandler parameter (see the wiki).  Not quite sure the 
Java incantation to set that but definitely possible. 
 
Erik

On Jan 11, 2013, at 17:14, uwe72 uwe.clem...@exxcellent.de wrote:

 Erik, what do u mean with this parameter, i don't find it..
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrJ-ContentStreamUpdateRequest-Accessing-parsed-items-without-committing-to-solr-tp4032636p4032656.html
 Sent from the Solr - User mailing list archive at Nabble.com.


RE: how to perform a delta-import when related table is updated

2013-01-11 Thread PeterKerk
Awesome!

This one line did the trick:
entity name=freemedia pk=id query=select * from freemedia WHERE
categoryid0

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-perform-a-delta-import-when-related-table-is-updated-tp4032587p4032671.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: retrieving latest document **only**

2013-01-11 Thread J Mohamed Zahoor
Cool… it worked… But the count of all the groups and the count inside stats 
component does not match…
Is that a bug?

./zahoor


On 11-Jan-2013, at 6:48 PM, Upayavira u...@odoko.co.uk wrote:

 could you use field collapsing? Boost by date and only show one value
 per group, and you'll have the most recent document only.
 
 Upayavira
 
 On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote:
 one crude way is first query and pick the latest date from the result
 then issue a query with q=timestamp[latestDate TO latestDate]
 
 But i dont want to execute two queries...
 
 ./zahoor
 
 On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote:
 
 
 
 
 What do you want?
 'the most recent ones' or '**only** the latest' ?
 
 Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs.
 
 Uwe
 
 
 
 I need **only** the latest documents...
 in the above query , refdate can vary based on the query.
 
 ./zahoor
 
 
 
 



Re: retrieving latest document **only**

2013-01-11 Thread Upayavira
Not sure exactly what you mean, can you give an example?

Upayavira

On Sat, Jan 12, 2013, at 06:32 AM, J Mohamed Zahoor wrote:
 Cool… it worked… But the count of all the groups and the count inside
 stats component does not match…
 Is that a bug?
 
 ./zahoor
 
 
 On 11-Jan-2013, at 6:48 PM, Upayavira u...@odoko.co.uk wrote:
 
  could you use field collapsing? Boost by date and only show one value
  per group, and you'll have the most recent document only.
  
  Upayavira
  
  On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote:
  one crude way is first query and pick the latest date from the result
  then issue a query with q=timestamp[latestDate TO latestDate]
  
  But i dont want to execute two queries...
  
  ./zahoor
  
  On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote:
  
  
  
  
  What do you want?
  'the most recent ones' or '**only** the latest' ?
  
  Perhaps a range query q=timestamp:[refdate TO NOW] will match your 
  needs.
  
  Uwe
  
  
  
  I need **only** the latest documents...
  in the above query , refdate can vary based on the query.
  
  ./zahoor