Re: configuring schema to match database

2013-01-14 Thread Jens Grivolla

On 01/11/2013 06:14 PM, Gora Mohanty wrote:

On 11 January 2013 22:30, Jens Grivolla j+...@grivolla.net wrote:
[...]

Actually, that is what you would get when doing a join in an RDBMS, the 
cross-product of your tables. This is NOT AT ALL what you typically do in Solr.

Best start the other way around, think of Solr as a retrieval system, not a 
storage system. What are your queries? What do you want to find, and what 
criteria do you use to search for it?

[...]

Um, he did describe his desired queries, and there was a reason
that I proposed the above schema design.


He said he wants queries such as users how have taken courseA and are 
fluent in english, which is exactly one case I was describing.



UserA has taken courseA, courseB and courseC and has writingskill
good verbalskill good for english and writingskill excellent
verbalskill excellent for spanish UserB has taken courseA, courseF,
courseG and courseH and has writingskill fluent verbalskill fluent
for english and writingskill good verbalskill good for italian


Unless the index is becoming huge, I feel that it is better to
flatten everything out rather than combine fields, and
post-process the results.


Then please show me the query to find users that are fluent in spanish 
and english. Bonus points if you manage to not retrieve the same user 
several times. (Hint, your schema stores only one language skill per row).


Regards,
Jens



Re: configuring schema to match database

2013-01-14 Thread Gora Mohanty
On 14 January 2013 16:59, Jens Grivolla j+...@grivolla.net wrote:
[...]
 Then please show me the query to find users that are fluent in spanish and
 english. Bonus points if you manage to not retrieve the same user several
 times. (Hint, your schema stores only one language skill per row).

Doh! You are right, of course. Brainfart from my side.

Regards,
Gora


Re: configuring schema to match database

2013-01-14 Thread Jens Grivolla

On 01/14/2013 12:50 PM, Gora Mohanty wrote:

On 14 January 2013 16:59, Jens Grivolla j+...@grivolla.net wrote:
[...]

Then please show me the query to find users that are fluent in spanish and
english. Bonus points if you manage to not retrieve the same user several
times. (Hint, your schema stores only one language skill per row).


Doh! You are right, of course. Brainfart from my side.


Ok, I was starting to wonder if I was the one missing something. 
Re-reading what I wrote I see I may have sounded a bit rude, that was 
not my intention, sorry.


Best,
Jens




Re: configuring schema to match database

2013-01-14 Thread Gora Mohanty
On 14 January 2013 17:28, Jens Grivolla j+...@grivolla.net wrote:
 On 01/14/2013 12:50 PM, Gora Mohanty wrote:
[...]
 Doh! You are right, of course. Brainfart from my side.


 Ok, I was starting to wonder if I was the one missing something. Re-reading
 what I wrote I see I may have sounded a bit rude, that was not my intention,
 sorry.

Did not take it as rude, and in any case am willing to
tolerate a lot of impoliteness when someone shows me
that I was wrong.

Must have been half-asleep when I wrote my original
reply, and was then trying to defend it. At least that's
my story, and I am sticking to it :-)

Regards,
Gora


configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hi!
I'm quite new to solr and trying to understand how to create a schema from how 
our postgres database and then search for the content in solr instead of 
querying the db.

My question should be really easy, it has most likely been asked many times but 
still I'm not able to google any answer to it.

To make it easy, I have 3 columns: users, courses and languages

Users has columns , userid, firstname, lastname
Courses has column coursename, startdate, enddate
Languages has column language, writingskill, verbalskill

UserA has taken courseA, courseB and courseC and has writingskill good 
verbalskill good for english and writingskill excellent verbalskill excellent 
for spanish
UserB has taken courseA, courseF, courseG and courseH and has writingskill 
fluent verbalskill fluent for english and writingskill good verbalskill good 
for italian

I would like to put this data into solr so I can search for all users how have 
taken courseA and are fluent in english.
Can I do that?

The problem is I'm not sure how to flatten this database into a schema
It's easy to understand the users column, for example
field name=userid type=string indexed=true /
field name=firstname type=string indexed=true /
field name=lastname type=string indexed=true /

But then I'm not so sure how the schema should look like for courses and 
languages
field name=userid type=string indexed=true /
field name=coursename type=string indexed=true /
field name=startdate type=string indexed=true /
field name=enddate type=string indexed=true /


Thanks for any help
/Niklas


Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
Hi Niklas,

Maybe this link helps:

http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

D.



On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig 
niklas.lang...@globesoft.com wrote:

 Hi!
 I'm quite new to solr and trying to understand how to create a schema from
 how our postgres database and then search for the content in solr instead
 of querying the db.

 My question should be really easy, it has most likely been asked many
 times but still I'm not able to google any answer to it.

 To make it easy, I have 3 columns: users, courses and languages

 Users has columns , userid, firstname, lastname
 Courses has column coursename, startdate, enddate
 Languages has column language, writingskill, verbalskill

 UserA has taken courseA, courseB and courseC and has writingskill good
 verbalskill good for english and writingskill excellent verbalskill
 excellent for spanish
 UserB has taken courseA, courseF, courseG and courseH and has writingskill
 fluent verbalskill fluent for english and writingskill good verbalskill
 good for italian

 I would like to put this data into solr so I can search for all users how
 have taken courseA and are fluent in english.
 Can I do that?

 The problem is I'm not sure how to flatten this database into a schema
 It's easy to understand the users column, for example
 field name=userid type=string indexed=true /
 field name=firstname type=string indexed=true /
 field name=lastname type=string indexed=true /

 But then I'm not so sure how the schema should look like for courses and
 languages
 field name=userid type=string indexed=true /
 field name=coursename type=string indexed=true /
 field name=startdate type=string indexed=true /
 field name=enddate type=string indexed=true /


 Thanks for any help
 /Niklas



SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
When thinkting some more,
Perhaps I could have coursename and such as multivalue?

Or should I have separate indeces for users, courses and languages?

I get the feeling both would work, but now sure which way is the best to go.

When a user is updating/removing/adding a course it would be nice to to have to 
query the database for users courses and languages and update everything but 
just update a course document
But perhaps I'm thinking to much in database terms?

But still I'm unsure how the schema should look like

Thanks
/Niklas

-Ursprungligt meddelande-
Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] 
Skickat: den 11 januari 2013 14:19
Till: solr-user@lucene.apache.org
Ämne: configuring schema to match database

Hi!
I'm quite new to solr and trying to understand how to create a schema from how 
our postgres database and then search for the content in solr instead of 
querying the db.

My question should be really easy, it has most likely been asked many times but 
still I'm not able to google any answer to it.

To make it easy, I have 3 columns: users, courses and languages

Users has columns , userid, firstname, lastname Courses has column coursename, 
startdate, enddate Languages has column language, writingskill, verbalskill

UserA has taken courseA, courseB and courseC and has writingskill good 
verbalskill good for english and writingskill excellent verbalskill excellent 
for spanish UserB has taken courseA, courseF, courseG and courseH and has 
writingskill fluent verbalskill fluent for english and writingskill good 
verbalskill good for italian

I would like to put this data into solr so I can search for all users how have 
taken courseA and are fluent in english.
Can I do that?

The problem is I'm not sure how to flatten this database into a schema It's 
easy to understand the users column, for example field name=userid 
type=string indexed=true / field name=firstname type=string 
indexed=true / field name=lastname type=string indexed=true /

But then I'm not so sure how the schema should look like for courses and 
languages field name=userid type=string indexed=true / field 
name=coursename type=string indexed=true / field name=startdate 
type=string indexed=true / field name=enddate type=string 
indexed=true /


Thanks for any help
/Niklas


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hmm noticed I wrote I have 3 columns: users, courses and languages
I ofcourse mean I have 3 tables: users, courses and languages

/Niklas

-Ursprungligt meddelande-
Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] 
Skickat: den 11 januari 2013 14:19
Till: solr-user@lucene.apache.org
Ämne: configuring schema to match database

Hi!
I'm quite new to solr and trying to understand how to create a schema from how 
our postgres database and then search for the content in solr instead of 
querying the db.

My question should be really easy, it has most likely been asked many times but 
still I'm not able to google any answer to it.

To make it easy, I have 3 columns: users, courses and languages

Users has columns , userid, firstname, lastname Courses has column coursename, 
startdate, enddate Languages has column language, writingskill, verbalskill

UserA has taken courseA, courseB and courseC and has writingskill good 
verbalskill good for english and writingskill excellent verbalskill excellent 
for spanish UserB has taken courseA, courseF, courseG and courseH and has 
writingskill fluent verbalskill fluent for english and writingskill good 
verbalskill good for italian

I would like to put this data into solr so I can search for all users how have 
taken courseA and are fluent in english.
Can I do that?

The problem is I'm not sure how to flatten this database into a schema It's 
easy to understand the users column, for example field name=userid 
type=string indexed=true / field name=firstname type=string 
indexed=true / field name=lastname type=string indexed=true /

But then I'm not so sure how the schema should look like for courses and 
languages field name=userid type=string indexed=true / field 
name=coursename type=string indexed=true / field name=startdate 
type=string indexed=true / field name=enddate type=string 
indexed=true /


Thanks for any help
/Niklas


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Hi Dariusz,
To me this  example has one table user and I have many tables that connects 
to one user and that is what I'm unsure how how to do.

/Niklas


-Ursprungligt meddelande-
Från: Dariusz Borowski [mailto:darius...@gmail.com] 
Skickat: den 11 januari 2013 14:56
Till: solr-user@lucene.apache.org
Ämne: Re: configuring schema to match database

Hi Niklas,

Maybe this link helps:

http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

D.



On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig  niklas.lang...@globesoft.com 
wrote:

 Hi!
 I'm quite new to solr and trying to understand how to create a schema 
 from how our postgres database and then search for the content in solr 
 instead of querying the db.

 My question should be really easy, it has most likely been asked many 
 times but still I'm not able to google any answer to it.

 To make it easy, I have 3 columns: users, courses and languages

 Users has columns , userid, firstname, lastname Courses has column 
 coursename, startdate, enddate Languages has column language, 
 writingskill, verbalskill

 UserA has taken courseA, courseB and courseC and has writingskill good 
 verbalskill good for english and writingskill excellent verbalskill 
 excellent for spanish UserB has taken courseA, courseF, courseG and 
 courseH and has writingskill fluent verbalskill fluent for english and 
 writingskill good verbalskill good for italian

 I would like to put this data into solr so I can search for all users 
 how have taken courseA and are fluent in english.
 Can I do that?

 The problem is I'm not sure how to flatten this database into a schema 
 It's easy to understand the users column, for example field 
 name=userid type=string indexed=true / field name=firstname 
 type=string indexed=true / field name=lastname type=string 
 indexed=true /

 But then I'm not so sure how the schema should look like for courses 
 and languages field name=userid type=string indexed=true / 
 field name=coursename type=string indexed=true / field 
 name=startdate type=string indexed=true / field name=enddate 
 type=string indexed=true /


 Thanks for any help
 /Niklas



Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
Hi,

No, it has actually two tables. User and Item. The example shown on the
blog is for one table, because you repeat the same thing for the other
table. Only your data-import.xml file changes. For the rest, just copy and
paste it in the conf directory. If you are running your solr in Linux, then
you can work with symlinks.

D.



On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig 
niklas.lang...@globesoft.com wrote:

 Hi Dariusz,
 To me this  example has one table user and I have many tables that
 connects to one user and that is what I'm unsure how how to do.

 /Niklas


 -Ursprungligt meddelande-
 Från: Dariusz Borowski [mailto:darius...@gmail.com]
 Skickat: den 11 januari 2013 14:56
 Till: solr-user@lucene.apache.org
 Ämne: Re: configuring schema to match database

 Hi Niklas,

 Maybe this link helps:

 http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

 D.



 On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig 
 niklas.lang...@globesoft.com wrote:

  Hi!
  I'm quite new to solr and trying to understand how to create a schema
  from how our postgres database and then search for the content in solr
  instead of querying the db.
 
  My question should be really easy, it has most likely been asked many
  times but still I'm not able to google any answer to it.
 
  To make it easy, I have 3 columns: users, courses and languages
 
  Users has columns , userid, firstname, lastname Courses has column
  coursename, startdate, enddate Languages has column language,
  writingskill, verbalskill
 
  UserA has taken courseA, courseB and courseC and has writingskill good
  verbalskill good for english and writingskill excellent verbalskill
  excellent for spanish UserB has taken courseA, courseF, courseG and
  courseH and has writingskill fluent verbalskill fluent for english and
  writingskill good verbalskill good for italian
 
  I would like to put this data into solr so I can search for all users
  how have taken courseA and are fluent in english.
  Can I do that?
 
  The problem is I'm not sure how to flatten this database into a schema
  It's easy to understand the users column, for example field
  name=userid type=string indexed=true / field name=firstname
  type=string indexed=true / field name=lastname type=string
  indexed=true /
 
  But then I'm not so sure how the schema should look like for courses
  and languages field name=userid type=string indexed=true /
  field name=coursename type=string indexed=true / field
  name=startdate type=string indexed=true / field name=enddate
  type=string indexed=true /
 
 
  Thanks for any help
  /Niklas
 



SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
Ahh sorry,
Now I understand,
Ok seems like a good solution, I just know need to understand how to query 
multiple cores now :)

-Ursprungligt meddelande-
Från: Dariusz Borowski [mailto:darius...@gmail.com] 
Skickat: den 11 januari 2013 15:15
Till: solr-user@lucene.apache.org
Ämne: Re: configuring schema to match database

Hi,

No, it has actually two tables. User and Item. The example shown on the blog is 
for one table, because you repeat the same thing for the other table. Only your 
data-import.xml file changes. For the rest, just copy and paste it in the conf 
directory. If you are running your solr in Linux, then you can work with 
symlinks.

D.



On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig  niklas.lang...@globesoft.com 
wrote:

 Hi Dariusz,
 To me this  example has one table user and I have many tables that 
 connects to one user and that is what I'm unsure how how to do.

 /Niklas


 -Ursprungligt meddelande-
 Från: Dariusz Borowski [mailto:darius...@gmail.com]
 Skickat: den 11 januari 2013 14:56
 Till: solr-user@lucene.apache.org
 Ämne: Re: configuring schema to match database

 Hi Niklas,

 Maybe this link helps:

 http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1
 /

 D.



 On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig  
 niklas.lang...@globesoft.com wrote:

  Hi!
  I'm quite new to solr and trying to understand how to create a 
  schema from how our postgres database and then search for the 
  content in solr instead of querying the db.
 
  My question should be really easy, it has most likely been asked 
  many times but still I'm not able to google any answer to it.
 
  To make it easy, I have 3 columns: users, courses and languages
 
  Users has columns , userid, firstname, lastname Courses has column 
  coursename, startdate, enddate Languages has column language, 
  writingskill, verbalskill
 
  UserA has taken courseA, courseB and courseC and has writingskill 
  good verbalskill good for english and writingskill excellent 
  verbalskill excellent for spanish UserB has taken courseA, courseF, 
  courseG and courseH and has writingskill fluent verbalskill fluent 
  for english and writingskill good verbalskill good for italian
 
  I would like to put this data into solr so I can search for all 
  users how have taken courseA and are fluent in english.
  Can I do that?
 
  The problem is I'm not sure how to flatten this database into a 
  schema It's easy to understand the users column, for example field 
  name=userid type=string indexed=true / field name=firstname
  type=string indexed=true / field name=lastname type=string
  indexed=true /
 
  But then I'm not so sure how the schema should look like for courses 
  and languages field name=userid type=string indexed=true / 
  field name=coursename type=string indexed=true / field 
  name=startdate type=string indexed=true / field name=enddate
  type=string indexed=true /
 
 
  Thanks for any help
  /Niklas
 



Re: configuring schema to match database

2013-01-11 Thread Dariusz Borowski
I don't know how to query multiple cores and if it's possible at once, but
otherwise I would create a JOIN sql script if you need values from multiple
tables.

D.



On Fri, Jan 11, 2013 at 3:27 PM, Niklas Langvig 
niklas.lang...@globesoft.com wrote:

 Ahh sorry,
 Now I understand,
 Ok seems like a good solution, I just know need to understand how to query
 multiple cores now :)

 -Ursprungligt meddelande-
 Från: Dariusz Borowski [mailto:darius...@gmail.com]
 Skickat: den 11 januari 2013 15:15
 Till: solr-user@lucene.apache.org
 Ämne: Re: configuring schema to match database

 Hi,

 No, it has actually two tables. User and Item. The example shown on the
 blog is for one table, because you repeat the same thing for the other
 table. Only your data-import.xml file changes. For the rest, just copy and
 paste it in the conf directory. If you are running your solr in Linux, then
 you can work with symlinks.

 D.



 On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig 
 niklas.lang...@globesoft.com wrote:

  Hi Dariusz,
  To me this  example has one table user and I have many tables that
  connects to one user and that is what I'm unsure how how to do.
 
  /Niklas
 
 
  -Ursprungligt meddelande-
  Från: Dariusz Borowski [mailto:darius...@gmail.com]
  Skickat: den 11 januari 2013 14:56
  Till: solr-user@lucene.apache.org
  Ämne: Re: configuring schema to match database
 
  Hi Niklas,
 
  Maybe this link helps:
 
  http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1
  /
 
  D.
 
 
 
  On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig 
  niklas.lang...@globesoft.com wrote:
 
   Hi!
   I'm quite new to solr and trying to understand how to create a
   schema from how our postgres database and then search for the
   content in solr instead of querying the db.
  
   My question should be really easy, it has most likely been asked
   many times but still I'm not able to google any answer to it.
  
   To make it easy, I have 3 columns: users, courses and languages
  
   Users has columns , userid, firstname, lastname Courses has column
   coursename, startdate, enddate Languages has column language,
   writingskill, verbalskill
  
   UserA has taken courseA, courseB and courseC and has writingskill
   good verbalskill good for english and writingskill excellent
   verbalskill excellent for spanish UserB has taken courseA, courseF,
   courseG and courseH and has writingskill fluent verbalskill fluent
   for english and writingskill good verbalskill good for italian
  
   I would like to put this data into solr so I can search for all
   users how have taken courseA and are fluent in english.
   Can I do that?
  
   The problem is I'm not sure how to flatten this database into a
   schema It's easy to understand the users column, for example field
   name=userid type=string indexed=true / field name=firstname
   type=string indexed=true / field name=lastname type=string
   indexed=true /
  
   But then I'm not so sure how the schema should look like for courses
   and languages field name=userid type=string indexed=true /
   field name=coursename type=string indexed=true / field
   name=startdate type=string indexed=true / field name=enddate
   type=string indexed=true /
  
  
   Thanks for any help
   /Niklas
  
 



Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 19:57, Niklas Langvig niklas.lang...@globesoft.com wrote:
 Ahh sorry,
 Now I understand,
 Ok seems like a good solution, I just know need to understand how to query 
 multiple cores now :)

There is no need to use multiple cores in your setup. Going
back to your original problem statement, it can easily be
handled with a single core, and it actually makes more sense
to do it that way. You will need to give us more details.

  My question should be really easy, it has most likely been asked
  many times but still I'm not able to google any answer to it.
 
  To make it easy, I have 3 columns: users, courses and languages

Presumably, you mean three tables, as you describe each as
having columns. How are the tables connected? Is there a
foreign key relationship between them? Is the relationship
one-to-one, one-to-many, or what?

  Users has columns , userid, firstname, lastname Courses has column
  coursename, startdate, enddate Languages has column language,
  writingskill, verbalskill
[...]
  I would like to put this data into solr so I can search for all
  users how have taken courseA and are fluent in english.
  Can I do that?

1. Your schema for the single core is quite straightforward,
and along the lines of what you had described (one field for
each database column in each table), e.g.,
field name=userid type=string indexed=true /
field name=firstname type=string indexed=true /
field name=lastname type=string indexed=true /
field name=coursename type=string indexed=true /
field name=startdate type=date indexed=true /
field name=enddate type= indexed=true /
field name=language type=string indexed=true /
field name=writingskill type=string indexed=true /
field name=verbalskill type=string indexed=true /
Pay attention to the type. Dates should typically be solr.DateField.
The others can be strings, but if they are integers in the database,
you might benefit from making these integers in Solr also.

2. One has to stop thinking of Solr as a RDBMS. Instead, one
flattens out data from a typical RDBMS structure. It is difficult
to give you complete instructions unless you describe the database
relationships, but, e.g., if one has userA with course1, course2,
and course3, and userB with course2, course4, the Solr documents
would be :
userA course1 details for course1...
userA course2 details for course2...
userA course3 details for course3...
userB course2 details for course2...
userB course4 details for course4...
This scheme could also be extended to languages, depending
on how the tables are related.

3. While indexing into Solr, one has to select from the database,
and flatten out the data as above. The two main ways of
doing this are using a library like SolrJ for Java (other languages
have other libraries, e.g., django-haystack is easy to get started
with if one is using Python/Django), or the Solr DataImportHandler
(please see http://wiki.apache.org/solr/DataImportHandler ) with
nested entities.

4. With such a structure, querying Solr should be simple.

Regards,
Gora


SV: configuring schema to match database

2013-01-11 Thread Niklas Langvig
It sounds good not to use more than one core, for sure I do not want to over 
complicate this.

Yes I meant tables.
It's pretty simple.

Both table courses and languages has it's own primary key courseseqno and 
languagesseqno
Both also have a foreign key userid that references the users table with 
column userid
The relationship from users to courses and languages are one-to-many.

but I guess I'm thinking wrong because my idead whould be to have a block of 
fields connected with one id

field name=coursename type=string indexed=true /
field name=startdate type=date indexed=true /
field name=enddate type= indexed=true /

These three are connected with a 
field name=courseseqno type=int indexed=true /
But also have a 
field name=userid type=int indexed=true /
To connect to a specific user?

Thanks
/Niklas



-Ursprungligt meddelande-
Från: Gora Mohanty [mailto:g...@mimirtech.com] 
Skickat: den 11 januari 2013 15:55
Till: solr-user@lucene.apache.org
Ämne: Re: configuring schema to match database

On 11 January 2013 19:57, Niklas Langvig niklas.lang...@globesoft.com wrote:
 Ahh sorry,
 Now I understand,
 Ok seems like a good solution, I just know need to understand how to 
 query multiple cores now :)

There is no need to use multiple cores in your setup. Going back to your 
original problem statement, it can easily be handled with a single core, and it 
actually makes more sense to do it that way. You will need to give us more 
details.

  My question should be really easy, it has most likely been asked 
  many times but still I'm not able to google any answer to it.
 
  To make it easy, I have 3 columns: users, courses and languages

Presumably, you mean three tables, as you describe each as having columns. How 
are the tables connected? Is there a foreign key relationship between them? Is 
the relationship one-to-one, one-to-many, or what?

  Users has columns , userid, firstname, lastname Courses has column 
  coursename, startdate, enddate Languages has column language, 
  writingskill, verbalskill
[...]
  I would like to put this data into solr so I can search for all 
  users how have taken courseA and are fluent in english.
  Can I do that?

1. Your schema for the single core is quite straightforward,
and along the lines of what you had described (one field for
each database column in each table), e.g.,
field name=userid type=string indexed=true /
field name=firstname type=string indexed=true /
field name=lastname type=string indexed=true /
field name=coursename type=string indexed=true /
field name=startdate type=date indexed=true /
field name=enddate type= indexed=true /
field name=language type=string indexed=true /
field name=writingskill type=string indexed=true /
field name=verbalskill type=string indexed=true /
Pay attention to the type. Dates should typically be solr.DateField.
The others can be strings, but if they are integers in the database,
you might benefit from making these integers in Solr also.

2. One has to stop thinking of Solr as a RDBMS. Instead, one
flattens out data from a typical RDBMS structure. It is difficult
to give you complete instructions unless you describe the database
relationships, but, e.g., if one has userA with course1, course2,
and course3, and userB with course2, course4, the Solr documents
would be :
userA course1 details for course1...
userA course2 details for course2...
userA course3 details for course3...
userB course2 details for course2...
userB course4 details for course4...
This scheme could also be extended to languages, depending
on how the tables are related.

3. While indexing into Solr, one has to select from the database,
and flatten out the data as above. The two main ways of
doing this are using a library like SolrJ for Java (other languages
have other libraries, e.g., django-haystack is easy to get started
with if one is using Python/Django), or the Solr DataImportHandler
(please see http://wiki.apache.org/solr/DataImportHandler ) with
nested entities.

4. With such a structure, querying Solr should be simple.

Regards,
Gora


Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 21:13, Niklas Langvig niklas.lang...@globesoft.com wrote:
 It sounds good not to use more than one core, for sure I do not want to over 
 complicate this.
[...]

Yes, not only are multiple cores unnecessarily complicated here,
your searches will also be be less complex, and faster.

 Both table courses and languages has it's own primary key courseseqno and 
 languagesseqno

There is no need to index these.

 Both also have a foreign key userid that references the users table with 
 column userid
 The relationship from users to courses and languages are one-to-many.

 but I guess I'm thinking wrong because my idead whould be to have a block 
 of fields connected with one id

 field name=coursename type=string indexed=true /
 field name=startdate type=date indexed=true /
 field name=enddate type= indexed=true /

 These three are connected with a
 field name=courseseqno type=int indexed=true /
 But also have a
 field name=userid type=int indexed=true /
 To connect to a specific user?
[...]

You are still thinking of Solr as a RDBMS, where you should not
be. In your case, it is easiest to flatten out the data. This increases
the size of the index, but that should not really be of concern. As
your courses and languages tables are connected only to user, the
schema that I described earlier should suffice. To extend my
earlier example, given:
* userA with courses c1, c2, c3, and languages l1, l2
* userB with c2, c3, and l2
you should flatten it such that you get the following Solr documents
userA c1 name c1 startdate...l1 l1 writing skill...
userA c1 name c1 startdate...l2 l2 writing skill...
userA c2 name c2 startdate...l1 l1 writing skill...
...
userB c2 name c2 startdate...l2 l2 writing skill...
userB c3 name c3 startdate...l2 l2 writing skill...
i.e., a total of 3 courses x 2 languages = 6 documents for
userA, and 2 courses x 1 language = 2 documents for userB

In order to get this form of flattened data into Solr, I would
suggest using the DataImportHandler with nested entities.
Please see the earlier link to DIH. Also, a Google search
for Solr dataimporthandler nested entities turns up many
examples, including:
http://solr.pl/en/2010/10/11/data-import-handler-%E2%80%93-how-to-import-data-from-sql-databases-part-1/
Please give it a try, and post here with your attempts if
you run into any issues.

Regards,
Gora


Re: configuring schema to match database

2013-01-11 Thread Jens Grivolla

On 01/11/2013 05:23 PM, Gora Mohanty wrote:

You are still thinking of Solr as a RDBMS, where you should not
be. In your case, it is easiest to flatten out the data. This increases
the size of the index, but that should not really be of concern. As
your courses and languages tables are connected only to user, the
schema that I described earlier should suffice. To extend my
earlier example, given:
* userA with courses c1, c2, c3, and languages l1, l2
* userB with c2, c3, and l2
you should flatten it such that you get the following Solr documents
userA c1 name c1 startdate...l1 l1 writing skill...
userA c1 name c1 startdate...l2 l2 writing skill...
userA c2 name c2 startdate...l1 l1 writing skill...

userB c2 name c2 startdate...l2 l2 writing skill...
userB c3 name c3 startdate...l2 l2 writing skill...
i.e., a total of 3 courses x 2 languages = 6 documents for
userA, and 2 courses x 1 language = 2 documents for userB


Actually, that is what you would get when doing a join in an RDBMS, the 
cross-product of your tables. This is NOT AT ALL what you typically do 
in Solr.


Best start the other way around, think of Solr as a retrieval system, 
not a storage system. What are your queries? What do you want to find, 
and what criteria do you use to search for it?


If your intention is to find users that match certain criteria, each 
entry should be a user (with ALL associated information, e.g. all 
courses, all language skills, etc.), if you want to retrieve courses, 
each entry should be a course.


Let's say you want to find users who have certain language skills, you 
would have a schema that describes a user:

- user id
- user name
- languages
- ...

In languages, you could store e.g. things like: en|reading|high 
es|writing|low, etc. It could be a multivalued field or just have 
everything separated by space and a tokenizer that splits on whitespace.


Now you can query:

- language:es* -- return all users with some spanish skills
- language:en|writing|high -- return all users with high english writing 
skills
- +(language:es* language:fr*) +language:en|writing|high -- return users 
with high english writing skills and some knowledge of french or spanish


If you want to avoid wildcard queries (more costly) you can just add 
plain en and es, etc. to your field so language:es will match 
anybody with spanish skills.


Best,
Jens



Re: configuring schema to match database

2013-01-11 Thread Gora Mohanty
On 11 January 2013 22:30, Jens Grivolla j+...@grivolla.net wrote:
[...]
 Actually, that is what you would get when doing a join in an RDBMS, the 
 cross-product of your tables. This is NOT AT ALL what you typically do in 
 Solr.

 Best start the other way around, think of Solr as a retrieval system, not a 
 storage system. What are your queries? What do you want to find, and what 
 criteria do you use to search for it?
[...]

Um, he did describe his desired queries, and there was a reason
that I proposed the above schema design.

  UserA has taken courseA, courseB and courseC and has writingskill
  good verbalskill good for english and writingskill excellent
  verbalskill excellent for spanish UserB has taken courseA, courseF,
  courseG and courseH and has writingskill fluent verbalskill fluent
  for english and writingskill good verbalskill good for italian

Unless the index is becoming huge, I feel that it is better to
flatten everything out rather than combine fields, and
post-process the results.

Regards,
Gora