Re: Flow Chart of Solr

2013-04-07 Thread Lance Norskog
Seconded. Single-stepping really is the best way to follow the logic 
chains and see how the data mutates.


On 04/05/2013 06:36 AM, Erick Erickson wrote:

Then there's my lazy method. Fire up the IDE and find a test case that
looks close to something you want to understand further. Step through
it all in the debugger. I admit there'll be some fumbling at the start
to _find_ the test case, but they're pretty well named. In IntelliJ,
all you have to do is right-click on the test case and the context
menu says debug blahbalbhabl You can chart the class
relationships you actually wind up in as you go. This seems tedious,
but it saves me getting lost in the class hierarchy.

Also, there are some convenient tools in the IDE that will show you
class hierarchies as you need.

Or attach your debugger to a running Solr, which is actually very
easy. In IntelliJ (and Eclipse has something very similar), create a
remote project. That'll specify some parameters you start up with,
e.g.:
java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
-jar start.jar

Now start up the remote debugging session you just created in the IDE
and you are attached to a live solr instance and able to step through
any code you want.

Either way, you can make the IDE work for you!

FWIW,
Erick

On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky j...@basetechnology.com wrote:

We're using the 4.x branch code as the basis for our writing. So,
effectively it will be for at least 4.3 when the book comes out in the
summer.

Early access will be in about a month or so. O'Reilly will be showing a
galley proof for 200 pages of the book next week at Big Data TechCon next
week in Boston.


-- Jack Krupansky

-Original Message- From: Jack Park
Sent: Wednesday, April 03, 2013 12:56 PM

To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

Jack,

Is that new book up to the 4.+ series?

Thanks
The other Jack

On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky j...@basetechnology.com
wrote:

And another one on the way:

http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957

Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.

-- Jack Krupansky

-Original Message- From: Jack Park
Sent: Wednesday, April 03, 2013 11:25 AM

To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

There are three books on Solr, two with that in the title, and one,
Taming Text, each of which have been very valuable in understanding
Solr.

Jack

On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky j...@basetechnology.com
wrote:


Sure, yes. But... it comes down to what level of detail you want and need
for a specific task. In other words, there are probably a dozen or more
levels of detail. The reality is that if you are going to work at the
Solr
code level, that is very, very different than being a user of Solr, and
at
that point your first step is to become familiar with the code itself.

When you talk about parsing and stemming, you are really talking
about
the user-level, not the Solr code level. Maybe what you really need is a
cheat sheet that maps a user-visible feature to the main Solr code
component
for that implements that user feature.

There are a number of different forms of parsing in Solr - parsing of
what? Queries? Requests? Solr documents? Function queries?

Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
that.
Lucene does all of the token filtering. Are you asking for details on
how
Lucene works? Maybe you meant to ask how term analysis works, which is
split between Solr and Lucene. Or maybe you simply wanted to know when
and
where term analysis is done. Tell us your specific problem or specific
question and we can probably quickly give you an answer.

In truth, NOBODY uses flow charts anymore. Sure, there are some
user-level
diagrams, but not down to the code level.

If you could focus on specific questions, we could give you specific
answers.

Main steps? That depends on what level you are working at. Tell us what
problem you are trying to solve and we can point you to the relevant
areas.

In truth, if you become generally familiar with Solr at the user level
(study the wikis), you will already know what the main steps are.

So, it is not main steps of Solr, but main steps of some specific
request of Solr, and for a specified level of detail, and for a
specified
area of Solr if greater detail is needed. Be more specific, and then we
can
be more specific.

For now, the general advice for people who need or want to go far beyond
the
user level is to get familiar with the code - just LOOK at it - a lot
of
the package and class names are OBVIOUS, really, and follow the class
hierarchy and code flow using the standard features of any modern Java
IDE.
If you are wondering where to start for some specific user-level feature,
please ask specifically about that feature. But... make a diligent effort
to
discover and learn on your own before asking open

Re: Flow Chart of Solr

2013-04-05 Thread Erick Erickson
Then there's my lazy method. Fire up the IDE and find a test case that
looks close to something you want to understand further. Step through
it all in the debugger. I admit there'll be some fumbling at the start
to _find_ the test case, but they're pretty well named. In IntelliJ,
all you have to do is right-click on the test case and the context
menu says debug blahbalbhabl You can chart the class
relationships you actually wind up in as you go. This seems tedious,
but it saves me getting lost in the class hierarchy.

Also, there are some convenient tools in the IDE that will show you
class hierarchies as you need.

Or attach your debugger to a running Solr, which is actually very
easy. In IntelliJ (and Eclipse has something very similar), create a
remote project. That'll specify some parameters you start up with,
e.g.:
java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
-jar start.jar

Now start up the remote debugging session you just created in the IDE
and you are attached to a live solr instance and able to step through
any code you want.

Either way, you can make the IDE work for you!

FWIW,
Erick

On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky j...@basetechnology.com wrote:
 We're using the 4.x branch code as the basis for our writing. So,
 effectively it will be for at least 4.3 when the book comes out in the
 summer.

 Early access will be in about a month or so. O'Reilly will be showing a
 galley proof for 200 pages of the book next week at Big Data TechCon next
 week in Boston.


 -- Jack Krupansky

 -Original Message- From: Jack Park
 Sent: Wednesday, April 03, 2013 12:56 PM

 To: solr-user@lucene.apache.org
 Subject: Re: Flow Chart of Solr

 Jack,

 Is that new book up to the 4.+ series?

 Thanks
 The other Jack

 On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky j...@basetechnology.com
 wrote:

 And another one on the way:

 http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957

 Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.

 -- Jack Krupansky

 -Original Message- From: Jack Park
 Sent: Wednesday, April 03, 2013 11:25 AM

 To: solr-user@lucene.apache.org
 Subject: Re: Flow Chart of Solr

 There are three books on Solr, two with that in the title, and one,
 Taming Text, each of which have been very valuable in understanding
 Solr.

 Jack

 On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky j...@basetechnology.com
 wrote:


 Sure, yes. But... it comes down to what level of detail you want and need
 for a specific task. In other words, there are probably a dozen or more
 levels of detail. The reality is that if you are going to work at the
 Solr
 code level, that is very, very different than being a user of Solr, and
 at
 that point your first step is to become familiar with the code itself.

 When you talk about parsing and stemming, you are really talking
 about
 the user-level, not the Solr code level. Maybe what you really need is a
 cheat sheet that maps a user-visible feature to the main Solr code
 component
 for that implements that user feature.

 There are a number of different forms of parsing in Solr - parsing of
 what? Queries? Requests? Solr documents? Function queries?

 Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
 that.
 Lucene does all of the token filtering. Are you asking for details on
 how
 Lucene works? Maybe you meant to ask how term analysis works, which is
 split between Solr and Lucene. Or maybe you simply wanted to know when
 and
 where term analysis is done. Tell us your specific problem or specific
 question and we can probably quickly give you an answer.

 In truth, NOBODY uses flow charts anymore. Sure, there are some
 user-level
 diagrams, but not down to the code level.

 If you could focus on specific questions, we could give you specific
 answers.

 Main steps? That depends on what level you are working at. Tell us what
 problem you are trying to solve and we can point you to the relevant
 areas.

 In truth, if you become generally familiar with Solr at the user level
 (study the wikis), you will already know what the main steps are.

 So, it is not main steps of Solr, but main steps of some specific
 request of Solr, and for a specified level of detail, and for a
 specified
 area of Solr if greater detail is needed. Be more specific, and then we
 can
 be more specific.

 For now, the general advice for people who need or want to go far beyond
 the
 user level is to get familiar with the code - just LOOK at it - a lot
 of
 the package and class names are OBVIOUS, really, and follow the class
 hierarchy and code flow using the standard features of any modern Java
 IDE.
 If you are wondering where to start for some specific user-level feature,
 please ask specifically about that feature. But... make a diligent effort
 to
 discover and learn on your own before asking open-ended questions.

 Sure, there are lots of things in Lucene and Solr

Re: Flow Chart of Solr

2013-04-05 Thread Furkan KAMACI
I have read books and wikis of Solr and Lucene and I had to debug the code
to find which parts comes from other. I will tidy up my notes and share the
pig picture flow and the detailed one. After that I will ask you for your
opinions, thanks.


2013/4/5 Erick Erickson erickerick...@gmail.com

 Then there's my lazy method. Fire up the IDE and find a test case that
 looks close to something you want to understand further. Step through
 it all in the debugger. I admit there'll be some fumbling at the start
 to _find_ the test case, but they're pretty well named. In IntelliJ,
 all you have to do is right-click on the test case and the context
 menu says debug blahbalbhabl You can chart the class
 relationships you actually wind up in as you go. This seems tedious,
 but it saves me getting lost in the class hierarchy.

 Also, there are some convenient tools in the IDE that will show you
 class hierarchies as you need.

 Or attach your debugger to a running Solr, which is actually very
 easy. In IntelliJ (and Eclipse has something very similar), create a
 remote project. That'll specify some parameters you start up with,
 e.g.:
 java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
 -jar start.jar

 Now start up the remote debugging session you just created in the IDE
 and you are attached to a live solr instance and able to step through
 any code you want.

 Either way, you can make the IDE work for you!

 FWIW,
 Erick

 On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky j...@basetechnology.com
 wrote:
  We're using the 4.x branch code as the basis for our writing. So,
  effectively it will be for at least 4.3 when the book comes out in the
  summer.
 
  Early access will be in about a month or so. O'Reilly will be showing a
  galley proof for 200 pages of the book next week at Big Data TechCon next
  week in Boston.
 
 
  -- Jack Krupansky
 
  -Original Message- From: Jack Park
  Sent: Wednesday, April 03, 2013 12:56 PM
 
  To: solr-user@lucene.apache.org
  Subject: Re: Flow Chart of Solr
 
  Jack,
 
  Is that new book up to the 4.+ series?
 
  Thanks
  The other Jack
 
  On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky j...@basetechnology.com
  wrote:
 
  And another one on the way:
 
 
 http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
 
  Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
 
  -- Jack Krupansky
 
  -Original Message- From: Jack Park
  Sent: Wednesday, April 03, 2013 11:25 AM
 
  To: solr-user@lucene.apache.org
  Subject: Re: Flow Chart of Solr
 
  There are three books on Solr, two with that in the title, and one,
  Taming Text, each of which have been very valuable in understanding
  Solr.
 
  Jack
 
  On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky j...@basetechnology.com
 
  wrote:
 
 
  Sure, yes. But... it comes down to what level of detail you want and
 need
  for a specific task. In other words, there are probably a dozen or more
  levels of detail. The reality is that if you are going to work at the
  Solr
  code level, that is very, very different than being a user of Solr,
 and
  at
  that point your first step is to become familiar with the code itself.
 
  When you talk about parsing and stemming, you are really talking
  about
  the user-level, not the Solr code level. Maybe what you really need is
 a
  cheat sheet that maps a user-visible feature to the main Solr code
  component
  for that implements that user feature.
 
  There are a number of different forms of parsing in Solr - parsing of
  what? Queries? Requests? Solr documents? Function queries?
 
  Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
  that.
  Lucene does all of the token filtering. Are you asking for details on
  how
  Lucene works? Maybe you meant to ask how term analysis works, which
 is
  split between Solr and Lucene. Or maybe you simply wanted to know when
  and
  where term analysis is done. Tell us your specific problem or specific
  question and we can probably quickly give you an answer.
 
  In truth, NOBODY uses flow charts anymore. Sure, there are some
  user-level
  diagrams, but not down to the code level.
 
  If you could focus on specific questions, we could give you specific
  answers.
 
  Main steps? That depends on what level you are working at. Tell us
 what
  problem you are trying to solve and we can point you to the relevant
  areas.
 
  In truth, if you become generally familiar with Solr at the user level
  (study the wikis), you will already know what the main steps are.
 
  So, it is not main steps of Solr, but main steps of some specific
  request of Solr, and for a specified level of detail, and for a
  specified
  area of Solr if greater detail is needed. Be more specific, and then we
  can
  be more specific.
 
  For now, the general advice for people who need or want to go far
 beyond
  the
  user level is to get familiar with the code - just LOOK at it - a lot

Re: Flow Chart of Solr

2013-04-03 Thread Furkan KAMACI
So, all in all, is there anybody who can write down just main steps of
Solr(including parsing, stemming etc.)?


2013/4/2 Furkan KAMACI furkankam...@gmail.com

 I think about myself as an example. I have started to make research about
 Solr just for some weeks. I have learned Solr and its related projects. My
 next step writing down the main steps Solr. We have separated learning
 curve of Solr into two main categories.
 First one is who are using it as out of the box components. Second one is
 developer side.

 Actually developer side branches into two way.

 First one is general steps of it. i.e. document comes into Solr (i.e.
 crawled data of Nutch). which analyzing processes are going to done
 (stamming, hamming etc.), what will be doing after parsing step by step.
 When a search query happens what happens step by step, at which step scores
 are calculated so on so forth.
 Second one is more code specific i.e. which handlers takes into account
 data that will going to be indexed(no need the explain every handler at
 this step) . Which are the analyzer, tokenizer classes and what are the
 flow between them. How response handlers works and what are they.

 Also explaining about cloud side is other work.

 Some of explanations are currently presents at wiki (but some of them are
 at very deep places at wiki and it is not easy to find the parent topic of
 it, maybe starting wiki from a top age and branching all other topics as
 possible as from it could be better)

 If we could show the big picture, and beside of it the smaller pictures
 within it, it would be great (if you know the main parts it will be easy to
 go deep into the code i.e. you don't need to explain every handler, if you
 show the way to the developer he/she could debug and find the needs)

 When I think about myself as an example, I have to write down the steps of
 Solr a bit detail  even I read many pages at wiki and a book about it, I
 see that it is not easy even writing down the big picture of developer side.


 2013/4/2 Alexandre Rafalovitch arafa...@gmail.com

 Yago,

 My point - perhaps lost in too much text - was that Solr is presented -
 and
 can function - as a black-box. Which makes it different from more
 traditional open-source project. So, the stage-2 happens exactly when the
 non-programmers have to cross the boundary from the black-box into
 code-first approach and the hand-off is not particularly smooth. Or even
 when - say - php or .Net programmer  tries to get beyond the basic
 operations their client library and has the understand the server-side
 aspects of Solr.

 Regards,
Alex.

 On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:

  Alexandre,
 
  You describe the normal path when a beginner try to use a source of code
  that doesn't understand, black-box, reading code, hacking, ok now I know
  10% of the project, with lucky :p.
 


 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)





Re: Flow Chart of Solr

2013-04-03 Thread Jack Krupansky
Sure, yes. But... it comes down to what level of detail you want and need 
for a specific task. In other words, there are probably a dozen or more 
levels of detail. The reality is that if you are going to work at the Solr 
code level, that is very, very different than being a user of Solr, and at 
that point your first step is to become familiar with the code itself.


When you talk about parsing and stemming, you are really talking about 
the user-level, not the Solr code level. Maybe what you really need is a 
cheat sheet that maps a user-visible feature to the main Solr code component 
for that implements that user feature.


There are a number of different forms of parsing in Solr - parsing of 
what? Queries? Requests? Solr documents? Function queries?


Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does that. 
Lucene does all of the token filtering. Are you asking for details on how 
Lucene works? Maybe you meant to ask how term analysis works, which is 
split between Solr and Lucene. Or maybe you simply wanted to know when and 
where term analysis is done. Tell us your specific problem or specific 
question and we can probably quickly give you an answer.


In truth, NOBODY uses flow charts anymore. Sure, there are some user-level 
diagrams, but not down to the code level.


If you could focus on specific questions, we could give you specific 
answers.


Main steps? That depends on what level you are working at. Tell us what 
problem you are trying to solve and we can point you to the relevant areas.


In truth, if you become generally familiar with Solr at the user level 
(study the wikis), you will already know what the main steps are.


So, it is not main steps of Solr, but main steps of some specific 
request of Solr, and for a specified level of detail, and for a specified 
area of Solr if greater detail is needed. Be more specific, and then we can 
be more specific.


For now, the general advice for people who need or want to go far beyond the 
user level is to get familiar with the code - just LOOK at it - a lot of 
the package and class names are OBVIOUS, really, and follow the class 
hierarchy and code flow using the standard features of any modern Java IDE. 
If you are wondering where to start for some specific user-level feature, 
please ask specifically about that feature. But... make a diligent effort to 
discover and learn on your own before asking open-ended questions.


Sure, there are lots of things in Lucene and Solr that are rather complex 
and seemingly convoluted, and not obvious, but people are more than willing 
to help you out if you simply ask a specific question. I mean, not everybody 
needs to know the fine detail of query parsing, analysis, building a 
Lucene-level stemmer, etc. If we tried to put all of that in a diagram, most 
people would be more confused than enlightened.


At which step are scores calculated? That's more of a Lucene question. Or, 
are you really asking what code in Solr invokes Lucene search methods that 
calculate basic scores?


In short, you need to be more specific. Don't force us to guess what problem 
you are trying to solve.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Wednesday, April 03, 2013 6:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

So, all in all, is there anybody who can write down just main steps of
Solr(including parsing, stemming etc.)?


2013/4/2 Furkan KAMACI furkankam...@gmail.com


I think about myself as an example. I have started to make research about
Solr just for some weeks. I have learned Solr and its related projects. My
next step writing down the main steps Solr. We have separated learning
curve of Solr into two main categories.
First one is who are using it as out of the box components. Second one is
developer side.

Actually developer side branches into two way.

First one is general steps of it. i.e. document comes into Solr (i.e.
crawled data of Nutch). which analyzing processes are going to done
(stamming, hamming etc.), what will be doing after parsing step by step.
When a search query happens what happens step by step, at which step 
scores

are calculated so on so forth.
Second one is more code specific i.e. which handlers takes into account
data that will going to be indexed(no need the explain every handler at
this step) . Which are the analyzer, tokenizer classes and what are the
flow between them. How response handlers works and what are they.

Also explaining about cloud side is other work.

Some of explanations are currently presents at wiki (but some of them are
at very deep places at wiki and it is not easy to find the parent topic of
it, maybe starting wiki from a top age and branching all other topics as
possible as from it could be better)

If we could show the big picture, and beside of it the smaller pictures
within it, it would be great (if you know the main parts it will be easy 
to

go deep into the code i.e. you don't

Re: Flow Chart of Solr

2013-04-03 Thread Jack Park
There are three books on Solr, two with that in the title, and one,
Taming Text, each of which have been very valuable in understanding
Solr.

Jack

On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky j...@basetechnology.com wrote:
 Sure, yes. But... it comes down to what level of detail you want and need
 for a specific task. In other words, there are probably a dozen or more
 levels of detail. The reality is that if you are going to work at the Solr
 code level, that is very, very different than being a user of Solr, and at
 that point your first step is to become familiar with the code itself.

 When you talk about parsing and stemming, you are really talking about
 the user-level, not the Solr code level. Maybe what you really need is a
 cheat sheet that maps a user-visible feature to the main Solr code component
 for that implements that user feature.

 There are a number of different forms of parsing in Solr - parsing of
 what? Queries? Requests? Solr documents? Function queries?

 Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does that.
 Lucene does all of the token filtering. Are you asking for details on how
 Lucene works? Maybe you meant to ask how term analysis works, which is
 split between Solr and Lucene. Or maybe you simply wanted to know when and
 where term analysis is done. Tell us your specific problem or specific
 question and we can probably quickly give you an answer.

 In truth, NOBODY uses flow charts anymore. Sure, there are some user-level
 diagrams, but not down to the code level.

 If you could focus on specific questions, we could give you specific
 answers.

 Main steps? That depends on what level you are working at. Tell us what
 problem you are trying to solve and we can point you to the relevant areas.

 In truth, if you become generally familiar with Solr at the user level
 (study the wikis), you will already know what the main steps are.

 So, it is not main steps of Solr, but main steps of some specific
 request of Solr, and for a specified level of detail, and for a specified
 area of Solr if greater detail is needed. Be more specific, and then we can
 be more specific.

 For now, the general advice for people who need or want to go far beyond the
 user level is to get familiar with the code - just LOOK at it - a lot of
 the package and class names are OBVIOUS, really, and follow the class
 hierarchy and code flow using the standard features of any modern Java IDE.
 If you are wondering where to start for some specific user-level feature,
 please ask specifically about that feature. But... make a diligent effort to
 discover and learn on your own before asking open-ended questions.

 Sure, there are lots of things in Lucene and Solr that are rather complex
 and seemingly convoluted, and not obvious, but people are more than willing
 to help you out if you simply ask a specific question. I mean, not everybody
 needs to know the fine detail of query parsing, analysis, building a
 Lucene-level stemmer, etc. If we tried to put all of that in a diagram, most
 people would be more confused than enlightened.

 At which step are scores calculated? That's more of a Lucene question. Or,
 are you really asking what code in Solr invokes Lucene search methods that
 calculate basic scores?

 In short, you need to be more specific. Don't force us to guess what problem
 you are trying to solve.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Wednesday, April 03, 2013 6:52 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Flow Chart of Solr


 So, all in all, is there anybody who can write down just main steps of
 Solr(including parsing, stemming etc.)?


 2013/4/2 Furkan KAMACI furkankam...@gmail.com

 I think about myself as an example. I have started to make research about
 Solr just for some weeks. I have learned Solr and its related projects. My
 next step writing down the main steps Solr. We have separated learning
 curve of Solr into two main categories.
 First one is who are using it as out of the box components. Second one is
 developer side.

 Actually developer side branches into two way.

 First one is general steps of it. i.e. document comes into Solr (i.e.
 crawled data of Nutch). which analyzing processes are going to done
 (stamming, hamming etc.), what will be doing after parsing step by step.
 When a search query happens what happens step by step, at which step
 scores
 are calculated so on so forth.
 Second one is more code specific i.e. which handlers takes into account
 data that will going to be indexed(no need the explain every handler at
 this step) . Which are the analyzer, tokenizer classes and what are the
 flow between them. How response handlers works and what are they.

 Also explaining about cloud side is other work.

 Some of explanations are currently presents at wiki (but some of them are
 at very deep places at wiki and it is not easy to find the parent topic of
 it, maybe starting wiki from a top age

Re: Flow Chart of Solr

2013-04-03 Thread Jack Krupansky

And another one on the way:
http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957

Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.

-- Jack Krupansky

-Original Message- 
From: Jack Park

Sent: Wednesday, April 03, 2013 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

There are three books on Solr, two with that in the title, and one,
Taming Text, each of which have been very valuable in understanding
Solr.

Jack

On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky j...@basetechnology.com 
wrote:

Sure, yes. But... it comes down to what level of detail you want and need
for a specific task. In other words, there are probably a dozen or more
levels of detail. The reality is that if you are going to work at the Solr
code level, that is very, very different than being a user of Solr, and 
at

that point your first step is to become familiar with the code itself.

When you talk about parsing and stemming, you are really talking about
the user-level, not the Solr code level. Maybe what you really need is a
cheat sheet that maps a user-visible feature to the main Solr code 
component

for that implements that user feature.

There are a number of different forms of parsing in Solr - parsing of
what? Queries? Requests? Solr documents? Function queries?

Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does 
that.
Lucene does all of the token filtering. Are you asking for details on 
how

Lucene works? Maybe you meant to ask how term analysis works, which is
split between Solr and Lucene. Or maybe you simply wanted to know when and
where term analysis is done. Tell us your specific problem or specific
question and we can probably quickly give you an answer.

In truth, NOBODY uses flow charts anymore. Sure, there are some 
user-level

diagrams, but not down to the code level.

If you could focus on specific questions, we could give you specific
answers.

Main steps? That depends on what level you are working at. Tell us what
problem you are trying to solve and we can point you to the relevant 
areas.


In truth, if you become generally familiar with Solr at the user level
(study the wikis), you will already know what the main steps are.

So, it is not main steps of Solr, but main steps of some specific
request of Solr, and for a specified level of detail, and for a 
specified
area of Solr if greater detail is needed. Be more specific, and then we 
can

be more specific.

For now, the general advice for people who need or want to go far beyond 
the

user level is to get familiar with the code - just LOOK at it - a lot of
the package and class names are OBVIOUS, really, and follow the class
hierarchy and code flow using the standard features of any modern Java 
IDE.

If you are wondering where to start for some specific user-level feature,
please ask specifically about that feature. But... make a diligent effort 
to

discover and learn on your own before asking open-ended questions.

Sure, there are lots of things in Lucene and Solr that are rather complex
and seemingly convoluted, and not obvious, but people are more than 
willing
to help you out if you simply ask a specific question. I mean, not 
everybody

needs to know the fine detail of query parsing, analysis, building a
Lucene-level stemmer, etc. If we tried to put all of that in a diagram, 
most

people would be more confused than enlightened.

At which step are scores calculated? That's more of a Lucene question. Or,
are you really asking what code in Solr invokes Lucene search methods that
calculate basic scores?

In short, you need to be more specific. Don't force us to guess what 
problem

you are trying to solve.

-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Wednesday, April 03, 2013 6:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr


So, all in all, is there anybody who can write down just main steps of
Solr(including parsing, stemming etc.)?


2013/4/2 Furkan KAMACI furkankam...@gmail.com


I think about myself as an example. I have started to make research about
Solr just for some weeks. I have learned Solr and its related projects. 
My

next step writing down the main steps Solr. We have separated learning
curve of Solr into two main categories.
First one is who are using it as out of the box components. Second one is
developer side.

Actually developer side branches into two way.

First one is general steps of it. i.e. document comes into Solr (i.e.
crawled data of Nutch). which analyzing processes are going to done
(stamming, hamming etc.), what will be doing after parsing step by step.
When a search query happens what happens step by step, at which step
scores
are calculated so on so forth.
Second one is more code specific i.e. which handlers takes into account
data that will going to be indexed(no need the explain every handler at
this step) . Which are the analyzer, tokenizer classes and what

Re: Flow Chart of Solr

2013-04-03 Thread Jack Park
Jack,

Is that new book up to the 4.+ series?

Thanks
The other Jack

On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky j...@basetechnology.com wrote:
 And another one on the way:
 http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957

 Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.

 -- Jack Krupansky

 -Original Message- From: Jack Park
 Sent: Wednesday, April 03, 2013 11:25 AM

 To: solr-user@lucene.apache.org
 Subject: Re: Flow Chart of Solr

 There are three books on Solr, two with that in the title, and one,
 Taming Text, each of which have been very valuable in understanding
 Solr.

 Jack

 On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky j...@basetechnology.com
 wrote:

 Sure, yes. But... it comes down to what level of detail you want and need
 for a specific task. In other words, there are probably a dozen or more
 levels of detail. The reality is that if you are going to work at the Solr
 code level, that is very, very different than being a user of Solr, and
 at
 that point your first step is to become familiar with the code itself.

 When you talk about parsing and stemming, you are really talking about
 the user-level, not the Solr code level. Maybe what you really need is a
 cheat sheet that maps a user-visible feature to the main Solr code
 component
 for that implements that user feature.

 There are a number of different forms of parsing in Solr - parsing of
 what? Queries? Requests? Solr documents? Function queries?

 Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
 that.
 Lucene does all of the token filtering. Are you asking for details on
 how
 Lucene works? Maybe you meant to ask how term analysis works, which is
 split between Solr and Lucene. Or maybe you simply wanted to know when and
 where term analysis is done. Tell us your specific problem or specific
 question and we can probably quickly give you an answer.

 In truth, NOBODY uses flow charts anymore. Sure, there are some
 user-level
 diagrams, but not down to the code level.

 If you could focus on specific questions, we could give you specific
 answers.

 Main steps? That depends on what level you are working at. Tell us what
 problem you are trying to solve and we can point you to the relevant
 areas.

 In truth, if you become generally familiar with Solr at the user level
 (study the wikis), you will already know what the main steps are.

 So, it is not main steps of Solr, but main steps of some specific
 request of Solr, and for a specified level of detail, and for a
 specified
 area of Solr if greater detail is needed. Be more specific, and then we
 can
 be more specific.

 For now, the general advice for people who need or want to go far beyond
 the
 user level is to get familiar with the code - just LOOK at it - a lot of
 the package and class names are OBVIOUS, really, and follow the class
 hierarchy and code flow using the standard features of any modern Java
 IDE.
 If you are wondering where to start for some specific user-level feature,
 please ask specifically about that feature. But... make a diligent effort
 to
 discover and learn on your own before asking open-ended questions.

 Sure, there are lots of things in Lucene and Solr that are rather complex
 and seemingly convoluted, and not obvious, but people are more than
 willing
 to help you out if you simply ask a specific question. I mean, not
 everybody
 needs to know the fine detail of query parsing, analysis, building a
 Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
 most
 people would be more confused than enlightened.

 At which step are scores calculated? That's more of a Lucene question. Or,
 are you really asking what code in Solr invokes Lucene search methods that
 calculate basic scores?

 In short, you need to be more specific. Don't force us to guess what
 problem
 you are trying to solve.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Wednesday, April 03, 2013 6:52 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Flow Chart of Solr


 So, all in all, is there anybody who can write down just main steps of
 Solr(including parsing, stemming etc.)?


 2013/4/2 Furkan KAMACI furkankam...@gmail.com

 I think about myself as an example. I have started to make research about
 Solr just for some weeks. I have learned Solr and its related projects.
 My
 next step writing down the main steps Solr. We have separated learning
 curve of Solr into two main categories.
 First one is who are using it as out of the box components. Second one is
 developer side.

 Actually developer side branches into two way.

 First one is general steps of it. i.e. document comes into Solr (i.e.
 crawled data of Nutch). which analyzing processes are going to done
 (stamming, hamming etc.), what will be doing after parsing step by step.
 When a search query happens what happens step by step, at which step
 scores
 are calculated so on so forth

Re: Flow Chart of Solr

2013-04-03 Thread Jack Krupansky
We're using the 4.x branch code as the basis for our writing. So, 
effectively it will be for at least 4.3 when the book comes out in the 
summer.


Early access will be in about a month or so. O'Reilly will be showing a 
galley proof for 200 pages of the book next week at Big Data TechCon next 
week in Boston.


-- Jack Krupansky

-Original Message- 
From: Jack Park

Sent: Wednesday, April 03, 2013 12:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

Jack,

Is that new book up to the 4.+ series?

Thanks
The other Jack

On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky j...@basetechnology.com 
wrote:

And another one on the way:
http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957

Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.

-- Jack Krupansky

-Original Message- From: Jack Park
Sent: Wednesday, April 03, 2013 11:25 AM

To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

There are three books on Solr, two with that in the title, and one,
Taming Text, each of which have been very valuable in understanding
Solr.

Jack

On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky j...@basetechnology.com
wrote:


Sure, yes. But... it comes down to what level of detail you want and need
for a specific task. In other words, there are probably a dozen or more
levels of detail. The reality is that if you are going to work at the 
Solr

code level, that is very, very different than being a user of Solr, and
at
that point your first step is to become familiar with the code itself.

When you talk about parsing and stemming, you are really talking 
about

the user-level, not the Solr code level. Maybe what you really need is a
cheat sheet that maps a user-visible feature to the main Solr code
component
for that implements that user feature.

There are a number of different forms of parsing in Solr - parsing of
what? Queries? Requests? Solr documents? Function queries?

Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
that.
Lucene does all of the token filtering. Are you asking for details on
how
Lucene works? Maybe you meant to ask how term analysis works, which is
split between Solr and Lucene. Or maybe you simply wanted to know when 
and

where term analysis is done. Tell us your specific problem or specific
question and we can probably quickly give you an answer.

In truth, NOBODY uses flow charts anymore. Sure, there are some
user-level
diagrams, but not down to the code level.

If you could focus on specific questions, we could give you specific
answers.

Main steps? That depends on what level you are working at. Tell us what
problem you are trying to solve and we can point you to the relevant
areas.

In truth, if you become generally familiar with Solr at the user level
(study the wikis), you will already know what the main steps are.

So, it is not main steps of Solr, but main steps of some specific
request of Solr, and for a specified level of detail, and for a
specified
area of Solr if greater detail is needed. Be more specific, and then we
can
be more specific.

For now, the general advice for people who need or want to go far beyond
the
user level is to get familiar with the code - just LOOK at it - a lot 
of

the package and class names are OBVIOUS, really, and follow the class
hierarchy and code flow using the standard features of any modern Java
IDE.
If you are wondering where to start for some specific user-level feature,
please ask specifically about that feature. But... make a diligent effort
to
discover and learn on your own before asking open-ended questions.

Sure, there are lots of things in Lucene and Solr that are rather complex
and seemingly convoluted, and not obvious, but people are more than
willing
to help you out if you simply ask a specific question. I mean, not
everybody
needs to know the fine detail of query parsing, analysis, building a
Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
most
people would be more confused than enlightened.

At which step are scores calculated? That's more of a Lucene question. 
Or,
are you really asking what code in Solr invokes Lucene search methods 
that

calculate basic scores?

In short, you need to be more specific. Don't force us to guess what
problem
you are trying to solve.

-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Wednesday, April 03, 2013 6:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr


So, all in all, is there anybody who can write down just main steps of
Solr(including parsing, stemming etc.)?


2013/4/2 Furkan KAMACI furkankam...@gmail.com

I think about myself as an example. I have started to make research 
about

Solr just for some weeks. I have learned Solr and its related projects.
My
next step writing down the main steps Solr. We have separated learning
curve of Solr into two main categories.
First one is who are using it as out of the box

Re: Flow Chart of Solr

2013-04-02 Thread Koji Sekiguchi

(13/04/02 21:45), Furkan KAMACI wrote:

Is there any documentation something like flow chart of Solr. i.e.
Documents comes into Solr(maybe indicating which classes get documents) and
goes to parsing process (i.e. stemming processes etc.) and then reverse
indexes are get so on so forth?



There is an interesting ticket:

Architecture Diagrams needed for Lucene, Solr and Nutch
https://issues.apache.org/jira/browse/LUCENE-2412

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html


Re: Flow Chart of Solr

2013-04-02 Thread Andre Bois-Crettez


On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:

(13/04/02 21:45), Furkan KAMACI wrote:

Is there any documentation something like flow chart of Solr. i.e.
Documents comes into Solr(maybe indicating which classes get documents) and
goes to parsing process (i.e. stemming processes etc.) and then reverse
indexes are get so on so forth?


There is an interesting ticket:

Architecture Diagrams needed for Lucene, Solr and Nutch
https://issues.apache.org/jira/browse/LUCENE-2412

koji


I like this one, it is a bit more detailed :

http://www.cominvent.com/2011/04/04/solr-architecture-diagram/

--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI
Actually maybe one the most important core thing is that Analysis part at
last diagram but there is nothing about it i.e. stamming, lemmitazing etc.
at any of them.


2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com


 On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:

 (13/04/02 21:45), Furkan KAMACI wrote:

 Is there any documentation something like flow chart of Solr. i.e.
 Documents comes into Solr(maybe indicating which classes get documents)
 and
 goes to parsing process (i.e. stemming processes etc.) and then reverse
 indexes are get so on so forth?

  There is an interesting ticket:

 Architecture Diagrams needed for Lucene, Solr and Nutch
 https://issues.apache.org/**jira/browse/LUCENE-2412https://issues.apache.org/jira/browse/LUCENE-2412

 koji


 I like this one, it is a bit more detailed :

 http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/http://www.cominvent.com/2011/04/04/solr-architecture-diagram/

 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/


 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris

 Ce message et les pièces jointes sont confidentiels et établis à
 l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
 destinataire de ce message, merci de le détruire et d'en avertir
 l'expéditeur.



Re: Flow Chart of Solr

2013-04-02 Thread Yago Riveiro
For beginners is complicate understand the complexity of solr / lucene, I'm 
trying devel a custom search component and it's too hard keep in mind the flow, 
inheritance and iteration between classes. I think that there is a gap between 
software doc and user doc, or maybe I don't search enough T_T. Java doc not 
always is clear always.  

The fact that I'm beginner in solr world don't help.

Either way, this thread was very helpful, I found some very good resources here 
:)   

Cumprimentos

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:

 Actually maybe one the most important core thing is that Analysis part at
 last diagram but there is nothing about it i.e. stamming, lemmitazing etc.
 at any of them.
  
  
 2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com 
 (mailto:andre.b...@kelkoo.com)
  
   
  On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
   
   (13/04/02 21:45), Furkan KAMACI wrote:

Is there any documentation something like flow chart of Solr. i.e.
Documents comes into Solr(maybe indicating which classes get documents)
and
goes to parsing process (i.e. stemming processes etc.) and then reverse
indexes are get so on so forth?
 
There is an interesting ticket:

   Architecture Diagrams needed for Lucene, Solr and Nutch
   https://issues.apache.org/**jira/browse/LUCENE-2412https://issues.apache.org/jira/browse/LUCENE-2412

   koji
   
  I like this one, it is a bit more detailed :
   
  http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/http://www.cominvent.com/2011/04/04/solr-architecture-diagram/
   
  --
  André Bois-Crettez
   
  Search technology, Kelkoo
  http://www.kelkoo.com/
   
   
  Kelkoo SAS
  Société par Actions Simplifiée
  Au capital de € 4.168.964,30
  Siège social : 8, rue du Sentier 75002 Paris
  425 093 069 RCS Paris
   
  Ce message et les pièces jointes sont confidentiels et établis à
  l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
  destinataire de ce message, merci de le détruire et d'en avertir
  l'expéditeur.
   
  
  
  




Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI
You are right about mentioning developer doc and user doc. Users separate
about it. Some of them uses Solr for indexing and monitoring via admin face
and that is quietly enough for them however some people wants to modify it
so it would be nice if there had been some documentation for developer side
too.


2013/4/2 Yago Riveiro yago.rive...@gmail.com

 For beginners is complicate understand the complexity of solr / lucene,
 I'm trying devel a custom search component and it's too hard keep in mind
 the flow, inheritance and iteration between classes. I think that there is
 a gap between software doc and user doc, or maybe I don't search enough
 T_T. Java doc not always is clear always.

 The fact that I'm beginner in solr world don't help.

 Either way, this thread was very helpful, I found some very good resources
 here :)

 Cumprimentos

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:

  Actually maybe one the most important core thing is that Analysis part at
  last diagram but there is nothing about it i.e. stamming, lemmitazing
 etc.
  at any of them.
 
 
  2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com (mailto:
 andre.b...@kelkoo.com)
 
  
   On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
  
(13/04/02 21:45), Furkan KAMACI wrote:
   
 Is there any documentation something like flow chart of Solr. i.e.
 Documents comes into Solr(maybe indicating which classes get
 documents)
 and
 goes to parsing process (i.e. stemming processes etc.) and then
 reverse
 indexes are get so on so forth?

 There is an interesting ticket:
   
Architecture Diagrams needed for Lucene, Solr and Nutch
https://issues.apache.org/**jira/browse/LUCENE-2412
 https://issues.apache.org/jira/browse/LUCENE-2412
   
koji
  
   I like this one, it is a bit more detailed :
  
   http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/
 http://www.cominvent.com/2011/04/04/solr-architecture-diagram/
  
   --
   André Bois-Crettez
  
   Search technology, Kelkoo
   http://www.kelkoo.com/
  
  
   Kelkoo SAS
   Société par Actions Simplifiée
   Au capital de € 4.168.964,30
   Siège social : 8, rue du Sentier 75002 Paris
   425 093 069 RCS Paris
  
   Ce message et les pièces jointes sont confidentiels et établis à
   l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
   destinataire de ce message, merci de le détruire et d'en avertir
   l'expéditeur.
  
 
 
 





Re: Flow Chart of Solr

2013-04-02 Thread Alexandre Rafalovitch
 important core thing is that Analysis part
 at
   last diagram but there is nothing about it i.e. stamming, lemmitazing
  etc.
   at any of them.
  
  
   2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com (mailto:
  andre.b...@kelkoo.com)
  
   
On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
   
 (13/04/02 21:45), Furkan KAMACI wrote:

  Is there any documentation something like flow chart of Solr.
 i.e.
  Documents comes into Solr(maybe indicating which classes get
  documents)
  and
  goes to parsing process (i.e. stemming processes etc.) and then
  reverse
  indexes are get so on so forth?
 
  There is an interesting ticket:

 Architecture Diagrams needed for Lucene, Solr and Nutch
 https://issues.apache.org/**jira/browse/LUCENE-2412
  https://issues.apache.org/jira/browse/LUCENE-2412

 koji
   
I like this one, it is a bit more detailed :
   
http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/
  http://www.cominvent.com/2011/04/04/solr-architecture-diagram/
   
--
André Bois-Crettez
   
Search technology, Kelkoo
http://www.kelkoo.com/
   
   
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris
   
Ce message et les pièces jointes sont confidentiels et établis à
l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
destinataire de ce message, merci de le détruire et d'en avertir
l'expéditeur.
   
  
  
  
 
 
 



Re: Flow Chart of Solr

2013-04-02 Thread Yago Riveiro
Alexandre,   

You describe the normal path when a beginner try to use a source of code that 
doesn't understand, black-box, reading code, hacking, ok now I know 10% of the 
project, with lucky :p.

First at all, the Solr community is fantastic and always helps when I need it. 
IMHO the devel documentation is dispersed in a lot of sources, blogs, wiki, 
lucidWorks wiki (I know that this wiki was donated to apache and it's in 
progress to present to the world as part of the project).

The curve for do funny thing with Solr at source level is hard, I see a lot of 
webinars teaching how deploy and use solr, but not how developing a 
ResponseWriter or a SearchComponent.

Unfortunately I don't have the knowledge to contribute right, in the future … 
will see.

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 5:24 PM, Alexandre Rafalovitch wrote:

 ommunity. I am trying to do my share throu  



Re: Flow Chart of Solr

2013-04-02 Thread Alexandre Rafalovitch
Yago,

My point - perhaps lost in too much text - was that Solr is presented - and
can function - as a black-box. Which makes it different from more
traditional open-source project. So, the stage-2 happens exactly when the
non-programmers have to cross the boundary from the black-box into
code-first approach and the hand-off is not particularly smooth. Or even
when - say - php or .Net programmer  tries to get beyond the basic
operations their client library and has the understand the server-side
aspects of Solr.

Regards,
   Alex.

On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro yago.rive...@gmail.com wrote:

 Alexandre,

 You describe the normal path when a beginner try to use a source of code
 that doesn't understand, black-box, reading code, hacking, ok now I know
 10% of the project, with lucky :p.



Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI
I think about myself as an example. I have started to make research about
Solr just for some weeks. I have learned Solr and its related projects. My
next step writing down the main steps Solr. We have separated learning
curve of Solr into two main categories.
First one is who are using it as out of the box components. Second one is
developer side.

Actually developer side branches into two way.

First one is general steps of it. i.e. document comes into Solr (i.e.
crawled data of Nutch). which analyzing processes are going to done
(stamming, hamming etc.), what will be doing after parsing step by step.
When a search query happens what happens step by step, at which step scores
are calculated so on so forth.
Second one is more code specific i.e. which handlers takes into account
data that will going to be indexed(no need the explain every handler at
this step) . Which are the analyzer, tokenizer classes and what are the
flow between them. How response handlers works and what are they.

Also explaining about cloud side is other work.

Some of explanations are currently presents at wiki (but some of them are
at very deep places at wiki and it is not easy to find the parent topic of
it, maybe starting wiki from a top age and branching all other topics as
possible as from it could be better)

If we could show the big picture, and beside of it the smaller pictures
within it, it would be great (if you know the main parts it will be easy to
go deep into the code i.e. you don't need to explain every handler, if you
show the way to the developer he/she could debug and find the needs)

When I think about myself as an example, I have to write down the steps of
Solr a bit detail  even I read many pages at wiki and a book about it, I
see that it is not easy even writing down the big picture of developer side.


2013/4/2 Alexandre Rafalovitch arafa...@gmail.com

 Yago,

 My point - perhaps lost in too much text - was that Solr is presented - and
 can function - as a black-box. Which makes it different from more
 traditional open-source project. So, the stage-2 happens exactly when the
 non-programmers have to cross the boundary from the black-box into
 code-first approach and the hand-off is not particularly smooth. Or even
 when - say - php or .Net programmer  tries to get beyond the basic
 operations their client library and has the understand the server-side
 aspects of Solr.

 Regards,
Alex.

 On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:

  Alexandre,
 
  You describe the normal path when a beginner try to use a source of code
  that doesn't understand, black-box, reading code, hacking, ok now I know
  10% of the project, with lucky :p.
 


 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)