Forking Solr

2015-10-16 Thread Ryan Josal
Hi guys, I'd like to get your tips on how to run a Solr fork at my
company.  I know Yonik has a "heliosearch" fork, and I'm sure many others
have a fork.  There have been times where I want to add features to an
existing core plugin, and subclassing isn't possible so I end up copying
the source code into my repo, then using some crazy reflection to get it to
work.  Sometimes there's a little bug in something and I have to do the
same thing.  Sometimes there's something I want to do deeper in core Solr
code that isn't pluggable and I end up doing an interesting workaround.
Sometimes I want to apply a patch from JIRA.  I also think forking solr
will make it easier for me to contribute patches back.  So here are my
questions:

*) how do I properly fork it outside of github to my own company's git
system?
*) how do I pull new changes?  I think I would expect to sync new changes
when there is a new public release.  What branches do I need to work
with/on?
*) how do I test my changes?  What part of the test suites do I run for
what changes?
*) how do I build a new version when I'm ready to go to prod?  This is
slightly more unclear to me now that it isn't just a war.

Thanks,
Ryan


Re: NullPointerException

2015-10-16 Thread Mark Fenbers
Yes, I'm aware that building an index is expensive and I will remove 
"buildOnStartup" once I have it working.  The field I added was an 
attempt to get it working...


I have attached my latest version of solrconfig.xml and schema.xml (both 
are in the same attachment), except that I have removed all block 
comments for your easier scrutiny.  The source of the correctly spelled 
words is a RedHat baseline file called /usr/share/dict/linux.words.  
(Does this also mean it is the source of the suggestions?)


thanks for the help!

Mark

On 10/13/2015 7:07 AM, Alessandro Benedetti wrote:

Generally it is highly discouraged to build the spellcheck on startup.
In the case of big suggestion file, you are going to build the suggester
data structures ( basically FST in memory and then in disk) for a long
time, on startup.
You should build your spellchecker only when you change the file source of
the suggestions,

Checking the snippet, first I see you add a field to the FileBased
Spellchecker config, which is useless.
Anyway should not be the problem.
Can you give us the source of suggestions ?
A snippet of the file ?

Cheers

On 13 October 2015 at 10:02, Duck Geraint (ext) GBJH <
geraint.d...@syngenta.com> wrote:





  5.3.0
  ${solr.data.dir:}
  
   
  

  
  
${solr.lock.type:native}
 true
  
  
  

  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}

  
   ${solr.autoCommit.maxTime:15000} 
   false 
 
  
   ${solr.autoSoftCommit.maxTime:-1} 
 

  
  
1024



   
 


true

   20
   200
false
2

  
  
 

 

  

  
 
   explicit
   10
 



  
  
 
   explicit
   json
   true
   text
 
  

  

  {!xport}
  xsort
  false



  query

  
  
  

/localapps/dev/EventLog/solr/EventLog2/conf/data-config.xml 
   

  

  

  text

  

  
  

  
  

 explicit 
 true

  
  
  
text_en


  WordBreak
  solr.WordBreakSolrSpellChecker
  logtext
  true
  true
  10



 solr.FileBasedSpellChecker
logtext 
FileDict
 /usr/share/dict/linux.words
 UTF-8
 /localapps/dev/EventLog/solr/EventLog2/data/spFile
 true
   0.5

  2

  1

  5

  4

  0.01

  
  
  

  


  FileDict
  WordBreak
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  
  
  

  
  
 
  true
  false
 

  terms

  

  
  
*:*
  

















   
   





   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   

   
   
   

   
   
   
   
   
   

   

   
   

   

 id
















 









  

  


  



  
  




  




  







  
  







  



  






  
  







  



  








  



  




  
  




  



  





  




  


  
















Re: File-based Spelling

2015-10-16 Thread Mark Fenbers

On 10/13/2015 9:30 AM, Dyer, James wrote:

Mark,

The older spellcheck implementations create an n-gram sidecar index, which is 
why you're seeing your name split into 2-grams like this.  See the IR Book by 
Manning et al, section 3.3.4 for more information.  Based on the results you're 
getting, I think it is loading your file correctly.  You should now try a query 
against this spelling index, using words *not* in the file you loaded that are 
within 1 or 2 edits from something that is in the dictionary.  If it doesn't 
yield suggestions, then post the relevant sections of the solrconfig.xml, 
schema.xml and also the query string you are trying.

James Dyer
Ingram Content Group

James, I've already done this.   My query string was "fenbers". This is 
my last name which does *not* occur in the linux.words file.  It is only 
1 edit distance from "fenders" which *is* in the linux.words file.  Yet, 
it claimed it found my misspelled word to be "fenber" without the "s" 
and it gave me these 8 suggestions:

f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

So I'm attaching the the entire solrconfig.xml and schema.xml that is in 
effect.  These are in a single file with all the block comments removed.


I'm also puzzled that you say "older implementations create a sidecar 
index"... because I am using v5.3.0, which was the latest version as of 
my download a month or two ago.  So, with my implementation being 
recent, why is an n-gram sidecar index still (seemingly) being produced?


thanks for the help!
Mark





  5.3.0
  ${solr.data.dir:}
  
   
  

  
  
${solr.lock.type:native}
 true
  
  
  

  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}

  
   ${solr.autoCommit.maxTime:15000} 
   false 
 
  
   ${solr.autoSoftCommit.maxTime:-1} 
 

  
  
1024



   
 


true

   20
   200
false
2

  
  
 

 

  

  
 
   explicit
   10
 



  
  
 
   explicit
   json
   true
   text
 
  

  

  {!xport}
  xsort
  false



  query

  
  
  

/localapps/dev/EventLog/solr/EventLog2/conf/data-config.xml 
   

  

  

  text

  

  
  

  
  

 explicit 
 true

  
  
  
text_en


  WordBreak
  solr.WordBreakSolrSpellChecker
  logtext
  true
  true
  10



 solr.FileBasedSpellChecker
logtext 
FileDict
 /usr/share/dict/linux.words
 UTF-8
 /localapps/dev/EventLog/solr/EventLog2/data/spFile
 true
   0.5

  2

  1

  5

  4

  0.01

  
  
  

  


  FileDict
  WordBreak
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  
  
  

  
  
 
  true
  false
 

  terms

  

  
  
*:*
  

















   
   





   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   

   
   
   

   
   
   
   
   
   

   

   
   

   

 id
















 









  

  


  



  
  




  




  







  
  







  



  






  
  







  



  








  



  




  
  




  



  





  




  


  
















Re: Forking Solr

2015-10-16 Thread Doug Turnbull
Ryan,

>From a "solr-user" perspective :) I would advise against forking Solr. Some
of our consulting business is "people who forked Solr, need to upgrade, and
now have gotten themselves into hot water."

I would try, in the following order
1. Creating a plugin (sounds like you can't do this)
2. Submitting a patch to Solr that makes it easier to create the plugin you
need
3. Copy-pasting code to create a plugin. I once had to do this for a
highlighter. It's ugly, but its better than forking.
4
999. Hiring Yonik :)
1000. Forking Solr

999 a prereq for 1000 :)

Even very heavily customized versions of Solr sold by major vendors that
staff committers are entirely plugin driven.

Cheers
-Doug


On Fri, Oct 16, 2015 at 3:30 PM, Alexandre Rafalovitch 
wrote:

> I suspect these questions should go the Lucene Dev list instead. This
> one is more for those who build on top of standard Solr.
>
> Regards,
>Alex.
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 16 October 2015 at 12:07, Ryan Josal  wrote:
> > Hi guys, I'd like to get your tips on how to run a Solr fork at my
> > company.  I know Yonik has a "heliosearch" fork, and I'm sure many others
> > have a fork.  There have been times where I want to add features to an
> > existing core plugin, and subclassing isn't possible so I end up copying
> > the source code into my repo, then using some crazy reflection to get it
> to
> > work.  Sometimes there's a little bug in something and I have to do the
> > same thing.  Sometimes there's something I want to do deeper in core Solr
> > code that isn't pluggable and I end up doing an interesting workaround.
> > Sometimes I want to apply a patch from JIRA.  I also think forking solr
> > will make it easier for me to contribute patches back.  So here are my
> > questions:
> >
> > *) how do I properly fork it outside of github to my own company's git
> > system?
> > *) how do I pull new changes?  I think I would expect to sync new changes
> > when there is a new public release.  What branches do I need to work
> > with/on?
> > *) how do I test my changes?  What part of the test suites do I run for
> > what changes?
> > *) how do I build a new version when I'm ready to go to prod?  This is
> > slightly more unclear to me now that it isn't just a war.
> >
> > Thanks,
> > Ryan
>



-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search 
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: Forking Solr

2015-10-16 Thread Ryan Josal
Thanks for the feedback, forking lucene/solr is my last resort indeed.

1) It's not about creating fresh new plugins.  It's about modifying
existing ones or core solr code.
2) I want to submit the patch to modify core solr or lucene code, but I
also want to run it in prod before its accepted and released publicly.
Also I think this helps solidify the patch over time.
3) I have to do this all the time, and I agree it's better than forking,
but doing this repeatedly over time has diminishing returns because it
increases the cost of upgrading solr.  I also requires some ugly reflection
in most cases, and in others copying verbatim a pile of other classes.

I will send my questions to lucene-dev, thanks!
Ryan

On Friday, October 16, 2015, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Ryan,
>
> From a "solr-user" perspective :) I would advise against forking Solr. Some
> of our consulting business is "people who forked Solr, need to upgrade, and
> now have gotten themselves into hot water."
>
> I would try, in the following order
> 1. Creating a plugin (sounds like you can't do this)
> 2. Submitting a patch to Solr that makes it easier to create the plugin you
> need
> 3. Copy-pasting code to create a plugin. I once had to do this for a
> highlighter. It's ugly, but its better than forking.
> 4
> 999. Hiring Yonik :)
> 1000. Forking Solr
>
> 999 a prereq for 1000 :)
>
> Even very heavily customized versions of Solr sold by major vendors that
> staff committers are entirely plugin driven.
>
> Cheers
> -Doug
>
>
> On Fri, Oct 16, 2015 at 3:30 PM, Alexandre Rafalovitch  >
> wrote:
>
> > I suspect these questions should go the Lucene Dev list instead. This
> > one is more for those who build on top of standard Solr.
> >
> > Regards,
> >Alex.
> >
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 16 October 2015 at 12:07, Ryan Josal >
> wrote:
> > > Hi guys, I'd like to get your tips on how to run a Solr fork at my
> > > company.  I know Yonik has a "heliosearch" fork, and I'm sure many
> others
> > > have a fork.  There have been times where I want to add features to an
> > > existing core plugin, and subclassing isn't possible so I end up
> copying
> > > the source code into my repo, then using some crazy reflection to get
> it
> > to
> > > work.  Sometimes there's a little bug in something and I have to do the
> > > same thing.  Sometimes there's something I want to do deeper in core
> Solr
> > > code that isn't pluggable and I end up doing an interesting workaround.
> > > Sometimes I want to apply a patch from JIRA.  I also think forking solr
> > > will make it easier for me to contribute patches back.  So here are my
> > > questions:
> > >
> > > *) how do I properly fork it outside of github to my own company's git
> > > system?
> > > *) how do I pull new changes?  I think I would expect to sync new
> changes
> > > when there is a new public release.  What branches do I need to work
> > > with/on?
> > > *) how do I test my changes?  What part of the test suites do I run for
> > > what changes?
> > > *) how do I build a new version when I'm ready to go to prod?  This is
> > > slightly more unclear to me now that it isn't just a war.
> > >
> > > Thanks,
> > > Ryan
> >
>
>
>
> --
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
> , LLC | 240.476.9983
> Author: Relevant Search 
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>


Re: Forking Solr

2015-10-16 Thread Alexandre Rafalovitch
I suspect these questions should go the Lucene Dev list instead. This
one is more for those who build on top of standard Solr.

Regards,
   Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 16 October 2015 at 12:07, Ryan Josal  wrote:
> Hi guys, I'd like to get your tips on how to run a Solr fork at my
> company.  I know Yonik has a "heliosearch" fork, and I'm sure many others
> have a fork.  There have been times where I want to add features to an
> existing core plugin, and subclassing isn't possible so I end up copying
> the source code into my repo, then using some crazy reflection to get it to
> work.  Sometimes there's a little bug in something and I have to do the
> same thing.  Sometimes there's something I want to do deeper in core Solr
> code that isn't pluggable and I end up doing an interesting workaround.
> Sometimes I want to apply a patch from JIRA.  I also think forking solr
> will make it easier for me to contribute patches back.  So here are my
> questions:
>
> *) how do I properly fork it outside of github to my own company's git
> system?
> *) how do I pull new changes?  I think I would expect to sync new changes
> when there is a new public release.  What branches do I need to work
> with/on?
> *) how do I test my changes?  What part of the test suites do I run for
> what changes?
> *) how do I build a new version when I'm ready to go to prod?  This is
> slightly more unclear to me now that it isn't just a war.
>
> Thanks,
> Ryan


Child document and parent document with same key

2015-10-16 Thread Jamie Johnson
I am looking at using child documents and noticed that if I specify a child
and parent with the same key solr indexes this fine and I can retrieve both
documents separately.  Is this expected to work?

-Jamie


Efficiency of integer storage/use

2015-10-16 Thread Robert Krüger
Hi,

I have a data model where I would store and index a lot of integer values
with a very restricted range (e.g. 0-255), so theoretically the 32 bits of
Solr's integer fields are complete overkill. I want to be able to to things
like vector distance calculations on those fields. Should I worry about the
"wasted" bits or will Solr compress/organize the index in a way that
compensates for this if there are only 256 (or even fewer) distinct values?

Any recommendations on how my fields should be defined to make things like
numeric functions work as fast as technically possible?

Thanks in advance,

Robert


Re: simple test on solr 5.2.1 wrong leader elected on startup

2015-10-16 Thread Alessandro Benedetti
On 15 October 2015 at 23:54, Matteo Grolla  wrote:

> Don't think so,
> the default behaviour at 4), to my knowledge,is to wait 3 minutes
> (leaderVoteWait) for all replicas to come up to avoid electing a leader
> with stale data.

So the observed behaviour is unexpected to me
>

If I read your sequence of events properly i see that at point 4 there is
no replica recovering.
You have only 8984 in the cluster and in the history of the cluster 8983
has been a leader ( during 8984 dead period).
Let's directly go to the code and avoid all the doubts !

The waiting of 3 minutes is in here :

if (!weAreReplacement) {
> allReplicasInLine = waitForReplicasToComeUp(leaderVoteWait);
> }


In our case  weAreReplacement should be true , because 8983 has been a
leader.
So we don't wait, we try to see if should be the leader, but we shouldn't
for this reason :

// maybe active but if the previous leader marked us as down and
> // we haven't recovered, then can't be leader
> final Replica.State lirState =
> zkController.getLeaderInitiatedRecoveryState(collection, shardId,
> core.getCoreDescriptor().getCloudDescriptor().getCoreNodeName());
> if (lirState == Replica.State.DOWN || lirState ==
> Replica.State.RECOVERING) {
> log.warn("Although my last published state is Active, the previous leader
> marked me "+core.getName()
> + " as " + lirState.toString()
> + " and I haven't recovered yet, so I shouldn't be the leader.");
> return false;
> }
> log.info("My last published State was Active, it's okay to be the
> leader.");
> return true;
> }
> log.info("My last published State was "
> + core.getCoreDescriptor().getCloudDescriptor().getLastPublished()
> + ", I won't be the leader.");
> // TODO: and if no one is a good candidate?


The second possible waiting is in registering the core :

// in this case, we want to wait for the leader as long as the leader might
> // wait for a vote, at least - but also long enough that a large cluster has
> // time to get its act together
> String leaderUrl = getLeader(cloudDesc, leaderVoteWait + 60);
>
> I scrolled a little bit the code, and I think that because is the only
live node 8984, this second wait will not happen, as no leader is there and
an election should have happened before.

I have no time now to get into debug, would be interesting if you can, but
i would bet that with a single alive node , that came alive when no-one was
recovering, it will become the leader.
Actually, waiting for 3 minutes for the old replica to come back ( which
eventually could never happen), it's a little bit counterintuitive, because
more new replicas could come, and it's not so reasonable to keep the
cluster without a leader for 3 minutes ...
I would suggest some debugging to have a better idea of the internals, and
i would be really interesting in a better insight !


> I created a cluster of 2 nodes copying the server dir to node1 and node2
> and using those as solrhome for the nodes
> created the collection with
> bin/solr create -c test
> so it's using the builtin schemaless configuration
>
> there's nothing custom, should be all pretty standard
>

Adding some login should help, maybe you are hitting some additional
waiting time, added after 4.10 .
We should start the research after we have more insights from the logs.

Cheers



>
> 2015-10-15 17:42 GMT+02:00 Alessandro Benedetti <
> benedetti.ale...@gmail.com>
> :
>
> > Hi Matteo,
> >
> > On 15 October 2015 at 16:16, Matteo Grolla 
> > wrote:
> >
> > > Hi,
> > >   I'm doing this test
> > > collection test is replicated on two solr nodes running on 8983, 8984
> > > using external zk
> > >
> > > 1)turn OFF solr 8984
> > > 2)add,commit a doc x con solr 8983
> > > 3)turn OFF solr 8983
> > > 4)turn ON solr 8984
> > >
> > At this point 8984 will be elected leader, because only element in the
> > cluster, it can not do anything to recover, so it will not replicate Doc
> x
> >
> > > 5)shortly after (leader still not elected) turn ON solr 8983
> > >
> > I assume that even if you are not able to see it, actually the leader
> > election was already starting, not taking in consideration 8983
> >
> > > 6)8984 is elected as leader
> > >
> > As expected
> >
> > > 7)doc x is present on 8983 but not on 8984 (check issuing a query)
> > >
> > This is expected as well.
> >
> > It is a very edge case, but i expect at the current status, the behaviour
> > you are obtaining is the expected one.
> > Probably the leader election should become smarter, for example, any
> time a
> > node came back to the cluster it should be checked, and in the case it
> > should be the leader a new election triggered.
> > Just thinking loud :)
> >
> >
> >
> > >
> > > attached are the logs of both solr
> > >
> > > BTW I'm using java 1.8.045 on osx yosemite and solr 5.2.1 seems much
> > > slower to startup than solr 4.10.3. it seems waiting on something
> > >
> >
> > I can not see any attached file, do you have any suggester in 

Re: Filtering on a Field with Suggestion

2015-10-16 Thread Salman Ansari
Thanks for pointing out as I am using Solr cloud 5.3. However, it looks
like they are talking about boolean operation in context field and not the
support of context field itself. Are you sure that context filtering is not
supported with any lookup prior to 5.4?
On Oct 16, 2015 12:47 PM, "Alessandro Benedetti" 
wrote:

> This will sound silly, but which version of Solr are you using ?
> According to :
> https://issues.apache.org/jira/browse/SOLR-7888
> This new cool feature will be included in solr 5.4 .
>
> Cheers
>
> On 15 October 2015 at 22:53, Salman Ansari 
> wrote:
>
> > Hi guys,
> >
> > I am working with Solr suggester as explained in this article.
> > https://cwiki.apache.org/confluence/display/solr/Suggester
> >
> > The suggester is working fine but I want to filter the results based on a
> > filed (which is type). I have tried to follow what was written at the end
> > of the article (about Context Filtering) but still could not get the
> filter
> > working.
> >
> > My Solr configuration for suggestion is
> >
> >   
> > 
> > mySuggester
> > AnalyzingInfixLookupFactory
> > DocumentDictionaryFactory
> > entity_autocomplete
> > type
> > text_auto
> > false
> > 
> >   
> >
> > I have two entries for "Bill Gates" one with type=people and the other
> with
> > type=organization.
> >
> > I have tried the following query but still get both records for
> suggestion
> > (The right thing is to get one since I only have one Bill Gates as a type
> > of organization)
> >
> > Here is my query
> >
> > http://
> >
> >
> [MySolr]/[MyCollection]/suggest?suggest=true=true=mySuggester=Bill=people
> >
> > Any comments why this is not filtering?
> >
> > Regards,
> > Salman
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Efficiency of integer storage/use

2015-10-16 Thread Alessandro Benedetti
Hi Robert,
current Solr compression will work really well , both for Stored and
DocValues contents.
Related the index term dictionaries, I ask for some help to other experts
as I never checked how the actual compression works in there, but I assume
it is quite efficient.

Usually the field type affects performance both  for specific features and
functions, for example if you are interested in range queries on Integers,
you should try the trieInt implementation.

Let us know for more details !

Cheers

On 16 October 2015 at 07:53, Robert Krüger  wrote:

> Hi,
>
> I have a data model where I would store and index a lot of integer values
> with a very restricted range (e.g. 0-255), so theoretically the 32 bits of
> Solr's integer fields are complete overkill. I want to be able to to things
> like vector distance calculations on those fields. Should I worry about the
> "wasted" bits or will Solr compress/organize the index in a way that
> compensates for this if there are only 256 (or even fewer) distinct values?
>
> Any recommendations on how my fields should be defined to make things like
> numeric functions work as fast as technically possible?
>
> Thanks in advance,
>
> Robert
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Filtering on a Field with Suggestion

2015-10-16 Thread Alessandro Benedetti
This will sound silly, but which version of Solr are you using ?
According to :
https://issues.apache.org/jira/browse/SOLR-7888
This new cool feature will be included in solr 5.4 .

Cheers

On 15 October 2015 at 22:53, Salman Ansari  wrote:

> Hi guys,
>
> I am working with Solr suggester as explained in this article.
> https://cwiki.apache.org/confluence/display/solr/Suggester
>
> The suggester is working fine but I want to filter the results based on a
> filed (which is type). I have tried to follow what was written at the end
> of the article (about Context Filtering) but still could not get the filter
> working.
>
> My Solr configuration for suggestion is
>
>   
> 
> mySuggester
> AnalyzingInfixLookupFactory
> DocumentDictionaryFactory
> entity_autocomplete
> type
> text_auto
> false
> 
>   
>
> I have two entries for "Bill Gates" one with type=people and the other with
> type=organization.
>
> I have tried the following query but still get both records for suggestion
> (The right thing is to get one since I only have one Bill Gates as a type
> of organization)
>
> Here is my query
>
> http://
>
> [MySolr]/[MyCollection]/suggest?suggest=true=true=mySuggester=Bill=people
>
> Any comments why this is not filtering?
>
> Regards,
> Salman
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Filtering on a Field with Suggestion

2015-10-16 Thread Alessandro Benedetti
Yes, as Jan confirmed, I am sure it was not there in 5.3 :)

Cheers

On 16 October 2015 at 12:10, Jan Høydahl  wrote:

> Yesm
>
> Context filtering is a new feature in yet-to-be-released Solr5.4.
> So you have to build branch_5x from source yourself to try it out.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 16. okt. 2015 kl. 12.35 skrev Salman Ansari :
> >
> > Thanks for pointing out as I am using Solr cloud 5.3. However, it looks
> > like they are talking about boolean operation in context field and not
> the
> > support of context field itself. Are you sure that context filtering is
> not
> > supported with any lookup prior to 5.4?
> > On Oct 16, 2015 12:47 PM, "Alessandro Benedetti" 
> > wrote:
> >
> >> This will sound silly, but which version of Solr are you using ?
> >> According to :
> >> https://issues.apache.org/jira/browse/SOLR-7888
> >> This new cool feature will be included in solr 5.4 .
> >>
> >> Cheers
> >>
> >> On 15 October 2015 at 22:53, Salman Ansari 
> >> wrote:
> >>
> >>> Hi guys,
> >>>
> >>> I am working with Solr suggester as explained in this article.
> >>> https://cwiki.apache.org/confluence/display/solr/Suggester
> >>>
> >>> The suggester is working fine but I want to filter the results based
> on a
> >>> filed (which is type). I have tried to follow what was written at the
> end
> >>> of the article (about Context Filtering) but still could not get the
> >> filter
> >>> working.
> >>>
> >>> My Solr configuration for suggestion is
> >>>
> >>>  
> >>>
> >>>mySuggester
> >>>AnalyzingInfixLookupFactory
> >>>DocumentDictionaryFactory
> >>>entity_autocomplete
> >>>type
> >>>text_auto
> >>>false
> >>>
> >>>  
> >>>
> >>> I have two entries for "Bill Gates" one with type=people and the other
> >> with
> >>> type=organization.
> >>>
> >>> I have tried the following query but still get both records for
> >> suggestion
> >>> (The right thing is to get one since I only have one Bill Gates as a
> type
> >>> of organization)
> >>>
> >>> Here is my query
> >>>
> >>> http://
> >>>
> >>>
> >>
> [MySolr]/[MyCollection]/suggest?suggest=true=true=mySuggester=Bill=people
> >>>
> >>> Any comments why this is not filtering?
> >>>
> >>> Regards,
> >>> Salman
> >>>
> >>
> >>
> >>
> >> --
> >> --
> >>
> >> Benedetti Alessandro
> >> Visiting card : http://about.me/alessandro_benedetti
> >>
> >> "Tyger, tyger burning bright
> >> In the forests of the night,
> >> What immortal hand or eye
> >> Could frame thy fearful symmetry?"
> >>
> >> William Blake - Songs of Experience -1794 England
> >>
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Filtering on a Field with Suggestion

2015-10-16 Thread Jan Høydahl
Yesm

Context filtering is a new feature in yet-to-be-released Solr5.4. 
So you have to build branch_5x from source yourself to try it out.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 16. okt. 2015 kl. 12.35 skrev Salman Ansari :
> 
> Thanks for pointing out as I am using Solr cloud 5.3. However, it looks
> like they are talking about boolean operation in context field and not the
> support of context field itself. Are you sure that context filtering is not
> supported with any lookup prior to 5.4?
> On Oct 16, 2015 12:47 PM, "Alessandro Benedetti" 
> wrote:
> 
>> This will sound silly, but which version of Solr are you using ?
>> According to :
>> https://issues.apache.org/jira/browse/SOLR-7888
>> This new cool feature will be included in solr 5.4 .
>> 
>> Cheers
>> 
>> On 15 October 2015 at 22:53, Salman Ansari 
>> wrote:
>> 
>>> Hi guys,
>>> 
>>> I am working with Solr suggester as explained in this article.
>>> https://cwiki.apache.org/confluence/display/solr/Suggester
>>> 
>>> The suggester is working fine but I want to filter the results based on a
>>> filed (which is type). I have tried to follow what was written at the end
>>> of the article (about Context Filtering) but still could not get the
>> filter
>>> working.
>>> 
>>> My Solr configuration for suggestion is
>>> 
>>>  
>>>
>>>mySuggester
>>>AnalyzingInfixLookupFactory
>>>DocumentDictionaryFactory
>>>entity_autocomplete
>>>type
>>>text_auto
>>>false
>>>
>>>  
>>> 
>>> I have two entries for "Bill Gates" one with type=people and the other
>> with
>>> type=organization.
>>> 
>>> I have tried the following query but still get both records for
>> suggestion
>>> (The right thing is to get one since I only have one Bill Gates as a type
>>> of organization)
>>> 
>>> Here is my query
>>> 
>>> http://
>>> 
>>> 
>> [MySolr]/[MyCollection]/suggest?suggest=true=true=mySuggester=Bill=people
>>> 
>>> Any comments why this is not filtering?
>>> 
>>> Regards,
>>> Salman
>>> 
>> 
>> 
>> 
>> --
>> --
>> 
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>> 
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>> 
>> William Blake - Songs of Experience -1794 England
>> 



Re: Nested entities not imported / do not show up in search?

2015-10-16 Thread Andrea Gazzarini
Hi Matthias,
you should use . in your expressions. So for
example, here

WHERE fb.EBI_NR='${firma.firma_ebi_nr}'

should be

WHERE fb.EBI_NR='${firma.EBI_NR}'

Best,
Andrea

2015-10-16 13:40 GMT+02:00 Matthias Fischer :

> Hello everybody,
>
> I am trying to import from an Oracle DB 11g2 via DIH using SOLR 5.3.1.
> In my relational DB there are company addresses (table tb_firmen_adressen)
> and branches (table tb_branchen). They have an n:m relationship using the
> join table tb_firmen_branchen.
> Now I would like to find companies by their name and in each company
> result I would like to see the associated branches.
> However I only get the companies without the nested entries. As a newbie
> I'd highly appreciate some help as there are no errors or warnings in the
> log file and I could not find any helpful hints in the documentation or
> elsewhere in the internet concerning my problem.
>
> Here is my data config:
>
> 
>  url="jdbc:oracle:thin:@//x.xxx:1521/pde11" user="myuser"
> password="mysecret"/>
> 
> 
>
> 
> 
> 
> 
>
> 
> 
> 
> 
>
> 
> 
> 
>
>
> And here are the relevant lines from my schema file:
>
> firma_ebi_nr
>
>required="true" indexed="true"  stored="true"/>
>indexed="true"  stored="true"/>
>indexed="true"  stored="true"/>
>indexed="true"  stored="true"/>
>indexed="true"  stored="true"/>
>
>
>
> After restarting solr and calling
> http://localhost:8983/solr/jcg/dataimport?command=full-import I get
> "Indexing completed. Added/Updated:  documents. Deleted 0 documents."
> So basically it seams to work, but my search results look like this:
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":71,
> "params":{
>   "q":"Der Bunte",
>   "defType":"edismax",
>   "indent":"true",
>   "qf":"firma_namenszeile_1",
>   "wt":"json"}},
>   "response":{"numFound":85,"start":0,"docs":[
>   {
> "firma_ebi_nr":123123123,
> "firma_namenszeile_1":"Der Bunte Laden",
> "_version_":1515185579421073408},
>   {
>  ...
> }
>
> Why are there no company branches inside the company records? What's wrong
> with my configuration? Any help is appreciated!
>
> Kind regards
> Matthias Fischer
>
>


Re: Recursively scan documents for indexing in a folder in SolrJ

2015-10-16 Thread Jan Høydahl
SolrJ does not have any file crawler built in.
But you are free to steal code from SimplePostTool.java related to directory 
traversal,
and then index each document found using SolrJ.

Note that SimplePostTool.java tries to be smart with what endpoint to post 
files to,
xml, csv and json content will be posted to /update while office docs go to 
/update/extract

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 16. okt. 2015 kl. 05.22 skrev Zheng Lin Edwin Yeo :
> 
> Hi,
> 
> I understand that in SimplePostTool (post.jar), there is this command to
> automatically detect content types in a folder, and recursively scan it for
> documents for indexing into a collection:
> bin/post -c gettingstarted afolder/
> 
> This has been useful for me to do mass indexing of all the files that are
> in the folder. Now that I'm moving to production and plans to use SolrJ to
> do the indexing as it can do more things like robustness checks and retires
> for indexes that fails.
> 
> However, I can't seems to find a way to do the same in SolrJ. Is it
> possible for this to be done in SolrJ? I'm using Solr 5.3.0
> 
> Thank you.
> 
> Regards,
> Edwin



Nested entities not imported / do not show up in search?

2015-10-16 Thread Matthias Fischer
Hello everybody,

I am trying to import from an Oracle DB 11g2 via DIH using SOLR 5.3.1. 
In my relational DB there are company addresses (table tb_firmen_adressen) and 
branches (table tb_branchen). They have an n:m relationship using the join 
table tb_firmen_branchen. 
Now I would like to find companies by their name and in each company result I 
would like to see the associated branches.
However I only get the companies without the nested entries. As a newbie I'd 
highly appreciate some help as there are no errors or warnings in the log file 
and I could not find any helpful hints in the documentation or elsewhere in the 
internet concerning my problem. 

Here is my data config:















  





And here are the relevant lines from my schema file:

firma_ebi_nr

 
 
 
 
 
 


After restarting solr and calling 
http://localhost:8983/solr/jcg/dataimport?command=full-import I get "Indexing 
completed. Added/Updated:  documents. Deleted 0 documents."
So basically it seams to work, but my search results look like this:

{
  "responseHeader":{
"status":0,
"QTime":71,
"params":{
  "q":"Der Bunte",
  "defType":"edismax",
  "indent":"true",
  "qf":"firma_namenszeile_1",
  "wt":"json"}},
  "response":{"numFound":85,"start":0,"docs":[
  {
"firma_ebi_nr":123123123,
"firma_namenszeile_1":"Der Bunte Laden",
"_version_":1515185579421073408},
  {
 ...
}

Why are there no company branches inside the company records? What's wrong with 
my configuration? Any help is appreciated!

Kind regards
Matthias Fischer



RE: Recursively scan documents for indexing in a folder in SolrJ

2015-10-16 Thread Duck Geraint (ext) GBJH
Also, check this link for SolrJ example code (including the recursion):
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Geraint


Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com

-Original Message-
From: Jan Høydahl [mailto:jan@cominvent.com]
Sent: 16 October 2015 12:14
To: solr-user@lucene.apache.org
Subject: Re: Recursively scan documents for indexing in a folder in SolrJ

SolrJ does not have any file crawler built in.
But you are free to steal code from SimplePostTool.java related to directory 
traversal, and then index each document found using SolrJ.

Note that SimplePostTool.java tries to be smart with what endpoint to post 
files to, xml, csv and json content will be posted to /update while office docs 
go to /update/extract

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 16. okt. 2015 kl. 05.22 skrev Zheng Lin Edwin Yeo :
>
> Hi,
>
> I understand that in SimplePostTool (post.jar), there is this command
> to automatically detect content types in a folder, and recursively
> scan it for documents for indexing into a collection:
> bin/post -c gettingstarted afolder/
>
> This has been useful for me to do mass indexing of all the files that
> are in the folder. Now that I'm moving to production and plans to use
> SolrJ to do the indexing as it can do more things like robustness
> checks and retires for indexes that fails.
>
> However, I can't seems to find a way to do the same in SolrJ. Is it
> possible for this to be done in SolrJ? I'm using Solr 5.3.0
>
> Thank you.
>
> Regards,
> Edwin





Syngenta Limited, Registered in England No 2710846;Registered Office : Syngenta 
Limited, European Regional Centre, Priestley Road, Surrey Research Park, 
Guildford, Surrey, GU2 7YH, United Kingdom

 This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.


Re: Highlight with NGram and German S Sharp "ß"

2015-10-16 Thread Jérôme Bernardes

Thanks for your reply Scott.

I tried

bs.language=de=de

Unfortunately the problem still occurs.
I have just discovered that the problem does not only affect "ß" but 
also "æ" (which is mapped to "ae"

at query and index time)
q=hae   -->   hæna
So it seems to me that the problem is related to any single character 
that is map to several characters using class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>


Jérôme

Le 13/10/2015 07:46, Scott Stults a écrit :

My guess is that the boundary scanner isn't configured right for your
highlighter. Try setting the bs.language and bs.country parameters either
in your request or in the requestHandler.


k/r,
Scott

On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes 

AW: Nested entities not imported / do not show up in search?

2015-10-16 Thread Matthias Fischer
Thank you, Andrea, for answering so quickly.

However I got further errors. I also had to change 
"firma_ebi_nr" to "id". But it 
still does not work properly. It seems that an id is auto generated for the 
company documents but not for the nested ones (the business branches). Any 
ideas how to fix this? 

2015-10-16 12:49:29.650 WARN  (Thread-17) [   x:jcg] o.a.s.h.d.SolrWriter Error 
creating document : 
SolrInputDocument(
fields: [firma_ebi_nr=317709682, firma_namenszeile_1=Example Company, 
id=3c7f7421-9d51-4056-a2a0-eebab87a546a, _version_=1515192078460518400, 
_root_=3c7f7421-9d51-4056-a2a0-eebab87a546a], 
children: [
   SolrInputDocument(fields: [branche_ebc_code=7, 
_root_=3c7f7421-9d51-4056-a2a0-eebab87a546a]), 
   SolrInputDocument(fields: [branche_ebc_code=47000, 
_root_=3c7f7421-9d51-4056-a2a0-eebab87a546a]), 
   SolrInputDocument(fields: [branche_ebc_code=47700, 
_root_=3c7f7421-9d51-4056-a2a0-eebab87a546a]), 
   SolrInputDocument(fields: [branche_ebc_code=47790, 
_root_=3c7f7421-9d51-4056-a2a0-eebab87a546a]), 
   SolrInputDocument(fields: [branche_ebc_code=47791, 
_root_=3c7f7421-9d51-4056-a2a0-eebab87a546a])])
org.apache.solr.common.SolrException: [doc=null] missing required field: id
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:198)
at 
org.apache.solr.update.AddUpdateCommand$1.next(AddUpdateCommand.java:191)
at 
org.apache.solr.update.AddUpdateCommand$1.next(AddUpdateCommand.java:166)
at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:259)
at 
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:413)
at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1316)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:235)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:94)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:259)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:524)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)

Kind regards,
Matthias

-Ursprüngliche Nachricht-
Von: Andrea Gazzarini [mailto:a.gazzar...@gmail.com] 
Gesendet: Freitag, 16. Oktober 2015 13:59
An: solr-user@lucene.apache.org
Betreff: Re: Nested entities not imported / do not show up in search?

Hi Matthias,
you should use . in your expressions. So for example, 
here

WHERE fb.EBI_NR='${firma.firma_ebi_nr}'

should be

WHERE fb.EBI_NR='${firma.EBI_NR}'

Best,
Andrea

2015-10-16 13:40 GMT+02:00 Matthias Fischer :

> Hello everybody,
>
> I am trying to import from an Oracle DB 11g2 via DIH using SOLR 5.3.1.
> In my relational DB there are company addresses (table 
> tb_firmen_adressen) and branches (table tb_branchen). They have an n:m 
> relationship using the join table tb_firmen_branchen.
> Now I would like to find companies by their name and in each company 
> result I would like to see the associated branches.
> However I only get the companies without the nested entries. As a 
> newbie I'd highly appreciate some help as there are no errors or 
> warnings in the log file and I could not find any helpful hints in the 
> documentation or elsewhere in the internet 

Re: Nested entities not imported / do not show up in search?

2015-10-16 Thread Andrea Gazzarini
Hi Matthias,
I guess the company.id field is not unique so you need a "compound"
uniqueKey on Solr, which is not strctly possible. As consequence of that
(company) UUID is probably created before the index phase by an
UpdateRequestProcessor [1] so you should check your solrconfig.xml and, if
I'm right, check if the same strategy could be used for the nested entities.

Andrea

[1]
http://lucene.apache.org/solr/5_2_1/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html

2015-10-16 17:11 GMT+02:00 Matthias Fischer :

> Thank you, Andrea, for answering so quickly.
>
> However I got further errors. I also had to change
> "firma_ebi_nr" to "id". But
> it still does not work properly. It seems that an id is auto generated for
> the company documents but not for the nested ones (the business branches).
> Any ideas how to fix this?
>
> 2015-10-16 12:49:29.650 WARN  (Thread-17) [   x:jcg] o.a.s.h.d.SolrWriter
> Error creating document :
> SolrInputDocument(
> fields: [firma_ebi_nr=317709682, firma_namenszeile_1=Example Company,
> id=3c7f7421-9d51-4056-a2a0-eebab87a546a, _version_=1515192078460518400,
> _root_=3c7f7421-9d51-4056-a2a0-eebab87a546a],
> children: [
>SolrInputDocument(fields: [branche_ebc_code=7,
> _root_=3c7f7421-9d51-4056-a2a0-eebab87a546a]),
>SolrInputDocument(fields: [branche_ebc_code=47000,
> _root_=3c7f7421-9d51-4056-a2a0-eebab87a546a]),
>SolrInputDocument(fields: [branche_ebc_code=47700,
> _root_=3c7f7421-9d51-4056-a2a0-eebab87a546a]),
>SolrInputDocument(fields: [branche_ebc_code=47790,
> _root_=3c7f7421-9d51-4056-a2a0-eebab87a546a]),
>SolrInputDocument(fields: [branche_ebc_code=47791,
> _root_=3c7f7421-9d51-4056-a2a0-eebab87a546a])])
> org.apache.solr.common.SolrException: [doc=null] missing required field: id
> at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:198)
> at
> org.apache.solr.update.AddUpdateCommand$1.next(AddUpdateCommand.java:191)
> at
> org.apache.solr.update.AddUpdateCommand$1.next(AddUpdateCommand.java:166)
> at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:259)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:413)
> at
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1316)
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:235)
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706)
> at
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> at
> org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:94)
> at
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
> at
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:259)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:524)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
>
> Kind regards,
> Matthias
>
> -Ursprüngliche Nachricht-
> Von: Andrea Gazzarini [mailto:a.gazzar...@gmail.com]
> Gesendet: Freitag, 16. Oktober 2015 13:59
> An: solr-user@lucene.apache.org
> Betreff: Re: Nested entities not imported / do not show up in search?
>
> Hi Matthias,
> you should use . in your expressions. So for
> example, here
>
> WHERE fb.EBI_NR='${firma.firma_ebi_nr}'
>
> should be
>
> WHERE fb.EBI_NR='${firma.EBI_NR}'
>
> Best,
> Andrea
>
> 2015-10-16 13:40 GMT+02:00 

Sold Geospatial Visualisation

2015-10-16 Thread Vijaya Narayana Reddy Bhoomi Reddy

Hi,

I am aware of Solr’s geospatial capabilities. However, wondering what is the 
best way to visualise Solr geospatial data. Is there any native support in 
Solritas or is there any other mechanism that suits thes requirement best? For 
example, if my problem is to find the best possible route using a given set of 
locations, and visualise it over a map, how can it be done?

Please let me know your suggestions. Any help in this regard is greatly 
appreciated.

Thanks
Vijay 



-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: Efficiency of integer storage/use

2015-10-16 Thread Erick Erickson
Under the covers, Lucene stores ints in a packed format, so I'd just count
on that for a first pass.

What is "a lot of integer values"? Hundreds of millions? Billions? Trillions?

Unless you give us some indication of scale, it's hard to say anything
helpful. But unless you have some evidence that your going to blow out
memory I'd just ignore the "wasted" bits. Especially if you can use docValues,
that option holds much of the underlying data in MMapDirectory
that uses swappable OS memory

Best,
Erick

On Fri, Oct 16, 2015 at 1:53 AM, Robert Krüger  wrote:
> Hi,
>
> I have a data model where I would store and index a lot of integer values
> with a very restricted range (e.g. 0-255), so theoretically the 32 bits of
> Solr's integer fields are complete overkill. I want to be able to to things
> like vector distance calculations on those fields. Should I worry about the
> "wasted" bits or will Solr compress/organize the index in a way that
> compensates for this if there are only 256 (or even fewer) distinct values?
>
> Any recommendations on how my fields should be defined to make things like
> numeric functions work as fast as technically possible?
>
> Thanks in advance,
>
> Robert


Re: Help me read Thread

2015-10-16 Thread Rallavagu
One more observation made is that tomcat's acceptor thread for http 
disappears (http-bio-8080-acceptor thread) and due to this no incoming 
connections could be opened on http. During this time ZK potentially 
thinks node is up and shows green from leader.


On 10/13/15 9:17 AM, Erick Erickson wrote:

How heavy is heavy? The proverbial smoking gun here will be messages in any
logs referring to "leader initiated recovery". (note, that's the
message I remember seeing,
it may not be exact).

There's no particular work-around here except to back off the indexing
load. Certainly increasing the
thread pool size allowed this to surface. Also 5.2 has some
significant improvements in this area, see:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/

And a lot depends on how you're indexing, batching up updates is a
good thing. If you go to a
multi-shard setup, using SolrJ and CloudSolrServer (CloudSolrClient in
5.x) would help. More
shards would help as well,  but I'd first take a look at the indexing
process and be sure you're
batching up updates.

It's also possible if indexing is a once-a-day process and it fits
with your SLAs to shut off the replicas,
index to the leader, then turn the replicas back on. That's not all
that satisfactory, but I've seen it used.

But with a single shard setup, I really have to ask why indexing at
such a furious rate is
required that you're hitting this. Are you unable to reduce the indexing rate?

Best,
Erick

On Tue, Oct 13, 2015 at 9:08 AM, Rallavagu  wrote:

Also, we have increased number of connections per host from default (20) to
100 for http thread pool to communicate with other nodes. Could this have
caused the issues as it can now spin many threads to send updates?


On 10/13/15 8:56 AM, Erick Erickson wrote:


Is this under a very heavy indexing load? There were some
inefficiencies that caused followers to work a lot harder than the
leader, but the leader had to spin off a bunch of threads to send
update to followers. That's fixed int he 5.2 release.

Best,
Erick

On Tue, Oct 13, 2015 at 8:40 AM, Rallavagu  wrote:


Please help me understand what is going on with this thread.

Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat
with
500 threads.


There are 47 threads overall and designated leader becomes unresponsive
though shows "green" from cloud perspective. This is causing issues.

particularly,

"   at

org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
  ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"



"http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive,
native_blocked, daemon
  at __lll_lock_wait+34(:0)@0x382ba0e262
  at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
  at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
  at _L_unlock_16+44(:0)@0x382ba0f710
  at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
  at

org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
  at

org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
  at

org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
  at

org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
  at

org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
  ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
  at

org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
  at

org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
  at
org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
  at

org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
  at

org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
  at

org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
  at

org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
  at

org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
  at


Re: Recursively scan documents for indexing in a folder in SolrJ

2015-10-16 Thread Zheng Lin Edwin Yeo
Thanks for your advice. I also found this method which so far has been able
to traverse all the documents in the folder and index them in Solr.

public static void showFiles(File[] files) {
for (File file : files) {
if (file.isDirectory()) {
System.out.println("Directory: " + file.getName());
showFiles(file.listFiles()); // Calls same method again.
} else {
System.out.println("File: " + file.getName());
}
}}

The problem for this is that it is indexing all the files regardless of the
formats, instead of just those formats in post.jar. So I guess still have
to "steal" some codes from there to detect the file format?

As for files that contains non-English characters (Eg; Chinese characters),
it is currently not able to read the Chinese characters, and it is all read
as a series of "???". Any idea how to solve this problem?

Thank you.

Regards,
Edwin


On 16 October 2015 at 21:16, Duck Geraint (ext) GBJH <
geraint.d...@syngenta.com> wrote:

> Also, check this link for SolrJ example code (including the recursion):
> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
>
> Geraint
>
>
> Geraint Duck
> Data Scientist
> Toxicology and Health Sciences
> Syngenta UK
> Email: geraint.d...@syngenta.com
>
> -Original Message-
> From: Jan Høydahl [mailto:jan@cominvent.com]
> Sent: 16 October 2015 12:14
> To: solr-user@lucene.apache.org
> Subject: Re: Recursively scan documents for indexing in a folder in SolrJ
>
> SolrJ does not have any file crawler built in.
> But you are free to steal code from SimplePostTool.java related to
> directory traversal, and then index each document found using SolrJ.
>
> Note that SimplePostTool.java tries to be smart with what endpoint to post
> files to, xml, csv and json content will be posted to /update while office
> docs go to /update/extract
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 16. okt. 2015 kl. 05.22 skrev Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > I understand that in SimplePostTool (post.jar), there is this command
> > to automatically detect content types in a folder, and recursively
> > scan it for documents for indexing into a collection:
> > bin/post -c gettingstarted afolder/
> >
> > This has been useful for me to do mass indexing of all the files that
> > are in the folder. Now that I'm moving to production and plans to use
> > SolrJ to do the indexing as it can do more things like robustness
> > checks and retires for indexes that fails.
> >
> > However, I can't seems to find a way to do the same in SolrJ. Is it
> > possible for this to be done in SolrJ? I'm using Solr 5.3.0
> >
> > Thank you.
> >
> > Regards,
> > Edwin
>
>
> 
>
>
> Syngenta Limited, Registered in England No 2710846;Registered Office :
> Syngenta Limited, European Regional Centre, Priestley Road, Surrey Research
> Park, Guildford, Surrey, GU2 7YH, United Kingdom
> 
>  This message may contain confidential information. If you are not the
> designated recipient, please notify the sender immediately, and delete the
> original and any copies. Any use of the message by you is prohibited.
>


Re: Forking Solr

2015-10-16 Thread Upayavira


On Fri, Oct 16, 2015, at 04:00 PM, Ryan Josal wrote:
> Thanks for the feedback, forking lucene/solr is my last resort indeed.
> 
> 1) It's not about creating fresh new plugins.  It's about modifying
> existing ones or core solr code.
> 2) I want to submit the patch to modify core solr or lucene code, but I
> also want to run it in prod before its accepted and released publicly.
> Also I think this helps solidify the patch over time.
> 3) I have to do this all the time, and I agree it's better than forking,
> but doing this repeatedly over time has diminishing returns because it
> increases the cost of upgrading solr.  I also requires some ugly
> reflection
> in most cases, and in others copying verbatim a pile of other classes.

If you want to patch a component, change its package name and fork that
component. I have a custom MoreLikeThisHandler in production quite
happily like this.

I've also done an SVN checkout of Solr, made my code changes there, and
then created a local git repo that I can track my own changes for stuff
that will eventually get pushed back to Solr. 

I work concurrently on a number of patches to the Admin UI. They tend to
sit in different JIRAs as patches for a few days before I commit them,
so this local git repo makes it much easier for me to track my changes,
but from the Solr community's perspective, I'm just using SVN.

I could easily push this git repo up to github or such if I thought that
added value.

Then, I regularly run svn update which keeps this checkout up-to-date,
and confirm it hasn't broken things.

If you wanted to run against a specific version in Solr, you could force
SVN to a specific revision (e.g. of the 5x branch) - the one that was
released, and git merge your patches into it, etc, etc, etc.

Upayavira