Re: solrj returning no results but curl can get them

2015-01-30 Thread S L
It was pilot error. I just reviewed my servlet and noticed a parameter in
web.xml that was looking to find data for the new product in the production
index which doesn't have that data yet while my curl command was running
against the staging index. I rebuilt the servlet with the fixed parameter
and life is now good.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrj-returning-no-results-but-curl-can-get-them-tp4183053p4183119.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Does DocValues improve Grouping performance ?

2015-01-30 Thread Cario, Elaine
Hi Shamik,

We use DocValues for grouping, and although I have nothing to compare it to (we 
started with DocValues), we are also seeing similar poor results as you: easily 
60% overhead compared to non-group queries.  Looking around for some solution, 
no quick fix is presenting itself unfortunately.  CollapsingQParserPlugin also 
is too limited for our needs.

-Original Message-
From: Shamik Bandopadhyay [mailto:sham...@gmail.com] 
Sent: Thursday, January 15, 2015 6:02 PM
To: solr-user@lucene.apache.org
Subject: Does DocValues improve Grouping performance ?

Hi,

   Does use of DocValues provide any performance improvement for Grouping ?
I' looked into the blog which mentions improving Grouping performance through 
DocValues.

https://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/

Right now, Group by queries (which I can't sadly avoid) has become a huge 
bottleneck. It has an overhead of 60-70% compared to the same query san group 
by. Unfortunately, I'm not able to be CollapsingQParserPlugin as it doesn't 
have a support similar to group.facet feature.

My understanding on DocValues is that it's intended for faceting and sorting. 
Just wondering if anyone have tried DocValues for Grouping and saw any 
improvements ?

-Thanks,
Shamik


Re: AW: AW: AW: CoreContainer#createAndLoad, existing cores not loaded

2015-01-30 Thread Shawn Heisey
On 1/29/2015 11:37 PM, Clemens Wyss DEV wrote:
 The recommendation these days is to NOT use the embedded server
 We would love to, as it is clear that this is not the Solr-way to go. The 
 reason for us building upon EmbeddedSolrServer is, we have more than 
 150sites, each with ist own index (core). If we'd go client server then we 
 could no easily update the solr server(s) without also updating all clients 
 (i.e. the 150 sites) at same time. And having a dedicated Solr server for 
 every client/site is not really an option, is it?
 
 Or can for example a 4.10.3 client talk to a Solr 5/6 Server? Also when 
 updating the Solr server, doesn't that also require a re-index of all data as 
 the Luncene-storage format might have changed?

Cross-version compatibility between SolrJ and Solr is very high, as long
as you're not running SolrCloud.  SolrCloud is *incredibly* awesome, but
it's not for everyone.

Without SolrCloud, the communication is http only, using very stable
APIs that have been around since pretty much the beginning of Solr.  In
the 1.x and 3.x days, there were occasional code tweaks required for
cross-version compatibility, but the API has been extremely stable since
early 4.x -- for a couple of years now.

SolrCloud is much more recent and far more complex, so problems or
deficiencies are sometimes found with the API.  Fixing those bugs
sometimes requires changes that are incompatible with other versions of
the Java client.  The SolrJ java client is an integral part of Solr
itself, so SolrCloud functionality in the client is tightly coupled to
specifics in the API that are undergoing rapid change from version to
version.

I don't think that SolrCloud is even possible with the embedded server,
because it requires HTTP for inter-server communication.  The embedded
server doesn't listen for HTTP.

Thanks,
Shawn



Removing a stored field from solrcloud 4.4

2015-01-30 Thread Nishanth S
Hello,

I have a field which is indexed  and stored  in the solr schema( 4.4.solr
cloud).This field is relatively huge and I plan to  only index the field
and not to store.Is there a  need to re-index the  documents once this
change is made?.

Thanks,
Nishanth


Calling custom request handler with data import

2015-01-30 Thread vineet yadav
Hi,
I am using data import handler to import data from mysql, and I want to
identify name entities from it. So I am using following example(
http://www.searchbox.com/named-entity-recognition-ner-in-solr/). where I am
using stanford ner to identify name entities. I am using following
requesthandler

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
 str name=configdata-import.xml/str
 /lst
/requestHandler

for importing data from mysql and

requestHandler name=/ner class=com.searchbox.ner.NerHandler /
  updateRequestProcessorChain name=mychain 
   processor class=com.searchbox.ner.NerProcessorFactory 
 lst name=queryFields
   str name=queryFieldcontent/str
 /lst
   /processor
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain
 requestHandler name=/update class=solr.UpdateRequestHandler
   lst name=defaults
 str name=update.chainmychain/str
   /lst
  /requestHandler

for identifying name entities.NER request handler identifies name entities
from content field, but store extracted entities in solr fields.

NER request handler was working when I am using nutch with solr. But When I
am importing data from mysql, ner request handler is not invoked. So
entities are not stored in solr for imported documents. Can anybody tell me
how to call custom request handler in data import handler.

Otherwise if I can invoke ner request handler externally, so that it can
index person, organization and location in solr for imported document. It
is also fine. Any suggestion are welcome.

Thanks
Vineet Yadav


timestamp field and atomic updates

2015-01-30 Thread Bill Au
I have a timestamp field in my schema to track when each doc was indexed:

field name=timestamp type=date indexed=true stored=true
default=NOW multiValued=false /

Recently, we have switched over to use atomic update instead of re-indexing
when we need to update a doc in the index.  It looks to me that the
timestamp field is not updated during an atomic update.  I have also looked
into TimestampUpdateProcessorFactory and it looks to me that won't help in
my case.

Is there anything within Solr that I can use to update the timestamp during
atomic update, or do I have to explicitly include the timestamp field as
part of the atomic update?

Bill


Re: solrj returning no results but curl can get them

2015-01-30 Thread S L
Hi Dmitri,

I do have a question mark in my search. I see that I dropped that
accidentally when I was copying/pasting/formatting the details. 

My curl command is curl http://myserver/myapp/myproduct?fl=*,.;

And, it works fine whether I have .../myproduct/?fl=*, or if I leave out
the / before ?fl=*.

The curl command works perfectly with any of the four request handlers so I
believe the data to be correct and my solrj code works perfectly with three
out of four of the request handlers so I believe the code to be correct as
well.

Thanks.

Sol



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrj-returning-no-results-but-curl-can-get-them-tp4183053p4183116.html
Sent from the Solr - User mailing list archive at Nabble.com.


Hit Highlighting and More Like This

2015-01-30 Thread Tim Hearn
Hi all,

I'm fairly new to Solr.  It seems like it should be possible to enable the
hit highlighting feature and more like this feature at the same time, with
the key words from the MLT query being the terms highlighted.  Is this
possible?  I am trying right now to do this, but I am not having any
snippets returned to me.

Thanks!


Re: Removing a stored field from solrcloud 4.4

2015-01-30 Thread Erick Erickson
Yes and no. Solr should continue to work fine, just all new documents won't
have the stored field to return to the clients. As you re-index docs,
subsequent merges will purge the stored data _for the docs you've
re-indexed_.

But I would re-index just to get my system in a consistent state.

Best
Erick

On Fri, Jan 30, 2015 at 9:40 AM, Nishanth S nishanth.2...@gmail.com wrote:

 Hello,

 I have a field which is indexed  and stored  in the solr schema( 4.4.solr
 cloud).This field is relatively huge and I plan to  only index the field
 and not to store.Is there a  need to re-index the  documents once this
 change is made?.

 Thanks,
 Nishanth



Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-30 Thread fabio.bozzo
Nice! It works indeed!
Sorry I didn't noticed that before.

But what if I want the same for the iPhone?
I mean suggesting I phone for users who searched iphone. Minbreaklength
of 1 is just too small isn't it?

Il sabato 31 gennaio 2015, Dyer, James-2 [via Lucene] 
ml-node+s472066n4183176...@n3.nabble.com ha scritto:

 You need to decrease this to at least 2 because the length of go is 3.

 int name=minBreakLength3/int

 James Dyer
 Ingram Content Group


 -Original Message-
 From: fabio.bozzo [mailto:[hidden email]
 http:///user/SendEmail.jtp?type=nodenode=4183176i=0]
 Sent: Wednesday, January 28, 2015 4:55 PM
 To: [hidden email] http:///user/SendEmail.jtp?type=nodenode=4183176i=1
 Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

 I tried increasing my alternativeTermCount to 5 and enable extended
 results.
 I also added a filter fq parameter to clarify what I mean:

 *Querying for go pro is good:*

 {
   responseHeader: {
 status: 0,
 QTime: 2,
 params: {
   q: go pro,
   indent: true,
   fq: marchio:\GO PRO\,
   rows: 1,
   wt: json,
   spellcheck.extendedResults: true,
   _: 1422485581792
 }
   },
   response: {
 numFound: 27,
 start: 0,
 docs: [
   {
 codice_produttore_s: DK00150020,
 codice_s: 5.BAT.27407,
 id: 27407,
 marchio: GO PRO,
 barcode_interno_s: 185323000958,
 prezzo_acquisto_d: 16.12,
 data_aggiornamento_dt: 2012-06-21T00:00:00Z,
 descrizione: BATTERIA GO PRO HERO ,
 prezzo_vendita_d: 39.9,
 categoria: Batterie,
 _version_: 1491583424191791000
   },

  

 ]
   },
   spellcheck: {
 suggestions: [
   go pro,
   {
 numFound: 1,
 startOffset: 0,
 endOffset: 6,
 origFreq: 433,
 suggestion: [
   {
 word: gopro,
 freq: 2
   }
 ]
   },
   correctlySpelled,
   false,
   collation,
   [
 collationQuery,
 gopro,
 hits,
 3,
 misspellingsAndCorrections,
 [
   go pro,
   gopro
 ]
   ]
 ]
   }
 }

 While querying for gopro is not:

 {
   responseHeader: {
 status: 0,
 QTime: 6,
 params: {
   q: gopro,
   indent: true,
   fq: marchio:\GO PRO\,
   rows: 1,
   wt: json,
   spellcheck.extendedResults: true,
   _: 1422485629480
 }
   },
   response: {
 numFound: 3,
 start: 0,
 docs: [
   {
 codice_produttore_s: DK0030010,
 codice_s: 5.VID.39163,
 id: 38814,
 marchio: GO PRO,
 barcode_interno_s: 818279012477,
 prezzo_acquisto_d: 150.84,
 data_aggiornamento_dt: 2014-12-24T00:00:00Z,
 descrizione: VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM,
 prezzo_vendita_d: 219,
 categoria: Fotografia,
 _version_: 1491583425479442400
   },
 
 ]
   },
   spellcheck: {
 suggestions: [
   gopro,
   {
 numFound: 1,
 startOffset: 0,
 endOffset: 5,
 origFreq: 2,
 suggestion: [
   {
 word: giro,
 freq: 6
   }
 ]
   },
   correctlySpelled,
   false
 ]
   }
 }

 ---

 I'd like go pro as a suggestion for gopro too.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4183176.html
  To unsubscribe from Suggesting broken words with
 solr.WordBreakSolrSpellChecker, click here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4182172code=Zi5ib3p6b0AzLXcuaXR8NDE4MjE3MnwxODkyODA0NDQy
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



-- 
Fabio Bozzo
SW Engineer

3W s.r.l.
Via Luisetti,7
13900-Biella ( BI )
Tel. 015.84.97.804 / 015.89.76.350
Fax 015.84.70.450

Registro imprese Biella n.01965270026
R.E.A. BI 175416

 Questo messaggio di posta elettronica contiene informazioni di carattere
confidenziale rivolte esclusivamente al destinatario sopra indicato.
E' vietato l'uso, la diffusione, distribuzione o riproduzione da parte di
ogni altra persona.
Nel caso aveste ricevuto questo messaggio di posta elettronica per 

Replication in solrloud

2015-01-30 Thread solr2020
Hi,


We have 4 servers in Solrcloud with one shard. 2 of the servers are not in
sync with other two.We like to force replication manually to keep all the
servers in sync.Do we have a command to force replication? (other than Solr
restart).


Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-in-solrloud-tp4183103.html
Sent from the Solr - User mailing list archive at Nabble.com.


New UI for SOLR-based projects

2015-01-30 Thread Roman Chyla
Hi everybody,

There exists a new open-source implementation of a search interface for
SOLR. It is written in Javascript (using Backbone), currently in version
v1.0.19 - but new features are constantly coming. Rather than describing it
in words, please see it in action for yourself at http://ui.adslabs.org -
I'd recommend exploring facets, the query form, and visualizations.

The code lives at: http://github.com/adsabs/bumblebee

Best,

  Roman


RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-30 Thread Dyer, James
You need to decrease this to at least 2 because the length of go is 3.

int name=minBreakLength3/int

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Wednesday, January 28, 2015 4:55 PM
To: solr-user@lucene.apache.org
Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

I tried increasing my alternativeTermCount to 5 and enable extended results.
I also added a filter fq parameter to clarify what I mean:

*Querying for go pro is good:*

{
  responseHeader: {
status: 0,
QTime: 2,
params: {
  q: go pro,
  indent: true,
  fq: marchio:\GO PRO\,
  rows: 1,
  wt: json,
  spellcheck.extendedResults: true,
  _: 1422485581792
}
  },
  response: {
numFound: 27,
start: 0,
docs: [
  {
codice_produttore_s: DK00150020,
codice_s: 5.BAT.27407,
id: 27407,
marchio: GO PRO,
barcode_interno_s: 185323000958,
prezzo_acquisto_d: 16.12,
data_aggiornamento_dt: 2012-06-21T00:00:00Z,
descrizione: BATTERIA GO PRO HERO ,
prezzo_vendita_d: 39.9,
categoria: Batterie,
_version_: 1491583424191791000
  },

 

]
  },
  spellcheck: {
suggestions: [
  go pro,
  {
numFound: 1,
startOffset: 0,
endOffset: 6,
origFreq: 433,
suggestion: [
  {
word: gopro,
freq: 2
  }
]
  },
  correctlySpelled,
  false,
  collation,
  [
collationQuery,
gopro,
hits,
3,
misspellingsAndCorrections,
[
  go pro,
  gopro
]
  ]
]
  }
}

While querying for gopro is not:

{
  responseHeader: {
status: 0,
QTime: 6,
params: {
  q: gopro,
  indent: true,
  fq: marchio:\GO PRO\,
  rows: 1,
  wt: json,
  spellcheck.extendedResults: true,
  _: 1422485629480
}
  },
  response: {
numFound: 3,
start: 0,
docs: [
  {
codice_produttore_s: DK0030010,
codice_s: 5.VID.39163,
id: 38814,
marchio: GO PRO,
barcode_interno_s: 818279012477,
prezzo_acquisto_d: 150.84,
data_aggiornamento_dt: 2014-12-24T00:00:00Z,
descrizione: VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM,
prezzo_vendita_d: 219,
categoria: Fotografia,
_version_: 1491583425479442400
  },

]
  },
  spellcheck: {
suggestions: [
  gopro,
  {
numFound: 1,
startOffset: 0,
endOffset: 5,
origFreq: 2,
suggestion: [
  {
word: giro,
freq: 6
  }
]
  },
  correctlySpelled,
  false
]
  }
}

---

I'd like go pro as a suggestion for gopro too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: New UI for SOLR-based projects

2015-01-30 Thread Shawn Heisey
On 1/30/2015 1:07 PM, Roman Chyla wrote:
 There exists a new open-source implementation of a search interface for
 SOLR. It is written in Javascript (using Backbone), currently in version
 v1.0.19 - but new features are constantly coming. Rather than describing it
 in words, please see it in action for yourself at http://ui.adslabs.org -
 I'd recommend exploring facets, the query form, and visualizations.
 
 The code lives at: http://github.com/adsabs/bumblebee

I have no wish to trivialize the work you've done.  I haven't looked
into the code, but a high-level glance at the documentation suggests
that you've put a lot of work into it.

I do however have a strong caveat for your users.  I'm the guy holding
the big sign that says the end is near to anyone who will listen!

By itself, this is an awesome tool for prototyping, but without some
additional expertise and work, there are severe security implications.

If this gets used for a public Internet facing service, the Solr server
must be accessible from the end user's machine, which might mean that it
must be available to the entire Internet.

If the Solr server is not sitting behind some kind of intelligent proxy
that can detect and deny aattempts to access certain parts of the Solr
API, then Solr will be wide open to attack.  A knowledgeable user that
has unfiltered access to a Solr server will be able to completely delete
the index, change any piece of information in the index, or send denial
of service queries that will make it unable to respond to legitimate
traffic.

Setting up such a proxy is not a trivial task.  I know that some people
have done it, but so far I have not seen anyone share those
configurations.  Even with such a proxy, it might still be possible to
easily send denial of service queries.

I cannot find any information in your README or the documentation links
that mentions any of these concerns.  I suspect that many who
incorporate this client into their websites will be unaware that their
setup may be insecure, or how to protect it.

Thanks,
Shawn



Re: Calling custom request handler with data import

2015-01-30 Thread Dan Davis
The Data Import Handler isn't pushing data into the /update request
handler.   However, Data Import Handler can be extended with transformers.
  Two such transformers are the TemplateTransformer and the
ScriptTransformer.   It may be possible to get a script function to load
your custom Java code.   You could also just write a
StandfordNerTransformer.

Hope this helps,

Dan

On Fri, Jan 30, 2015 at 9:07 AM, vineet yadav vineet.yadav.i...@gmail.com
wrote:

 Hi,
 I am using data import handler to import data from mysql, and I want to
 identify name entities from it. So I am using following example(
 http://www.searchbox.com/named-entity-recognition-ner-in-solr/). where I
 am
 using stanford ner to identify name entities. I am using following
 requesthandler

 requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
  str name=configdata-import.xml/str
  /lst
 /requestHandler

 for importing data from mysql and

 requestHandler name=/ner class=com.searchbox.ner.NerHandler /
   updateRequestProcessorChain name=mychain 
processor class=com.searchbox.ner.NerProcessorFactory 
  lst name=queryFields
str name=queryFieldcontent/str
  /lst
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain
  requestHandler name=/update class=solr.UpdateRequestHandler
lst name=defaults
  str name=update.chainmychain/str
/lst
   /requestHandler

 for identifying name entities.NER request handler identifies name entities
 from content field, but store extracted entities in solr fields.

 NER request handler was working when I am using nutch with solr. But When I
 am importing data from mysql, ner request handler is not invoked. So
 entities are not stored in solr for imported documents. Can anybody tell me
 how to call custom request handler in data import handler.

 Otherwise if I can invoke ner request handler externally, so that it can
 index person, organization and location in solr for imported document. It
 is also fine. Any suggestion are welcome.

 Thanks
 Vineet Yadav



Re: New UI for SOLR-based projects

2015-01-30 Thread Roman Chyla
I gather from your comment that I should update readme, because there could
be people who would be inclined to use bumblebee development server in
production: Beware those who enter through this gate! :-)

Your point, that so far you haven't seen anybody share their middle layer
can be addressed by pointing to the following projects:

https://github.com/adsabs/solr-service
https://github.com/adsabs/adsws

These are also open source, we use them in production, and have oauth,
microservices, rest, and rate limits, we know it is not perfect, but what
is? ;-) pull requests welcome!

Thanks,

Roman
On 30 Jan 2015 21:51, Shawn Heisey apa...@elyograg.org wrote:

 On 1/30/2015 1:07 PM, Roman Chyla wrote:
  There exists a new open-source implementation of a search interface for
  SOLR. It is written in Javascript (using Backbone), currently in version
  v1.0.19 - but new features are constantly coming. Rather than describing
 it
  in words, please see it in action for yourself at http://ui.adslabs.org
 -
  I'd recommend exploring facets, the query form, and visualizations.
 
  The code lives at: http://github.com/adsabs/bumblebee

 I have no wish to trivialize the work you've done.  I haven't looked
 into the code, but a high-level glance at the documentation suggests
 that you've put a lot of work into it.

 I do however have a strong caveat for your users.  I'm the guy holding
 the big sign that says the end is near to anyone who will listen!

 By itself, this is an awesome tool for prototyping, but without some
 additional expertise and work, there are severe security implications.

 If this gets used for a public Internet facing service, the Solr server
 must be accessible from the end user's machine, which might mean that it
 must be available to the entire Internet.

 If the Solr server is not sitting behind some kind of intelligent proxy
 that can detect and deny aattempts to access certain parts of the Solr
 API, then Solr will be wide open to attack.  A knowledgeable user that
 has unfiltered access to a Solr server will be able to completely delete
 the index, change any piece of information in the index, or send denial
 of service queries that will make it unable to respond to legitimate
 traffic.

 Setting up such a proxy is not a trivial task.  I know that some people
 have done it, but so far I have not seen anyone share those
 configurations.  Even with such a proxy, it might still be possible to
 easily send denial of service queries.

 I cannot find any information in your README or the documentation links
 that mentions any of these concerns.  I suspect that many who
 incorporate this client into their websites will be unaware that their
 setup may be insecure, or how to protect it.

 Thanks,
 Shawn




Re: Calling custom request handler with data import

2015-01-30 Thread Dan Davis
You know, another thing you can do is just write some Java/perl/whatever to
pull data out of your database and push it to Solr.Not as convenient
for development perhaps, but it has more legs in the long run.   Data
Import Handler does not easily multi-thread.

On Sat, Jan 31, 2015 at 12:34 AM, Dan Davis dansm...@gmail.com wrote:

 The Data Import Handler isn't pushing data into the /update request
 handler.   However, Data Import Handler can be extended with transformers.
   Two such transformers are the TemplateTransformer and the
 ScriptTransformer.   It may be possible to get a script function to load
 your custom Java code.   You could also just write a
 StandfordNerTransformer.

 Hope this helps,

 Dan

 On Fri, Jan 30, 2015 at 9:07 AM, vineet yadav vineet.yadav.i...@gmail.com
  wrote:

 Hi,
 I am using data import handler to import data from mysql, and I want to
 identify name entities from it. So I am using following example(
 http://www.searchbox.com/named-entity-recognition-ner-in-solr/). where I
 am
 using stanford ner to identify name entities. I am using following
 requesthandler

 requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
  str name=configdata-import.xml/str
  /lst
 /requestHandler

 for importing data from mysql and

 requestHandler name=/ner class=com.searchbox.ner.NerHandler /
   updateRequestProcessorChain name=mychain 
processor class=com.searchbox.ner.NerProcessorFactory 
  lst name=queryFields
str name=queryFieldcontent/str
  /lst
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain
  requestHandler name=/update class=solr.UpdateRequestHandler
lst name=defaults
  str name=update.chainmychain/str
/lst
   /requestHandler

 for identifying name entities.NER request handler identifies name entities
 from content field, but store extracted entities in solr fields.

 NER request handler was working when I am using nutch with solr. But When
 I
 am importing data from mysql, ner request handler is not invoked. So
 entities are not stored in solr for imported documents. Can anybody tell
 me
 how to call custom request handler in data import handler.

 Otherwise if I can invoke ner request handler externally, so that it can
 index person, organization and location in solr for imported document. It
 is also fine. Any suggestion are welcome.

 Thanks
 Vineet Yadav





role of the wiki and cwiki

2015-01-30 Thread Dan Davis
I've been thinking of https://wiki.apache.org/solr/ as the Old Wiki and
https://cwiki.apache.org/confluence/display/solr as the New Wiki.

I guess that's the wrong way to think about it - Confluence is being used
for the Solr Reference Guide, and MoinMoin is being used as a wiki.

Is this the correct understanding?


Re: role of the wiki and cwiki

2015-01-30 Thread Shawn Heisey
On 1/30/2015 10:59 PM, Dan Davis wrote:
 I've been thinking of https://wiki.apache.org/solr/ as the Old Wiki and
 https://cwiki.apache.org/confluence/display/solr as the New Wiki.
 
 I guess that's the wrong way to think about it - Confluence is being used
 for the Solr Reference Guide, and MoinMoin is being used as a wiki.
 
 Is this the correct understanding?

Yes, your understanding is correct.

Because the Solr Reference Guide is released as official documentation
in PDF form shortly after each new minor Solr version, only committers
have the ability to edit the confluence wiki.  Anyone can comment on it,
so we do have a feedback mechanism.

Anyone can edit the MoinMoin wiki, after they ask for edit rights and
provide their username for the Solr portion of that wiki.  Asking for
edit permission is typically done via this mailing list or the IRC channel.

Because they have different potential authors, the two systems now serve
different purposes.

There are still some pages on the MoinMoin wiki that contain
documentation that should be in the reference guide, but isn't.

The MoinMoin wiki is still useful, as a place where users can collect
information that is useful to others, but doesn't qualify as official
documentation, or perhaps simply hasn't been verified.  I believe this
means that a lot of information which has been migrated into the
reference guide will eventually be removed from MoinMoin.

Thanks,
Shawn



Re: role of the wiki and cwiki

2015-01-30 Thread Anshum Gupta
Hi Dan,

I would say that the wiki is old and dated and that gap is only increasing.
I would highly recommend everyone to use the Reference Guide instead of the
wiki, unless there's something that they can't find. In case you are unable
to find something on the wiki, it'd be good to comment on confluence about
the missing content, better still, contribute :-).

Now, about the reference guide. The link you've shared above is always the
next version of the ref guide e.g. right now, all the content there is
w.r.t. 5.0 and is unreleased. The best way to use the reference guide is to
download the ref guide for the version you're using.


On Fri, Jan 30, 2015 at 9:59 PM, Dan Davis dansm...@gmail.com wrote:

 I've been thinking of https://wiki.apache.org/solr/ as the Old Wiki and
 https://cwiki.apache.org/confluence/display/solr as the New Wiki.

 I guess that's the wrong way to think about it - Confluence is being used
 for the Solr Reference Guide, and MoinMoin is being used as a wiki.

 Is this the correct understanding?




-- 
Anshum Gupta
http://about.me/anshumgupta


Re: New UI for SOLR-based projects

2015-01-30 Thread Lukáš Vlček
Nice work Roman!

Lukas

On Sat, Jan 31, 2015 at 4:36 AM, Roman Chyla roman.ch...@gmail.com wrote:

 I gather from your comment that I should update readme, because there could
 be people who would be inclined to use bumblebee development server in
 production: Beware those who enter through this gate! :-)

 Your point, that so far you haven't seen anybody share their middle layer
 can be addressed by pointing to the following projects:

 https://github.com/adsabs/solr-service
 https://github.com/adsabs/adsws

 These are also open source, we use them in production, and have oauth,
 microservices, rest, and rate limits, we know it is not perfect, but what
 is? ;-) pull requests welcome!

 Thanks,

 Roman
 On 30 Jan 2015 21:51, Shawn Heisey apa...@elyograg.org wrote:

  On 1/30/2015 1:07 PM, Roman Chyla wrote:
   There exists a new open-source implementation of a search interface for
   SOLR. It is written in Javascript (using Backbone), currently in
 version
   v1.0.19 - but new features are constantly coming. Rather than
 describing
  it
   in words, please see it in action for yourself at
 http://ui.adslabs.org
  -
   I'd recommend exploring facets, the query form, and visualizations.
  
   The code lives at: http://github.com/adsabs/bumblebee
 
  I have no wish to trivialize the work you've done.  I haven't looked
  into the code, but a high-level glance at the documentation suggests
  that you've put a lot of work into it.
 
  I do however have a strong caveat for your users.  I'm the guy holding
  the big sign that says the end is near to anyone who will listen!
 
  By itself, this is an awesome tool for prototyping, but without some
  additional expertise and work, there are severe security implications.
 
  If this gets used for a public Internet facing service, the Solr server
  must be accessible from the end user's machine, which might mean that it
  must be available to the entire Internet.
 
  If the Solr server is not sitting behind some kind of intelligent proxy
  that can detect and deny aattempts to access certain parts of the Solr
  API, then Solr will be wide open to attack.  A knowledgeable user that
  has unfiltered access to a Solr server will be able to completely delete
  the index, change any piece of information in the index, or send denial
  of service queries that will make it unable to respond to legitimate
  traffic.
 
  Setting up such a proxy is not a trivial task.  I know that some people
  have done it, but so far I have not seen anyone share those
  configurations.  Even with such a proxy, it might still be possible to
  easily send denial of service queries.
 
  I cannot find any information in your README or the documentation links
  that mentions any of these concerns.  I suspect that many who
  incorporate this client into their websites will be unaware that their
  setup may be insecure, or how to protect it.
 
  Thanks,
  Shawn
 
 



Re: Does DocValues improve Grouping performance ?

2015-01-30 Thread Joel Bernstein
A few questions so we can better understand the scale of grouping you're
trying to accomplish:

How many distinct groups do you typically have in a search result?

How many distinct groups are there in the field you are grouping on?

How many results are you trying to group in a query?

Joel Bernstein
Search Engineer at Heliosearch

On Fri, Jan 30, 2015 at 4:10 PM, Cario, Elaine 
elaine.ca...@wolterskluwer.com wrote:

 Hi Shamik,

 We use DocValues for grouping, and although I have nothing to compare it
 to (we started with DocValues), we are also seeing similar poor results as
 you: easily 60% overhead compared to non-group queries.  Looking around for
 some solution, no quick fix is presenting itself unfortunately.
 CollapsingQParserPlugin also is too limited for our needs.

 -Original Message-
 From: Shamik Bandopadhyay [mailto:sham...@gmail.com]
 Sent: Thursday, January 15, 2015 6:02 PM
 To: solr-user@lucene.apache.org
 Subject: Does DocValues improve Grouping performance ?

 Hi,

Does use of DocValues provide any performance improvement for Grouping ?
 I' looked into the blog which mentions improving Grouping performance
 through DocValues.

 https://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/

 Right now, Group by queries (which I can't sadly avoid) has become a huge
 bottleneck. It has an overhead of 60-70% compared to the same query san
 group by. Unfortunately, I'm not able to be CollapsingQParserPlugin as it
 doesn't have a support similar to group.facet feature.

 My understanding on DocValues is that it's intended for faceting and
 sorting. Just wondering if anyone have tried DocValues for Grouping and saw
 any improvements ?

 -Thanks,
 Shamik



AW: AW: AW: CoreContainer#createAndLoad, existing cores not loaded

2015-01-30 Thread Clemens Wyss DEV
I looked into sources of CoreAdminHandler#handleCreateAction
...
  SolrCore core = coreContainer.create(dcore);
  
  // only write out the descriptor if the core is successfully created
  coreContainer.getCoresLocator().create(coreContainer, dcore);
...

I was missing the   coreContainer.getCoresLocator().create(coreContainer, 
dcore);
When doing the two calls:
a) Core.properties is being created 
AND 
b) the cores are being loaded upon container-startup ;)
:-) 

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Freitag, 30. Januar 2015 07:38
An: solr-user@lucene.apache.org
Betreff: AW: AW: AW: CoreContainer#createAndLoad, existing cores not loaded

 The recommendation these days is to NOT use the embedded server
We would love to, as it is clear that this is not the Solr-way to go. The 
reason for us building upon EmbeddedSolrServer is, we have more than 150sites, 
each with ist own index (core). If we'd go client server then we could no 
easily update the solr server(s) without also updating all clients (i.e. the 
150 sites) at same time. And having a dedicated Solr server for every 
client/site is not really an option, is it?

Or can for example a 4.10.3 client talk to a Solr 5/6 Server? Also when 
updating the Solr server, doesn't that also require a re-index of all data as 
the Luncene-storage format might have changed?

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org]
Gesendet: Donnerstag, 29. Januar 2015 20:30
An: solr-user@lucene.apache.org
Betreff: Re: AW: AW: CoreContainer#createAndLoad, existing cores not loaded

On 1/29/2015 10:15 AM, Clemens Wyss DEV wrote:
 to put your solr home inside the extracted WAR
 We are NOT using war's
 
 coreRootDirectory
 I don't have this property in my sorl.xml
 
 If there will only be core.properties files in that cores directory
 Again, I see no core.properties file. I am creating my cores through 
 CoreContainer.createCore( CordeDescriptor). The folder(s) are created 
 but  no core.properties file

I am pretty clueless when it comes to the embedded server, but if you are 
creating the cores in the java code every time you create the container, I bet 
what I'm telling you doesn't apply at all.  The solr.xml file may not even be 
used.

The recommendation these days is to NOT use the embedded server.  There are too 
many limitations and it doesn't receive as much user testing as the webapp.  
Start Solr as a separate process and access it over http.
The overhead of http on a LAN is minimal, and over localhost it's almost 
nothing.

To do that, you would just need to change your code to use one of the client 
objects.  That would probably be HttpSolrServer, which is renamed to 
HttpSolrClient in 5.0.  They share the same parent object as 
EmbeddedSolrServer.  Most of the relevant methods used come from the parent 
class, so you would need very few code changes.

Thanks,
Shawn



Re: solrj returning no results but curl can get them

2015-01-30 Thread Dmitry Kan
Hi,

Some sanity checking: does the solr server base url in the code match the
one you use with curl?

What if you curl against http://myserver/myapp/
http://myserver/myapp/myproduct%5C ? http://myserver/myapp/myproduct%5C

On Fri, Jan 30, 2015 at 5:58 AM, S L sol.leder...@gmail.com wrote:

 I'm stumped. I've got some solrj 3.6.1 code that works fine against three
 of
 my request handlers but not the fourth. The very odd thing is that I have
 no
 trouble retrieving results with curl against all of the result handlers.

 My solrj code sets some parameters:

 ModifiableSolrParams params = new ModifiableSolrParams();

 params.set(fl,*,score);
 params.set(rows,500);
 params.set(qt,/+product);
 params.set(hl, on);
 params.set(hl.fl, title snippet);
 params.set(hl.fragsize,50);
 params.set(hl.simple.pre,);
 params.set(hl.simple.post,);

 queryString = ( + queryString + s[s.length-1] + );

 I have various request handlers that key off of the product value. I'll
 call
 the one that doesn't work myproduct.

 I send the parameter string to catalina.out for debugging:

 System.out.println(params.toString());

 I get this:



 fl=*%2Cscorerows=500qt=%2Fmyproducthl=onhl.fl=title+snippethl.fragsize=50
 hl.simple.pre=%3Cspan+class%3D%22hlt%22%3E


 hl.simple.post=%3C%2Fspan%3Eq=title%3A%28brain%29+OR+snippet%3A%28brain%29

 I get no results when I let the solrj code do the search although the code
 works fine with the other three products.

 To convince myself that there is nothing wrong with the data I unencode the
 parameter string and run this command:

 curl http://myserver/myapp/myproduct\


 fl=*,scorerows=500qt=/myproducthl=onhl.fl=title+snippethl.fragsize=50\
hl.simple.pre=span+class=quot;hltquot;hl.simple.post=\
q=title:brain%20OR%20snippet:brain

 It runs just fine.

 How can I debug this?

 Thanks very much.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solrj-returning-no-results-but-curl-can-get-them-tp4183053.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: WordDelimiterFilterFactory and position increment.

2015-01-30 Thread Dmitry Kan
Hi,

Do you use WordDelimiterFilter on query side as well?

On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather modather1...@gmail.com
wrote:

 Hi,

 An insight in the behavior of WordDelimiterFilter will be very helpful.
 Please share your inputs.

 Thanks,
 Modassar

 On Thu, Jan 22, 2015 at 2:54 PM, Modassar Ather modather1...@gmail.com
 wrote:

  Hi,
 
  I am using WordDelimiterFilter while indexing. Parser used is edismax.
  Phrase search is failing for terms like 3d image.
 
  On the analysis page it shows following four tokens for *3d* and there
  positions.
 
  *token  position*
  3d   1
  3 1
  3d   1
  d 2
 
  image 3
 
  Here the token d is at position 2 which per my understanding causes the
  phrase search 3d image fail.
  3d image~1 works fine. Same behavior is present for wi-fi device and
  other few queries starting with token which is tokenized as shown above
 in
  the table.
 
  Kindly help me understand the behavior and let me know how the phrase
  search is possible in such cases without the slop.
 
  Thanks,
  Modassar
 
 
 




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: WordDelimiterFilterFactory and position increment.

2015-01-30 Thread Modassar Ather
Hi,

An insight in the behavior of WordDelimiterFilter will be very helpful.
Please share your inputs.

Thanks,
Modassar

On Thu, Jan 22, 2015 at 2:54 PM, Modassar Ather modather1...@gmail.com
wrote:

 Hi,

 I am using WordDelimiterFilter while indexing. Parser used is edismax.
 Phrase search is failing for terms like 3d image.

 On the analysis page it shows following four tokens for *3d* and there
 positions.

 *token  position*
 3d   1
 3 1
 3d   1
 d 2

 image 3

 Here the token d is at position 2 which per my understanding causes the
 phrase search 3d image fail.
 3d image~1 works fine. Same behavior is present for wi-fi device and
 other few queries starting with token which is tokenized as shown above in
 the table.

 Kindly help me understand the behavior and let me know how the phrase
 search is possible in such cases without the slop.

 Thanks,
 Modassar