multivalued field using DIH

2014-03-27 Thread scallawa
I am using solr 4.7 and am importing data directly from a mysql database
table using the DIH.  I have a column that looks like similar to this below
in that it has multiple values in the database.

material  cotton polyester blend rayon

I would like the data to look like the following when imported.

str name=materialcotton/str
str name=materialpolyester blend/str
str name=materialrayon/str.

In other words.  If there is multiple data points for a particular column
and the mapped field is multivalued, create multiple str name fields.  If
there are quotes around multiple words, treat them as one token.  Is this
possible?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/multivalued-field-using-DIH-tp4127297.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.3.1 memory swapping

2014-03-27 Thread Shawn Heisey
On 3/26/2014 10:26 PM, Darrell Burgan wrote:
 Okay well it didn't take long for the swapping to start happening on one of 
 our nodes.  Here is a screen shot of the Solr console:
 
 https://s3-us-west-2.amazonaws.com/panswers-darrell/solr.png
 
 And here is a shot of top, with processes sorted by VIRT:
 
 https://s3-us-west-2.amazonaws.com/panswers-darrell/top.png
 
 As shown, we have used up more than 25% of the swap space, over 1GB, even 
 though there is 16GB of OS RAM available, and the Solr JVM has been allocated 
 only 10GB. Further, we're only consuming 1.5/4GB of the 10GB of JVM heap.
 
 Top shows that the Solr process 21582 is using 2.4GB resident but has a 
 virtual size of 82.4GB. Presumably that virtual size is due to the memory 
 mapped file. The other Java process 27619 is Zookeeper.
 
 So my question remains - why did we use any swap space at all? Doesn't seem 
 like we're experiencing memory pressure at the moment ... I'm confused.  :-)

The virtual memory value is indeed that large because of the mmapped file.

There is definitely something wrong here.  I don't know whether it's
Java, RHEL, or something strange with the S3 virtual machine, possibly a
bad interaction with the older kernel.  With your -Xmx value, Java
should never use more than about 10.5 GB of physical memory, and the top
output indicates that it's only using 2.4GB of memory.  13GB is used by
the OS disk cache.

You might notice that I'm not mentioning Solr in the list of possible
problems.  This is because an unmodified Solr install only utilizes the
Java heap, so it's Java that is in charge of allocating memory from the
operating system.

Here is a script that will tell you what's using swap and how much.
This will let you be absolutely sure about whether or not Java is the
problem child:

http://stackoverflow.com/a/7180078/2665648

There are instructions in the comments of the script for sorting the output.

The only major thing I saw in your JVM config (aside from perhaps
reducing the max heap) that I would change is the garbage collector
tuning.  I'm the original author mentioned in this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems



Here's a screenshot from my dev solr server, where you can see that
there is zero swap usage:

https://www.dropbox.com/s/mftgi3q2hn7w9qp/solr-centos6-top.png

This is a baremetal server with 16GB of RAM, running CentOS 6.5 and a
pre-release snapshot of Solr 4.7.1.  With an Intel Xeon X3430, I'm
pretty sure the processor architecture is NUMA, but the motherboard only
has one CPU slot, so it's only got one NUMA node.  As you can see by my
virtual memory value, I have a lot more index data on this machine than
you have on yours.  My heap is 7GB.  The other three java processes that
you can see running are in-house software related to Solr.

Performance is fairly slow with that much index and so little disk
cache, but it's a dev server.  The production environment has plenty of
RAM to cache the entire index.

Thanks,
Shawn



Re: multivalued field using DIH

2014-03-27 Thread Shawn Heisey
On 3/27/2014 12:49 AM, scallawa wrote:
 I am using solr 4.7 and am importing data directly from a mysql database
 table using the DIH.  I have a column that looks like similar to this below
 in that it has multiple values in the database.
 
 material  cotton polyester blend rayon
 
 I would like the data to look like the following when imported.
 
 str name=materialcotton/str
 str name=materialpolyester blend/str
 str name=materialrayon/str.
 
 In other words.  If there is multiple data points for a particular column
 and the mapped field is multivalued, create multiple str name fields.  If
 there are quotes around multiple words, treat them as one token.  Is this
 possible?

In a direct manner, I do not think so.  If the input data were simply
space separated and didn't have the quoted string that includes a space,
you could use the RegexTransformer in DIH and do a simple 'splitBy' on
the field.

If you know how to write a regex that would only match the spaces
outside of the quotes, you could still use that method.  I have no idea
how to do that.

Alternatively, you can write a custom update processor for Solr that
knows how to break up the input, remove the original field, and reinsert
it with the multiple values.  Custom update processors are not very
difficult if you already know how to write a program, but it's not trivial.

If the database actually has multiple values in a table rather than the
space separation, there are two possibilities: 1) Use nested DIH
entities, which makes a query to the database for every document. 2) Use
a JOIN with GROUP_CONCAT to construct a value with a delimiter other
than space - something that won't ever show up in the actual data.  You
can then use the splitBy method that I already mentioned.

You'd need to consult a database expert for help with JOIN and GROUP_CONCAT.

Thanks,
Shawn



Re: FE Integration with JSON

2014-03-27 Thread Shawn Heisey
On 3/27/2014 2:11 AM, Bernhard Prange wrote:
 I am looking for a simple solution to construct a frontend search. The
 search provider just gave me a JSON Url.
 
 Anybody has a simple guide or some snippets for that?

There are no details here.  What specifically do you need help with?
Presumably you want help with Solr because you're on the solr-user
mailing list, but the only technology you've mentioned is JSON.

Let's say that you are wanting to add search to System X.  The first
question that comes to mind is:  What programming language is System X
written in?  The answer will make a big difference in where the
discussion goes.

Thanks,
Shawn



AW: Indexing parts of an HTML file differently

2014-03-27 Thread Michael Clivot
Thanks for your answer Jack.
@Gora:

 How are you fetching the HTML content, and indexing it into Solr?

We are using SolR with the OpenText Delivery Server. The Delivery Server 
generated HTML representations of the published pages and writes them to the 
directory, which is used by solr to get data content.

 It is probably best to handle this requirement at that point. Haven't used 
 Nutch ( http://nutch.apache.org/) recently, but you might be able to use it 
 for this.

Do you mean the web crawler way? From the first view, it fits us not very good. 
In this case we need to implement ourselves the OpenText Search layer. 
Theoretically, we can try to teach DeliveryServer to understand external 
indexes. But the crawling itself is not the preferred solution - it is not so 
responsive, as the DS-way; in case of existing authorization restrictions, it 
should be many crawler users for every role; etc...

-Ursprüngliche Nachricht-
Von: Gora Mohanty [mailto:g...@mimirtech.com] 
Gesendet: Dienstag, 25. März 2014 11:32
An: solr-user@lucene.apache.org
Betreff: Re: Indexing parts of an HTML file differently

On 25 March 2014 15:59, Michael Clivot cli...@netmedia.de wrote:
 Hello,

 I have the following issue and need help:

 One HTML file has different parts for different countries.
 For example:

 !-- Country: FR, BE ---
 
 Address for France and Benelux
 
 !-- Country End --
 !-- Country: CH --
 
 Address for Switzerland
 
 !-- Country End --

 Depending on a parameter, I show or hide the parts on the website 
 Logically, all parts are in the index and therefore all items are found by 
 SolR.
 My question is: how can I have only the items for the current country in my 
 result list?

How are you fetching the HTML content, and indexing it into Solr?
It is probably best to handle this requirement at that point. Haven't used 
Nutch ( http://nutch.apache.org/ ) recently, but you might be able to use it 
for this.

Regards,
Gora


Re: FE Integration with JSON

2014-03-27 Thread Bernhard Prange

right :) Thanks Shawn.

It is the Frontend of a Webpage. (HTML5).
The search provider offers me an URL where I get a query result of solr 
(in JSON).

That's what I have.

What I need is a How to for the UI rendering of this file. (And the 
search query functionality).

The SOLR Server is on a remote location.







Am 27.03.2014 09:25, schrieb Shawn Heisey:

On 3/27/2014 2:11 AM, Bernhard Prange wrote:

I am looking for a simple solution to construct a frontend search. The
search provider just gave me a JSON Url.

Anybody has a simple guide or some snippets for that?

There are no details here.  What specifically do you need help with?
Presumably you want help with Solr because you're on the solr-user
mailing list, but the only technology you've mentioned is JSON.

Let's say that you are wanting to add search to System X.  The first
question that comes to mind is:  What programming language is System X
written in?  The answer will make a big difference in where the
discussion goes.

Thanks,
Shawn






Re: FE Integration with JSON

2014-03-27 Thread Alexandre Rafalovitch
Still not enough details. But let me try to understand:

There is a third party provider. They are exposing Solr directly to
the internet and you have a particular query that returns Solr results
in JSON form.

You want to know if there are libraries/components that will know how
to parse that Solr JSON result and present it on a screen.

Is that about right? If so, there is one big issue to resolve before
wasting time on anything else.

Specifically, Solr should not be exposed directly to the web as it is
not built for security. Unless this third party provider is
specifically building some sort of hardened-hosted-Solr service, in
which case I am very curious to know who they are. Usually, there is a
middle-ware implementation that talks to Solr (like to a database) and
then sends domain-specific results to the client.

There is also a question of what features you are using. E.g. Facets?
Folding? Auto-complete? Etc.

Regards,
   Alex.


Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Mar 27, 2014 at 3:39 PM, Bernhard Prange
m...@bernhard-prange.de wrote:
 right :) Thanks Shawn.

 It is the Frontend of a Webpage. (HTML5).
 The search provider offers me an URL where I get a query result of solr (in
 JSON).
 That's what I have.

 What I need is a How to for the UI rendering of this file. (And the search
 query functionality).
 The SOLR Server is on a remote location.







 Am 27.03.2014 09:25, schrieb Shawn Heisey:

 On 3/27/2014 2:11 AM, Bernhard Prange wrote:

 I am looking for a simple solution to construct a frontend search. The
 search provider just gave me a JSON Url.

 Anybody has a simple guide or some snippets for that?

 There are no details here.  What specifically do you need help with?
 Presumably you want help with Solr because you're on the solr-user
 mailing list, but the only technology you've mentioned is JSON.

 Let's say that you are wanting to add search to System X.  The first
 question that comes to mind is:  What programming language is System X
 written in?  The answer will make a big difference in where the
 discussion goes.

 Thanks,
 Shawn





Re: Indexing parts of an HTML file differently

2014-03-27 Thread Alexandre Rafalovitch
Can you get Delivery Server to generate Solr-style XML or JSON update
file? Might be easier than generating and then re-parsing HTML?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Mar 27, 2014 at 3:28 PM, Michael Clivot cli...@netmedia.de wrote:
 Thanks for your answer Jack.
 @Gora:

 How are you fetching the HTML content, and indexing it into Solr?

 We are using SolR with the OpenText Delivery Server. The Delivery Server 
 generated HTML representations of the published pages and writes them to the 
 directory, which is used by solr to get data content.

 It is probably best to handle this requirement at that point. Haven't used 
 Nutch ( http://nutch.apache.org/) recently, but you might be able to use it 
 for this.

 Do you mean the web crawler way? From the first view, it fits us not very 
 good. In this case we need to implement ourselves the OpenText Search layer. 
 Theoretically, we can try to teach DeliveryServer to understand external 
 indexes. But the crawling itself is not the preferred solution - it is not so 
 responsive, as the DS-way; in case of existing authorization restrictions, it 
 should be many crawler users for every role; etc...

 -Ursprüngliche Nachricht-
 Von: Gora Mohanty [mailto:g...@mimirtech.com]
 Gesendet: Dienstag, 25. März 2014 11:32
 An: solr-user@lucene.apache.org
 Betreff: Re: Indexing parts of an HTML file differently

 On 25 March 2014 15:59, Michael Clivot cli...@netmedia.de wrote:
 Hello,

 I have the following issue and need help:

 One HTML file has different parts for different countries.
 For example:

 !-- Country: FR, BE ---
 
 Address for France and Benelux
 
 !-- Country End --
 !-- Country: CH --
 
 Address for Switzerland
 
 !-- Country End --

 Depending on a parameter, I show or hide the parts on the website
 Logically, all parts are in the index and therefore all items are found by 
 SolR.
 My question is: how can I have only the items for the current country in my 
 result list?

 How are you fetching the HTML content, and indexing it into Solr?
 It is probably best to handle this requirement at that point. Haven't used 
 Nutch ( http://nutch.apache.org/ ) recently, but you might be able to use it 
 for this.

 Regards,
 Gora


dih data-config.xml onImportEnd event

2014-03-27 Thread Andreas Owen
i would like to call a url after the import is finished whith the event  
document onImportEnd=. how can i do this?


Re: Facetting by field then query

2014-03-27 Thread Alvaro Cabrerizo
I don't think you can do it, as pivot
facetinghttp://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting
doesn't
let you use facet queries.  The closer query I can imagine is:


   - q=sentence:bar OR sentence:foo
   - facet=true
   - facet.pivot=media_id,sentence

At least the q will make faceting only by those documents containing foo
and bar but depending on the size of sentence field you cant get a huge
response.

Hope it helps.


On Wed, Mar 26, 2014 at 11:12 PM, David Larochelle 
dlaroche...@cyber.law.harvard.edu wrote:

 I have the following schema

 field name=id type=string indexed=true stored=true required=true
 multiValued=false /
 field name=media_id type=int indexed=true stored=true
 required=false multiValued=false /
 field name=sentence  type=text_general indexed=true stored=true
 termVectors=true termPositions=true termOffsets=true /


 I'd like to be able to facet by a field and then by queries. i.e.


 facet_fields: {media_id: [1:{ sentence:foo: 102410, sentence:bar:
 29710}2:
 { sentence:foo: 600, sentence:bar: 220}
 3:
 { sentence:foo: 80, sentence:bar: 2330}]}


 However, when I try:

 http://localhost:8983/solr/collection1/select?q=*:*wt=jsonindent=truefacet=truefacet.query=sentence%3A%foofacet.query=sentence%3Abarfacet.field=media_id

 the facet counts for the queries and media_id are listed separately rather
 than hierarchically.

 I realize that I could use 2 separate requests and programmatically combine
 the results but would much prefer to use a single Solr request.

 Is there any way to go this in Solr?

 Thanks in advance,


 David



Re: dih data-config.xml onImportEnd event

2014-03-27 Thread Alexandre Rafalovitch
I don't think there is one like that.

But you might be able to use a custom UpdateRequestProcessor? Or a
postCommit hook in solrconfig.xml

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Mar 27, 2014 at 3:58 PM, Andreas Owen a.o...@gmx.net wrote:
 i would like to call a url after the import is finished whith the event
 document onImportEnd=. how can i do this?


Re: dih data-config.xml onImportEnd event

2014-03-27 Thread Ahmet Arslan
Hi Andres,

Here is a snippet you can use for starting point.

import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.EventListener;

public class MyEventListener implements EventListener {
  public void onEvent(Context ctx) {

    if (Context.DELTA_DUMP.equals(ctx.currentProcess())) {
      // do something call a URL
    }

  }
}

http://wiki.apache.org/solr/DataImportHandler#EventListeners


Ahmet



On Thursday, March 27, 2014 11:08 AM, Alexandre Rafalovitch 
arafa...@gmail.com wrote:
I don't think there is one like that.

But you might be able to use a custom UpdateRequestProcessor? Or a
postCommit hook in solrconfig.xml

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency



On Thu, Mar 27, 2014 at 3:58 PM, Andreas Owen a.o...@gmx.net wrote:
 i would like to call a url after the import is finished whith the event
 document onImportEnd=. how can i do this?



Re: dih data-config.xml onImportEnd event

2014-03-27 Thread Alexandre Rafalovitch
Oops. Ignore my email. I learnt something today that I have not seen
anybody else use.

Are there live open-source examples of the DIH EventListeners?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Mar 27, 2014 at 4:11 PM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi Andres,

 Here is a snippet you can use for starting point.

 import org.apache.solr.handler.dataimport.Context;
 import org.apache.solr.handler.dataimport.EventListener;

 public class MyEventListener implements EventListener {
   public void onEvent(Context ctx) {

 if (Context.DELTA_DUMP.equals(ctx.currentProcess())) {
   // do something call a URL
 }

   }
 }

 http://wiki.apache.org/solr/DataImportHandler#EventListeners


 Ahmet



 On Thursday, March 27, 2014 11:08 AM, Alexandre Rafalovitch 
 arafa...@gmail.com wrote:
 I don't think there is one like that.

 But you might be able to use a custom UpdateRequestProcessor? Or a
 postCommit hook in solrconfig.xml

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr 
 proficiency



 On Thu, Mar 27, 2014 at 3:58 PM, Andreas Owen a.o...@gmx.net wrote:
 i would like to call a url after the import is finished whith the event
 document onImportEnd=. how can i do this?



Re: MergingSolrIndexes not supported by SolrCloud?why?

2014-03-27 Thread rulinma
I use hdfs to test, that not work.
I tried: 
  (1) **/indexDir=hdfs://ip/solr/sample/data/index
  (2) **/indexDir=/solr/sample/data/index
not work well.

I also try:
  (3) **/srcCore=sample
not work well.

can give me some success sample.
3x!

I insert data, hdfs appear index files, that is ok, but mergeindex is not
work well.
solr 4.4 and cloudear hdfs.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/MergingSolrIndexes-not-supported-by-SolrCloud-why-tp4127111p4127350.html
Sent from the Solr - User mailing list archive at Nabble.com.


facet doesnt display all possibilities after selecting one

2014-03-27 Thread Andreas Owen
when i select a facet in thema_f all the others in the group disapear  
but the other facets keep the original findings. it seems like it should  
work. maybe the underscore is the wrong char for the seperator?


example documents in index

 doc
arr name=thema_f
  str1_Produkte/str
/arr
str name=iddms:381/str
/doc
  doc
arr name=thema_f
  str1_Beratung/str
  str1_Beratung_Beratungsportal PK/str
/arr
str name=iddms:2679/str
/doc
  doc
arr name=thema_f
  str1_Beratung/str
  str1_Beratung_Beratungsportal PK/str
/arr
str name=iddms:190/str
/doc



solrconfig.xml

requestHandler name=/select2 class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=defTypesynonym_edismax/str
   str name=synonymstrue/str
   str name=qfplain_text^10 editorschoice^200
title^20 h_*^14
tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
productsegment^5 productgroup^5 contentmanager^5 links^5
last_modified^5 url^5
   /str
   str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str
   str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --

   str name=dftext/str
   str name=fl*,path,score/str
   str name=wtjson/str
   str name=q.opAND/str

   !-- Highlighting defaults --
   str name=hlon/str
   str name=hl.flplain_text,title/str
   str name=hl.fragSize200/str
   str name=hl.simple.prelt;bgt;/str
   str name=hl.simple.postlt;/bgt;/str

!-- lst name=invariants --
str name=faceton/str
str name=facet.mincount1/str
str name=facet.missingfalse/str
str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str
str name=f.inhaltstyp_s.facet.sortindex/str
str name=facet.field{!ex=doctype}doctype/str
str name=f.doctype.facet.sortindex/str
str name=facet.field{!ex=thema_f}thema_f/str
str name=f.thema_f.facet.sortindex/str
str 
name=facet.field{!ex=productsegment_f}productsegment_f/str
str name=f.productsegment_f.facet.sortindex/str
str name=facet.field{!ex=productgroup_f}productgroup_f/str
str name=f.productgroup_f.facet.sortindex/str
str name=facet.field{!ex=author_s}author_s/str
str name=f.author_s.facet.sortindex/str
		str  
name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str

str name=f.sachverstaendiger_s.facet.sortindex/str
str 
name=facet.field{!ex=veranstaltung_s}veranstaltung_s/str
str name=f.veranstaltung_s.facet.sortindex/str
		str  
name=facet.field{!ex=kundensegment_aktive_beratung}kundensegment_aktive_beratung/str

str 
name=f.kundensegment_aktive_beratung.facet.sortindex/str
str name=facet.date{!ex=last_modified}last_modified/str
str name=facet.date.gap+1MONTH/str
str name=facet.date.endNOW/MONTH+1MONTH/str
str name=facet.date.startNOW/MONTH-36MONTHS/str
str name=facet.date.otherafter/str
/lst
/requestHandler




schema.xml

fieldType name=text_thema class=solr.TextField  
positionIncrementGap=100

 !-- analyzer
tokenizer class=solr.PatternTokenizerFactory pattern=_/
/analyzer--

 analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
 /analyzer
 analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/

 /analyzer
/fieldType


dih data-config.xml onImportEnd event

2014-03-27 Thread Andreas Owen

i would like to call a url after the import is finished whith the event
document onImportEnd=. how can i do this?


Re: Facetting by field then query

2014-03-27 Thread David Santamauro


For pivot facets in SolrCloud, see
  https://issues.apache.org/jira/browse/SOLR-2894

Resolution: Unresolved
Fix Version/s 4.8

I am waiting patiently ...

On 03/27/2014 05:04 AM, Alvaro Cabrerizo wrote:

I don't think you can do it, as pivot
facetinghttp://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting
doesn't
let you use facet queries.  The closer query I can imagine is:


- q=sentence:bar OR sentence:foo
- facet=true
- facet.pivot=media_id,sentence

At least the q will make faceting only by those documents containing foo
and bar but depending on the size of sentence field you cant get a huge
response.

Hope it helps.


On Wed, Mar 26, 2014 at 11:12 PM, David Larochelle 
dlaroche...@cyber.law.harvard.edu wrote:


I have the following schema

field name=id type=string indexed=true stored=true required=true
multiValued=false /
field name=media_id type=int indexed=true stored=true
required=false multiValued=false /
field name=sentence  type=text_general indexed=true stored=true
termVectors=true termPositions=true termOffsets=true /


I'd like to be able to facet by a field and then by queries. i.e.


facet_fields: {media_id: [1:{ sentence:foo: 102410, sentence:bar:
29710}2:
{ sentence:foo: 600, sentence:bar: 220}
3:
{ sentence:foo: 80, sentence:bar: 2330}]}


However, when I try:

http://localhost:8983/solr/collection1/select?q=*:*wt=jsonindent=truefacet=truefacet.query=sentence%3A%foofacet.query=sentence%3Abarfacet.field=media_id

the facet counts for the queries and media_id are listed separately rather
than hierarchically.

I realize that I could use 2 separate requests and programmatically combine
the results but would much prefer to use a single Solr request.

Is there any way to go this in Solr?

Thanks in advance,


David







Re: dih data-config.xml onImportEnd event

2014-03-27 Thread Stefan Matheis
I would suggest you read the replies to your last mail (containing the very 
same question) first? 

-Stefan 


On Thursday, March 27, 2014 at 1:56 PM, Andreas Owen wrote:

 i would like to call a url after the import is finished whith the event
 document onImportEnd=. how can i do this?
 
 




Re: facet doesnt display all possibilities after selecting one

2014-03-27 Thread Yonik Seeley
On Thu, Mar 27, 2014 at 8:56 AM, Andreas Owen ao...@swissonline.ch wrote:
 when i select a facet in thema_f all the others in the group disapear

OK, I see you're excluding filters tagged with thema_f when faceting
on the thema_f field.

 str name=facet.field{!ex=thema_f}thema_f/str

Now all you should need to do is tag the right filter with that when
you select the facet.

fq={!tag=thema_f}thema_f:1_Beratung

http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams

-Yonik
http://heliosearch.org - solve Solr GC pauses with off-heap filters
and fieldcache




 but
 the other facets keep the original findings. it seems like it should work.
 maybe the underscore is the wrong char for the seperator?

 example documents in index

  doc
 arr name=thema_f
   str1_Produkte/str
 /arr
 str name=iddms:381/str
 /doc
   doc
 arr name=thema_f
   str1_Beratung/str
   str1_Beratung_Beratungsportal PK/str
 /arr
 str name=iddms:2679/str
 /doc
   doc
 arr name=thema_f
   str1_Beratung/str
   str1_Beratung_Beratungsportal PK/str
 /arr
 str name=iddms:190/str
 /doc



 solrconfig.xml

 requestHandler name=/select2 class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=defTypesynonym_edismax/str
str name=synonymstrue/str
str name=qfplain_text^10 editorschoice^200
 title^20 h_*^14
 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
 productsegment^5 productgroup^5 contentmanager^5 links^5
 last_modified^5 url^5
/str
str name=bq(expiration:[NOW TO *] OR (*:*
 -expiration:*))^6/str
str name=bfdiv(clicks,max(displays,1))^8/str !-- tested
 --

str name=dftext/str
str name=fl*,path,score/str
str name=wtjson/str
str name=q.opAND/str

!-- Highlighting defaults --
str name=hlon/str
str name=hl.flplain_text,title/str
str name=hl.fragSize200/str
str name=hl.simple.prelt;bgt;/str
str name=hl.simple.postlt;/bgt;/str

 !-- lst name=invariants --
 str name=faceton/str
 str name=facet.mincount1/str
 str name=facet.missingfalse/str
 str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str
 str name=f.inhaltstyp_s.facet.sortindex/str
 str name=facet.field{!ex=doctype}doctype/str
 str name=f.doctype.facet.sortindex/str
 str name=facet.field{!ex=thema_f}thema_f/str
 str name=f.thema_f.facet.sortindex/str
 str
 name=facet.field{!ex=productsegment_f}productsegment_f/str
 str name=f.productsegment_f.facet.sortindex/str
 str
 name=facet.field{!ex=productgroup_f}productgroup_f/str
 str name=f.productgroup_f.facet.sortindex/str
 str name=facet.field{!ex=author_s}author_s/str
 str name=f.author_s.facet.sortindex/str
 str
 name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str
 str name=f.sachverstaendiger_s.facet.sortindex/str
 str
 name=facet.field{!ex=veranstaltung_s}veranstaltung_s/str
 str name=f.veranstaltung_s.facet.sortindex/str
 str
 name=facet.field{!ex=kundensegment_aktive_beratung}kundensegment_aktive_beratung/str
 str
 name=f.kundensegment_aktive_beratung.facet.sortindex/str
 str
 name=facet.date{!ex=last_modified}last_modified/str
 str name=facet.date.gap+1MONTH/str
 str name=facet.date.endNOW/MONTH+1MONTH/str
 str
 name=facet.date.startNOW/MONTH-36MONTHS/str
 str name=facet.date.otherafter/str
 /lst
 /requestHandler




 schema.xml

 fieldType name=text_thema class=solr.TextField
 positionIncrementGap=100
  !-- analyzer
 tokenizer class=solr.PatternTokenizerFactory
 pattern=_/
 /analyzer--

  analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
  /analyzer
  analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/

  /analyzer
 /fieldType


Block until replication finishes

2014-03-27 Thread Fermin Silva
Hi,

we are moving to native replication with SOLR 3.5.1.
Because we want to control the replication from another program (a cron
job), we decided to curl the slave to issue a fetchIndex command.

The problem we have is that the curl returns immediately, while the
replication still goes in the background.
We need to know when the replication is done, and then resume the cron job.

Is there a way to block on the replication call until it's done similar to
waitForSearcher=true when committing ?
If not, what other possibilities we have?

Just in case, here is the solrconfig part in the slave (we pass masterUrl
in the curl url)

requestHandler name=/replication class=solr.ReplicationHandler
lst name=slave
  str name=masterUrl/str
/lst
  /requestHandler


Many thanks in advance

-- 
Fermin Silva


Please remove this thread.

2014-03-27 Thread Baruch
Hello Admin,

 Can you please remove this thread 
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/93279
There is no reason to have this thread live. 

Please and thank you.

Baruch!

Logging which client connected to Solr

2014-03-27 Thread Juha Haaga
Hello,

I’m investigating the possibility of logging the username of the client who did 
the search on Solr along with the normal logging information. The username is 
in the basic auth headers of the request, and the access control is managed by 
an Apache instance proxying to Solr. Is there a way to append that information 
to the Solr query log, so that the log would look like this:

INFO  - 2014-03-27 11:16:24.000; org.apache.solr.core.SolrCore; [generic] 
webapp=/solr path=/select params={lots of params} hits=0 status=0 QTime=49 
username=juha

I need to log both username and the query, and if I do it directly in Apache 
then I lose the information about amount of hits and the query time. If I log 
it with Solr then I get query time and hits, but no username. Username logging 
is higher priority requirement than the hits and query time, but I’m looking 
for solution that covers both cases. 

Has anyone implemented this kind of logging scheme, and how would I accomplish 
this? I couldn’t find this as a configuration option.

Regards,
Juha







Re: Logging which client connected to Solr

2014-03-27 Thread Greg Walters
We do something similar and include the server's hostname in solr's response. 
To accomplish this you'll have to write a class that extends 
org.apache.solr.servlet.SolrDispatchFilter and put your custom class in place 
as the SolrRequestFilter in solr's web.xml.

Thanks,
Greg

On Mar 27, 2014, at 8:59 AM, Juha Haaga juha.ha...@codenomicon.com wrote:

 Hello,
 
 I’m investigating the possibility of logging the username of the client who 
 did the search on Solr along with the normal logging information. The 
 username is in the basic auth headers of the request, and the access control 
 is managed by an Apache instance proxying to Solr. Is there a way to append 
 that information to the Solr query log, so that the log would look like this:
 
 INFO  - 2014-03-27 11:16:24.000; org.apache.solr.core.SolrCore; [generic] 
 webapp=/solr path=/select params={lots of params} hits=0 status=0 QTime=49 
 username=juha
 
 I need to log both username and the query, and if I do it directly in Apache 
 then I lose the information about amount of hits and the query time. If I log 
 it with Solr then I get query time and hits, but no username. Username 
 logging is higher priority requirement than the hits and query time, but I’m 
 looking for solution that covers both cases. 
 
 Has anyone implemented this kind of logging scheme, and how would I 
 accomplish this? I couldn’t find this as a configuration option.
 
 Regards,
 Juha
 
 
 
 
 



[ANN] Solr in Action book release (Solr 4.7)

2014-03-27 Thread Trey Grainger
I'm excited to announce the final print release of *Solr in Action*, the
newest Solr book by Manning publications covering through Solr 4.7 (the
current version). The book is available for immediate purchase in print and
ebook formats, and the *outline*, some *free chapters* as well as the *full
source code are also available* at http://solrinaction.com.

I would love it if you would check the book out, and I would also
appreciate your feedback on it, especially if you find the book to be a
useful guide as you are working with Solr! Timothy Potter and I (Trey
Grainger) worked tirelessly on the book for nearly 2 years to bring you a
thorough (664 pg.) and fantastic example-driven guide to the best Solr has
to offer.

*Solr in Action* is intentionally designed to be a learning guide as
opposed to a reference manual. It builds from an initial introduction to
Solr all the way to advanced topics such as implementing a predictive
search experience, writing your own Solr plugins for function queries and
multilingual text analysis, using Solr for big data analytics, and even
building your own Solr-based recommendation engine. The book uses fun
real-world examples, including analyzing the text of tweets, searching and
faceting on restaurants, grouping similar items in an ecommerce
application, highlighting interesting keywords in UFO sighting reports, and
even building a personalized job search experience.

For a more detailed write-up about the book and it's contents, you can also
visit the Solr homepage at
https://lucene.apache.org/solr/books.html#solr-in-action. Thanks in advance
for checking it out, and I really hope many of you find the book to be
personally useful!

All the best,

Trey Grainger
Co-author,
*Solr in Action*Director of Engineering, Search  Analytics @CareerBuilder


Re: Logging which client connected to Solr

2014-03-27 Thread Jeff Wartes

You could always just pass the username as part of the GET params for the
query. Solr will faithfully ignore and log any parameters it doesn¹t
recognize, so it¹d show up in your {lot of params}.

That means your log parser would need more intelligence, and your client
would have to pass in the data, but it would save any custom work on the
server side.



On 3/27/14, 7:07 AM, Greg Walters greg.walt...@answers.com wrote:

We do something similar and include the server's hostname in solr's
response. To accomplish this you'll have to write a class that extends
org.apache.solr.servlet.SolrDispatchFilter and put your custom class in
place as the SolrRequestFilter in solr's web.xml.

Thanks,
Greg

On Mar 27, 2014, at 8:59 AM, Juha Haaga juha.ha...@codenomicon.com
wrote:

 Hello,
 
 I¹m investigating the possibility of logging the username of the client
who did the search on Solr along with the normal logging information.
The username is in the basic auth headers of the request, and the access
control is managed by an Apache instance proxying to Solr. Is there a
way to append that information to the Solr query log, so that the log
would look like this:
 
 INFO  - 2014-03-27 11:16:24.000; org.apache.solr.core.SolrCore;
[generic] webapp=/solr path=/select params={lots of params} hits=0
status=0 QTime=49 username=juha
 
 I need to log both username and the query, and if I do it directly in
Apache then I lose the information about amount of hits and the query
time. If I log it with Solr then I get query time and hits, but no
username. Username logging is higher priority requirement than the hits
and query time, but I¹m looking for solution that covers both cases.
 
 Has anyone implemented this kind of logging scheme, and how would I
accomplish this? I couldn¹t find this as a configuration option.
 
 Regards,
 Juha
 
 
 
 
 




Timeout when deleting collections or aliases in Solr 4.6.1

2014-03-27 Thread Dave Seltzer
I'm trying to delete some data on a 12 node Solr cloud environment. The
cluster is running Solr 4.6.1.

When I try to delete an alias the collections api returns:

org.apache.solr.common.SolrException: deletealias the collection time
out:60s at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:204)
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:185)
at
org.apache.solr.handler.admin.CollectionsHandler.handleDeleteAliasAction(CollectionsHandler.java:274)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:154)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:673)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:744)

It doesn't seem to matter which server in the cluster I runt this on.
There's no data being inserted and no queries being performed.

I have the same problem whether I attempt to remove a collection or a
collection alias. How do I find the source of this problem?

Many Thanks,

-D


Re: [ANN] Solr in Action book release (Solr 4.7)

2014-03-27 Thread Mark Miller
Nice, Congrats!
-- 
Mark Miller
about.me/markrmiller

On March 27, 2014 at 11:17:49 AM, Trey Grainger (solrt...@gmail.com) wrote:

I'm excited to announce the final print release of *Solr in Action*, the  
newest Solr book by Manning publications covering through Solr 4.7 (the  
current version). The book is available for immediate purchase in print and  
ebook formats, and the *outline*, some *free chapters* as well as the *full  
source code are also available* at http://solrinaction.com.  

I would love it if you would check the book out, and I would also  
appreciate your feedback on it, especially if you find the book to be a  
useful guide as you are working with Solr! Timothy Potter and I (Trey  
Grainger) worked tirelessly on the book for nearly 2 years to bring you a  
thorough (664 pg.) and fantastic example-driven guide to the best Solr has  
to offer.  

*Solr in Action* is intentionally designed to be a learning guide as  
opposed to a reference manual. It builds from an initial introduction to  
Solr all the way to advanced topics such as implementing a predictive  
search experience, writing your own Solr plugins for function queries and  
multilingual text analysis, using Solr for big data analytics, and even  
building your own Solr-based recommendation engine. The book uses fun  
real-world examples, including analyzing the text of tweets, searching and  
faceting on restaurants, grouping similar items in an ecommerce  
application, highlighting interesting keywords in UFO sighting reports, and  
even building a personalized job search experience.  

For a more detailed write-up about the book and it's contents, you can also  
visit the Solr homepage at  
https://lucene.apache.org/solr/books.html#solr-in-action. Thanks in advance  
for checking it out, and I really hope many of you find the book to be  
personally useful!  

All the best,  

Trey Grainger  
Co-author,  
*Solr in Action*Director of Engineering, Search  Analytics @CareerBuilder  


Re: Please remove this thread.

2014-03-27 Thread Shawn Heisey
On 3/27/2014 7:37 AM, Baruch wrote:
  Can you please remove this thread 
 http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/93279
 There is no reason to have this thread live. 

This is an Apache mailing list.  Apache almost never honors requests to
remove anything from its mailing list archive.

http://www.apache.org/foundation/public-archives.html

The URL that you used to indicate what to remove illustrates one of the
main reasons why Apache's policy exists:  You linked to gmane.org, a
site that Apache does not control.  This list is also mirrored in the
nabble.com forums, and several other places.  Apache cannot make changes
to any of them except its own archive at http://mail-archives.apache.org/ .

Thanks,
Shawn



WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-03-27 Thread Malte Hübner
I am using Solr 4.7 and have got a serious problem with
WordDelimiterFilterFactory.

WordDelimiterFilterFactory behaves different on hyphenated terms if they
contain charaters (a-Z) or characters AND numbers.



Splitting up hyphenated terms is deactivated in my configuration:



*This is the fieldType setup from my schema:*



{code}

   fieldType name=text
class=solr.TextField positionIncrementGap=100

   analyzer type=index

   tokenizer
class=solr.WhitespaceTokenizerFactory /

   filter
class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt enablePositionIncrements=true /

   filter
class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0
generateWordParts=0 generateNumberParts=0 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0
splitOnNumerics=0 preserveOriginal=1/

   filter
class=solr.LowerCaseFilterFactory /

   /analyzer

   analyzer type=query

   tokenizer
class=solr.WhitespaceTokenizerFactory /

   filter
class=solr.SynonymFilterFactory synonyms=lang/synonyms_de.txt
ignoreCase=true expand=true /

   filter
class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt enablePositionIncrements=true /

   filter
class=solr.WordDelimiterFilterFactory generateWordParts=0
generateNumberParts=0 catenateWords=1 catenateNumbers=0
catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0
preserveOriginal=1/

   filter
class=solr.LowerCaseFilterFactory /

   /analyzer

   /fieldType

{code}



The given search term is: *X-002-99-495*



WordDelimiterFilterFactory indexes the following word parts:



* X-002-99-495

* X (shouldn't be there)

* 00299495 (shouldn't be there)

* X00299495



But the 'X' should not be indexed or queried as a single term. You can see
that splitting is completely deactivated in the schema.



I can move the charater part around in the search term:



Searching for *002-abc-99-495* gives me



* 002-abc-99-495

* 002 (shouldn't be there)

* abc (shouldn't be there)

* 99495 (shouldn't be there)

* 002abc99495



Searching for Searching for *002-99-495* (no character) gives me

* 002-99-495

* 00299495

This result is what I would expect.



Any ideas?


Re: Logging which client connected to Solr

2014-03-27 Thread Alexandre Rafalovitch
I assume you are passing extra info to Solr.

Then you can write servletfilter to put it in NDC or MDC which can then be
picked up by log4j config pattern.

This approach is not Solr specific. Just usual servlet/log stuff.

Regards,
 Alex
On 27/03/2014 9:00 pm, Juha Haaga juha.ha...@codenomicon.com wrote:

 Hello,

 I’m investigating the possibility of logging the username of the client
 who did the search on Solr along with the normal logging information. The
 username is in the basic auth headers of the request, and the access
 control is managed by an Apache instance proxying to Solr. Is there a way
 to append that information to the Solr query log, so that the log would
 look like this:

 INFO  - 2014-03-27 11:16:24.000; org.apache.solr.core.SolrCore; [generic]
 webapp=/solr path=/select params={lots of params} hits=0 status=0 QTime=49
 username=juha

 I need to log both username and the query, and if I do it directly in
 Apache then I lose the information about amount of hits and the query time.
 If I log it with Solr then I get query time and hits, but no username.
 Username logging is higher priority requirement than the hits and query
 time, but I’m looking for solution that covers both cases.

 Has anyone implemented this kind of logging scheme, and how would I
 accomplish this? I couldn’t find this as a configuration option.

 Regards,
 Juha








timeAllowed query parameter not working?

2014-03-27 Thread Mario-Leander Reimer
Hi Solr users,



currently I have some really long running user entered pure wildcards
queries (like *??) , these are hogging the CPU for several minutes.



So what I tried is setting the timeAllowed query parameter via the search
handler in solrconfig.xml. But without any luck, the parameter does not
seem tob e working. Here is my search handler definition:



requestHandler name=/select class=solr.SearchHandler default=true

lst name=defaults

int name=rows10/int

str name=dfTEXT/str

int name=timeAllowed1/int

/lst

/requestHandler



Thanks for your help!

Leander


Re: Block until replication finishes

2014-03-27 Thread Chris W
Hi

 You can use the details command to check the status of replication.
http://localhost:8983/solr/core_name/replication?command=details

The command returns an xml output and look out for the isReplicating
field in the output. Keep running the command in a loop until the flag
becomes false. Thats when you know its done. I would also recommend you to
check the # of docs in the output at source/destination after the
replication to be sure


HTH




On Thu, Mar 27, 2014 at 6:35 AM, Fermin Silva ferm...@olx.com wrote:

 Hi,

 we are moving to native replication with SOLR 3.5.1.
 Because we want to control the replication from another program (a cron
 job), we decided to curl the slave to issue a fetchIndex command.

 The problem we have is that the curl returns immediately, while the
 replication still goes in the background.
 We need to know when the replication is done, and then resume the cron job.

 Is there a way to block on the replication call until it's done similar to
 waitForSearcher=true when committing ?
 If not, what other possibilities we have?

 Just in case, here is the solrconfig part in the slave (we pass masterUrl
 in the curl url)

 requestHandler name=/replication class=solr.ReplicationHandler
 lst name=slave
   str name=masterUrl/str
 /lst
   /requestHandler


 Many thanks in advance

 --
 Fermin Silva




-- 
Best
-- 
C


Re: New to Solr can someone help me to know if Solr fits my use case

2014-03-27 Thread Saurabh Agarwal
Can anyone help me please.

Hi All,

I am  new to Solr and from initial reading i am quite convinced Solr
will be of great help. Can anyone help in making that decision.

Usecase:
1.  I will have PDF,Word docs generated daily/weekly ( lot of them )
which kinds of get overwritten frequently.
2. I have a dictionary kind of thing ( having a list of which
words/small sentences should be part of above docs , words which
cannot be and alternatives for some  ).
3. Now i want Solr to search my Docs produced in step 1 to be searched
for words/small sentences from step 2 and give me my Doc Name/line no
in which they exist.

Will Solr be a good help to me, If anybody can help giving some
examples that will be great.

Appreciate your help and patience.

Thanks
Saurabh


Re: [ANN] Solr in Action book release (Solr 4.7)

2014-03-27 Thread Trey Grainger
Hi Philippe,

Yes if you've purchased the eBook then the PDF is available now and the
other formats (ePub and Kindle) are supposed to be available for download
on April 8th.
It's also worth mentioning that the eBook formats are all available for
free with the purchase of the print book.

Best regards,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search  Analytics @CareerBuilder


On Thu, Mar 27, 2014 at 12:04 PM, Philippe Soares soa...@genomequest.com
wrote:

 Thanks Trey !
 I just tried to download my copy from my manning account, and this final
 version appears only in PDF format.
 Any idea about when they'll release the other formats ?


Re: Searching multivalue fields.

2014-03-27 Thread Jack Krupansky
Sounds good... for Lucene users, but for Solr users... sounds like a Jira is 
needed.


-- Jack Krupansky

-Original Message- 
From: Ahmet Arslan

Sent: Wednesday, March 26, 2014 4:54 PM
To: solr-user@lucene.apache.org ; kokatnur.vi...@gmail.com
Subject: Re: Searching multivalue fields.

Hi Vijay,

After reading the documentation it seems that following query is what you 
are after. It will return OrderId:345 without matching OrderId:123


SpanQuery q1  = new SpanTermQuery(new Term(BookingRecordId, 234));
SpanQuery q2  = new SpanTermQuery(new Term(OrderLineType, 11));
SpanQuery q2m new FieldMaskingSpanQuery(q2, BookingRecordId);
Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false);

Ahmet



On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan iori...@yahoo.com 
wrote:

Hi Vijay,

I personally don't understand joins very well. Just a guess may be 
FieldMaskingSpanQuery could be used?


http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html


Ahmet




On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur 
kokatnur.vi...@gmail.com wrote:

Hi,

I am bumping this thread again one last time to see if anyone has a
solution.

In it's current state, our application is storing child items as multivalue
fields.  Consider some orders, for example -


{
OrderId:123
BookingRecordId : [145, 987, *234*]
OrderLineType : [11, 12, *13*]
.
}
{
OrderId:345
BookingRecordId : [945, 882, *234*]
OrderLineType : [1, 12, *11*]
.
}
{
OrderId:678
BookingRecordId : [444]
OrderLineType : [11]
.
}


Here, If you look up for an Order with BookingRecordId: 234 And
OrderLineType:11.  You will get two orders with orderId : 123 and 345,
which is correct.  You have two arrays in both the orders that satisfy this
condition.

However, for OrderId:123, the value at 3rd index of OrderLineType array is
13 and not 11( this is for OrderId:345).  So orderId 123 should be
excluded. This is what I am trying to achieve.

I got some suggestions from a solr-user to use FieldsCollapsing, Join,
Block-join or string concatenation.  None of these approaches can be used
without re-indexing schema.

Has anyone found a non-invasive solution for this?

Thanks,

-Vijay 



What are my options?

2014-03-27 Thread Software Dev
We have a collection named items. These are simply products that we
sell. A large part of our scoring involves boosting on certain metrics
for each product (amount sold, total GMS, ratings, etc). Some of these
metrics are actually split across multiple tables.

We are currently re-indexing the complete document anytime any of
these values changes. I'm wondering if there is a better way?

Some ideas:

1) Partial update the document. Is this even possible?
2) Add a parent-child relationship on Item and its metrics?
3) Dump all metrics to a file and use that as it changes throughout
the day? I forgot the actual component that does it. Either way, can
it handle multiple values?
4) Something else?

I appreciate any feedback. Thanks


Re: What are my options?

2014-03-27 Thread Jack Krupansky
Consider DataStax Enterprise - a true real-time database with rich search 
(Cassandra plus Solr).


-- Jack Krupansky

-Original Message- 
From: Software Dev

Sent: Thursday, March 27, 2014 1:11 PM
To: solr-user@lucene.apache.org
Subject: What are my options?

We have a collection named items. These are simply products that we
sell. A large part of our scoring involves boosting on certain metrics
for each product (amount sold, total GMS, ratings, etc). Some of these
metrics are actually split across multiple tables.

We are currently re-indexing the complete document anytime any of
these values changes. I'm wondering if there is a better way?

Some ideas:

1) Partial update the document. Is this even possible?
2) Add a parent-child relationship on Item and its metrics?
3) Dump all metrics to a file and use that as it changes throughout
the day? I forgot the actual component that does it. Either way, can
it handle multiple values?
4) Something else?

I appreciate any feedback. Thanks 



Re: stored=true vs stored=false, in terms of storage

2014-03-27 Thread Jack Krupansky
You can consider DocValues as well. There you can control whether they ever 
use heap memory or only file space.


See:
https://cwiki.apache.org/confluence/display/solr/DocValues

-- Jack Krupansky

-Original Message- 
From: Pramod Negi

Sent: Wednesday, March 26, 2014 1:27 PM
To: solr-user@lucene.apache.org
Subject: stored=true vs stored=false, in terms of storage

Hi,

I am using Solr and I have one doubt.

If any field has stored=false, does it mean that this fields is stored in
disk and not in main memory. and this will be loaded whenever asked.


The scenario I would like to handle this, In my case there are lots of
information which I need to show when debugQuery=true, so i can take the
latency hit on debugQuery=true.

Can i save all the information in a field with indexed=false and
stored=true.

And how do normally DebugInformation is saved


Regards,
Pramod Negi 



RE: Solr 4.3.1 memory swapping

2014-03-27 Thread Darrell Burgan
Thanks for the advice Shawn - gives me a direction to head. My next step is 
probably to update the operating system and the JVM to see if the behavior 
changes. If not, I'll pull in Red Hat support.
Thanks,
Darrell


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Thursday, March 27, 2014 2:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3.1 memory swapping

On 3/26/2014 10:26 PM, Darrell Burgan wrote:
 Okay well it didn't take long for the swapping to start happening on one of 
 our nodes.  Here is a screen shot of the Solr console:
 
 https://s3-us-west-2.amazonaws.com/panswers-darrell/solr.png
 
 And here is a shot of top, with processes sorted by VIRT:
 
 https://s3-us-west-2.amazonaws.com/panswers-darrell/top.png
 
 As shown, we have used up more than 25% of the swap space, over 1GB, even 
 though there is 16GB of OS RAM available, and the Solr JVM has been allocated 
 only 10GB. Further, we're only consuming 1.5/4GB of the 10GB of JVM heap.
 
 Top shows that the Solr process 21582 is using 2.4GB resident but has a 
 virtual size of 82.4GB. Presumably that virtual size is due to the memory 
 mapped file. The other Java process 27619 is Zookeeper.
 
 So my question remains - why did we use any swap space at all? Doesn't 
 seem like we're experiencing memory pressure at the moment ... I'm 
 confused.  :-)

The virtual memory value is indeed that large because of the mmapped file.

There is definitely something wrong here.  I don't know whether it's Java, 
RHEL, or something strange with the S3 virtual machine, possibly a bad 
interaction with the older kernel.  With your -Xmx value, Java should never use 
more than about 10.5 GB of physical memory, and the top output indicates that 
it's only using 2.4GB of memory.  13GB is used by the OS disk cache.

You might notice that I'm not mentioning Solr in the list of possible problems. 
 This is because an unmodified Solr install only utilizes the Java heap, so 
it's Java that is in charge of allocating memory from the operating system.

Here is a script that will tell you what's using swap and how much.
This will let you be absolutely sure about whether or not Java is the problem 
child:

http://stackoverflow.com/a/7180078/2665648

There are instructions in the comments of the script for sorting the output.

The only major thing I saw in your JVM config (aside from perhaps reducing the 
max heap) that I would change is the garbage collector tuning.  I'm the 
original author mentioned in this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems



Here's a screenshot from my dev solr server, where you can see that there is 
zero swap usage:

https://www.dropbox.com/s/mftgi3q2hn7w9qp/solr-centos6-top.png

This is a baremetal server with 16GB of RAM, running CentOS 6.5 and a 
pre-release snapshot of Solr 4.7.1.  With an Intel Xeon X3430, I'm pretty sure 
the processor architecture is NUMA, but the motherboard only has one CPU slot, 
so it's only got one NUMA node.  As you can see by my virtual memory value, I 
have a lot more index data on this machine than you have on yours.  My heap is 
7GB.  The other three java processes that you can see running are in-house 
software related to Solr.

Performance is fairly slow with that much index and so little disk cache, but 
it's a dev server.  The production environment has plenty of RAM to cache the 
entire index.

Thanks,
Shawn



Re: dih data-config.xml onImportEnd event

2014-03-27 Thread Andreas Owen

sorry, the previous conversation was started with a false email-address.

On Thu, 27 Mar 2014 14:06:57 +0100, Stefan Matheis  
matheis.ste...@gmail.com wrote:


I would suggest you read the replies to your last mail (containing the  
very same question) first?


-Stefan


On Thursday, March 27, 2014 at 1:56 PM, Andreas Owen wrote:


i would like to call a url after the import is finished whith the event
document onImportEnd=. how can i do this?








--
Using Opera's mail client: http://www.opera.com/mail/


RE: timeAllowed query parameter not working?

2014-03-27 Thread Michael Ryan
Unfortunately the timeAllowed parameter doesn't apply to the part of the 
processing that makes wildcard queries so slow. It only applies to a later part 
of the processing when the matching documents are being collected. There's some 
discussion in the original ticket that implemented this 
(https://issues.apache.org/jira/browse/SOLR-502). I'm not sure if there's a 
newer ticket for implementing an end-to-end timeout.

-Michael

-Original Message-
From: Mario-Leander Reimer [mailto:mario-leander.rei...@qaware.de] 
Sent: Thursday, March 27, 2014 12:15 PM
To: solr-user@lucene.apache.org
Subject: timeAllowed query parameter not working?

Hi Solr users,



currently I have some really long running user entered pure wildcards queries 
(like *??) , these are hogging the CPU for several minutes.



So what I tried is setting the timeAllowed query parameter via the search 
handler in solrconfig.xml. But without any luck, the parameter does not seem 
tob e working. Here is my search handler definition:



requestHandler name=/select class=solr.SearchHandler default=true

lst name=defaults

int name=rows10/int

str name=dfTEXT/str

int name=timeAllowed1/int

/lst

/requestHandler



Thanks for your help!

Leander


Stats Filter Exclusion Throwing Error

2014-03-27 Thread Harish Agarwal
I'm using the latest nightly build of 4.8 and testing this patch:

https://issues.apache.org/jira/browse/SOLR-3177

using this set of fq / stats.field query params:

fq={!tag=INTEGER_4}INTEGER_4:(2)stats.field={!ex=INTEGER_4}INTEGER_4

with Solr throwing the following error:

ERROR - 2014-03-27 16:13:12.164; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: undefined field:
{!ex=INTEGER_4}INTEGER_4

at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1172)

at
org.apache.solr.handler.component.StatsInfo.parse(StatsComponent.java:190)

at
org.apache.solr.handler.component.StatsComponent.modifyRequest(StatsComponent.java:97)

at
org.apache.solr.handler.component.ResponseBuilder.addRequest(ResponseBuilder.java:147)

at
org.apache.solr.handler.component.QueryComponent.createMainQuery(QueryComponent.java:816)

at
org.apache.solr.handler.component.QueryComponent.regularDistributedProcess(QueryComponent.java:649)

at
org.apache.solr.handler.component.QueryComponent.distributedProcess(QueryComponent.java:602)

at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:253)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1939)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)

at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)

at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)

at
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1805)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)



Have I got the syntax wrong?


SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2014-03-27 Thread Rishi Easwaran
All,

I am running SOLR Cloud 4.6, everything looks ok, except for this warn message 
constantly in the logs.


2014-03-27 17:09:03,982 WARN  [commitScheduler-15-thread-1] [] SolrCore - 
[index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2
2014-03-27 17:09:05,517 WARN  [commitScheduler-15-thread-1] [] SolrCore - 
[index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2
2014-03-27 17:09:06,774 WARN  [commitScheduler-15-thread-1] [] SolrCore - 
[index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2
2014-03-27 17:09:08,085 WARN  [commitScheduler-15-thread-1] [] SolrCore - 
[index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2
2014-03-27 17:09:09,114 WARN  [commitScheduler-15-thread-1] [] SolrCore - 
[index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2
2014-03-27 17:09:10,238 WARN  [commitScheduler-15-thread-1] [] SolrCore - 
[index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2

Searched around a bit, looks like my solrconfig.xml is configured fine and 
verified there are no explicit commits sent by our clients.

My solrconfig.xml 
 autoCommit
maxDocs1/maxDocs
maxTime6/maxTime
openSearcherfalse/openSearcher
/autoCommit

   autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit


Any idea why its warning every second, the only config that has 1 second is 
softcommit.

Thanks,
Rishi.



Re: DIH dataimport.properties Zulu time

2014-03-27 Thread Kiran J
Thank you for the response. This works if I invoke start.jar with java. In
my usecase however, I need to invoke start.jar directly (consoleless
service so that the user cannot close it accidentally). It doesnt pickup
user.timezone property when done this way. Is it possible to do this using
the tag below somehow. I tried setting locale=UTC and it didnt work.

propertyWriter dateFormat=-MM-dd HH:mm:ss
type=SimplePropertiesWriter directory=data
filename=my_dih.properties locale=en_US /



On Tue, Mar 25, 2014 at 7:45 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 26 March 2014 02:44, Kiran J kiranjuni...@gmail.com wrote:
 
  Hi
 
  Is it possible to set up the data import handler so that it keeps track
 of
  the last imported time in Zulu time and not local time ?
 [...]

 Start your JVM with the desired timezone, e.g.,
 java -Duser.timezone=UTC -jar start.jar

 Regards,
 Gora



Re: [ANN] Solr in Action book release (Solr 4.7)

2014-03-27 Thread Jagat Singh
Many Congrats,

600+ pages can make me feel the tireless two years handwork behind it.



On Fri, Mar 28, 2014 at 4:04 AM, Trey Grainger solrt...@gmail.com wrote:

 Hi Philippe,

 Yes if you've purchased the eBook then the PDF is available now and the
 other formats (ePub and Kindle) are supposed to be available for download
 on April 8th.
 It's also worth mentioning that the eBook formats are all available for
 free with the purchase of the print book.

 Best regards,

 Trey Grainger
 Co-author, Solr in Action
 Director of Engineering, Search  Analytics @CareerBuilder


 On Thu, Mar 27, 2014 at 12:04 PM, Philippe Soares soa...@genomequest.com
 wrote:
 
  Thanks Trey !
  I just tried to download my copy from my manning account, and this final
  version appears only in PDF format.
  Any idea about when they'll release the other formats ?



Re: Multiple Languages in Same Core

2014-03-27 Thread Trey Grainger
In addition to the two approaches Liu Bo mentioned (separate core per
language and separate field per language), it is also possible to put
multiple languages in a single field. This saves you the overhead of
multiple cores and of having to search across multiple fields at query
time. The idea here is that you can run multiple analyzers (i.e. one for
German, one for English, one for Chinese, etc.) and stack the outputted
TokenStreams for each of these within a single field. It is also possible
to swap out the languages you want to use on a case-by-case basis (i.e.
per-document, per field, or even per word) if you really need to for
advanced use cases.

All three of these methods, including code examples and the pros and cons
of each are discussed in the Multilingual Search chapter of Solr in Action,
which Alexandre referenced. If you don't have the book, you can also just
download and run the code examples for free, though they may be harder to
follow without the context from the book.

Thanks,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search  Analytics @CareerBuilder





On Wed, Mar 26, 2014 at 4:34 AM, Liu Bo diabl...@gmail.com wrote:

 Hi Jeremy

 There're a lot of multi language discussions, two main approaches
  1. like yours, a language is one core
  2. all in one core, different language has it's own field.

 We have multi-language support in a single core, each multilingual field
 has it's own suffix such as name_en_US. We customized query handler to hide
 the query details to client.
 The main reason we want to do this is about NRT index and search,
 take product for example:

 product has price, quantity which is common and it's used by filtering
 and sorting, name, description is multi language field,
 if we split product in do different cores, the common field updating
 may end up a update in all of the multi language cores.

 As to scalability, we don't change solr cores/collections when a new
 language is added, but we probably need update our customized index process
 and run a full re-index.

 This approach suits our requirement for now, but you may have your own
 concerns.

 We have similar suggest filter problem like yours, we want to return
 suggest result filtering by stores. I can't find a way to build dictionary
 with query at my version of solr 4.6

 What I do is run a query on a N-Gram analyzed field and with filter queries
 on store_id field. The suggest is actually a query. It may not perform as
 well as suggestion but can do the trick.

 You can try it to build a additional N-GRAM field for suggestion only and
 search on it with fq on your Locale field.

 All the best

 Liu Bo




 On 25 March 2014 09:15, Alexandre Rafalovitch arafa...@gmail.com wrote:

  Solr In Action has a significant discussion on the multi-lingual
  approach. They also have some code samples out there. Might be worth a
  look
 
  Regards,
 Alex.
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all
  at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
 
 
  On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson
  jer...@thomersonfamily.com wrote:
   I recently deployed Solr to back the site search feature of a site I
 work
   on. The site itself is available in hundreds of languages. With the
  initial
   release of site search we have enabled the feature for ten of those
   languages. This is distributed across eight cores, with two Chinese
   languages plus Korean combined into one CJK core and each of the other
   seven languages in their own individual cores. The reason for splitting
   these into separate cores was so that we could have the same field
 names
   across all cores but have different configuration for analyzers, etc,
 per
   core.
  
   Now I have some questions on this approach.
  
   1) Scalability: Considering I need to scale this to many dozens more
   languages, perhaps hundreds more, is there a better way so that I don't
  end
   up needing dozens or hundreds of cores? My initial plan was that many
   languages that didn't have special support within Solr would simply get
   lumped into a single default core that has some default analyzers
 that
   are applicable to the majority of languages.
  
   1b) Related to this: is there a practical limit to the number of cores
  that
   can be run on one instance of Lucene?
  
   2) Auto Suggest: In phase two I intend to add auto-suggestions as a
 user
   types a query. In reviewing how this is implemented and how the
  suggestion
   dictionary is built I have concerns. If I have more than one language
 in
  a
   single core (and I keep the same field name for suggestions on all
   languages within a core) then it seems that I could get suggestions
 from
   another language returned with a suggest query. Is there a way to
 build a
   separate dictionary for 

Re: DIH dataimport.properties Zulu time

2014-03-27 Thread Kiran J
I figured it out. I use SQL Server, so this is my solution :

propertyWriter dateFormat=*-MM-dd'T'HH:mm:ssXXX*
type=SimplePropertiesWriter /

In TSQL, this can be converted to a UTC date time using :

CONVERT(datetimeoffset, '${dih.last_index_time}', 127)

Refs:

http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html
http://msdn.microsoft.com/en-us/library/ms187928.aspx




On Thu, Mar 27, 2014 at 2:17 PM, Kiran J kiranjuni...@gmail.com wrote:

 Thank you for the response. This works if I invoke start.jar with java. In
 my usecase however, I need to invoke start.jar directly (consoleless
 service so that the user cannot close it accidentally). It doesnt pickup
 user.timezone property when done this way. Is it possible to do this using
 the tag below somehow. I tried setting locale=UTC and it didnt work.

 propertyWriter dateFormat=-MM-dd HH:mm:ss 
 type=SimplePropertiesWriter directory=data filename=my_dih.properties 
 locale=en_US /



 On Tue, Mar 25, 2014 at 7:45 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 26 March 2014 02:44, Kiran J kiranjuni...@gmail.com wrote:
 
  Hi
 
  Is it possible to set up the data import handler so that it keeps track
 of
  the last imported time in Zulu time and not local time ?
 [...]

 Start your JVM with the desired timezone, e.g.,
 java -Duser.timezone=UTC -jar start.jar

 Regards,
 Gora





Re: String Cast Error

2014-03-27 Thread Chris Hostetter

: I have a search that sorts on a boolean field. This search is pulling 
: the following error: java.lang.String cannot be cast to 
: org.apache.lucene.util.BytesRef.

This is almost certainly another manifestation of SOLR-5920...

https://issues.apache.org/jira/browse/SOLR-5920



-Hoss
http://www.lucidworks.com/


Re: New to Solr can someone help me to know if Solr fits my use case

2014-03-27 Thread Alexandre Rafalovitch
This feels somewhat backwards. It's very hard to extract Line-Number
information out of MSWord and next to impossible from PDF. So, it's
not whether the Solr is a good fit or not here is that maybe your
whole architecture has a major issue. Can you do this/what you want by
hand at least once? Down to the precision you want?

If you can, then yes you probably can automate the searching with
Solr, though you will still have serious issues (sentence crossing
line-boundaries, etc). But I suspect your whole approach will change
once you try to do this manually.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Mar 27, 2014 at 11:46 PM, Saurabh Agarwal
sagarwal1...@gmail.com wrote:
 Can anyone help me please.

 Hi All,

 I am  new to Solr and from initial reading i am quite convinced Solr
 will be of great help. Can anyone help in making that decision.

 Usecase:
 1.  I will have PDF,Word docs generated daily/weekly ( lot of them )
 which kinds of get overwritten frequently.
 2. I have a dictionary kind of thing ( having a list of which
 words/small sentences should be part of above docs , words which
 cannot be and alternatives for some  ).
 3. Now i want Solr to search my Docs produced in step 1 to be searched
 for words/small sentences from step 2 and give me my Doc Name/line no
 in which they exist.

 Will Solr be a good help to me, If anybody can help giving some
 examples that will be great.

 Appreciate your help and patience.

 Thanks
 Saurabh


Re: document level security filter solution for Solr

2014-03-27 Thread Philip Durbin
Yonik, your reply was incredibly helpful. Thank you very much!

The join approach to document security you explained is somewhat
similar to what I called Option 2 (ACL PostFilter) since permissions
are stored in each document, but it's much simpler in that I'm not
required to write, compile, and distribute my own QParserPlugin. In
addition, by using dynamic fields (for now anyway), I don't even have
to distribute a new schema.xml. It justs works! (Once you re-index.)
At least it seems to work. I'm declaring this a new option, Option 5.
:)

The crux of the solution is creating a new document type to join on,
a new group type. For me, this new group type sits along side some
other document types I had defined already (dataverses, datasets,
and files in my case). Each of my older types, my existing documents,
now get tagged with the id of one or more of the new group
documents. It's like saying, This document can be seen by these
groups I'm tagging it with.

To make this more concrete, I thought I'd post some curl output
showing how I'm now tagging my existing dataverse documents with new
permissions such as group_2 and group_public which represent
actual groups as well as what I'll call User Private Groups (UPG*)
which is one group per user with the user's name. (Unlike your example
where user joe is part of a group called joe I'm putting user1
in the name of the group such as groups_user1. But that's still the
joe group that only joe is a part of.)

At runtime, I'll check to see which groups a user is part of and then
run one or more joins (separated by OR's) for each group. Anonymous
users only get to see documents tagged with the group called public,
as you had illustrated. If you're part of a lot of groups, I guess
there will be a lot of OR's in the filter query.

Output from curl is below. Comments are welcome! (Any objections to
this approach?) Thanks again!

Phil

Exisiting dataverse documents, now tagged with various groups under
the perms_ss field, and two example joins, separated by an OR:

[pdurbin@localhost ~]$ curl -s --globoff
'http://localhost:8983/solr/collection1/select?rows=100wt=jsonindent=truesort=id+descq=*fq=({!join+from=groups_s+to=perms_ss}id:group_public+OR+{!join+from=groups_s+to=perms_ss}id:group_user1)'
| jq '.response.docs[] | {id,perms_ss,dvtype}' | head -17
{
  dvtype: dataverses,
  perms_ss: [
group_user1,
group_user5,
group_2
  ],
  id: dataverse_9
}
{
  dvtype: dataverses,
  perms_ss: [
group_public,
group_2
  ],
  id: dataverse_7
}

New groups documents that are used in the join:

[pdurbin@localhost ~]$ curl -s
'http://localhost:8983/solr/collection1/select?rows=100wt=jsonindent=truesort=id+ascq=id:group**'
| jq '.response.docs[] | {id,groups_s,dvtype}' | grep group_public -B7
-A6
{
  dvtype: groups,
  groups_s: group_4,
  id: group_4
}
{
  dvtype: groups,
  groups_s: group_public,
  id: group_public
}
{
  dvtype: groups,
  groups_s: group_user1,
  id: group_user1
}

* User Private Groups (UPG) is what Red Hat calls them: Red Hat
Enterprise Linux uses a user private group (UPG) scheme, which makes
UNIX groups easier to manage. A user private group is created whenever
a new user is added to the system. It has the same name as the user
for which it was created and that user is the only member of the user
private group. --
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/ch-Managing_Users_and_Groups.html#s2-users-groups-private-groups

On Tue, Mar 25, 2014 at 3:40 PM, Yonik Seeley yo...@heliosearch.com wrote:
 Depending on requirements, another option for simple security is to
 store the security info in the index and utilize a join.  This really
 only works when you have a single shard since joins aren't
 distributed.

 # the documents, with permissions
 id:doc1, perms:public,...
 id:doc2, perms:group1 group2 joe, ...
 id:doc3, perms:group3, ...

 # documents modeling users and what groups they belong to
 id:joe, groups:joe public  group3
 id:mark, groups:mark public group1 group2

 And then if joe does a query, you add a filter query like the following
 fq={!join from=groups to=perms v=id:joe}

 The user documents can either be in the same collection, or in a
 separate core as long as it's co-located in the same JVM (core
 container), and you can do a cross-core join.

 -Yonik
 http://heliosearch.org - solve Solr GC pauses with off-heap filters
 and fieldcache


 On Tue, Mar 25, 2014 at 3:06 PM, Philip Durbin
 philip_dur...@harvard.edu wrote:
 I'm new to Solr and I'm looking for a document level security filter
 solution. Anonymous users searching my application should be able to
 find public data. Logged in users should be able to find public data
 and private data they have access to.

 Earlier today I wrote about shards as a possible solution. I got a
 great reply from Shalin Shekhar Mangar of LucidWorks explaining how to
 achieve something technical but I'd like to back up a minute and
 consider 

Re: New to Solr can someone help me to know if Solr fits my use case

2014-03-27 Thread Saurabh Agarwal
Thanks a lot Alex for your reply, Appreciate the same.

So if i leave the line no part.
1. I guess putting pdf/word  in solr for search can be done, These
documents will go go in solr.
2. For search any automatic way to give a excel sheet or large search
keywords to search for .
ie i have 1000's of words that i want to search in doc can i do it
collectively or send search queries one by one.

Thanks
Saurabh



On Fri, Mar 28, 2014 at 6:48 AM, Alexandre Rafalovitch
arafa...@gmail.com wrote:
 This feels somewhat backwards. It's very hard to extract Line-Number
 information out of MSWord and next to impossible from PDF. So, it's
 not whether the Solr is a good fit or not here is that maybe your
 whole architecture has a major issue. Can you do this/what you want by
 hand at least once? Down to the precision you want?

 If you can, then yes you probably can automate the searching with
 Solr, though you will still have serious issues (sentence crossing
 line-boundaries, etc). But I suspect your whole approach will change
 once you try to do this manually.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr 
 proficiency


 On Thu, Mar 27, 2014 at 11:46 PM, Saurabh Agarwal
 sagarwal1...@gmail.com wrote:
 Can anyone help me please.

 Hi All,

 I am  new to Solr and from initial reading i am quite convinced Solr
 will be of great help. Can anyone help in making that decision.

 Usecase:
 1.  I will have PDF,Word docs generated daily/weekly ( lot of them )
 which kinds of get overwritten frequently.
 2. I have a dictionary kind of thing ( having a list of which
 words/small sentences should be part of above docs , words which
 cannot be and alternatives for some  ).
 3. Now i want Solr to search my Docs produced in step 1 to be searched
 for words/small sentences from step 2 and give me my Doc Name/line no
 in which they exist.

 Will Solr be a good help to me, If anybody can help giving some
 examples that will be great.

 Appreciate your help and patience.

 Thanks
 Saurabh


Re: New to Solr can someone help me to know if Solr fits my use case

2014-03-27 Thread Alexandre Rafalovitch
1. You don't actually put PDF/Word into Solr. Instead, it is run
through content and metadata extraction process and then index that.
This is important because a computer does not understand what you
are looking for when you open a PDF. It only understand whatever text
is possible to extract. In case of PDF it is often not much at all,
unless it was generated with accessibility layer in place. You can
experiment with what you can extract by downloading a standalone
Apache Tika install, which has a command line version or using Solr's
extractOnly flag. Solr, internally, uses Tika, so the results should
be the same.

2) When you do a search you can do field:(Keyword1 Keyword2 Keyword3
Keyword4) and you get as results any document that matches one of
those. Not sure about 1000 of them in one go, but certainly a large
number.

On the other hand, if you have same keywords all the time and you are
trying to match documents against them, you might be more interested
in Elastic Search's percolator
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html
) or in Luwak (https://github.com/flaxsearch/luwak).

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Mar 28, 2014 at 10:05 AM, Saurabh Agarwal
sagarwal1...@gmail.com wrote:
 Thanks a lot Alex for your reply, Appreciate the same.

 So if i leave the line no part.
 1. I guess putting pdf/word  in solr for search can be done, These
 documents will go go in solr.
 2. For search any automatic way to give a excel sheet or large search
 keywords to search for .
 ie i have 1000's of words that i want to search in doc can i do it
 collectively or send search queries one by one.

 Thanks
 Saurabh



 On Fri, Mar 28, 2014 at 6:48 AM, Alexandre Rafalovitch
 arafa...@gmail.com wrote:
 This feels somewhat backwards. It's very hard to extract Line-Number
 information out of MSWord and next to impossible from PDF. So, it's
 not whether the Solr is a good fit or not here is that maybe your
 whole architecture has a major issue. Can you do this/what you want by
 hand at least once? Down to the precision you want?

 If you can, then yes you probably can automate the searching with
 Solr, though you will still have serious issues (sentence crossing
 line-boundaries, etc). But I suspect your whole approach will change
 once you try to do this manually.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr 
 proficiency


 On Thu, Mar 27, 2014 at 11:46 PM, Saurabh Agarwal
 sagarwal1...@gmail.com wrote:
 Can anyone help me please.

 Hi All,

 I am  new to Solr and from initial reading i am quite convinced Solr
 will be of great help. Can anyone help in making that decision.

 Usecase:
 1.  I will have PDF,Word docs generated daily/weekly ( lot of them )
 which kinds of get overwritten frequently.
 2. I have a dictionary kind of thing ( having a list of which
 words/small sentences should be part of above docs , words which
 cannot be and alternatives for some  ).
 3. Now i want Solr to search my Docs produced in step 1 to be searched
 for words/small sentences from step 2 and give me my Doc Name/line no
 in which they exist.

 Will Solr be a good help to me, If anybody can help giving some
 examples that will be great.

 Appreciate your help and patience.

 Thanks
 Saurabh


[RE-BALACE of Collection] Re-balancing of collection after adding nodes to clustered node

2014-03-27 Thread Debasis Jana
Hi,

I found the email addresses from a slide-share @
http://www.slideshare.net/thelabdude/tjp-solr-webinar. It's very useful. We
are developing SOLR search using CDH4 Cloudera and embedded SOLR
4.4.0-search-1.1.0.

We created a Collection when the cluster had 2 slave nodes. Then two extra
nodes added. In those extra nodes SOLR service runs, but Zoo Keeper service
does not run in those nodes.
Zoo Keeper service runs only in earlier nodes. When cluster had 2 nodes
then indexing tool run successfully. But after adding two nodes when again
indexing tool runs then it throws and error *no active slice servicing
hashcode*.
The error seems that re-balancing of collection didn't happen after adding
extra SOLR nodes. So when indexing tool runs then tool tries to
shard/distribute the indexing information into extra node(s) which is/are
not aware of that collection
and throws an error. Number of sharding is: 2. Composite routing policy is
used.

My question is, is it possible to re-balancing the collection information
after creating new SOLR nodes?
In your slide share it's written that re-balancing is available in
SOLR-5025, what's SOLR-5025?

Thanks  Regards
Debasis


Re: Question on highlighting edgegrams

2014-03-27 Thread Software Dev
Certainly I am not the only user experiencing this?

On Wed, Mar 26, 2014 at 1:11 PM, Software Dev static.void@gmail.com wrote:
 Is this a known bug?

 On Tue, Mar 25, 2014 at 1:12 PM, Software Dev static.void@gmail.com 
 wrote:
 Same problem here:
 http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html

 On Tue, Mar 25, 2014 at 9:39 AM, Software Dev static.void@gmail.com 
 wrote:
 Bump

 On Mon, Mar 24, 2014 at 3:00 PM, Software Dev static.void@gmail.com 
 wrote:
 In 3.5.0 we have the following.

 fieldType name=autocomplete class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=30/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType

 If we searched for c with highlighting enabled we would get back
 results such as:

 emc/emdat
 emc/emrocdile
 emce/mool beans

 But in the latest Solr (4.7) we get the full words highlighted back.
 Did something change from these versions with regards to highlighting?

 Thanks


Re: Question on highlighting edgegrams

2014-03-27 Thread Shalin Shekhar Mangar
Yes, there are known bugs with EdgeNGram filters. I think they are fixed in 4.4

See https://issues.apache.org/jira/browse/LUCENE-3907

On Fri, Mar 28, 2014 at 10:17 AM, Software Dev
static.void@gmail.com wrote:
 Certainly I am not the only user experiencing this?

 On Wed, Mar 26, 2014 at 1:11 PM, Software Dev static.void@gmail.com 
 wrote:
 Is this a known bug?

 On Tue, Mar 25, 2014 at 1:12 PM, Software Dev static.void@gmail.com 
 wrote:
 Same problem here:
 http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html

 On Tue, Mar 25, 2014 at 9:39 AM, Software Dev static.void@gmail.com 
 wrote:
 Bump

 On Mon, Mar 24, 2014 at 3:00 PM, Software Dev static.void@gmail.com 
 wrote:
 In 3.5.0 we have the following.

 fieldType name=autocomplete class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=30/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType

 If we searched for c with highlighting enabled we would get back
 results such as:

 emc/emdat
 emc/emrocdile
 emce/mool beans

 But in the latest Solr (4.7) we get the full words highlighted back.
 Did something change from these versions with regards to highlighting?

 Thanks



-- 
Regards,
Shalin Shekhar Mangar.


Product index schema for solr

2014-03-27 Thread Ajay Patel




 Original Message 
Subject:Product index schema for solr
Date:   Fri, 28 Mar 2014 10:46:20 +0530
From:   Ajay Patel apa...@officebeacon.com
To: solr-user-ow...@lucene.apache.org



Hi Solr user  developers.

i am new in the world of solr search engine. i have a complex product
database structure in postgres.

Product has many product_quantity_price attrbutes in range

For e.g Product iD 1 price range is stored in product_quantity_price
table in following manner.

min_qty max_qty price_per_qty
1504
51  100  3.5
1011503
151200  2.5

the range is not fixed for any product it can be different for different
product.

now my question is that how can i save this data in solr in optimized
way so that i can create facets on qty and prices.

Thanks in advance.
Ajay Patel.